1487122675 Q * sannes Ping timeout: 480 seconds 1487123223 J * sannes ~ace@2a02:fe0:c131:9070:2daa:c503:b0b3:18c4 1487124796 J * derjohn_mobi ~aj@p2003008E6C1E49007D250562C1BF6825.dip0.t-ipconnect.de 1487125225 Q * derjohn_mob Ping timeout: 480 seconds 1487128940 J * fstd_ ~fstd@x4e301437.dyn.telefonica.de 1487129405 Q * fstd Ping timeout: 480 seconds 1487129405 N * fstd_ fstd 1487129765 Q * sannes Ping timeout: 480 seconds 1487130303 J * sannes ~ace@2a02:fe0:c131:9070:c5f:93de:b455:6e6c 1487136905 Q * sannes Ping timeout: 480 seconds 1487137491 J * sannes ~ace@2a02:fe0:c131:9070:e56a:84f0:51aa:293a 1487141175 Q * derjohn_mobi Ping timeout: 480 seconds 1487143800 J * derjohn_mobi ~aj@46.183.103.17 1487144065 Q * sannes Ping timeout: 480 seconds 1487144711 J * sannes ~ace@2a02:fe0:c131:9070:88e6:a58e:fbba:d82f 1487145780 J * Ghislain ~ghislain@adsl1.aqueos.com 1487147324 M * Bertl_oO off to bed now ... have a good one everyone! 1487147326 N * Bertl_oO Bertl_zZ 1487148428 J * nikolay ~nikolay@HOST.255.3.ixos.de 1487149135 Q * derjohn_mobi Ping timeout: 480 seconds 1487150456 J * derjohn_mobi ~aj@2a01:598:a841:e231:b5e7:3a22:317a:f44b 1487151185 Q * derjohn_mobi Ping timeout: 480 seconds 1487151195 Q * sannes Ping timeout: 480 seconds 1487151690 J * sannes ~ace@2a02:fe0:c131:9070:79d3:e08c:f6ce:4822 1487154054 Q * Ghislain Quit: Leaving. 1487154134 J * Ghislain ~ghislain@adsl1.aqueos.com 1487154661 Q * Ghislain Quit: Leaving. 1487155196 J * Ghislain ~ghislain@adsl1.aqueos.com 1487158263 Q * sannes Ping timeout: 480 seconds 1487158855 J * sannes ~ace@2a02:fe0:c131:9070:7db9:7a67:fe49:6ff8 1487160823 J * derjohn_mob ~aj@tmo-122-213.customers.d1-online.com 1487163825 Q * Aiken Remote host closed the connection 1487164268 Q * derjohn_mob Ping timeout: 480 seconds 1487165368 Q * sannes Ping timeout: 480 seconds 1487166032 J * sannes ~ace@2a02:fe0:c131:9070:a019:4986:7bb0:4a00 1487169318 J * derjohn_mob ~aj@tmo-122-213.customers.d1-online.com 1487169660 N * Bertl_zZ Bertl 1487169662 M * Bertl morning folks! 1487169740 M * nikolay afternoon Bertl 1487169825 M * Bertl how's going? 1487169831 M * Guy- my load spikes are back :) 1487169858 M * Guy- but I don't have time to investigate right now 1487169866 M * Guy- (also, hello and all) 1487169903 M * Bertl do you have swap space configured? if so, how much compared to the memory and what is your swappiness? 1487169914 M * nikolay +11 degrees, weather is nice... 1487170103 M * Bertl can't complain here either ... sunny and relatively warm 1487170586 M * Guy- KiB Mem : 32729512 total, 5011888 free, 25972360 used, 1745264 buff/cache 1487170586 M * Guy- KiB Swap: 3779580 total, 3687372 free, 92208 used. 6308688 avail Mem 1487170599 M * Guy- swappiness is 60 1487170616 M * Guy- according to munin there is very little (if any) swapping going on, also during load spikes 1487170622 M * Guy- swap is on SSDs 1487170641 M * Guy- kernel is 4.1.36-vs2.3.8.5.2, fwiw 1487170682 M * Bertl yeah, doesn't look like the swap space is used much 1487170740 M * Guy- when I have more time I'll reproduce the problem interactively and try to see what's going on 1487170745 M * Bertl but the buffer/cache memory is quite small which suggests that the memory is mostly occupied 1487170771 M * Guy- I think it's mostly used by zfs for caching, which doesn't show up in buff/cache (iiuc) 1487170855 M * Guy- but (again according to munin) minimum unused is still over 3GB 1487170871 M * Guy- so I don't think there is much memory pressure 1487170894 M * Bertl well, it might mean that the inode caches get purged quite often 1487170921 M * Bertl which is not bad per se, but might explain the sudden activity when you traverse the filesystem(s) 1487170982 N * DLange DLange_dc 1487170989 M * Guy- yes, that's plausible 1487171015 M * Bertl I would suggest to have a cron job like 10 or 15 minutes before the rsync, which just iterates over the files to be synced 1487171021 M * Guy- inode cache is slowly and unsteadily declining until first load spike, then peaking along with load 1487171025 M * Bertl then see if that moves the spike :) 1487171066 M * Guy- then it scales back a bit (to about 50% of max), and when the next backup job runs it goes up again (along with the load) 1487171088 M * Guy- I have looked at this before but haven't really found a good way of tuning the inode cache 1487171123 Q * derjohn_mob Ping timeout: 480 seconds 1487171566 M * Guy- maybe I should decrease vfs_cache_pressure? 1487171986 M * Bertl maybe ... really depends on what your other vm settings are (min free, watermark scale, etc) 1487172080 M * Guy- defaults; I didn't touch anything 1487172511 Q * nikolay Quit: Leaving 1487172543 Q * sannes Ping timeout: 480 seconds 1487173206 J * sannes ~ace@2a02:fe0:c131:9070:bd62:6462:406f:9573 1487178382 N * DLange_dc DLange 1487179660 Q * sannes Ping timeout: 480 seconds 1487180226 J * sannes ~ace@2a02:fe0:c131:9070:6ce1:4eee:335a:2f50 1487186737 Q * sannes Ping timeout: 480 seconds 1487187310 J * sannes ~ace@2a02:fe0:c131:9070:9dd8:fe8f:31e3:df7a 1487190678 J * Aiken ~Aiken@d63f.h.jbmb.net 1487192192 M * Guy- fwiw, call_rwsem_down_read_failed() is another kernel function lots of processes are hanging in (in D state), when the load spikes happen 1487192285 M * Guy- and it takes about a minute to recover, even after I suspend the find(1) that I use to induce the load spike 1487192287 M * daniel_hozac do you have a trace of where it is invoked from? 1487192308 M * Bertl probably read() 1487192323 M * Guy- daniel_hozac: no, but I'll try to get one 1487192366 M * Bertl hey daniel_hozac, how's going? 1487192410 J * derjohn_mob ~aj@46.183.103.8 1487192436 M * daniel_hozac pretty good, been far too busy lately... how about you? 1487192479 M * Bertl same here, but everything fine so far, thanks for asking! 1487192511 M * Guy- trace: http://sprunge.us/GJcg 1487192550 M * Guy- there are quite a few zfs related calls in there 1487192809 M * Guy- but I don't see this behaviour on other boxes that also use zfs, so I'm puzzled 1487192963 M * daniel_hozac it looks starved for IO to me. 1487193006 M * Guy- daniel_hozac: iostat -x shows no activity 1487193090 M * Guy- to me it looks like the kernel is doing some internal housekeeping that's blocking all these processes 1487193109 M * Guy- like trying to free up memory oslt 1487193171 M * Guy- (but maybe not specifically that, because there is plenty of free memory) 1487193200 M * Guy- it does seem to be related to the inode cache 1487193212 M * Guy- whenever the inode cache has to grow quickly I get these load spikes 1487193231 M * Guy- (this is not just a gut feeling, it's visible in munin graphs) 1487193374 M * Guy- oh and there was a change between 3.10.16-vs2.3.6.6 and 4.1.36-vs2.3.8.5.2 that caused the inode cache to be smaller; looking at the yearly graph it seems the inode table size always hovered around 1.6 million with the old kernel but it's around 1 million with the new one (frequently even smaller) 1487193412 M * Guy- I realize this is probably a change in the vanilla kernel, but I'd be grateful for hints on how to improve the situation 1487193423 M * Guy- I already decreased vfs_cache_pressure 1487193443 M * Guy- so maybe the kernel won't shrink the inode cache so aggressively? we'll see 1487193942 Q * sannes Ping timeout: 480 seconds 1487194491 J * sannes ~ace@2a02:fe0:c131:9070:8c94:cd2f:f02a:efb9 1487196874 Q * derjohn_mob Ping timeout: 480 seconds 1487197314 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:b576:138f:bd64:aefd 1487199764 Q * Long_yanG Remote host closed the connection 1487200129 J * LongyanG ~long@15255.s.t4vps.eu 1487201035 Q * sannes Ping timeout: 480 seconds 1487201673 J * sannes ~ace@2a02:fe0:c131:9070:ac92:7639:f351:dcc