1487122675 Q * sannes Ping timeout: 480 seconds
1487123223 J * sannes ~ace@2a02:fe0:c131:9070:2daa:c503:b0b3:18c4
1487124796 J * derjohn_mobi ~aj@p2003008E6C1E49007D250562C1BF6825.dip0.t-ipconnect.de
1487125225 Q * derjohn_mob Ping timeout: 480 seconds
1487128940 J * fstd_ ~fstd@x4e301437.dyn.telefonica.de
1487129405 Q * fstd Ping timeout: 480 seconds
1487129405 N * fstd_ fstd
1487129765 Q * sannes Ping timeout: 480 seconds
1487130303 J * sannes ~ace@2a02:fe0:c131:9070:c5f:93de:b455:6e6c
1487136905 Q * sannes Ping timeout: 480 seconds
1487137491 J * sannes ~ace@2a02:fe0:c131:9070:e56a:84f0:51aa:293a
1487141175 Q * derjohn_mobi Ping timeout: 480 seconds
1487143800 J * derjohn_mobi ~aj@46.183.103.17
1487144065 Q * sannes Ping timeout: 480 seconds
1487144711 J * sannes ~ace@2a02:fe0:c131:9070:88e6:a58e:fbba:d82f
1487145780 J * Ghislain ~ghislain@adsl1.aqueos.com
1487147324 M * Bertl_oO  off to bed now ... have a good one everyone!
1487147326 N * Bertl_oO Bertl_zZ
1487148428 J * nikolay ~nikolay@HOST.255.3.ixos.de
1487149135 Q * derjohn_mobi Ping timeout: 480 seconds
1487150456 J * derjohn_mobi ~aj@2a01:598:a841:e231:b5e7:3a22:317a:f44b
1487151185 Q * derjohn_mobi Ping timeout: 480 seconds
1487151195 Q * sannes Ping timeout: 480 seconds
1487151690 J * sannes ~ace@2a02:fe0:c131:9070:79d3:e08c:f6ce:4822
1487154054 Q * Ghislain Quit: Leaving.
1487154134 J * Ghislain ~ghislain@adsl1.aqueos.com
1487154661 Q * Ghislain Quit: Leaving.
1487155196 J * Ghislain ~ghislain@adsl1.aqueos.com
1487158263 Q * sannes Ping timeout: 480 seconds
1487158855 J * sannes ~ace@2a02:fe0:c131:9070:7db9:7a67:fe49:6ff8
1487160823 J * derjohn_mob ~aj@tmo-122-213.customers.d1-online.com
1487163825 Q * Aiken Remote host closed the connection
1487164268 Q * derjohn_mob Ping timeout: 480 seconds
1487165368 Q * sannes Ping timeout: 480 seconds
1487166032 J * sannes ~ace@2a02:fe0:c131:9070:a019:4986:7bb0:4a00
1487169318 J * derjohn_mob ~aj@tmo-122-213.customers.d1-online.com
1487169660 N * Bertl_zZ Bertl
1487169662 M * Bertl  morning folks!
1487169740 M * nikolay afternoon Bertl 
1487169825 M * Bertl how's going?
1487169831 M * Guy- my load spikes are back :)
1487169858 M * Guy- but I don't have time to investigate right now
1487169866 M * Guy- (also, hello and all)
1487169903 M * Bertl do you have swap space configured? if so, how much compared to the memory and what is your swappiness?
1487169914 M * nikolay +11 degrees, weather is nice...
1487170103 M * Bertl can't complain here either ... sunny and relatively warm
1487170586 M * Guy- KiB Mem : 32729512 total,  5011888 free, 25972360 used,  1745264 buff/cache
1487170586 M * Guy- KiB Swap:  3779580 total,  3687372 free,    92208 used.  6308688 avail Mem
1487170599 M * Guy- swappiness is 60
1487170616 M * Guy- according to munin there is very little (if any) swapping going on, also during load spikes
1487170622 M * Guy- swap is on SSDs
1487170641 M * Guy- kernel is 4.1.36-vs2.3.8.5.2, fwiw
1487170682 M * Bertl yeah, doesn't look like the swap space is used much
1487170740 M * Guy- when I have more time I'll reproduce the problem interactively and try to see what's going on
1487170745 M * Bertl but the buffer/cache memory is quite small which suggests that the memory is mostly occupied
1487170771 M * Guy- I think it's mostly used by zfs for caching, which doesn't show up in buff/cache (iiuc)
1487170855 M * Guy- but (again according to munin) minimum unused is still over 3GB
1487170871 M * Guy- so I don't think there is much memory pressure
1487170894 M * Bertl well, it might mean that the inode caches get purged quite often
1487170921 M * Bertl which is not bad per se, but might explain the sudden activity when you traverse the filesystem(s)
1487170982 N * DLange DLange_dc
1487170989 M * Guy- yes, that's plausible
1487171015 M * Bertl I would suggest to have a cron job like 10 or 15 minutes before the rsync, which just iterates over the files to be synced
1487171021 M * Guy- inode cache is slowly and unsteadily declining until first load spike, then peaking along with load
1487171025 M * Bertl then see if that moves the spike :)
1487171066 M * Guy- then it scales back a bit (to about 50% of max), and when the next backup job runs it goes up again (along with the load)
1487171088 M * Guy- I have looked at this before but haven't really found a good way of tuning the inode cache
1487171123 Q * derjohn_mob Ping timeout: 480 seconds
1487171566 M * Guy- maybe I should decrease vfs_cache_pressure?
1487171986 M * Bertl maybe ... really depends on what your other vm settings are (min free, watermark scale, etc)
1487172080 M * Guy- defaults; I didn't touch anything
1487172511 Q * nikolay Quit: Leaving
1487172543 Q * sannes Ping timeout: 480 seconds
1487173206 J * sannes ~ace@2a02:fe0:c131:9070:bd62:6462:406f:9573
1487178382 N * DLange_dc DLange
1487179660 Q * sannes Ping timeout: 480 seconds
1487180226 J * sannes ~ace@2a02:fe0:c131:9070:6ce1:4eee:335a:2f50
1487186737 Q * sannes Ping timeout: 480 seconds
1487187310 J * sannes ~ace@2a02:fe0:c131:9070:9dd8:fe8f:31e3:df7a
1487190678 J * Aiken ~Aiken@d63f.h.jbmb.net
1487192192 M * Guy- fwiw, call_rwsem_down_read_failed() is another kernel function lots of processes are hanging in (in D state), when the load spikes happen
1487192285 M * Guy- and it takes about a minute to recover, even after I suspend the find(1) that I use to induce the load spike
1487192287 M * daniel_hozac do you have a trace of where it is invoked from?
1487192308 M * Bertl probably read()
1487192323 M * Guy- daniel_hozac: no, but I'll try to get one
1487192366 M * Bertl hey daniel_hozac, how's going?
1487192410 J * derjohn_mob ~aj@46.183.103.8
1487192436 M * daniel_hozac pretty good, been far too busy lately... how about you?
1487192479 M * Bertl same here, but everything fine so far, thanks for asking!
1487192511 M * Guy- trace: http://sprunge.us/GJcg 
1487192550 M * Guy- there are quite a few zfs related calls in there
1487192809 M * Guy- but I don't see this behaviour on other boxes that also use zfs, so I'm puzzled
1487192963 M * daniel_hozac it looks starved for IO to me.
1487193006 M * Guy- daniel_hozac: iostat -x shows no activity
1487193090 M * Guy- to me it looks like the kernel is doing some internal housekeeping that's blocking all these processes
1487193109 M * Guy- like trying to free up memory oslt
1487193171 M * Guy- (but maybe not specifically that, because there is plenty of free memory)
1487193200 M * Guy- it does seem to be related to the inode cache
1487193212 M * Guy- whenever the inode cache has to grow quickly I get these load spikes
1487193231 M * Guy- (this is not just a gut feeling, it's visible in munin graphs)
1487193374 M * Guy- oh and there was a change between 3.10.16-vs2.3.6.6 and 4.1.36-vs2.3.8.5.2 that caused the inode cache to be smaller; looking at the yearly graph it seems the inode table size always hovered around 1.6 million with the old kernel but it's around 1 million with the new one (frequently even smaller)
1487193412 M * Guy- I realize this is probably a change in the vanilla kernel, but I'd be grateful for hints on how to improve the situation
1487193423 M * Guy- I already decreased vfs_cache_pressure
1487193443 M * Guy- so maybe the kernel won't shrink the inode cache so aggressively? we'll see
1487193942 Q * sannes Ping timeout: 480 seconds
1487194491 J * sannes ~ace@2a02:fe0:c131:9070:8c94:cd2f:f02a:efb9
1487196874 Q * derjohn_mob Ping timeout: 480 seconds
1487197314 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:b576:138f:bd64:aefd
1487199764 Q * Long_yanG Remote host closed the connection
1487200129 J * LongyanG ~long@15255.s.t4vps.eu
1487201035 Q * sannes Ping timeout: 480 seconds
1487201673 J * sannes ~ace@2a02:fe0:c131:9070:ac92:7639:f351:dcc