1444360776 J * ripstop ~knoppix@199-7-158-56.eng.wind.ca 1444366568 Q * ripstop Ping timeout: 480 seconds 1444369887 J * derjohn_mob ~aj@tmo-111-131.customers.d1-online.com 1444370783 M * Bertl off to bed now ... have a good one everyone! 1444370784 N * Bertl Bertl_zZ 1444373224 J * Ghislain ~aqueos@adsl1.aqueos.com 1444375235 J * Gremble ~Gremble@cpc29-aztw22-2-0-cust128.18-1.cable.virginm.net 1444376375 Q * Gremble Quit: I Leave 1444376688 M * Ghislain this is the second time i get a process locked in unknow land that completly makes impossible to do any top/ps/lsof etc.. on the machine ( it hang the process indefinitively if i do) 1444376709 M * Ghislain the only thing i have in the logs (cannot be sure this is related but got it each time) is vcontext[10837]: segfault at 7fff713b4ff8 ip 000000000040329f sp 00007fff713b5000 error 6 in vcontext[400000+9000] 1444379004 Q * derjohn_mob Read error: Connection reset by peer 1444379302 J * derjohn_mob ~aj@tmo-111-131.customers.d1-online.com 1444380162 Q * derjohn_mob Ping timeout: 480 seconds 1444381322 J * derjohn_mob ~aj@185.7.33.128 1444386599 J * fstd_ ~fstd@xdsl-84-44-220-50.netcologne.de 1444386781 Q * fstd Read error: Connection reset by peer 1444386782 N * fstd_ fstd 1444390727 N * Bertl_zZ Bertl 1444390728 M * Bertl morning folks! 1444390752 M * Bertl Ghislain: very interesting ... 1444390763 M * Bertl daniel_hozac: any idea what could segfault in vcontext? 1444391074 Q * Aiken Remote host closed the connection 1444391853 Q * derjohn_mob Ping timeout: 480 seconds 1444391960 Q * fstd Remote host closed the connection 1444391971 J * fstd ~fstd@xdsl-87-78-143-156.netcologne.de 1444392202 M * Ghislain the ps has been hang since 5h now it do not react to ctrl z nor crtl D or anything. Cannot really do any inquiriy here has the basics process tools just hang 1444392361 M * Bertl why was the vcontext executed? 1444392497 M * Ghislain no idea what vcontext does at all 1444392537 M * Ghislain if vnamespace do a vcontext could be the backup that do a rsync in the guest context 1444392562 M * Ghislain the hour would correspond to the backups hours 1444392595 M * Bertl so you change into the guest to make a backup? 1444392613 M * Ghislain yes to have access to the guest diskS 1444392619 M * Ghislain i mean partitions 1444392655 M * Ghislain and launch myslq dumps etc... 1444392686 M * Bertl I see, and somehow that causes a segfault in vcontext 1444392717 M * Bertl the memory on the host system is verified? 1444392734 M * Bertl i.e. either via ECC or by memory tests 1444392796 M * Ghislain it has been rebooted some days ago and the bios memory test did not show any issues 1444392822 M * Bertl well, the bios memory test seldom shows anything, except for the memory size :) 1444392824 M * Ghislain it is not ecc 1444392872 M * Bertl do the individual incidents show similar or even identical addresses for the segfault? 1444392881 M * Bertl can you collect and upload them somewhere? 1444392999 M * Ghislain http://pastebin.com/raw.php?i=fESrSzYD 1444393262 M * Ghislain when i strace the ps auxwf that hang it stop at : 1444393262 M * Ghislain open("/proc/10837/cmdline", O_RDONLY) = 6 1444393262 M * Ghislain read(6, 1444393276 M * Ghislain read(6, "Name:\tvcontext\nState:\tD (disk sleep) 1444393313 M * Ghislain i mean it read /proc/xxx/status that say read(6, "Name:\tvcontext\nState:\tD (disk sleep)... and then hag on open("/proc/10837/cmdline", O_RDONLY) = 6 read(6, 1444393608 J * derjohn_mob ~aj@185.7.33.128 1444393838 M * Ghislain don't know if this helps 1444395978 M * Ghislain do the vserver patch change anything in the oom killer event remotly ? 1444396000 M * Bertl not in recent kernels 1444396033 M * Ghislain i bet you mean in post memcgrou kernels like 3.4.108 1444396828 M * Bertl yep 1444401021 Q * derjohn_mob Ping timeout: 480 seconds 1444401419 J * Gremble ~Gremble@cpc29-aztw22-2-0-cust128.18-1.cable.virginm.net 1444403564 M * Ghislain this guest has no cgroup related limits on the disk 1444404483 M * Ghislain this seems a lot like what is described here https://rachelbythebay.com/w/2014/10/27/ps/ 1444404497 M * Ghislain this is why i wondered if any limits were hit than locks it 1444404532 M * Ghislain i have no cgroup limit apart cpuset and swapiness , no vserver limits set only defaults 1444405878 J * bonbons ~bonbons@2001:a18:204:4d01:81db:98ac:8d89:4a5d 1444409579 M * daniel_hozac i don't know why vcontext would segfault. no other kernel messages aside from that? 1444409638 M * daniel_hozac looks like the kernel is unhappy about the process though. 1444410856 Q * Jb_boin Ping timeout: 480 seconds 1444411191 J * Jb_boin ~dedior@proxad.eu 1444415885 M * Bertl yes, but I think that might be the result of the segfault 1444415912 M * Bertl i.e. something causes the segfault, which seems to happen at similar locations in the executable 1444415947 M * Bertl and once the process has segfaulted, accessing (proc) data about that specific process locks up the kernel (or at least ps) 1444415970 M * Bertl (but I might be wrong here) 1444416105 M * daniel_hozac can you get the command line through magic sysrq? 1444416138 M * daniel_hozac that might tell us where it is coming from at least. 1444416435 Q * Ghislain Quit: Leaving. 1444418142 Q * Gremble Quit: I Leave 1444418950 M * Bertl hmm, might be in the task dump 1444418972 M * Bertl but if not, we can probably add it, if the issue is somewhat reproduceable 1444422913 J * Aiken ~Aiken@d63f.h.jbmb.net 1444423491 M * Bertl off for a nap ... bbl 1444423493 N * Bertl Bertl_zZ 1444426020 Q * bonbons Quit: Leaving 1444435160 Q * fstd Remote host closed the connection 1444435174 J * fstd ~fstd@xdsl-84-44-227-66.netcologne.de