1347410340 Q * nakacya Remote host closed the connection 1347410363 J * nakacya ~nakacya@KD118152083243.ppp-bb.dion.ne.jp 1347410847 Q * nakacya Ping timeout: 480 seconds 1347413898 J * nakacya ~nakacya@169.208.138.210.vmobile.jp 1347416401 J * clopez ~clopez@17.28.165.83.dynamic.mundo-r.com 1347417382 Q * nlm Read error: Connection reset by peer 1347417405 J * nlm ~nlm@host77.190-30-39.telecom.net.ar 1347417732 F * Bertl -b *!*nkukard@*.dsl.mweb.co.za 1347418698 Q * nakacya Read error: Connection reset by peer 1347418885 Q * Guy- Ping timeout: 480 seconds 1347419894 J * fisted ~fisted@xdsl-81-173-184-197.netcologne.de 1347419930 Q * fisted_ Read error: Operation timed out 1347420213 Q * nlm Ping timeout: 480 seconds 1347420329 Q * clopez Ping timeout: 480 seconds 1347420452 Q * AndrewLee Ping timeout: 480 seconds 1347420834 J * Guy- ~korn@elan.rulez.org 1347422539 Q * ivan` Remote host closed the connection 1347422589 J * ivan` ~ivan`@li125-242.members.linode.com 1347424249 Q * ivan` Read error: Operation timed out 1347424256 J * ivan` ~ivan`@li125-242.members.linode.com 1347424386 J * vn585 R3h@89.205.104.146 1347425365 Q * fisted Quit: leaving 1347426650 M * Bertl off to bed now ... have a good one everyone! 1347426656 N * Bertl Bertl_zZ 1347427737 J * fleischergesell ~fleischer@p5B0A0709.dip.t-dialin.net 1347428095 J * fisted ~fisted@xdsl-81-173-184-197.netcologne.de 1347429779 J * ghislain ~AQUEOS@adsl2.aqueos.com 1347430657 Q * fisted Quit: leaving 1347433298 Q * FireEgl Ping timeout: 480 seconds 1347434701 Q * hparker Read error: Operation timed out 1347435224 J * hparker ~hparker@2001:470:1f0f:32c:beae:c5ff:fe01:b647 1347437823 J * nou Chaton@causse.larzac.fr.eu.org 1347438177 Q * vn585 1347442321 J * BenG ~bengreen@cpc29-aztw23-2-0-cust105.18-1.cable.virginmedia.com 1347442950 Q * Romster Quit: Geeks shall inherit properties and methods of object earth. 1347443973 Q * BenG Quit: I Leave 1347444296 J * clopez ~clopez@fanzine.igalia.com 1347444574 M * uranus Bertl_zZ, disabling "detect hung tasks" didn't solve the problem 1347445521 J * BenG ~bengreen@cpc10-aztw24-2-0-cust114.aztw.cable.virginmedia.com 1347445870 J * kir ~kir@swsoft-msk-nat.sw.ru 1347445955 Q * fleischergesell Ping timeout: 480 seconds 1347445996 Q * BenG Quit: I Leave 1347447160 P * kir PING 1347447160 1347451146 N * Bertl_zZ Bertl 1347451151 M * Bertl morning folks! 1347451164 M * Bertl uranus: good to know! any news on the nfs part? 1347451185 M * uranus no, this will start in around 1h had to do some other things 1347451190 M * Bertl btw, did you get a trace from the machine with hung task detection disabled? 1347451195 M * uranus no 1347451196 M * uranus nothing 1347451197 F * Bertl -o Bertl 1347451208 M * uranus dmesg is "empty" 1347451219 M * Bertl but you can still logon to the system 1347451235 M * Bertl ? 1347451279 M * uranus yes 1347451292 M * uranus but ps aux did not get to the end 1347451295 M * uranus vtop is running 1347451300 M * uranus vserver-stat also 1347451307 M * Bertl does /proc/sysrq-trigger exist? 1347451333 M * uranus yes 1347451381 M * Bertl then I'd be interested in the output (dmesg) of echo T >/proc/sysrq-trigger 1347451391 M * Bertl and echo L >/proc/sysrq-trigger 1347451416 M * uranus just a moment 1347451451 M * uranus it's running 1347451469 M * Bertl okay, please upload the output somewhere 1347451585 M * uranus the output is realy realy long 1347451664 M * uranus hopefully my serial console won't crash with to many data 1347451689 M * uranus btw did you see any interesting thing in the config? 1347451767 M * Bertl nothing obvious 1347451819 J * BenG ~bengreen@cpc10-aztw24-2-0-cust114.aztw.cable.virginmedia.com 1347451875 M * uranus -T still running 1347452179 M * uranus -T ended with: BUG: unable to handle kernel NULL pointer dereference at 0000000000000030 1347452231 M * Bertl sounds interesting 1347452248 M * uranus I even cannot send breaks to trigger your -L 1347452258 M * uranus host is going in reboot 1347452287 J * nlm ~nlm@host77.190-30-39.telecom.net.ar 1347453208 Q * Aiken Remote host closed the connection 1347453429 M * uranus Bertl, the size of the -T output is 7 MB 1347453634 M * uranus Bertl, here is the output: https://docs.google.com/open?id=0B4SdzIotUw8nd3h3ZFd5LVdTU3M 1347453759 M * aurel since I'm a lazy fuck… anybody happen to have a check_mk_agent (that's a Nagios/Icinga addon) thingie for vserver-stat? Perhaps even with PerfData and PNP4Nagios templates? 1347454025 M * uranus Bertl, do you need a addr2line from one of the addresses inside the sysrq-output? 1347454159 M * Bertl the bug related trace might be good to have 1347454204 M * uranus the complete output from start of sysrq -T till the restart sequence is included 1347454215 M * uranus it at the bottom of the file 1347454806 M * Bertl yep, that one, also the awk trace around 57065.356738 please 1347454831 M * Bertl and 57065.820703 1347454971 M * WMP Bertl: hello, i have big problem 1347455044 M * Bertl let's hear ... 1347455054 M * WMP Bertl: http://pastebin.com/rnuaRnNx 1347455382 M * Bertl looks like your inode permissions are a little off 1347455421 M * Bertl most likely due to a guest with too many capabilities/devices 1347455445 M * uranus Bertl, awk: the complete trace or only the proc_cgroupstats_show? 1347455503 M * WMP Bertl: is simple method to check what guest have too big inodes? 1347455569 M * daniel_hozac Bertl: the tag mount option disappearing, was that ever resolved? or does it just not happen on recent kernels? 1347455628 M * Bertl not that I know (resolved) 1347455710 M * Bertl but I didn't receive any related reports either 1347455712 M * WMP Bertl: all guet have less inodes_used that inodes_total 1347455752 M * WMP Bertl: i check this this may: vdlimit --xid GUEST /home/vservers/ 1347455764 M * Bertl check guests with xid 40011 and 40015 and 40019 1347455770 M * uranus Bertl, the IP points to kernel/lockdep.c:479 (lockdep_on+0x20/0x20) @57265.398984 1347455792 M * daniel_hozac WMP: it looks more like you have problems with guests accessing eachother's files. did you change tags/xids? 1347455894 M * WMP Bertl: i havent guest with xid 40015 and 40019 1347455930 M * Bertl daniel_hozac: is 40000+ dynamically assigned at creation? 1347455947 M * WMP daniel_hozac: no, after move files i execute this script: http://linux-vserver.org/Disk_Limits_and_Quota#Controling_disk_limits and start vserver 1347456002 M * daniel_hozac yeah 1347456033 M * Bertl WMP: and 'move files' means? 1347456064 M * WMP Bertl: i move vservers from other machine 1347456096 M * Bertl okay, then most likely the tags are wrong somehow 1347456148 M * WMP so what i shout doing now? 1347456203 M * Bertl first, I'd suggest to assign fixed xid/tag values to each of your guests 1347456287 Q * ensc|w Remote host closed the connection 1347456295 J * ensc|w ~ensc@www.sigma-chemnitz.de 1347456336 M * Bertl then you probably want to re-tag each guest with the new tag 1347456342 M * WMP what mean fixed xid/tag ? 1347456343 M * Bertl after that, everything should be fine 1347456347 M * uranus Bertl, on the host without nfs i have a new wonderfull thing :) 1347456347 M * uranus http://paste.linux-vserver.org/22936 1347456378 M * uranus guest is not stoppable 1347456392 M * WMP Bertl: http://pastebin.com/MebAwxwR 1347456400 M * uranus but at the moment no state d processes 1347456444 M * Bertl uranus: hmm, in many of your guests, you are running rsyslogd with a config to read the kernel ring buffer 1347456449 M * uranus load of this guest: 176.00 176.20 160.24 1347456482 M * Bertl would it be possible to adjust the rsyslogd config in each of them for a test? 1347456507 M * uranus what should be changed? 1347456687 M * Bertl there is a config part related to /proc/kmsg 1347456715 M * Bertl (depends on the rsyslogd version and distribution where it actually resides) 1347456784 M * Bertl on recent rsyslogd it is called imklog (module) 1347456823 M * uranus what let you guess that this is causing that issue? 1347456857 M * Bertl well, it shows as defunct in the guest shutdown, and it appears very often in the task traces 1347456881 M * Bertl and I just checked the code we are using and it might be related to the issue 1347456896 M * uranus that will be a problem, because if somone changes the logfile my kernel crashes 1347456920 M * uranus but well in this test case i let the configs be changed 1347456927 M * Bertl well, I'm not saying that the solution is to remove the module :) 1347456936 M * Bertl it's just for testing the theory 1347456942 M * uranus k 1347456966 M * Bertl but you have to make sure that it is disabled in all guests 1347456973 M * WMP Bertl: http://pastebin.com/13Dtw7Gn 1347457012 M * uranus Bertl, now i also have state d processes (without the nfs mounts) 1347457073 M * Bertl WMP: unfortunately, it's not that simple with your current setup 1347457075 M * uranus should i collect a sysrq L dump? 1347457085 M * Bertl won't hurt 1347457131 M * Bertl WMP: see that vxW: [�vcontext�,32230:#40011|40002|40002] denied ... 1347457146 M * Bertl this means that the guest with xid 40011 uses a tag value of 40002 1347457159 M * WMP wtf? how to debug this? 1347457186 M * WMP (i havent 40011) 1347457191 M * Bertl that's why I advised to give all the guests a fixed xid/tag 1347457222 M * Bertl but you need to stop the guest for this to work, then add the proper config entries, re-tag the files and start the guest new 1347457235 M * WMP but what mean fixed xid/tag ? 1347457250 M * uranus Bertl, now sysrq L doesn't respond :/ 1347457368 M * Bertl WMP: it just means: pick a number for each guest, e.g. 1001, 1002, ... and assign it for xid and tag 1347457387 M * uranus WMP, echo "1000" > /etc/vservers//context 1347457410 M * uranus chxid -c `cat /etc/vservers//context` -R -- /etc/vservers//vdir/. 1347457423 M * uranus next geust gets 1001 1347457425 M * uranus and so on 1347457429 M * WMP ok 1347457438 M * Bertl make sure to stop the guest first 1347457450 M * Bertl but you can process one at a time 1347457456 M * WMP Bertl: i shoud stop all guest on only this what i fixing? 1347457462 M * WMP ok 1347457924 M * WMP Bertl: http://pastebin.com/Kd34yknk 1347457989 M * uranus Bertl, in http://paste.linux-vserver.org/22936 the strange thing in my opinion is also that all processes inside the guest are in State R 1347458225 M * Bertl well, R is fine if there is work to be done but no CPU cycles available (for whatever reason) 1347458242 M * uranus k 1347458854 M * Bertl have to take a short nap ... bbl 1347458863 N * Bertl Bertl_zZ 1347465160 Q * BenG Quit: I Leave 1347465182 J * BenG ~bengreen@cpc10-aztw24-2-0-cust114.aztw.cable.virginmedia.com 1347465871 J * nakacya ~nakacya@KD118152083243.ppp-bb.dion.ne.jp 1347466008 J * bonbons ~bonbons@2001:960:7ab:0:bc30:bddc:77f0:3c93 1347471646 Q * BenG Quit: I Leave 1347471796 N * Bertl_zZ Bertl 1347471800 M * Bertl back now .. 1347472647 Q * clopez Ping timeout: 480 seconds 1347472880 J * fleischergesell ~fleischer@p5B0A3DCA.dip.t-dialin.net 1347474428 Q * Guy- Ping timeout: 480 seconds 1347475122 J * fisted ~fisted@xdsl-87-78-231-19.netcologne.de 1347475142 J * Guy- ~korn@elan.rulez.org 1347475307 J * nkukard ~nkukard@41-133-138-36.dsl.mweb.co.za 1347476090 Q * fisted Read error: Connection reset by peer 1347476402 N * Bertl Bertl_oO 1347476475 J * fisted ~fisted@xdsl-87-78-231-19.netcologne.de 1347476740 N * ensc Guest6841 1347476750 J * ensc ~irc-ensc@p4FFCF039.dip.t-dialin.net 1347477158 Q * Guest6841 Ping timeout: 480 seconds 1347479142 Q * fisted Remote host closed the connection 1347479402 J * fisted ~fisted@xdsl-87-78-231-19.netcologne.de 1347479593 J * hijacker ~hijacker@cable-84-43-134-121.mnet.bg 1347479932 J * clopez ~clopez@17.28.165.83.dynamic.mundo-r.com 1347481580 Q * fleischergesell Ping timeout: 480 seconds 1347483101 Q * bonbons Quit: Leaving 1347484087 Q * hijacker Quit: Leaving 1347485351 Q * fisted Quit: leaving 1347485816 J * fisted ~fisted@xdsl-87-78-231-19.netcologne.de 1347486886 J * Aiken ~Aiken@2001:44b8:2168:1000:21f:d0ff:fed6:d63f 1347488384 Q * bergerx_ Quit: Leaving 1347488810 Q * ghislain Quit: Leaving. 1347488880 Q * fisted Quit: brb 1347489251 Q * clopez Ping timeout: 480 seconds 1347490393 J * Romster ~romster@202.168.100.149.dynamic.rev.eftel.com 1347491164 Q * nlm Remote host closed the connection 1347491349 J * nlm ~nlm@host77.190-30-39.telecom.net.ar 1347492548 Q * nlm Ping timeout: 480 seconds