1444446485 P * undefined 1444446918 J * undefined ~undefined@00011a48.user.oftc.net 1444449505 N * Bertl_zZ Bertl 1444449506 M * Bertl back now ... 1444456108 J * Ghislain ~aqueos@adsl1.aqueos.com 1444472889 M * Ghislain hi 1444472906 M * Ghislain i go the issue on all the server i put on the latest patch it seems 1444472929 M * Ghislain daniel_hozac: which sysreq will do ? 1444473711 M * Bertl sysrq-t might work (show tasks) 1444473745 M * Bertl sysrq-w might be better/shorter if the task is in d state 1444473780 M * daniel_hozac from reading the code, it looks like it only prints comm, and not the command line. 1444473848 M * Ghislain the t give this http://pastebin.com/raw.php?i=gwc4TK0f 1444473854 M * Ghislain 26042 is the vcontext one 1444473856 M * Bertl IIRC, the rest of the system is working as expected 1444473888 M * Bertl can you try to cat something in the processes /proc entry? 1444473908 M * Ghislain yes but if i hit the /proc/26042 all stop 1444473932 M * Bertl try the following: 1444473951 M * Ghislain yes the rest seems ok but any process that touches /proc just hangs as soon as it hit the process 1444473962 M * Bertl cat /proc/26042/stat 1444473975 M * Bertl do not use any commandline expansion (tab) or so 1444474036 M * Ghislain 26042 (vcontext) D 26041 26041 26031 0 -1 4195584 5654 7375 4 1 3 4 0 0 20 0 1 0 121832860 8441856 2033 18446744073709551615 4194304 4230024 140735967868784 140735959486472 4207263 256 0 0 134283266 0 0 0 17 3 0 0 2 0 0 6328304 6328396 36065280 1444474056 M * Bertl excellent, now let's try: 1444474062 M * Bertl cat /proc/26042/cmdline 1444474127 M * Ghislain this one crash 1444474147 M * Bertl and crash means? 1444474164 M * Ghislain http://pastebin.com/raw.php?i=Kf8tXrjz 1444474171 M * Ghislain no return to the shell 1444474178 M * Ghislain stuck 1444474209 M * Bertl anything in dmesg? 1444474219 M * Bertl try to raise the log level to at least 7 1444474226 M * Ghislain seems to happen a lot on the machine i upgraded to 3.4.108+ 1444474243 M * daniel_hozac from what to what did you upgrade? 1444474264 M * Ghislain 3.4.106 mostly 1444474280 M * Bertl and that was fine? 1444474295 M * Ghislain yes but since it seems i got thos lock in 1444474317 M * Ghislain i do not remember steppin in one lock like this before 1444474321 M * Ghislain but could be wrong 1444474329 M * Ghislain ok log to 7 done 1444474371 M * Bertl now try the hanging 'cat' again and see if something gets logged 1444474403 M * Ghislain ok we will have to wait i done a silly 't' and it is still writting like crazy on the console :( 1444474408 M * daniel_hozac after that, if you do sysrq-t, what does it say for the cat process? 1444474408 A * Bertl hopes PETA is not reading up on the logs 8-) 1444474414 M * daniel_hozac lol 1444474452 M * Ghislain PETA ? 1444474467 M * Bertl http://www.peta.org/international/ 1444474478 P * undefined 1444474522 M * Ghislain i got no log output and the same lock 1444474555 M * Bertl okay, so probably the entry is "just" locked in the kernel and thus userspace hangs waiting for a release 1444474577 M * Bertl it might even be a different process holding the lock now 1444474615 M * Ghislain the only one are ssh rsync vanmespace and sudo i launch no other 1444474656 M * Ghislain oh yes i launch bash and mysqldump but it does not seems to be the one locked logs show that the mysqldump finished okay 1444474709 M * daniel_hozac what is spawning the vcontext? are you do doing vserver ... exec? 1444474809 M * Ghislain i ssh to a server inside a guest to a host sudo /usr/sbin/vnamespace -e guest01 /usr/sbin/chroot "/vservers/guest01 rsync 1444474855 M * Ghislain i see only this can trigegr the vcontext i guess 1444474857 M * daniel_hozac well, vnamespace doesn't call vcontext. 1444474887 M * Ghislain so the only other one is the mysqldump 1444474895 M * Ghislain sudo /usr/sbin/vserver guest01 exec /usr/local/.aqadmin/bin/mysqlbackup-bases.sh 1444474904 M * daniel_hozac yes, that will call vcontext. 1444474960 M * daniel_hozac and you see that script complete? 1444475030 M * Ghislain at least the dump yes 1444475054 M * Ghislain but the scripts clea n up after and call the monitoring to tell 'done' 1444475072 M * Ghislain so i could have a zabbix client dead here 1444475198 M * Ghislain i just got the PETA thing now...:) 1444475222 M * daniel_hozac well, vcontext shouldn't be running after the script starts executing. it execs its argument. 1444475293 M * Ghislain the backup server connect to the machine, then launch the dump, then rsync the files basically this is all what is done 1444475305 M * Ghislain what thing trigger vcontext apart vserver... exec ? 1444475319 M * Bertl aren't there two processes on an enter/exec, one _outside_ and one _inside_ the guest? 1444475320 M * daniel_hozac vserver stop, start, exec 1444475325 M * daniel_hozac only on enter. 1444475332 M * daniel_hozac for the tty forwarding 1444475360 M * Bertl ah, okay, so exec is a clean chcontext plus overload 1444475375 M * Ghislain the dump is done via exec the rsync is done via vnamespace 1444475403 M * Ghislain and they are supposed to run one after the other 1444475437 M * Bertl would be interesting to have some "debug" output at different stages inside the backup.sh 1444475457 M * daniel_hozac do you have several guests where this is run against? 1444475463 M * daniel_hozac or is it just one, and that one succeeds? 1444475473 M * Bertl maybe run it with -x and exec 2>/tmp/some.log 1444475478 M * Ghislain i have this backup on all my guest 1444475497 M * daniel_hozac ah, so there may be some guest where it doesn't succeed? 1444475575 M * daniel_hozac would be interesting to know if it's always the same guest, or if it is random. 1444475609 J * jrklein_ ~cloud@proxy.dnihost.net 1444475625 M * Ghislain i got the issue on 3 servers that i can see, all are ones that i rebooted with 3.4.108+ 1444475701 M * daniel_hozac how reproducible is it? does it happen every time you try to take backups? 1444475716 M * daniel_hozac on those hosts. 1444475719 M * Bertl can you run a diff between your 3.4.106 and 3.4.108+ kernel? 1444475745 M * Ghislain they are vanilla kernel+vserver patch and the very same .config 1444475768 M * Bertl (best make a copy of the build dir, then do a make clean; make mrproper; and then runn a diff -NurpP --minimal) 1444475773 Q * jrklein Ping timeout: 480 seconds 1444475913 Q * Aiken Remote host closed the connection 1444475915 M * Ghislain this give a never ending list of lines 1444475941 M * Ghislain 15602320 lines 1444475948 M * Bertl yeah, dump them into a file, it is a patch :) 1444475960 M * daniel_hozac that seems excessively large 1444475983 M * Bertl probably not properly cleaned up, i.e. some garbage left over 1444476000 M * Ghislain i have done it to 3.4.109 s the 108 dir was destroyed some time ago by a mistake 1444476051 M * Ghislain sry got a cut/paste issue the result is 55900 lines 1444476138 M * Ghislain you need the result ? 1444476319 M * Bertl yeah, please upload it somewhere 1444476338 M * Bertl but it needs to be between the old kernel and a kernel which shows the observed effect 1444476365 M * Bertl i.e. no point to do a diff between 106 and 109 if 109 doesn't exhibit the problem 1444476438 M * Ghislain i test on 3.4.109 and i tell you 1444476473 M * Ghislain i just got back my test machine so i think it is time to screw over again no ? 1444476474 M * Bertl okay, maybe increase the number of backup runs just to increase probability for something to go wrong :) 1444476506 M * Ghislain i will first try to do the same operation the backup does manualy inside the host, thne try from the backup server 1444478050 M * daniel_hozac did your sysrq-t output include locks held? 1444478361 Q * fstd Remote host closed the connection 1444478372 J * fstd ~fstd@xdsl-84-44-220-196.netcologne.de 1444478562 M * Ghislain it listed all the processes like the other time 1444478693 M * Ghislain if i read the dump well 1444478719 M * Ghislain the lock is vcontext that launch tcsh ( the login shell) 1444478768 M * Ghislain sudo S 0000000000000006 0 26041 1 => vcontext D 0000000000000002 0 26042 26041 => tcsh x ffff8808555ee330 0 26082 26042 1444478768 M * Ghislain then no trace of a 26082 pid 1444479105 M * Bertl off to bed now ... have a good one everyone! 1444479136 N * Bertl Bertl_zZ 1444481692 J * derjohn_mob ~aj@x4db053bb.dyn.telefonica.de 1444483628 M * Ghislain i rebooted backup server and guest server in 3.4.109 and for now cannot reproduce the pb 1444484007 M * Ghislain i try a test with including timeouts in the mysqldump then i will retry with 3.4.108 1444484033 M * Ghislain if this is a mainline issue in 3.4.108 then the problem is solved 1444485900 Q * fstd Remote host closed the connection 1444485907 J * fstd ~fstd@xdsl-84-44-220-196.netcologne.de 1444486678 J * undefined ~undefined@00011a48.user.oftc.net 1444487748 M * Ghislain i will let the test server run some couple of days and see 1444490960 J * bonbons ~bonbons@2001:a18:204:4d01:b941:9ba4:37fb:97d4 1444495870 M * daniel_hozac Ghislain: are you still able to reproduce the invalid shell issue? i haven't been able to. was that also on 3.4? 1444495921 M * Ghislain 99% of my machines are 3.4 :) 1444495930 M * Ghislain i do not remember what was it ? 1444495962 M * Ghislain i remember an issue when the default shell was not installed in the guest, is it that one ? 1444495966 M * daniel_hozac yeah 1444495979 M * Ghislain easy to try let me see 1444496015 M * Ghislain hum this time i got : vlogin: execvp(): No such file or directory 1444496020 M * Ghislain you modified it ? 1444496022 M * daniel_hozac no 1444496028 M * daniel_hozac that's what i get, and what is expected. 1444496069 M * Ghislain okay i think i will definitively leave the linux world 1444496086 M * Ghislain i want to work on computers not on magical pony that behave mysteriously 1444496126 M * Ghislain that's not as if i had not all in puppet with the same kernel and config everywhere 1444496155 M * Ghislain there has been a new vserver util version since then if i recall 1444496204 P * undefined 1444496446 M * daniel_hozac hmm, okay 1444496465 M * daniel_hozac yeah, computers aren't good for sanity 1444498772 J * undefined ~undefined@00011a48.user.oftc.net 1444499798 Q * derjohn_mob Ping timeout: 480 seconds 1444505174 J * Aiken ~Aiken@d63f.h.jbmb.net 1444514955 Q * bonbons Quit: Leaving 1444514975 M * bXi would there be interest in a simple vserver management webinterface thats not outdated or nonfree 1444515957 N * Bertl_zZ Bertl 1444515977 M * Bertl bXi: I guess so, at least folks always ask for it on the channel 1444516232 M * bXi cool i've started work on something already 1444516476 M * bXi probably gonna keep it at multiple host servers and let it index the vservers from there and just show info like uptime/cpu%/mem% maybe I/O and basic start/restart stuff (and perhaps some creation type page) 1444516609 M * bXi but i'll have to get back on some of these to figure out how to read all of the required info 1444516648 M * Bertl IIRC, OpenVCP was quite popular some time, so if not already done, you might get some inspirations from there 1444516667 M * bXi i know that i don't like the iptables requirement :P 1444516757 M * bXi but i havent been able to run it properly 1444516766 M * bXi the daemon won't run for whatever reason 1444516776 M * Bertl well, it was written around 2004 IIRC 1444516777 M * bXi it'll start and create a zombie process 1444516799 M * Bertl so I guess a lot has changed since 1444516811 M * bXi the only thing that might be a problem for users is tha ti'm using nodejs 1444516890 M * bXi but on the other hand its less heavy than running a full on webserver 1444521561 Q * fstd Remote host closed the connection 1444521573 J * fstd ~fstd@xdsl-87-78-80-42.netcologne.de