1264032554 Q * dna_ Quit: Verlassend 1264035450 J * balbir ~balbir@122.172.55.187 1264035701 J * FireEgl FireEgl@173-16-9-10.client.mchsi.com 1264039293 Q * xdr_ Read error: Connection reset by peer 1264039523 N * Bertl_oO Bertl 1264039527 M * Bertl back now .. 1264042640 Q * hparker Quit: Quit 1264043240 Q * balbir Ping timeout: 480 seconds 1264046422 J * SauLus_ ~SauLus@d003145.adsl.hansenet.de 1264046830 Q * SauLus Ping timeout: 480 seconds 1264046830 N * SauLus_ SauLus 1264047897 Q * FloodServ Service unloaded 1264047959 J * FloodServ services@services.oftc.net 1264050749 J * matthew-1 ~ms@ns2.wellquite.org 1264050835 Q * matthew-_ Ping timeout: 480 seconds 1264055481 M * Bertl off to bed now ... have a good one everyone! 1264055485 N * Bertl Bertl_zZ 1264058268 Q * derjohn_foo Ping timeout: 480 seconds 1264059797 J * derjohn_foo ~aj@213.238.45.2 1264060248 J * ghislain ~AQUEOS@adsl2.aqueos.com 1264060392 J * niki ~niki@cpe.fe4-0-120.0x50a6de52.kdnxd4.customer.tele.dk 1264060911 J * sharkjaw ~gab@90.149.121.45 1264064458 Q * Piet Remote host closed the connection 1264067816 M * NOC|YEP vdu: lstat("config.autogenerated"): Stale NFS file handle - what is this for a problem? 1264067938 Q * FireEgl Ping timeout: 480 seconds 1264068105 M * hijacker fellows, morning 1264068130 M * hijacker why is the util-vserver latest prerelease 2864 asking for 'checking for vconfig... no' 1264068147 M * hijacker will there be a negative effect if switching that off? 1264069647 M * hijacker ah, it is in the build requirements ... 1264069659 M * hijacker Build requirements: 1264069659 M * hijacker * iproute/iproute2 1264069659 M * hijacker * iptables 1264069660 M * hijacker * vconfig/vlan (see http://www.candelatech.com/~greear/vlan.html) 1264069661 M * hijacker * ... 1264070537 J * BenG ~bengreen@cpc2-aztw22-2-0-cust521.aztw.cable.virginmedia.com 1264072420 J * barismetin ~barismeti@zanzibar.inria.fr 1264072931 J * thierryp ~thierry@zankai.inria.fr 1264072963 J * balbir ~balbir@122.172.55.187 1264072963 Q * infowolfe Ping timeout: 480 seconds 1264073116 J * infowolfe ~infowolfe@c-71-236-152-35.hsd1.or.comcast.net 1264075640 J * SubZero ~SubZero@chello089076140236.chello.pl 1264076570 Q * SubZero Ping timeout: 480 seconds 1264078182 Q * barismetin Remote host closed the connection 1264079198 Q * thierryp Remote host closed the connection 1264079231 J * thierryp ~thierry@zankai.inria.fr 1264080130 J * barismetin ~barismeti@zanzibar.inria.fr 1264080182 J * vortex7 ~gf@194.219.129.169 1264080417 Q * nkukard Remote host closed the connection 1264080889 N * vortex7 [vortex7] 1264081494 N * [vortex7] vortex7 1264082432 Q * BenG Quit: I Leave 1264082487 N * Bertl_zZ Bertl 1264082492 M * Bertl morning folks! 1264082505 M * Bertl hijacker: your distro should provide that :) 1264082812 J * fluor- ~fluor@silentio.us 1264082871 M * fluor- hey there 1264082913 M * fluor- is it possible to run a i386 vserver guest under an amd64 system? 1264082927 M * daniel_hozac yes 1264082943 M * fluor- oh, good 1264082963 M * fluor- I need to move a few vserver guests to a new server that is amd64 1264082979 M * fluor- is there anything special to do, or will the i386 binaries within the guest vserver run just like before? 1264083012 M * fback daniel_hozac: is there big difference between debian-supplied and util-vserver supplied startup scripts? 1264083328 M * Bertl fluor-: you might want to set the guest personality to 32bit 1264083358 M * daniel_hozac fback: yes, they are nothing alike at all. 1264083370 M * hijacker hey Bertl 1264083373 M * hijacker it does provide it 1264083419 M * fluor- Bertl: thanks 1264083437 M * fluor- Bertl: would that be : 1264083438 M * fluor- echo linux_32bit > /etc/vservers/$NAME/personality 1264083438 M * fluor- echo i686 > /etc/vservers/$NAME/uts/machine 1264083589 M * Bertl not sure the personality file is case insensitive .. the machine part should not be required, i.e. it should get set with the personality 1264083663 M * fback daniel_hozac: can I safely use them on debian host, or take debian sources and sync them with svn is better approach? 1264083743 M * daniel_hozac they work fine on Debian 1264083763 M * Bertl off to grab some groceries ... bbl 1264083772 N * Bertl Bertl_oO 1264084119 M * hijacker but as I do not use it, i just wanted to skip installing it 1264084148 M * hijacker seems like it is no longer an option to not have vconfig on a vserver enabled box 1264084197 M * geb http://www.kroah.com/log/linux/stable-status-01-2010.html : the 2.6.32, will be the next -stable kernel 1264084201 Q * sharkjaw Remote host closed the connection 1264084403 M * daniel_hozac hijacker: never has been. 1264084454 M * daniel_hozac note though that you don't have to build it on the box you're installing it on. 1264084615 M * hijacker i never had it installed, and i always compiled util-vserver on that same box 1264084626 M * hijacker it is the first release that complained about it 1264084637 M * fback daniel_hozac: hm... otoh, sid's r2864 should be recent enough, no? 1264084642 M * hijacker thus I was interested if I can go without installing it 1264084881 M * daniel_hozac every release of util-vserver has required it. 1264084973 M * daniel_hozac well, from the last 5 years. 1264085157 J * nkukard ~nkukard@41.145.124.234 1264085234 M * hijacker daniel_hozac, i am with util-vserver: 0.30.216-pre2827; Jan 13 2009, 23:55:09 1264085240 M * hijacker and it did not 1264085250 M * hijacker very strange indeed 1264085258 M * hijacker anyways 1264085271 M * hijacker i am going to live with it and install the package 1264085342 N * vortex7 [vortex7] 1264085376 M * daniel_hozac it did. 1264085404 M * daniel_hozac the last version to not require it was 0.30.196. 1264085446 M * hijacker lool 1264085450 M * hijacker daniel_hozac, you're right 1264085460 M * hijacker just checked the config.log of the current build 1264085463 M * hijacker and it had it 1264085528 M * hijacker apt-get autoremove might have removed it if it was not used by any package on the host 1264085538 M * hijacker thanks for the clarification daniel_hozac 1264087804 Q * dowdle Remote host closed the connection 1264087967 J * dowdle ~dowdle@scott.coe.montana.edu 1264089748 J * orzel ~orzel@berlioz.ethernet.freehackers.org 1264089808 M * orzel Hello. Any idea why 'date' in the guest is one hour before the 'date' resul in the host. The one in the host is the right one (even synced by ntp). Of course my localtime is utc+1 and i guess this is somehow related, though 'date' is just about a syscall, isn't it ? 1264089886 M * geb timezone problem 1264089911 M * geb you have to run tzconfig on the guests 1264089918 M * geb or they will be in utc 1264090229 Q * thierryp Remote host closed the connection 1264090256 J * thierryp ~thierry@zankai.inria.fr 1264092052 M * orzel geb: yop, found, thanks! 1264092109 Q * derjohn_foo Ping timeout: 480 seconds 1264092479 J * vserverUser ~vServer_U@host90-152-15-246.ipv4.regusnet.com 1264093076 J * FireEgl FireEgl@173-16-9-10.client.mchsi.com 1264093368 Q * thierryp Quit: ciao folks 1264094156 N * Bertl_oO Bertl 1264094160 M * Bertl back now ... 1264094837 J * hparker ~hparker@linux.homershut.net 1264095155 Q * barismetin Quit: Leaving... 1264095612 J * SubZero ~SubZero@chello089076140236.chello.pl 1264097100 J * bonbons ~bonbons@2001:960:7ab:0:2c0:9fff:fe2d:39d 1264097481 M * fzylogic Bertl: Did you see my message about the new memeater Tuesday night? 1264097482 M * fzylogic http://karategerbil.com/kernel_debug/memeater2.c 1264097506 Q * Loki|muh Remote host closed the connection 1264099867 Q * balbir Read error: Connection reset by peer 1264099926 Q * niki Quit: Leaving 1264100367 J * kbad ~kyle@ip-66-33-206-8.dreamhost.com 1264100567 J * balbir ~balbir@122.172.52.154 1264101055 Q * balbir Ping timeout: 480 seconds 1264101077 J * balbir ~balbir@122.172.52.154 1264101944 J * hijacker_ ~hijacker@87-126-142-51.btc-net.bg 1264103445 Q * nkukard Ping timeout: 480 seconds 1264103471 J * nkukard ~nkukard@196.212.73.74 1264103574 P * fluor- 1264104446 M * Bertl fzylogic: nope, obviously missed that one .. sorry for the delay 1264104455 M * fzylogic no worries 1264104469 M * fzylogic so far it's crashed everything I've thrown it at 1264104489 M * fzylogic even have a 64-bit version that'll crash a guest configured with 2G hard rss/4G soft 1264104501 M * fzylogic http://karategerbil.com/kernel_debug/m3.c 1264104533 M * fzylogic and if I'm not mistaken, the guy who started the "Exploding Load in v2.3" thread on the mailing list is hitting the same bug 1264104538 M * Bertl okay, let me see what it does on my test setup .. same arguments as before? 1264104658 M * fzylogic yeah 1264104662 M * fzylogic or the defaults will work too 1264104749 M * Bertl for my low memory limits setup as well? 1264104789 M * fzylogic well, probably not the 2MB one :) 1264104805 J * derjohn_foo ~aj@c193198.adsl.hansenet.de 1264104805 M * fzylogic should work for anything 10MB or up, I think... 1264104817 M * Bertl excellent, let me give it a try then ... 1264104935 M * Bertl hmm, all I get is a simple OOM kill ... something must be different in our setups 1264104991 M * Bertl let me update util-vserver to a more recent version just to make sure 1264105082 M * Bertl doesn't change anything either ... 1264105127 M * Bertl maybe I need a different shell 1264105136 M * fzylogic hrm...this is with 2.6.32.3? 1264105300 M * Bertl currently I tested with 2.6.31.12, but I can test with 2.6.32.4 as well 1264105324 M * fzylogic most of my testing has been 2.6.32.2 and 2.6.31.[56] 1264105331 M * Bertl the interesting part is, something 'unusual' happens, but not what you describe 1264105353 M * Bertl i.e. the OOM killer is invoked, and it seems that the guest processes remain in some kind of stopped state 1264105373 M * fzylogic that is what I'm seeing :) 1264105378 M * Bertl investigating that right now, might be realted to the vshelper not acting upon 1264105383 M * fzylogic if you turn on soft lockup detection, you'll see one core get locked up 1264105387 M * Bertl *related 1264105397 M * Bertl I have the soft lockup detection enabled 1264105400 M * fzylogic and eventually enough stuff gets backed up on that core that the machine essentially deadlocks 1264105404 M * fzylogic oh? 1264105412 M * Bertl I verified that yesterday 1264105417 M * fzylogic interesting... 1264105478 J * niki ~niki@0x5553169c.adsl.cybercity.dk 1264105479 M * Bertl and while they are contributing to the load, they do not consume cpu 1264105515 M * fzylogic I wonder if there's some other kernel option that's turning what you see as a simple hang into my system-wide issue 1264105517 M * Bertl anyway, this behaviour seems reproduceable, so I'll start fixing that one, maybe it's the same you are observing after all :) 1264105552 M * Bertl so, I presume, when I fix that, you can test that on one of your hang cases quite simple, yeah? 1264105574 M * fzylogic yeah, I've got lots of hardware to test on 1264105607 M * fzylogic and if I allocate a 20MB guest, reproduction takes just a few seconds 1264105658 M * Bertl good, then give me a few minutes to add some dbug statements 1264105689 M * fzylogic sure thing 1264107138 Q * balbir Ping timeout: 480 seconds 1264107243 Q * FireEgl Remote host closed the connection 1264107789 J * balbir ~balbir@122.172.51.38 1264107807 Q * kezar Ping timeout: 480 seconds 1264108007 J * kezar ~kezar@rb178-1-88-163-25-248.fbx.proxad.net 1264108846 Q * hijacker_ Quit: Leaving 1264109815 J * FireEgl FireEgl@173-16-9-10.client.mchsi.com 1264111242 Q * bonbons Quit: Leaving 1264111272 Q * fback Remote host closed the connection 1264111407 P * kbad 1264111775 J * fback fback@red.fback.net 1264112125 Q * derjohn_foo Ping timeout: 480 seconds 1264112192 J * fback_ fback@red.fback.net 1264112198 Q * fback Remote host closed the connection 1264112212 J * fback fback@red.fback.net 1264112265 P * fback_ 1264112429 J * derjohn_foo ~aj@c193198.adsl.hansenet.de 1264113181 J * jrklein ~jrklein@2001:0:53aa:64c:0:15b5:63e5:65c4 1264113293 J * aj__ ~aj@c150098.adsl.hansenet.de 1264113293 Q * derjohn_foo Read error: Connection reset by peer 1264113347 M * Bertl fzylogic: hmm, that's quite interesting .. have to dig intoit a little more 1264113357 M * fzylogic alright 1264113377 M * Bertl but I guess what you are seeing is the same than I observe now 1264113385 M * fzylogic sounds like it 1264113390 M * Bertl although I do not see why I do not get the soft lockup 1264113413 M * fzylogic I'm trying to figure out what significance the memory allocation limit has 1264113416 M * Bertl but it's rather easy to recreate in kvm here, so I can look at the structures easily 1264113453 M * Bertl it seems that the actual trigger is the oom, causing a signal to be sent to the process, and some locking issues preventing that signal from being delivered properly 1264113466 M * fzylogic yeah 1264113492 M * Bertl not sure that the dual process stuff is actually needed to trigger it, just increasing the probability I guess 1264113503 M * fzylogic just noticed the crash does happen at (free + swap) memory. for some reason, I thought it was happening prematurely before 1264113525 M * fzylogic oh, yeah. I haven't actually been using 2 processes since I rewrote it to walk the memory :) 1264113535 M * fzylogic one has proven more than sufficient 1264113542 M * Bertl it happens whenever the oom is invoked inside a guest 1264113561 M * fzylogic the funny thing is it's not every time it's invoked 1264113562 M * Bertl (and some magic additional condition is met :) 1264113573 M * fzylogic most of our hosts fire off OOM killers every few minutes, if not seconds 1264113579 M * fzylogic they'll go days before crashing, however 1264113601 M * fzylogic unless OOM gets triggered by whatever situation my memeater creates 1264113605 M * Bertl yeah, otherwise I had found it earlier in my testing 1264113635 M * fzylogic I found an old 2.6.22.19 kernel that I'm testing on now 1264113636 M * Bertl anyway, I should have something to test with tomorrow, even if it is just a debug framework to gather more data 1264113642 M * fzylogic cool 1264113669 M * Bertl it would be quite interesting to narrow it down to a specific kernel change 1264113677 M * Bertl (e.g. via bisection or so) 1264113696 M * Bertl so, if you feel like doing that in the meantime ... go crazy :) 1264113712 M * fzylogic unfortunately we went straight from 2.6.22.19 to 2.6.31.5, so I don't have anything readily available outside those 1264113718 M * fzylogic I can try to get something together, though 1264113754 M * Bertl it pays off to reduce the kenrnel config to the actual hardware/software requirements of your test system 1264113783 M * Bertl with that done, you should be able to build a new kernel within a few minutes 1264113787 M * fzylogic we have pretty well 1264113797 M * fzylogic with our 200 or so hosts, we really only have 3 hardware configurations in use 1264113806 M * Bertl excellent then 1264113957 M * fzylogic 2.6.22.19 test just finished with a normal oom kill 1264113976 Q * aj__ Remote host closed the connection 1264113998 M * Bertl try it a few times, it might be just a little harder to trigger 1264114018 M * Bertl maybe spawn the eaters in a loop or so 1264114050 M * Bertl hmm, no, I guess I found it :) 1264114060 M * fzylogic sweet 1264114083 M * Bertl do you have a test setup ready for 2.6.31.x? 1264114106 M * fzylogic absolutely 1264114108 M * Bertl I mean, if I tell you to comment out two lines can you test easily? 1264114116 M * fzylogic yes 1264114127 M * Bertl okay, check include/linux/vs_context.h 1264114132 M * Bertl around line 230 1264114148 M * Bertl you should see (in __task_is_init) 1264114159 M * Bertl a task_lock(p) and task_unlock(p) 1264114172 M * fzylogic yep 1264114181 M * Bertl comment them out, with e.g. // 1264114216 M * Bertl note: this is not a proper fix, as those protect the check for init, but it should do the trick for a short test 1264114224 M * fzylogic ok 1264114251 M * Bertl my assumption is that the task is already locked when sending the signal 1264114286 M * fzylogic looks like somebody hosed our 2.6.31.5 git branch. 2.6.32.2 ok? looks like the code's in the same place 1264114301 M * Bertl yeah, fine too 1264114376 M * fzylogic ok, rebooting 1264114389 M * Bertl that was quick indeed :) 1264114490 M * fzylogic you can tell I've been doing this a lot lately :) 1264114557 M * Bertl yeah, maybe you want to start as Linux-VServer kernel tester :) 1264114635 M * fzylogic my day job keeps me plenty busy. just so happens that that day job has been kernel testing for the past month 1264114681 M * Bertl so a fortunate overlap then ... well, fine for me 1264114749 M * fzylogic still triggered, but I'm rebuilding the kernel from a clean slate this time 1264114775 M * Bertl hmm, so you got the soft lockup? 1264114787 M * fzylogic yeah 1264114803 M * fzylogic make showed oom_kill.c get rebuilt, but I'm rebuilding again to be sure 1264114807 M * Bertl could you do a magic-sysrq-t for me in that state? 1264114820 J * derjohn_mob ~aj@c150098.adsl.hansenet.de 1264114848 M * fzylogic once it stops printing, I'll upload it 1264114913 M * Bertl excellent, will print a dump for each process 1264114919 M * fzylogic http://karategerbil.com/kernel_debug/sysrq-t.txt 1264114929 M * Bertl the only actual interesting one is the one for the memeater 1264114933 M * Mr_Smoke excellent domain name btw ;) 1264114940 M * fzylogic thanks :) 1264114996 M * Bertl hmm, can't see the process in that dump though 1264115003 M * fzylogic it's named "mem3" 1264115053 M * fzylogic know what's interesting? it's not deadlocked. 1264115054 M * fzylogic just slow 1264115072 M * fzylogic these soft lockup messages that get printed every minute show the stack changing 1264115282 M * fzylogic http://karategerbil.com/kernel_debug/soft_lockup.txt 1264115340 M * Bertl interestng, do we have the first one? 1264115390 M * fzylogic reload the file 1264115392 M * fzylogic just prepended it 1264115471 M * Bertl okay, could you addr2line -e vmlinux the following address for me? 1264115482 M * Bertl ffffffff810282b9 1264115618 M * fzylogic didn't match anything. might've been changed by my rebuild, so I'll boot this new kernel and try again 1264115636 M * Bertl okay, make sure that you ahve DEBUG_INFO enabled 1264115649 M * Bertl otherwise you won't get anything reasonable out of the kernel 1264115690 M * Bertl and it's the first 'stuck' in mem3 we are interested in 1264115704 M * fzylogic ok 1264115939 Q * ghislain Quit: Leaving. 1264115987 Q * jrklein Quit: jrklein 1264116074 M * fzylogic rebooted and re-running 1264116234 M * fzylogic http://karategerbil.com/kernel_debug/soft_lockup2.txt 1264116273 M * fzylogic jeremy@womb:~/kernels/ndn-2.6$ addr2line -e vmlinux ffffffff810ae889 1264116293 M * fzylogic /data/home/jeremy/kernels/ndn-2.6/arch/x86/include/asm/atomic_64.h:93 1264116364 M * Bertl okay, let's reduce the address by a few bytes, e.g. ffffffff810ae885 1264116381 M * Bertl see what you get there .. do that till you get out of the atomic 1264116402 M * fzylogic ffffffff810ae880 returns: 1264116408 M * fzylogic /home/jeremy/kernels/ndn-2.6/include/linux/vserver/limit_int.h:113 1264117099 M * Bertl okay, let's go a little forward, i.e. checl ffffffff810ae890 1264117103 M * Bertl *check 1264117142 M * fzylogic had to go up to ffffffff810ae894 to get out of atomic 1264117147 M * fzylogic that's mm/memory.c:2989 1264117205 M * Bertl hmm, you don't have hugepages enabled? 1264117222 M * fzylogic apparently not! 1264117236 M * Bertl okay, that's fine ... 1264117728 J * jrklein ~jrklein@2001:0:53aa:64c:0:408d:b4d8:690