1262910027 J * fzylogic ~fzylogic@dsl081-243-128.sfo1.dsl.speakeasy.net 1262911058 Q * imcsk8 Quit: This computer has gone to sleep 1262911393 Q * dowdle Remote host closed the connection 1262911875 M * fzylogic got a fun case of the OOM killer triggering soft CPU lockups when it kills (presumamably) uninterruptable processes 1262911897 M * fzylogic enough so that after a few rounds of the OOM killer firing off, the machine deadlocks 1262911902 M * fzylogic http://karategerbil.com/kernel_debug/vserver102.kernlog 1262911906 M * Bertl kernel/patch version? 1262911970 M * fzylogic 2.6.32.2-vs2.3.0.36.28 1262911990 M * fzylogic though I'm pretty sure it's been happening for a long time 1262912003 M * fzylogic the other OOM bug I brought up was most likely hidding it (OOM causing the host to panic) 1262912009 M * fzylogic hiding* 1262912050 M * Bertl hmm, interesting .. but why would apache become unkillable? 1262912061 M * Bertl and what is aufs? 1262912103 M * fzylogic aufs is something in place between the underlying nfsroot and the live root filesystem 1262912111 M * fzylogic though we see the lockup even without that patch involved 1262912138 M * fzylogic as for apache, I don't have the slightest idea why it's uninterruptable 1262912148 M * fzylogic the deadlocked processes can be anything 1262912161 M * fzylogic I've got examples of cron and even sh locking up 1262912269 M * fzylogic it's a bug we've been dealing with since 2.6.22.19 that I just now got the right debug options to pinpoint 1262912280 M * Bertl do you have a test setup to trigger that? 1262912283 M * fzylogic well, 2.6.22.19 was the last kernel to not exhibit the issue 1262912297 M * fzylogic I'm working on getting one that I can replicate 1262912313 M * fzylogic it's unfortunately happening to our live servers right now 1262912319 M * fzylogic about half a dozen crashes/day 1262912333 M * Bertl well, that would be enough to test with :) 1262912375 M * Bertl first, I presume you are hitting guest limits, to be precise RSS limits, yes? 1262912382 M * fzylogic correct 1262912396 M * fzylogic one of the downsides of letting customers essentially set their memory as low as they want 1262912434 M * Bertl okay, so we reach the guest limit and invoke the OOM killer, which successfully picks a process and 'seems' to kill it 1262912445 M * fzylogic yeah 1262912450 M * Bertl but that results (as a side effect?) in a stuck CPU 1262912462 M * Bertl waiting on some short time spinlock as it seems 1262912476 M * fzylogic the oom killer also gives the killed process realtime priority so it can cleanly shut down in a timely manner 1262912512 M * Bertl which obviously doesn't happen, but could be related to the fact that it cannot be shut down 1262912553 M * fzylogic I'm wondering if maybe hitting the OOM killer in quick enough succession is locking all the cpus long enough that nothing ever gets to actually shut down, resulting in the deadlock 1262912577 M * fzylogic it's hard to know if it's in a D state before the OOM trigger, or as a result of it 1262912599 M * Bertl would be easy to output that as part of the kill 1262913235 Q * balbir Read error: Operation timed out 1262913248 M * fzylogic how do you grab that from task_struct? 1262913259 M * fzylogic or is it elsewhere? 1262913272 M * Bertl the status contains that 1262913492 M * fzylogic got it 1262913537 Q * matthew-_ Ping timeout: 480 seconds 1262914051 J * balbir ~balbir@122.172.59.203 1262914916 J * matthew-_ ~ms@ns2.wellquite.org 1262916246 Q * yarihm Quit: This computer has gone to sleep 1262916355 Q * mkee Remote host closed the connection 1262916454 M * Bertl fzylogic: hmm, do you have VSERVER_DEBUG enabled? 1262916493 M * fzylogic yes 1262916536 M * Bertl enable bit 5 in misc debug then 1262916555 M * fzylogic just rebooted with a patch to log process states before being OOM killed too 1262916575 M * Bertl okay, you can enable it any time before the actual oom happens 1262916616 M * fzylogic where's that done? 1262916647 M * Bertl sysctl on the host 1262916664 M * fzylogic k 1262916674 M * fzylogic gotta run, but I'll check back with more info tomorrow 1262916706 M * Bertl sysctl -w vserver.debug_misc=32 1262916963 Q * thierryp Remote host closed the connection 1262917553 Q * SubZero 1262917750 Q * fLoo Ping timeout: 480 seconds 1262918423 Q * geb Ping timeout: 480 seconds 1262921485 Q * balbir Ping timeout: 480 seconds 1262922124 J * balbir ~balbir@122.172.109.16 1262923258 J * saulus_ ~saulus@c150041.adsl.hansenet.de 1262923258 Q * SauLus Read error: Connection reset by peer 1262923268 N * saulus_ SauLus 1262924185 Q * nkukard Ping timeout: 480 seconds 1262924947 Q * nenolod Quit: Leaving 1262925714 Q * balbir Read error: Operation timed out 1262928595 J * AndrewLe1 ~andrew@u7.hlc.edu.tw 1262928596 Q * AndrewLee Read error: Connection reset by peer 1262930379 J * thierryp ~thierry@home.parmentelat.net 1262930936 Q * niki Quit: Leaving 1262931800 J * sharkjaw ~gab@90.149.121.45 1262932026 N * AndrewLe1 AndrewLee 1262932156 J * nkukard ~nkukard@196.212.73.74 1262932249 M * Bertl off to bed now .. have a good one everyone! 1262932265 N * Bertl Bertl_zZ 1262932880 Q * derjohn_mob Ping timeout: 480 seconds 1262932979 J * ghislain ~AQUEOS@LPuteaux-151-41-11-129.w217-128.abo.wanadoo.fr 1262933702 J * nenolod ~nenolod@petrie.dereferenced.org 1262934861 Q * thierryp Remote host closed the connection 1262935495 J * SubZero ~SubZero@chello089076140236.chello.pl 1262936659 Q * sharkjaw Remote host closed the connection 1262936822 J * niki ~niki@cpe.fe4-0-120.0x50a6de52.kdnxd4.customer.tele.dk 1262937295 M * karasz is there any reason why proc/vmstat is not in defaults/vprocunhide-files ? 1262937935 J * sharkjaw ~gab@90.149.121.45 1262937974 J * fLoo fLoo@188.194.83.192 1262938430 J * thierryp ~thierry@zankai.inria.fr 1262938455 Q * fLoo Ping timeout: 480 seconds 1262939337 J * ghislain1 ~AQUEOS@adsl2.aqueos.com 1262939640 Q * ghislain Ping timeout: 480 seconds 1262943260 J * beby ~planet@118.96.184.87 1262943493 P * beby Konversation terminated! 1262945786 J * gnuk ~F404ror@pla93-3-82-240-11-251.fbx.proxad.net 1262948355 J * yarihm ~yarihm@84-72-135-146.dclient.hispeed.ch 1262949158 Q * yarihm Ping timeout: 480 seconds 1262949168 J * yarihm ~yarihm@217-162-53-251.dclient.hispeed.ch 1262950571 J * BenG ~bengreen@cpc2-aztw22-2-0-cust521.aztw.cable.virginmedia.com 1262951036 Q * SubZero 1262951741 Q * BenG Quit: I Leave 1262951765 J * fLoo fLoo@188.194.83.192 1262951999 Q * yarihm Quit: This computer has gone to sleep 1262952242 N * Bertl_zZ Bertl 1262952250 M * Bertl morning folks! 1262952263 M * Mr_Smoke mo'in 1262952370 M * Bertl karasz: it would leak host information 1262952479 J * SubZero ~SubZero@chello089076140236.chello.pl 1262954750 M * karasz Bertl: any other way to get sysstat worling? especially iostat and vmstat? 1262954779 M * karasz no, wrong question 1262954803 M * karasz actually what i want is to get io and vm information on each guest 1262954838 J * ktwilight ~keliew@85.34-241-81.adsl-dyn.isp.belgacom.be 1262954978 M * Bertl karasz: io and vm is shared/optimized across the host, any data you'd get there would not really reflect the guest activity 1262954996 M * karasz ic 1262955004 M * karasz so it is a deadend :-/ 1262955024 M * Bertl if you need, for whatever purpose, fully separated guests, look into kvm 1262955061 M * karasz nah, somehow it is vserver of nothing :), i can't stand the other virtualisations. 1262955599 Q * fLoo 1262955794 M * karasz s/of/or/ 1262956411 M * Bertl the problem with io is that at the time when most operations (async ones) are accounted, the relation between request and guest is already lost 1262956469 M * Bertl for vm it really depends on the data you're interested in, you could get for example guest-only read/write ops to unshared inodes 1262956474 Q * SubZero 1262957148 J * davidkarban ~david@80.250.18.198 1262958260 Q * ghislain1 Ping timeout: 480 seconds 1262958466 J * derjohn_mob ~aj@tmo-108-92.customers.d1-online.com 1262958507 J * ghislain ~AQUEOS@LPuteaux-151-41-11-129.w217-128.abo.wanadoo.fr 1262958555 J * fLoo fLoo@188.194.83.192 1262958782 Q * bzed Remote host closed the connection 1262958977 J * bzed ~bzed@devel.recluse.de 1262959003 Q * groente Remote host closed the connection 1262959031 J * groente ~groente@shell.puscii.nl 1262960138 Q * derjohn_mob Ping timeout: 480 seconds 1262961744 J * barismetin ~barismeti@zanzibar.inria.fr 1262962470 Q * ghislain Ping timeout: 480 seconds 1262963105 J * docelic ~docelic@78-2-71-58.adsl.net.t-com.hr 1262963432 J * yarihm ~yarihm@80-219-169-54.dclient.hispeed.ch 1262963436 Q * gnuk Quit: NoFeature 1262963708 Q * hijacker__ Quit: Leaving 1262963717 J * hijacker ~hijacker@213.91.163.5 1262964765 J * imcsk8 ~ichavero@evdomip-109-112.iusacell.net 1262964896 J * derjohn_mob ~aj@tmo-101-33.customers.d1-online.com 1262964910 J * SubZero ~SubZero@chello089076140236.chello.pl 1262965024 J * gnuk ~F404ror@pla93-3-82-240-11-251.fbx.proxad.net 1262965144 J * geb ~geb@earth.gebura.eu.org 1262965243 Q * thierryp Ping timeout: 480 seconds 1262965636 J * dowdle ~dowdle@scott.coe.montana.edu 1262966400 Q * sharkjaw Remote host closed the connection 1262966506 Q * imcsk8 Quit: This computer has gone to sleep 1262967183 Q * matthew-_ Quit: leaving 1262967196 J * matthew-_ ~ms@ns2.wellquite.org 1262967238 Q * niki Quit: Leaving 1262967536 P * jamiem_ 1262968423 J * thierryp ~thierry@ANice-256-1-96-131.w83-197.abo.wanadoo.fr 1262971863 Q * docelic Quit: http://www.spinlocksolutions.com/ 1262971934 Q * derjohn_mob Ping timeout: 480 seconds 1262972232 M * Bertl nap attack ... bbl 1262972237 N * Bertl Bertl_zZ 1262972471 Q * davidkarban Quit: Ex-Chat 1262972768 Q * barismetin Quit: Leaving... 1262972792 J * imcsk8 ~ichavero@148.229.1.11 1262973442 J * pologtijaune ~gillou@trash.mana.pf 1262973492 Q * pologtijaune 1262973744 Q * gnuk Quit: NoFeature 1262974051 Q * thierryp Remote host closed the connection 1262974071 J * thierryp ~thierry@ANice-256-1-96-131.w83-197.abo.wanadoo.fr 1262974328 Q * nou Ping timeout: 480 seconds 1262974500 Q * jrklein Ping timeout: 480 seconds 1262974558 Q * thierryp Ping timeout: 480 seconds 1262974727 J * jrklein ~jrklein@2001:0:53aa:64c:0:44ad:b4d8:690 1262975109 J * jrklein_ ~jrklein@2001:0:53aa:64c:0:44ad:b4d8:690 1262975210 Q * jrklein Ping timeout: 480 seconds 1262975210 N * jrklein_ jrklein 1262975783 J * niki ~niki@0x5553169c.adsl.cybercity.dk 1262975911 J * hijacker_ ~hijacker@87-126-142-51.btc-net.bg 1262976993 J * derjohn_mob ~aj@e180194153.adsl.alicedsl.de 1262979590 Q * jrklein Ping timeout: 480 seconds 1262980218 J * bonbons ~bonbons@2001:960:7ab:0:2c0:9fff:fe2d:39d 1262980918 J * thierryp ~thierry@home.parmentelat.net 1262982148 Q * thierryp Quit: ciao folks 1262982149 Q * yarihm Quit: This computer has gone to sleep 1262982194 Q * hijacker_ Quit: Leaving 1262982730 J * cuba33ci_ ~cuba33ci@118-160-160-63.dynamic.hinet.net 1262982799 J * Walex ~Walex@82-69-39-138.dsl.in-addr.zen.co.uk 1262982838 Q * cuba33ci Ping timeout: 480 seconds 1262983738 N * Bertl_zZ Bertl 1262983753 M * Bertl bacn now .. 1262983763 M * Bertl *back even 1262984113 J * yarihm ~yarihm@217.150.254.84 1262984454 M * trippeh Commuting by bike to/from work is starting to get a wee bit chilly. 1262984458 M * trippeh -26C today ;) 1262984489 M * ruskie fun 1262984517 Q * Walex Read error: Connection reset by peer 1262984631 M * trippeh Two layers of wool over my face, yet my beard underneath freezes solid. Things like smiling gets reeeally uncomfortable. 1262984659 M * trippeh Wonder when I'll cave in and use public transport ;) 1262985065 M * fback trippeh: unless public transport stop working too ;) 1262985081 M * Bertl hmm, I refuse to go outside when it is below zero (C) so you must belong to one of those people I see on documentations which work and live on the south or north pole (and which have my respect for doing so :) 1262985147 M * fback -20 in february here is not uncommon 1262985151 J * ichavero_ ~ichavero@148.229.1.11 1262985156 M * fback and I actually like this weather 1262985181 M * trippeh Much nicer than ~0C and wet 1262985186 M * fback much more than -5 and very humid air 1262985248 M * fback and you feel every your breath 1262985316 M * fback trippeh: totally agree :-) 1262985416 M * trippeh Had to bring my fileserver back in yesterday though, power went out, and the server dropped to below -20 in an instant :P 1262985517 M * trippeh Outdoor server shed for the win 1262985655 M * ruskie whoa 1262985725 M * trippeh A few hours indoor + a couple of resets it eventually POSTed. Phew ;) 1262985731 M * ruskie scary 1262986780 M * Bertl now that is a reason for making it fully redundant :) 1262986813 M * Bertl i.e. with generator and everything required to keep it running 24/7 :) 1262987003 M * pmjdebruijn standard colocation :) 1262987136 M * fback Bertl: ah, almost forgot -- monday around the noon 1262989638 Q * yarihm Ping timeout: 480 seconds 1262990663 J * yarihm ~yarihm@84-72-135-146.dclient.hispeed.ch 1262991993 J * nou Chaton@causse.larzac.fr.eu.org 1262993160 Q * SubZero 1262993930 Q * bonbons Quit: Leaving 1262994085 Q * Piet Remote host closed the connection 1262994097 Q * ichavero_ Quit: This computer has gone to sleep 1262994152 M * Bertl fback: okay, tx 1262994152 J * Piet ~Piet__@659AAAXMU.tor-irc.dnsbl.oftc.net 1262994258 Q * imcsk8 Quit: Leaving