1210809625 Q * edlinuxguru Ping timeout: 480 seconds 1210812193 Q * mick_work Ping timeout: 480 seconds 1210812864 Q * besonen_mobile Ping timeout: 480 seconds 1210812971 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210813248 J * dna ~dna@106-233-dsl.kielnet.net 1210813643 Q * dna Quit: Verlassend 1210814620 Q * mire Ping timeout: 480 seconds 1210814713 Q * mick_work Ping timeout: 480 seconds 1210815461 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210818318 Q * zbyniu Ping timeout: 480 seconds 1210824793 Q * mick_work Ping timeout: 480 seconds 1210825279 J * cryptronic ~oli@p54A3B3C7.dip0.t-ipconnect.de 1210825566 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210826473 Q * balbir Read error: Operation timed out 1210827193 Q * mick_work Ping timeout: 480 seconds 1210827482 Q * cryptronic Quit: Leaving. 1210827759 J * yarihm ~yarihm@84-75-103-252.dclient.hispeed.ch 1210827943 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210829371 Q * yarihm Ping timeout: 480 seconds 1210829425 J * zbyniu ~zbyniu@host13-188.crowley.pl 1210829880 J * sharkjaw ~gab@64.28.12.166 1210829881 J * Slydder ~chuck@194.59.17.53 1210830369 J * yarihm ~yarihm@vpn-global-dhcp3-076.ethz.ch 1210831001 M * Bertl finally off to bed now ... have a good one everyone! 1210831006 N * Bertl Bertl_zZ 1210832162 J * balbir ~balbir@59.145.136.1 1210832382 J * ntrs__ ~ntrs@77.29.67.161 1210833897 J * hijacker__ ~hijacker@213.91.163.5 1210833897 Q * hijacker_ Read error: Connection reset by peer 1210834383 J * ntrs_ ~ntrs@77.29.67.79 1210834674 J * rgl ~rgl@bl8-135-125.dsl.telepac.pt 1210834677 M * rgl hi 1210834819 Q * ntrs__ Ping timeout: 480 seconds 1210835546 J * JonB ~NoSuchUse@77.75.164.169 1210836009 Q * rob-84x^ Ping timeout: 480 seconds 1210836320 Q * Slydder Remote host closed the connection 1210836352 J * Slydder ~chuck@194.59.17.53 1210836498 Q * ntrs_ Ping timeout: 480 seconds 1210836932 J * bfremon ~ben@lns-bzn-33-82-252-45-56.adsl.proxad.net 1210837007 J * rob-84x^ ~rob@submarine.ath.cx 1210837393 Q * mick_work Ping timeout: 480 seconds 1210837436 N * DoberMann[ZZZzzz] DoberMann 1210837531 Q * wibble Remote host closed the connection 1210837753 J * MatBoy ~MatBoy@wiljewelwetenhe.xs4all.nl 1210837772 Q * JonB Quit: This computer has gone to sleep 1210838167 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210839491 Q * bfremon Remote host closed the connection 1210839673 Q * mick_work Ping timeout: 480 seconds 1210839714 J * bfremon ~ben@lns-bzn-33-82-252-45-56.adsl.proxad.net 1210839936 J * JonB ~NoSuchUse@130.227.63.19 1210840196 J * dna ~dna@88-233-dsl.kielnet.net 1210840421 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210842114 Q * [PUPPETS]Gonzo Remote host closed the connection 1210843880 Q * bfremon Quit: Leaving. 1210844276 J * bfremon ben@lns-bzn-33-82-252-45-56.adsl.proxad.net 1210845140 J * bfremo1 ben@lns-bzn-52-82-65-102-239.adsl.proxad.net 1210845242 J * Pazzo ~ugelt@reserved-225136.rol.raiffeisen.net 1210845479 Q * bfremon Ping timeout: 480 seconds 1210846123 Q * rob-84x^ Quit: That's it for today 1210846130 J * rob-84x^ ~rob@submarine.ath.cx 1210846904 Q * Fire_Egl Ping timeout: 480 seconds 1210847515 J * Fire_Egl FireEgl@adsl-147-90-212.bhm.bellsouth.net 1210847692 J * Supaplex_ supaplex@166.70.62.194 1210847693 J * micah_ ~micah@micah.riseup.net 1210847726 Q * mick_work resistance.oftc.net tachyon.oftc.net 1210847726 Q * balbir resistance.oftc.net tachyon.oftc.net 1210847726 Q * hparker resistance.oftc.net tachyon.oftc.net 1210847726 Q * micah resistance.oftc.net tachyon.oftc.net 1210847726 Q * brc resistance.oftc.net tachyon.oftc.net 1210847726 Q * Supaplex resistance.oftc.net tachyon.oftc.net 1210848096 Q * MatBoy Remote host closed the connection 1210848239 J * MatBoy ~MatBoy@wiljewelwetenhe.xs4all.nl 1210849325 J * friendly ~friendly@ppp59-167-137-15.lns3.mel6.internode.on.net 1210849514 J * mire ~mire@36-175-222-85.adsl.verat.net 1210850738 Q * nox Ping timeout: 480 seconds 1210851418 Q * JonB Ping timeout: 480 seconds 1210851653 Q * Aiken Quit: Leaving 1210851690 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210852518 Q * mick_work Ping timeout: 480 seconds 1210852967 Q * friendly Quit: Leaving. 1210852978 J * JonB ~NoSuchUse@192.38.8.25 1210853128 Q * mire Ping timeout: 480 seconds 1210853279 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210854126 J * nox ~nox@static.88-198-17-175.clients.your-server.de 1210854388 Q * sharkjaw Quit: Leaving 1210855845 Q * Fire_Egl Ping timeout: 480 seconds 1210856769 J * FloF ~FloF@p50813807.dip0.t-ipconnect.de 1210856789 M * FloF hi 1210856802 Q * FloF 1210857420 Q * dennis__ Remote host closed the connection 1210857436 J * ki1 ~kir@swsoft-msk-nat.sw.ru 1210857471 J * FireEgl FireEgl@adsl-17-148-127.bhm.bellsouth.net 1210857679 J * ki2 ~kir@swsoft-msk-nat.sw.ru 1210858041 Q * ki1 Ping timeout: 480 seconds 1210859002 J * squat_ ~squat@85-10-210-61.clients.your-server.de 1210859002 Q * squat Read error: Connection reset by peer 1210859006 N * squat_ squat 1210859216 J * edlinuxguru ~edlinuxgu@72.sub-72-125-71.myvzw.com 1210859223 Q * opuk Remote host closed the connection 1210859747 Q * Slydder Quit: Leaving. 1210859950 Q * Loki|muh Remote host closed the connection 1210860226 N * Bertl_zZ Bertl 1210860232 M * Bertl morning folks! 1210860264 J * rgl_ ~rgl@bl8-135-125.dsl.telepac.pt 1210860289 M * JonB hey Bertl 1210860435 M * sandman May I apply a new vserver patch to an older kernel (2.6.18)? 1210860538 M * Bertl sure, but it will most likely fail 1210860674 Q * rgl Ping timeout: 480 seconds 1210860705 J * opuk ~kupo@2001:16d8:ffbd:100::10 1210860729 M * sandman I see. 1210860791 J * ntrs_ ~ntrs@77.29.66.22 1210860803 M * Bertl why would you want to apply a (more) recent patch to 2.6.18? 1210861387 M * sandman Just that it's the one that's running in Debian Stable 1210861402 M * sandman not a problem though, I'll likely just be dist-upgrading to debian Lenny shortly 1210861576 J * pmenier ~pmenier@ACaen-152-1-68-97.w83-115.abo.wanadoo.fr 1210861599 M * kriebel what feature(s) are you looking for in this patch? 1210861619 M * daniel_hozac "maintainedness" is one that comes to mind... 1210861645 M * kriebel is the stable fork getting long in the tooth? 1210861657 M * daniel_hozac what? 1210861682 M * kriebel I thought the patch in debian stable was the advertised "stable" version of vserver 1210861689 M * kriebel but I never really checked 1210861704 M * Bertl it is the _previous_ stable release :) 1210861706 M * daniel_hozac that's old-stable. 1210861721 M * Bertl kriebel: i.e. the one we stopped working on about a year or so ago :) 1210861722 M * daniel_hozac i.e. nobody-cares-about-it-same-bugs-will-always-be-present-stable. 1210861957 M * kriebel I'm having trouble telling the version of the patch by poking at the running system, actually 1210861959 Q * edlinuxguru Ping timeout: 480 seconds 1210861999 M * Bertl kriebel: because debian, for whatever reason, removes the kernel name extension 1210862024 M * Bertl kriebel: but you can check the API version in /proc/virtual/info 1210862060 M * Bertl (or read up on the debian info/logs) 1210862098 M * kriebel I should email someone to request patch versions get put into package descriptions 1210862150 M * Bertl hehe, sounds like resolution: wontfix :) 1210862163 J * mire ~mire@36-175-222-85.adsl.verat.net 1210862593 Q * mick_work Ping timeout: 480 seconds 1210863143 M * kriebel well, I tried submitting a bug 1210863153 M * kriebel I think this is my first, or first in a looooong time 1210863161 M * kriebel don't know if it emailed properly 1210863166 M * Bertl keep us updated how it goes ... 1210863173 M * kriebel I will if it works 1210863368 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210863616 Q * phedny Ping timeout: 480 seconds 1210863916 J * phedny ~mark@2001:610:656::115 1210864100 J * _e1 ~sapan@aegis.CS.Princeton.EDU 1210864111 M * _e1 hullo 1210864115 M * _e1 hm. 1210864118 N * _e1 er 1210864124 M * Bertl hullo er! 1210864133 M * er greetings Bertl 1210864235 M * er so that kernel bug we were facing did in fact turn out in the vx_unhold loop 1210864247 M * er dunno if Daniel mentioned that here 1210864285 M * er anyhow, onward to the next bug 1210864288 M * Bertl don't remember .. so in princeton code only? 1210864305 M * er Bertl: in Princeton workloads only 1210864339 M * Bertl hmm? do we need to change anything in the mainline patch? 1210864363 M * er I think Andy is investigating the issue 1210864368 M * daniel_hozac i think andy's fixes are legit, and they're what caused the problem. 1210864400 M * er well, the freeze goes away when you disable his patch 1210864402 M * daniel_hozac i.e. i don't think you'll see it on a vanilla Linux-VServer kernel, without that patch. 1210864412 M * er yes, that's true. 1210864441 M * daniel_hozac but i think we should apply the current patch. 1210864444 M * er but he's of the opinion that the workload combined with his patch 1210864472 M * er leads to a degenerate case that could also happen in mainline, hence causing the freeze. 1210864482 M * Bertl okay, what's the patch/fixes we are talking about? 1210864484 M * er He's putting together a test to try to prove that, so we should hear from him about that soon. 1210864499 M * daniel_hozac https://svn.planet-lab.org/browser/linux-2.6/trunk/linux-2.6-210-vserver-cpu-sched.patch?format=txt 1210864583 N * micah_ micah 1210864656 M * er anyhow, the bug I'm trying to stomp out is a different one and involves this line in VNET that you wrote when you were down here: 1210864660 M * er skb->skb_tag = nx_current_nid(); 1210864693 M * er in alloc_skb. So it turns out that alloc_skb need not always happen in process context 1210864719 M * Bertl no, but it should happen in the network context :) 1210864751 M * er hm, and what would that be for an incoming ICMP packet? 1210864769 M * daniel_hozac i think by process context he refers to != interrupt context. 1210864772 M * Bertl incoming packets without classification do not have any context by default 1210864795 M * er I see, and how does nx_current_nid report "not having any context" ? 1210864821 M * er right now it appears to be returning the previously active network context 1210864841 M * Bertl probably because the 'previous task' is still current 1210864867 M * Bertl you might want to special case that for irqs, as daniel pointed out 1210864871 M * er right. 1210864889 M * er so we tried that... we add an if (in_interrupt()) prior to that 1210864918 Q * Pazzo Quit: Ex-Chat 1210864919 M * er but that drops the skb_tag for all packets, suggesting that alloc_skb even for outgoing packets happens in interrupt context 1210864936 M * daniel_hozac hmm, it was !in_interrupt(), right? 1210864951 J * cryptronic ~oli@p54A3B3C7.dip0.t-ipconnect.de 1210864957 M * er daniel_hozac: urrrkkkkkkkkkkkkk 1210864994 M * Bertl regarding the svn patch, the third hunk in sched.c, why advance normal time in integral steps? 1210865065 M * daniel_hozac well, the integral is how much time it's going to use, no? 1210865100 M * daniel_hozac it's what's done for idle time already, and, as i recall, how you explained it to me :) 1210865113 M * daniel_hozac why should it be different from idle time? 1210865137 M * Bertl good point, so that is a clear bug then 1210865229 M * Bertl and what's the change in the math for delta_min? 1210865332 M * daniel_hozac that took me quite some time to figure out... 1210865448 M * Bertl ah, I see, we allocate the 'wrong' part of the interval 1210865461 Q * phedny Ping timeout: 480 seconds 1210865473 Q * mick_work Ping timeout: 480 seconds 1210865732 M * Bertl the delta<0 case only happens for new contexts, I guess? 1210865746 M * Bertl I mean, is an overrun really realistic? 1210865770 M * daniel_hozac new contexts are initialized to the current value of jiffies. 1210865795 M * daniel_hozac IIRC, the math worked out to 49 days or something for an overrun. 1210865848 M * Bertl well, 49 days without scheduling activity, and I presume the context is dead anyway :) 1210865939 M * daniel_hozac heh, well, it's not impossible on PL, with persistent contexts and all. 1210865948 M * Bertl okay 1210865953 M * daniel_hozac but yes, i'm not sure we want another check on such a hot path. 1210865986 M * daniel_hozac if anything, it should be an unlikely else if branch. 1210866247 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210866270 Q * ntrs_ Ping timeout: 480 seconds 1210866487 M * Bertl daniel_hozac: what about this one: http://vserver.13thfloor.at/Experimental/delta-sched-fix05.diff ? 1210866496 M * nox daniel_hozac: is it possible 2 apply the singal patch to a running kernel? 1210866525 M * Bertl IIRC, yes, there is a project to do that 1210866530 Q * sandman Remote host closed the connection 1210866556 M * nox Bertl: will it be included in the 2.3.0.35? 1210866564 J * balbir ~balbir@122.167.177.15 1210866607 M * Bertl nox: we are talking about this one: http://people.linux-vserver.org/~dhozac/p/k/delta-signal-fix03.diff, right? 1210866658 M * nox yes 1210866676 M * Bertl yep, is already in my tree 1210866690 M * nox great 1210866703 M * daniel_hozac Bertl: http://people.linux-vserver.org/~dhozac/p/k/delta-space-fix01.diff btw. pidspace patches are still in progress... 1210866725 M * Bertl yeah, saw that one 1210866728 M * Bertl tx 1210866760 Q * bronson Read error: Connection reset by peer 1210866802 J * bronson ~bronson@adsl-68-122-117-135.dsl.pltn13.pacbell.net 1210866852 M * daniel_hozac Bertl: what about ceiling the result? 1210866871 M * daniel_hozac (i.e. the tokens % sched_pc->fill_rate[1] part) 1210866901 M * daniel_hozac don't forget the sched_hard.h change, that's what caused the hang. 1210867151 M * Bertl well, the tokens % sched_pc->fill_rate[1] part looks like a bug to me .. i.e. I do not see a point in doing that 1210867194 Q * rob-84x^ Ping timeout: 480 seconds 1210867248 M * nox is there a estimated releasedate for .35? 1210867262 M * Bertl near future 1210867396 M * Bertl daniel_hozac: for the vx_try_unhold() part, what is the point in running the 1210867410 M * Bertl vx_set_rq_max_idle/min_skip without proper values? 1210867455 M * daniel_hozac i thought they were intiailized to 0 1210867471 M * Bertl well, maxidle = HZ, minskip=0 1210867477 M * daniel_hozac right. 1210867508 M * daniel_hozac minskip not being 0 is what causes an infinite loop in schedule. 1210867521 M * daniel_hozac i guess this is not all that relevant though, since most of this will need to be rewritten for 2.6.23+ anyway... 1210867555 M * Bertl yep, anyway, that looks like papering over a real issue 1210867624 M * daniel_hozac well, there's no context to schedule since all contexts require a bigger time slice than what is currently available. 1210867644 M * daniel_hozac but there's still some time left. 1210867659 M * Bertl wait, when the hold queue is empty, we have no _waiting_ contexts 1210867704 M * daniel_hozac hmm, right... so i guess there just aren't any processes to schedule, at all? 1210867718 M * daniel_hozac i never quite understood how this was triggered. 1210867720 M * Bertl in which case we should skip idle time or go idle 1210867733 M * daniel_hozac i guess it's the go idle part that doesn't happen. 1210867757 M * daniel_hozac vx_try_skip should probably check for an empty hold queue? 1210867800 M * Bertl yep, but if that was the error case, then there should be a log entry like: 1210867805 M * Bertl hold queue empty on cpu %d", cpu 1210867810 M * Bertl do we have those? 1210867823 M * daniel_hozac no, debugging is disabled in the kernel, and i don't think anybody tried to enable it. 1210867854 M * Bertl okay 1210867878 M * Bertl so I think we should assume that exactly this happened 1210867910 M * Bertl and I opt for adding a check to try_skip or even before that in the scheduler 1210867916 M * Bertl (will think about that) 1210868258 M * Bertl daniel_hozac: how about this one: http://vserver.13thfloor.at/Experimental/delta-sched-fix06.diff ? 1210868703 J * bonbons ~bonbons@2001:960:7ab:0:2c0:9fff:fe2d:39d 1210869406 N * DoberMann DoberMann[PullA] 1210871258 J * phedny ~mark@2001:610:656::115 1210871302 J * ntrs_ ~ntrs@77.29.66.22 1210871396 Q * er Quit: Leaving. 1210871444 N * Guest818 phedny_ 1210871668 Q * Linus Ping timeout: 480 seconds 1210872314 Q * yarihm Quit: This computer has gone to sleep 1210873602 Q * bfremo1 Quit: Leaving. 1210873929 J * rob-84x^ ~rob@submarine.ath.cx 1210874309 Q * Bertl Ping timeout: 480 seconds 1210874427 N * pmenier pmenier_off 1210874521 J * hparker ~hparker@linux.homershut.net 1210874812 J * Piet ~piet@tor.noreply.org 1210875193 Q * mick_work Ping timeout: 480 seconds 1210875804 J * Bertl herbert@IRC.13thfloor.at 1210875967 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210876783 M * daniel_hozac Bertl: looks good. 1210876873 Q * mick_work Ping timeout: 480 seconds 1210876950 M * Bertl could you be so kind and point andy to those patches? 1210876970 M * daniel_hozac okay, will do. 1210876977 M * Bertl it might make perfect sense to test them on planetlab with the problematic loads 1210877459 Q * JonB Quit: This computer has gone to sleep 1210877620 J * ntrs__ ~ntrs@77.29.65.228 1210877640 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210878055 Q * ntrs_ Ping timeout: 480 seconds 1210879821 J * doener_ ~doener@i577AFC80.versanet.de 1210879926 Q * doener Ping timeout: 480 seconds 1210879933 J * docelic ~docelic@78.134.196.168 1210880244 Q * rgl_ Quit: Saindo 1210880289 Q * Bertl Ping timeout: 480 seconds 1210882451 J * phedny__ ~mark@2001:610:656::115 1210882490 J * arthur_ ~arthur@pan.madism.org 1210882577 J * ktwilight ~ktwilight@136.76-66-87.adsl-dyn.isp.belgacom.be 1210882718 Q * phedny charon.oftc.net venus.oftc.net 1210882718 Q * arthur charon.oftc.net venus.oftc.net 1210882718 Q * bzed charon.oftc.net venus.oftc.net 1210882718 Q * nou charon.oftc.net venus.oftc.net 1210882718 Q * ard charon.oftc.net venus.oftc.net 1210882753 J * nou Chaton@causse.larzac.fr.eu.org 1210882854 Q * ktwilight_ Ping timeout: 480 seconds 1210883737 J * ard ~ard@shell2.kwaak.net 1210883759 J * bzed ~bzed@devel.recluse.de 1210883980 J * grim1 ~ben@68-117-50-137.dhcp.roch.mn.charter.com 1210884130 M * grim1 Hi there -- are we aware of any awful bugs in 2.6.25-vs2.3.0.34.9? 1210884192 M * daniel_hozac a part from the missing features? 2 or 3 bugs are known, yes. 1210884229 M * grim1 which features are still missing? (unless there are too many to list) 1210884299 M * daniel_hozac pid spaces, scheduler support, vtime(?). 1210884358 M * emag booting, fs access, actually virtualizing, not sending swarms of zombie robots out to kill all humans... 1210884362 M * emag :-) 1210884374 M * daniel_hozac oh no, those all work. 1210884381 M * grim1 hah -- so not worth experimenting with? 1210884385 M * daniel_hozac there's a limit on the number of robots though. 1210884397 M * daniel_hozac just 49151 robots are supported at this time. 1210884568 M * grim1 showstopper type bugs? or just run of the mill box getting pwnd 30 seconds after it's booted? 1210884611 M * daniel_hozac if you're just gonna experiment, it's fine. 1210884628 M * daniel_hozac just make you get http://people.linux-vserver.org/~dhozac/p/k/delta-signal-fix03.diff and http://people.linux-vserver.org/~dhozac/p/k/delta-space-fix01.diff 1210884631 M * daniel_hozac +sure 1210884694 Q * nou Ping timeout: 480 seconds 1210884739 M * grim1 but avoid production use, mainly because... swarms of zombie robots will destroy my data center? 1210884774 M * daniel_hozac right. 1210884785 M * grim1 fun fun 1210884804 M * grim1 few more months? 1210884827 M * daniel_hozac until what? 1210884833 M * grim1 until a stable release? 1210884840 M * daniel_hozac for 2.6.25? 1210884842 M * grim1 yep 1210884856 M * grim1 or something higher than 2.6.22 1210884862 M * daniel_hozac well, it depends. 1210884902 M * daniel_hozac it's maybe two weeks actual work, but then you need testing. loooooots of testing. 1210884945 M * daniel_hozac (note: two weeks is an optimistic guess. since (AFAIK) nobody has tried working with CFS, it may very well take longer) 1210885103 Q * bonbons Quit: Leaving 1210885184 M * grim1 thanks for the update 1210885191 J * JonB ~NoSuchUse@77.75.164.169 1210885251 M * grim1 is the project status published anywhere? I couldn't find anything on the experimental releases 1210885267 M * daniel_hozac IRC logs, i guess. 1210885279 M * grim1 yeah, that's about it... tough sifting though 1210885358 M * daniel_hozac feel free to start publishing it :) 1210885398 M * grim1 then I'd have to pay attention! 1210885657 J * nou Chaton@causse.larzac.fr.eu.org 1210885693 J * Aiken ~james@ppp121-45-230-114.lns1.bne4.internode.on.net 1210885762 J * Bertl herbert@IRC.13thfloor.at 1210886172 M * Bertl hmm .. strange .. I keep timing out today for no aparrent reason 1210886217 M * daniel_hozac yeah, Pinky survives... 1210886228 M * Bertl yeah, that's the funny part :) 1210886327 M * daniel_hozac Andy preferred removing the optimization in vx_try_unhold (or explicitly setting the idle_skip before returning) over sched-fix06. 1210886382 M * Bertl okay, that's fine for me, shouldn't make any big difference 1210886416 M * daniel_hozac right 1210886457 M * Bertl is he fine with the rest? 1210886486 M * daniel_hozac i guess so, he didn't explicitly comment on them. 1210886510 M * Bertl doener_: didn't you plan to play with the scheduler last time? (for 2.6.25+ that is) 1210886542 M * daniel_hozac i think we've all planned on playing with the scheduler... 1210886579 M * doener_ Bertl: yeah, but I got lost in the code at some point, and life catched up as well :-( 1210886585 M * Bertl hmm, well, I didn't plan to do so (yet), was hoping for the control groups stuff to improve 1210886649 M * daniel_hozac it doesn't look like anyone cares for limiting CPU. 1210886659 M * Bertl doener_: np, just curious what you found so far 1210886706 M * doener_ I turned that off in my config when I first saw it limiting a task to 50% cpu because another user has another cpu churning task (which was niced to 19...) 1210886714 M * Bertl doener_, daniel_hozac: care to do a short brainstorming about this subject? 1210886745 M * Bertl doener_: turned what off? 1210886757 M * doener_ the group scheduler 1210886786 M * doener_ the uid based grouping was crap (hey, I said nice 19 for a reason...) 1210886796 M * Bertl ah, because it gave fair sharing between uids, I get it 1210886799 M * doener_ and the explicitly groups were of no interest to me 1210886821 M * daniel_hozac Bertl: sure. 1210886877 M * doener_ WRT to working on it, I basically ended up looking at some bug report again, about breakage with some hr timer stuff. And there I got stuck for a looong time 1210886975 M * doener_ I wanted to check at some point, whether the control groups are namespace-aware in _some_ way, but never got around it 1210886984 M * Bertl well, I think the actual changes in the hard scheduler design are minimal to make it work with the 'completely fair' scheduler 1210886993 N * DoberMann[PullA] DoberMann 1210887015 M * Bertl IMHO we have three issues/areas to work on 1210887028 M * daniel_hozac doener_: what do you mean? 1210887030 M * Bertl first, the change from runqueues to rb-trees 1210887081 M * Bertl second, the change from tick based to completely tick less (i.e. time based) 1210887099 M * doener_ because OToneH it seems like using the control groups would be smart, but OTOH, we'd then need them recursively. Once to group the guests and then in those groups, the "normal" support 1210887145 M * Bertl and finally, from priority adjustments to calculated fair share priorities 1210887174 M * Bertl yes, I think the control groups will give us some headache to incorporate in Linux-VServer 1210887215 M * daniel_hozac i don't know... it shouldn't be too hard. 1210887224 M * Bertl but I also think we should treat them like a namespace, i.e. allow them to be set on a per guest basis 1210887226 M * daniel_hozac but i haven't looked at the controllers yet 1210887264 M * Bertl (and ignore stuff like per user control groups and such) 1210887307 M * doener_ if there's sub-control-group support, it should be quite easy to adapt that, the math should be "kind of" there for us... but I didn't look at it yet either 1210887337 M * daniel_hozac i don't think any of the controllers support hierarchical resources yet. 1210887346 M * daniel_hozac but i think it'd be fine to just not allow guests to use cgroups. 1210887360 M * Bertl well, there is a reason for not having guests inside guests ... they tend to complicate typical hot-pathes 1210887370 M * daniel_hozac (i thought they needed something like CAP_SYS_ADMIN either way) 1210887418 M * Bertl doener_: don't forget that although the math is there, it needs to calculated (and that will happen fairly often) 1210887447 Q * MatBoy Quit: Ik ga weg 1210887480 M * doener_ Bertl: yeah, but if it's there, mainline has probably already optimized it. Though I kinda doubt that it actually _is_ there. 1210887517 M * daniel_hozac if it is, it's on a per-controller basis. there's no high-level support for it. 1210887520 M * Bertl now, IMHO the alternative to sticking the hard cpu scheduler ontop of the cfscheduler we could think about improving the cfs scheduler to allow for cpu limits 1210887522 Q * cryptronic Quit: Leaving. 1210887539 M * Bertl +is 1210887561 M * Bertl i.e. for actual hard limits 1210887572 M * daniel_hozac wouldn't that be pretty much the same thing, only the latter drops all the vx_ names? :) 1210887582 M * daniel_hozac oh, no idle time? 1210887599 M * Bertl precisely, that could be covered by the fair scheduling part 1210887614 M * daniel_hozac hmm, i suppose. 1210887649 M * daniel_hozac nice values don't have quite the same granularity though. 1210887695 M * Bertl the thing is, although I'd love to improve on the mainline scheduler, the hard cpu scheduler is a big developments and (except for corner cases and small bugs we still keep fixing) has proven quite stable 1210887710 M * daniel_hozac right. 1210887793 Q * mick_work Ping timeout: 480 seconds 1210887881 M * Bertl okay, any ideas/insights so far how to control the time a process is scheduled, without harming the entire cfs framework? 1210887901 Q * balbir Read error: Operation timed out 1210887904 J * brc bruce@megarapido.cliquerapido.com.br 1210888384 M * Bertl I also had the strange idea to add a 'dummy' CPU which just doesn't get work done at all, but allows a control mechanism (token bucket) to move tasks to/from 1210888463 M * doener_ hm, somehow that sounds familiar... 1210888568 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210888577 J * yarihm ~yarihm@84-75-103-252.dclient.hispeed.ch 1210888594 M * daniel_hozac that sounds like a lot of work. 1210888608 M * doener_ ah, was with cpusets, but for memory (fake NUMA nodes) 1210888624 Q * ntrs__ Ping timeout: 480 seconds 1210888659 M * Bertl the main question is, what happens when we grab a task out of the rb tree and put it on our own rb-tree for some time 1210888673 M * Bertl (like with the hold-queue) 1210888703 M * daniel_hozac do we need our own rb-tree? 1210888713 M * Bertl will this or putting it back make the cfs go haywire? 1210888723 Q * dna Quit: Verlassend 1210888736 M * daniel_hozac i'd assume tasks that aren't running also aren't present in the rb-tree. is that a faulty assumption? 1210888743 M * Bertl good question, a different aaproach would be to keep the tasks from scheduling 1210888781 M * Bertl the question is, how does the rb-tree perform if there are 1000+ tasks which do not want to be scheduled :) 1210888786 Q * JonB Ping timeout: 480 seconds 1210888831 M * Bertl i.e. we would have to keep them somehow out of the calculation 1210888843 M * daniel_hozac isn't a hold queue easier? 1210888874 M * Bertl well, yeah, but that would require us to have hold-queue and rb-tree structures 1210888888 M * daniel_hozac why the need for the rb-tree? 1210888899 M * Bertl because mainline cfs uses it? 1210888917 M * daniel_hozac right, but why would _we_ need it? or do you mean the mainline rb-tree? 1210888948 M * Bertl yes, the current structure in each task 1210888961 M * daniel_hozac okay. 1210888997 M * Bertl but, I think the rb tree could actually serve some purpose in our hold-rbt too 1210889057 M * Bertl it could structure the tasks/contexts according to the time they are supposed to be idle 1210889080 M * daniel_hozac yeah, i guess that makes sense. 1210889094 M * daniel_hozac though we could do that with a list too, no? 1210889102 M * Bertl yes, definitely 1210889106 Q * yarihm Quit: This computer has gone to sleep 1210889109 M * doener_ quick glance over the code looks like the thread is dropped from the rb tree when it is no runnable 1210889128 M * Bertl dropped where? 1210889139 M * Bertl I mean, where is it at that time? 1210889173 M * Bertl hmm, let me rephrase that once again: where does it get 'stored' instead? 1210889175 M * doener_ that, I'm still trying to figure out, just noticed an rb_erase call somewhere in the call-chain 1210889250 M * doener_ the sched_entity stuff still confuses the hell out of me... 1210889275 M * daniel_hozac yeah... 1210889279 M * Bertl it was added to somehow accommodate the cgroups 1210889283 M * daniel_hozac i guess we'll need to use that. 1210889294 M * daniel_hozac for the idle time replacement. 1210889299 M * Bertl we could basically disable it for now 1210889321 Q * esa Ping timeout: 480 seconds 1210889337 M * Bertl (if we put the hard cpu scheduler on top as controller) 1210889407 Q * Piet Quit: Piet 1210889497 Q * _gh_ Remote host closed the connection 1210889660 Q * opuk Remote host closed the connection 1210889674 J * opuk ~kupo@2001:16d8:ffbd:100::10 1210889796 J * esa ~esa@ip-87-238-2-45.static.adsl.cheapnet.it 1210890129 J * bfremon ~ben@lns-bzn-52-82-65-102-239.adsl.proxad.net 1210890464 N * DoberMann DoberMann[ZZZzzz] 1210890918 Q * mick_work Ping timeout: 480 seconds 1210890969 M * doener_ hm, ok, so sched_entity does actually have a parent pointer 1210890975 N * arthur_ arthur 1210891692 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net 1210891809 J * [PUPPETS]Gonzo gonzo@fellatio.deswahnsinns.de 1210892069 J * Linus ~nuhx@bl7-130-23.dsl.telepac.pt 1210892077 Q * bfremon Quit: Leaving. 1210894804 M * Guy- hi, has the xfs link breaking corruption issue been fixed yet? 1210894844 M * Bertl don't think so .. and I doubt that xfs will get a lot of care in the near future 1210894870 M * Bertl (unless somebody _really_ cares about it :) 1210894897 M * Guy- are there any other known issues? 1210894902 M * Guy- with xfs, I mean 1210894925 M * Bertl with the changes in 2.6.24+, xfs would need a complete overhaul 1210894943 M * Guy- yikes 1210894954 M * Guy- what does that mean for the end user? 1210894977 M * Guy- sharing xfs filesystems between guests doesn't work? having an xfs filesystem mounted inside a single guest doesn't work? 1210894992 M * Bertl basically that we are likely going to remove the tagging and barrier flags 1210895019 M * Bertl which in turn, would make xfs not suitable for secure guests 1210895035 M * Guy- at least not as their rootfs, right? 1210895043 M * Bertl correct 1210895055 M * Guy- I don't think I care about tagging 1210895080 M * Bertl but we are investigating the options there .. actually the tagging is less a problem than the flags it seems 1210895116 M * Guy- can't you use extended attributes? 1210895130 M * Bertl the flagword we used (inside xfs) has been used up by other xfs flags, which basically kills our piggy-back ride we used to have 1210895161 M * Bertl you definitely don't want extended attribute checks in time sensitive code 1210895172 M * Guy- I see 1210895186 M * Bertl well, I definitely wouldn't want that, but I do not consider xfs a performant filesystem either 1210895204 M * Guy- it seems to be doing pretty well in multi-threaded benchmarks though 1210895232 M * Bertl do you have some with a good comparison to e.g. ext3? 1210895240 M * Guy- well, yes and no 1210895264 M * Bertl just curious, i.e. not starting the fs flame war (yet :) 1210895269 M * Guy- a friend ran some benchmarks that I'm inclined to take seriously 1210895288 M * Guy- he didn't just use bonnie but also iozone and postmark 1210895294 M * Guy- with several threads 1210895307 M * Bertl on the same physical partition(s)? 1210895312 M * Guy- but only one or two different hardware configurations, I don't recall that exactly 1210895315 M * Guy- yes, same phyiscal devices 1210895338 M * Bertl and the results are somewhere? 1210895344 M * Guy- he created one fs, ran the benchmarks (this took a couple of days), then removed the fs, created the new one, etc 1210895370 M * Guy- yes, I'm sure he still has them, I'll ask him when he wakes up 1210895386 M * Guy- the trend I though I could perceive was that ext3 deteriorated pretty rapidly as the number of threads went up 1210895395 M * Guy- whereas xfs and jfs didn't 1210895405 M * Guy- *thought 1210895419 M * Bertl would be interesting to look at 1210895538 M * Guy- anyway, doesn't xfs perform xattr lookups anyway? I thought posix ACLs were implemented as xattrs 1210895559 M * Bertl xattrs and extended attributes are two different things 1210895579 M * Bertl but yes, posix ACLs are implemented in extended attributes 1210895590 M * Bertl (doesn't make them more performant) 1210895644 M * Guy- what I meant was, would an additional attribute lookup make that much of a difference? 1210895660 M * Guy- also, I was under the impression that xattr and extended attributes were the same thing 1210895670 M * Guy- there's even a user_xattr or similar mount option 1210895677 M * Bertl no, but it would require to have those attributes turned on unconditionally, which I don't want to do 1210895689 Q * infowolfe Read error: Connection reset by peer 1210895743 M * Guy- it was my understanding that extended attributes were 'forks', as in alternative data areas of a file 1210895759 M * Guy- and that xattr was an abbreviation for extended attributes 1210895766 M * Guy- isn't this the case? 1210895780 M * Bertl nah, that is some kind of confusion the kernel creates 1210895783 J * infowolfe ~infowolfe@c-67-160-167-96.hsd1.or.comcast.net 1210895815 M * Bertl attributes like for example the immutable 'flag' or 'append' 1210895815 M * Guy- so which is what then? :) 1210895823 M * Guy- yes, those are attributes 1210895830 M * Bertl are not realized with extended attributes 1210895834 M * Guy- but they're not 'extended' attributes 1210895838 M * Guy- exactly 1210895842 M * Guy- they're just flags in the fs 1210895844 M * Bertl they are so called 'xattribs' 1210895851 M * Guy- oh, wonderful :) 1210895856 M * Bertl (guess what that stands for :) 1210895872 M * Guy- extra attributes? :) 1210895878 M * Bertl so, we are basically using those flags for our purpose 1210895899 M * Bertl unfortunately, xfs was kind of cheap when allocation space for them 1210895905 M * Bertl *allocating 1210895918 M * Bertl and now, they have run out of them completelsy 1210895921 M * Bertl *completely 1210895948 M * Bertl could be, that the xfs folks extend that in the near future 1210895962 M * Bertl at which point, we would be back in the game 1210895970 M * Guy- speaking of which, what's with this filesystem capabilities stuff? 1210895989 M * Bertl posix filesystem capabilities? 1210895990 M * Guy- have they finally gotten around to actually doing this so it works? we no longer need setuid root? 1210895993 M * Guy- yes