1210809625 Q * edlinuxguru Ping timeout: 480 seconds
1210812193 Q * mick_work Ping timeout: 480 seconds
1210812864 Q * besonen_mobile Ping timeout: 480 seconds
1210812971 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210813248 J * dna ~dna@106-233-dsl.kielnet.net
1210813643 Q * dna Quit: Verlassend
1210814620 Q * mire Ping timeout: 480 seconds
1210814713 Q * mick_work Ping timeout: 480 seconds
1210815461 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210818318 Q * zbyniu Ping timeout: 480 seconds
1210824793 Q * mick_work Ping timeout: 480 seconds
1210825279 J * cryptronic ~oli@p54A3B3C7.dip0.t-ipconnect.de
1210825566 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210826473 Q * balbir Read error: Operation timed out
1210827193 Q * mick_work Ping timeout: 480 seconds
1210827482 Q * cryptronic Quit: Leaving.
1210827759 J * yarihm ~yarihm@84-75-103-252.dclient.hispeed.ch
1210827943 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210829371 Q * yarihm Ping timeout: 480 seconds
1210829425 J * zbyniu ~zbyniu@host13-188.crowley.pl
1210829880 J * sharkjaw ~gab@64.28.12.166
1210829881 J * Slydder ~chuck@194.59.17.53
1210830369 J * yarihm ~yarihm@vpn-global-dhcp3-076.ethz.ch
1210831001 M * Bertl finally off to bed now ... have a good one everyone!
1210831006 N * Bertl Bertl_zZ
1210832162 J * balbir ~balbir@59.145.136.1
1210832382 J * ntrs__ ~ntrs@77.29.67.161
1210833897 J * hijacker__ ~hijacker@213.91.163.5
1210833897 Q * hijacker_ Read error: Connection reset by peer
1210834383 J * ntrs_ ~ntrs@77.29.67.79
1210834674 J * rgl ~rgl@bl8-135-125.dsl.telepac.pt
1210834677 M * rgl hi
1210834819 Q * ntrs__ Ping timeout: 480 seconds
1210835546 J * JonB ~NoSuchUse@77.75.164.169
1210836009 Q * rob-84x^ Ping timeout: 480 seconds
1210836320 Q * Slydder Remote host closed the connection
1210836352 J * Slydder ~chuck@194.59.17.53
1210836498 Q * ntrs_ Ping timeout: 480 seconds
1210836932 J * bfremon ~ben@lns-bzn-33-82-252-45-56.adsl.proxad.net
1210837007 J * rob-84x^ ~rob@submarine.ath.cx
1210837393 Q * mick_work Ping timeout: 480 seconds
1210837436 N * DoberMann[ZZZzzz] DoberMann
1210837531 Q * wibble Remote host closed the connection
1210837753 J * MatBoy ~MatBoy@wiljewelwetenhe.xs4all.nl
1210837772 Q * JonB Quit: This computer has gone to sleep
1210838167 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210839491 Q * bfremon Remote host closed the connection
1210839673 Q * mick_work Ping timeout: 480 seconds
1210839714 J * bfremon ~ben@lns-bzn-33-82-252-45-56.adsl.proxad.net
1210839936 J * JonB ~NoSuchUse@130.227.63.19
1210840196 J * dna ~dna@88-233-dsl.kielnet.net
1210840421 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210842114 Q * [PUPPETS]Gonzo Remote host closed the connection
1210843880 Q * bfremon Quit: Leaving.
1210844276 J * bfremon ben@lns-bzn-33-82-252-45-56.adsl.proxad.net
1210845140 J * bfremo1 ben@lns-bzn-52-82-65-102-239.adsl.proxad.net
1210845242 J * Pazzo ~ugelt@reserved-225136.rol.raiffeisen.net
1210845479 Q * bfremon Ping timeout: 480 seconds
1210846123 Q * rob-84x^ Quit: That's it for today
1210846130 J * rob-84x^ ~rob@submarine.ath.cx
1210846904 Q * Fire_Egl Ping timeout: 480 seconds
1210847515 J * Fire_Egl FireEgl@adsl-147-90-212.bhm.bellsouth.net
1210847692 J * Supaplex_ supaplex@166.70.62.194
1210847693 J * micah_ ~micah@micah.riseup.net
1210847726 Q * mick_work resistance.oftc.net tachyon.oftc.net
1210847726 Q * balbir resistance.oftc.net tachyon.oftc.net
1210847726 Q * hparker resistance.oftc.net tachyon.oftc.net
1210847726 Q * micah resistance.oftc.net tachyon.oftc.net
1210847726 Q * brc resistance.oftc.net tachyon.oftc.net
1210847726 Q * Supaplex resistance.oftc.net tachyon.oftc.net
1210848096 Q * MatBoy Remote host closed the connection
1210848239 J * MatBoy ~MatBoy@wiljewelwetenhe.xs4all.nl
1210849325 J * friendly ~friendly@ppp59-167-137-15.lns3.mel6.internode.on.net
1210849514 J * mire ~mire@36-175-222-85.adsl.verat.net
1210850738 Q * nox Ping timeout: 480 seconds
1210851418 Q * JonB Ping timeout: 480 seconds
1210851653 Q * Aiken Quit: Leaving
1210851690 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210852518 Q * mick_work Ping timeout: 480 seconds
1210852967 Q * friendly Quit: Leaving.
1210852978 J * JonB ~NoSuchUse@192.38.8.25
1210853128 Q * mire Ping timeout: 480 seconds
1210853279 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210854126 J * nox ~nox@static.88-198-17-175.clients.your-server.de
1210854388 Q * sharkjaw Quit: Leaving
1210855845 Q * Fire_Egl Ping timeout: 480 seconds
1210856769 J * FloF ~FloF@p50813807.dip0.t-ipconnect.de
1210856789 M * FloF hi
1210856802 Q * FloF 
1210857420 Q * dennis__ Remote host closed the connection
1210857436 J * ki1 ~kir@swsoft-msk-nat.sw.ru
1210857471 J * FireEgl FireEgl@adsl-17-148-127.bhm.bellsouth.net
1210857679 J * ki2 ~kir@swsoft-msk-nat.sw.ru
1210858041 Q * ki1 Ping timeout: 480 seconds
1210859002 J * squat_ ~squat@85-10-210-61.clients.your-server.de
1210859002 Q * squat Read error: Connection reset by peer
1210859006 N * squat_ squat
1210859216 J * edlinuxguru ~edlinuxgu@72.sub-72-125-71.myvzw.com
1210859223 Q * opuk Remote host closed the connection
1210859747 Q * Slydder Quit: Leaving.
1210859950 Q * Loki|muh Remote host closed the connection
1210860226 N * Bertl_zZ Bertl
1210860232 M * Bertl morning folks!
1210860264 J * rgl_ ~rgl@bl8-135-125.dsl.telepac.pt
1210860289 M * JonB hey Bertl
1210860435 M * sandman May I apply a new vserver patch to an older kernel (2.6.18)?
1210860538 M * Bertl sure, but it will most likely fail
1210860674 Q * rgl Ping timeout: 480 seconds
1210860705 J * opuk ~kupo@2001:16d8:ffbd:100::10
1210860729 M * sandman I see.
1210860791 J * ntrs_ ~ntrs@77.29.66.22
1210860803 M * Bertl why would you want to apply a (more) recent patch to 2.6.18?
1210861387 M * sandman Just that it's the one that's running in Debian Stable
1210861402 M * sandman not a problem though, I'll likely just be dist-upgrading to debian Lenny shortly
1210861576 J * pmenier ~pmenier@ACaen-152-1-68-97.w83-115.abo.wanadoo.fr
1210861599 M * kriebel what feature(s) are you looking for in this patch?
1210861619 M * daniel_hozac "maintainedness" is one that comes to mind...
1210861645 M * kriebel is the stable fork getting long in the tooth?
1210861657 M * daniel_hozac what?
1210861682 M * kriebel I thought the patch in debian stable was the advertised "stable" version of vserver
1210861689 M * kriebel but I never really checked
1210861704 M * Bertl it is the _previous_ stable release :)
1210861706 M * daniel_hozac that's old-stable.
1210861721 M * Bertl kriebel: i.e. the one we stopped working on about a year or so ago :)
1210861722 M * daniel_hozac i.e. nobody-cares-about-it-same-bugs-will-always-be-present-stable.
1210861957 M * kriebel I'm having trouble telling the version of the patch by poking at the running system, actually
1210861959 Q * edlinuxguru Ping timeout: 480 seconds
1210861999 M * Bertl kriebel: because debian, for whatever reason, removes the kernel name extension
1210862024 M * Bertl kriebel: but you can check the API version in /proc/virtual/info
1210862060 M * Bertl (or read up on the debian info/logs)
1210862098 M * kriebel I should email someone to request patch versions get put into package descriptions
1210862150 M * Bertl hehe, sounds like resolution: wontfix :)
1210862163 J * mire ~mire@36-175-222-85.adsl.verat.net
1210862593 Q * mick_work Ping timeout: 480 seconds
1210863143 M * kriebel well, I tried submitting a bug
1210863153 M * kriebel I think this is my first, or first in a looooong time
1210863161 M * kriebel don't know if it emailed properly
1210863166 M * Bertl keep us updated how it goes ...
1210863173 M * kriebel I will if it works
1210863368 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210863616 Q * phedny Ping timeout: 480 seconds
1210863916 J * phedny ~mark@2001:610:656::115
1210864100 J * _e1 ~sapan@aegis.CS.Princeton.EDU
1210864111 M * _e1 hullo
1210864115 M * _e1 hm.
1210864118 N * _e1 er
1210864124 M * Bertl hullo er!
1210864133 M * er greetings Bertl
1210864235 M * er so that kernel bug we were facing did in fact turn out in the vx_unhold loop
1210864247 M * er dunno if Daniel mentioned that here
1210864285 M * er anyhow, onward to the next bug
1210864288 M * Bertl don't remember .. so in princeton code only?
1210864305 M * er Bertl: in Princeton workloads only
1210864339 M * Bertl hmm? do we need to change anything in the mainline patch?
1210864363 M * er I think Andy is investigating the issue
1210864368 M * daniel_hozac i think andy's fixes are legit, and they're what caused the problem.
1210864400 M * er well, the freeze goes away when you disable his patch
1210864402 M * daniel_hozac i.e. i don't think you'll see it on a vanilla Linux-VServer kernel, without that patch.
1210864412 M * er yes, that's true.
1210864441 M * daniel_hozac but i think we should apply the current patch.
1210864444 M * er but he's of the opinion that the workload combined with his patch
1210864472 M * er leads to a degenerate case that could also happen in mainline, hence causing the freeze.
1210864482 M * Bertl okay, what's the patch/fixes we are talking about?
1210864484 M * er He's putting together a test to try to prove that, so we should hear from him about that soon.
1210864499 M * daniel_hozac https://svn.planet-lab.org/browser/linux-2.6/trunk/linux-2.6-210-vserver-cpu-sched.patch?format=txt
1210864583 N * micah_ micah
1210864656 M * er anyhow, the bug I'm trying to stomp out is a different one and involves this line in VNET that you wrote when you were down here:
1210864660 M * er skb->skb_tag = nx_current_nid();
1210864693 M * er in alloc_skb. So it turns out that alloc_skb need not always happen in process context
1210864719 M * Bertl no, but it should happen in the network context :)
1210864751 M * er hm, and what would that be for an incoming ICMP packet?
1210864769 M * daniel_hozac i think by process context he refers to != interrupt context.
1210864772 M * Bertl incoming packets without classification do not have any context by default
1210864795 M * er I see, and how does nx_current_nid report "not having any context" ?
1210864821 M * er right now it appears to be returning the previously active network context
1210864841 M * Bertl probably because the 'previous task' is still current
1210864867 M * Bertl you might want to special case that for irqs, as daniel pointed out
1210864871 M * er right.
1210864889 M * er so we tried that... we add an if (in_interrupt()) prior to that
1210864918 Q * Pazzo Quit: Ex-Chat
1210864919 M * er but that drops the skb_tag for all packets, suggesting that alloc_skb even for outgoing packets happens in interrupt context
1210864936 M * daniel_hozac hmm, it was !in_interrupt(), right?
1210864951 J * cryptronic ~oli@p54A3B3C7.dip0.t-ipconnect.de
1210864957 M * er daniel_hozac: urrrkkkkkkkkkkkkk
1210864994 M * Bertl regarding the svn patch, the third hunk in sched.c, why advance normal time in integral steps?
1210865065 M * daniel_hozac well, the integral is how much time it's going to use, no?
1210865100 M * daniel_hozac it's what's done for idle time already, and, as i recall, how you explained it to me :)
1210865113 M * daniel_hozac why should it be different from idle time?
1210865137 M * Bertl good point, so that is a clear bug then
1210865229 M * Bertl and what's the change in the math for delta_min?
1210865332 M * daniel_hozac that took me quite some time to figure out...
1210865448 M * Bertl ah, I see, we allocate the 'wrong' part of the interval
1210865461 Q * phedny Ping timeout: 480 seconds
1210865473 Q * mick_work Ping timeout: 480 seconds
1210865732 M * Bertl the delta<0 case only happens for new contexts, I guess?
1210865746 M * Bertl I mean, is an overrun really realistic?
1210865770 M * daniel_hozac new contexts are initialized to the current value of jiffies.
1210865795 M * daniel_hozac IIRC, the math worked out to 49 days or something for an overrun.
1210865848 M * Bertl well, 49 days without scheduling activity, and I presume the context is dead anyway :)
1210865939 M * daniel_hozac heh, well, it's not impossible on PL, with persistent contexts and all.
1210865948 M * Bertl okay
1210865953 M * daniel_hozac but yes, i'm not sure we want another check on such a hot path.
1210865986 M * daniel_hozac if anything, it should be an unlikely else if branch.
1210866247 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210866270 Q * ntrs_ Ping timeout: 480 seconds
1210866487 M * Bertl daniel_hozac: what about this one: http://vserver.13thfloor.at/Experimental/delta-sched-fix05.diff ?
1210866496 M * nox daniel_hozac: is it possible 2 apply the singal patch to a running kernel?
1210866525 M * Bertl IIRC, yes, there is a project to do that
1210866530 Q * sandman Remote host closed the connection
1210866556 M * nox Bertl: will it be included in the 2.3.0.35?
1210866564 J * balbir ~balbir@122.167.177.15
1210866607 M * Bertl nox: we are talking about this one: http://people.linux-vserver.org/~dhozac/p/k/delta-signal-fix03.diff, right?
1210866658 M * nox yes
1210866676 M * Bertl yep, is already in my tree
1210866690 M * nox great
1210866703 M * daniel_hozac Bertl: http://people.linux-vserver.org/~dhozac/p/k/delta-space-fix01.diff btw. pidspace patches are still in progress...
1210866725 M * Bertl yeah, saw that one
1210866728 M * Bertl tx
1210866760 Q * bronson Read error: Connection reset by peer
1210866802 J * bronson ~bronson@adsl-68-122-117-135.dsl.pltn13.pacbell.net
1210866852 M * daniel_hozac Bertl: what about ceiling the result?
1210866871 M * daniel_hozac (i.e. the tokens % sched_pc->fill_rate[1] part)
1210866901 M * daniel_hozac don't forget the sched_hard.h change, that's what caused the hang.
1210867151 M * Bertl well, the tokens % sched_pc->fill_rate[1] part looks like a bug to me .. i.e. I do not see a point in doing that
1210867194 Q * rob-84x^ Ping timeout: 480 seconds
1210867248 M * nox is there a estimated releasedate for .35?
1210867262 M * Bertl near future
1210867396 M * Bertl daniel_hozac: for the vx_try_unhold() part, what is the point in running the
1210867410 M * Bertl vx_set_rq_max_idle/min_skip without proper values?
1210867455 M * daniel_hozac i thought they were intiailized to 0
1210867471 M * Bertl well, maxidle = HZ, minskip=0
1210867477 M * daniel_hozac right.
1210867508 M * daniel_hozac minskip not being 0 is what causes an infinite loop in schedule.
1210867521 M * daniel_hozac i guess this is not all that relevant though, since most of this will need to be rewritten for 2.6.23+ anyway...
1210867555 M * Bertl yep, anyway, that looks like papering over a real issue
1210867624 M * daniel_hozac well, there's no context to schedule since all contexts require a bigger time slice than what is currently available.
1210867644 M * daniel_hozac but there's still some time left.
1210867659 M * Bertl wait, when the hold queue is empty, we have no _waiting_ contexts
1210867704 M * daniel_hozac hmm, right... so i guess there just aren't any processes to schedule, at all?
1210867718 M * daniel_hozac i never quite understood how this was triggered.
1210867720 M * Bertl in which case we should skip idle time or go idle
1210867733 M * daniel_hozac i guess it's the go idle part that doesn't happen.
1210867757 M * daniel_hozac vx_try_skip should probably check for an empty hold queue?
1210867800 M * Bertl yep, but if that was the error case, then there should be a log entry like:
1210867805 M * Bertl hold queue empty on cpu %d", cpu
1210867810 M * Bertl do we have those?
1210867823 M * daniel_hozac no, debugging is disabled in the kernel, and i don't think anybody tried to enable it.
1210867854 M * Bertl okay
1210867878 M * Bertl so I think we should assume that exactly this happened
1210867910 M * Bertl and I opt for adding a check to try_skip or even before that in the scheduler
1210867916 M * Bertl (will think about that)
1210868258 M * Bertl daniel_hozac: how about this one: http://vserver.13thfloor.at/Experimental/delta-sched-fix06.diff ?
1210868703 J * bonbons ~bonbons@2001:960:7ab:0:2c0:9fff:fe2d:39d
1210869406 N * DoberMann DoberMann[PullA]
1210871258 J * phedny ~mark@2001:610:656::115
1210871302 J * ntrs_ ~ntrs@77.29.66.22
1210871396 Q * er Quit: Leaving.
1210871444 N * Guest818 phedny_
1210871668 Q * Linus Ping timeout: 480 seconds
1210872314 Q * yarihm Quit: This computer has gone to sleep
1210873602 Q * bfremo1 Quit: Leaving.
1210873929 J * rob-84x^ ~rob@submarine.ath.cx
1210874309 Q * Bertl Ping timeout: 480 seconds
1210874427 N * pmenier pmenier_off
1210874521 J * hparker ~hparker@linux.homershut.net
1210874812 J * Piet ~piet@tor.noreply.org
1210875193 Q * mick_work Ping timeout: 480 seconds
1210875804 J * Bertl herbert@IRC.13thfloor.at
1210875967 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210876783 M * daniel_hozac Bertl: looks good.
1210876873 Q * mick_work Ping timeout: 480 seconds
1210876950 M * Bertl could you be so kind and point andy to those patches?
1210876970 M * daniel_hozac okay, will do.
1210876977 M * Bertl it might make perfect sense to test them on planetlab with the problematic loads
1210877459 Q * JonB Quit: This computer has gone to sleep
1210877620 J * ntrs__ ~ntrs@77.29.65.228
1210877640 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210878055 Q * ntrs_ Ping timeout: 480 seconds
1210879821 J * doener_ ~doener@i577AFC80.versanet.de
1210879926 Q * doener Ping timeout: 480 seconds
1210879933 J * docelic ~docelic@78.134.196.168
1210880244 Q * rgl_ Quit: Saindo
1210880289 Q * Bertl Ping timeout: 480 seconds
1210882451 J * phedny__ ~mark@2001:610:656::115
1210882490 J * arthur_ ~arthur@pan.madism.org
1210882577 J * ktwilight ~ktwilight@136.76-66-87.adsl-dyn.isp.belgacom.be
1210882718 Q * phedny charon.oftc.net venus.oftc.net
1210882718 Q * arthur charon.oftc.net venus.oftc.net
1210882718 Q * bzed charon.oftc.net venus.oftc.net
1210882718 Q * nou charon.oftc.net venus.oftc.net
1210882718 Q * ard charon.oftc.net venus.oftc.net
1210882753 J * nou Chaton@causse.larzac.fr.eu.org
1210882854 Q * ktwilight_ Ping timeout: 480 seconds
1210883737 J * ard ~ard@shell2.kwaak.net
1210883759 J * bzed ~bzed@devel.recluse.de
1210883980 J * grim1 ~ben@68-117-50-137.dhcp.roch.mn.charter.com
1210884130 M * grim1 Hi there -- are we aware of any awful bugs in 2.6.25-vs2.3.0.34.9?
1210884192 M * daniel_hozac a part from the missing features? 2 or 3 bugs are known, yes.
1210884229 M * grim1 which features are still missing?  (unless there are too many to list)
1210884299 M * daniel_hozac pid spaces, scheduler support, vtime(?).
1210884358 M * emag booting, fs access, actually virtualizing, not sending swarms of zombie robots out to kill all humans...
1210884362 M * emag :-)
1210884374 M * daniel_hozac oh no, those all work.
1210884381 M * grim1 hah -- so not worth experimenting with?
1210884385 M * daniel_hozac there's a limit on the number of robots though.
1210884397 M * daniel_hozac just 49151 robots are supported at this time.
1210884568 M * grim1 showstopper type bugs? or just run of the mill box getting pwnd 30 seconds after it's booted?
1210884611 M * daniel_hozac if you're just gonna experiment, it's fine.
1210884628 M * daniel_hozac just make you get http://people.linux-vserver.org/~dhozac/p/k/delta-signal-fix03.diff and http://people.linux-vserver.org/~dhozac/p/k/delta-space-fix01.diff
1210884631 M * daniel_hozac +sure
1210884694 Q * nou Ping timeout: 480 seconds
1210884739 M * grim1 but avoid production use, mainly because... swarms of zombie robots will destroy my data center?
1210884774 M * daniel_hozac right.
1210884785 M * grim1 fun fun
1210884804 M * grim1 few more months?
1210884827 M * daniel_hozac until what?
1210884833 M * grim1 until a stable release?
1210884840 M * daniel_hozac for 2.6.25?
1210884842 M * grim1 yep
1210884856 M * grim1 or something higher than 2.6.22
1210884862 M * daniel_hozac well, it depends.
1210884902 M * daniel_hozac it's maybe two weeks actual work, but then you need testing. loooooots of testing.
1210884945 M * daniel_hozac (note: two weeks is an optimistic guess. since (AFAIK) nobody has tried working with CFS, it may very well take longer)
1210885103 Q * bonbons Quit: Leaving
1210885184 M * grim1 thanks for the update
1210885191 J * JonB ~NoSuchUse@77.75.164.169
1210885251 M * grim1 is the project status published anywhere?  I couldn't find anything on the experimental releases
1210885267 M * daniel_hozac IRC logs, i guess.
1210885279 M * grim1 yeah, that's about it... tough sifting though
1210885358 M * daniel_hozac feel free to start publishing it :)
1210885398 M * grim1 then I'd have to pay attention!
1210885657 J * nou Chaton@causse.larzac.fr.eu.org
1210885693 J * Aiken ~james@ppp121-45-230-114.lns1.bne4.internode.on.net
1210885762 J * Bertl herbert@IRC.13thfloor.at
1210886172 M * Bertl hmm .. strange .. I keep timing out today for no aparrent reason
1210886217 M * daniel_hozac yeah, Pinky survives...
1210886228 M * Bertl yeah, that's the funny part :)
1210886327 M * daniel_hozac Andy preferred removing the optimization in vx_try_unhold (or explicitly setting the idle_skip before returning) over sched-fix06.
1210886382 M * Bertl okay, that's fine for me, shouldn't make any big difference
1210886416 M * daniel_hozac right
1210886457 M * Bertl is he fine with the rest?
1210886486 M * daniel_hozac i guess so, he didn't explicitly comment on them.
1210886510 M * Bertl doener_: didn't you plan to play with the scheduler last time? (for 2.6.25+ that is)
1210886542 M * daniel_hozac i think we've all planned on playing with the scheduler...
1210886579 M * doener_ Bertl: yeah, but I got lost in the code at some point, and life catched up as well :-(
1210886585 M * Bertl hmm, well, I didn't plan to do so (yet), was hoping for the control groups stuff to improve
1210886649 M * daniel_hozac it doesn't look like anyone cares for limiting CPU.
1210886659 M * Bertl doener_: np, just curious what you found so far
1210886706 M * doener_ I turned that off in my config when I first saw it limiting a task to 50% cpu because another user has another cpu churning task (which was niced to 19...)
1210886714 M * Bertl doener_, daniel_hozac: care to do a short brainstorming about this subject?
1210886745 M * Bertl doener_: turned what off?
1210886757 M * doener_ the group scheduler
1210886786 M * doener_ the uid based grouping was crap (hey, I said nice 19 for a reason...)
1210886796 M * Bertl ah, because it gave fair sharing between uids, I get it
1210886799 M * doener_ and the explicitly groups were of no interest to me
1210886821 M * daniel_hozac Bertl: sure.
1210886877 M * doener_ WRT to working on it, I basically ended up looking at some bug report again, about breakage with some hr timer stuff. And there I got stuck for a looong time
1210886975 M * doener_ I wanted to check at some point, whether the control groups are namespace-aware in _some_ way, but never got around it
1210886984 M * Bertl well, I think the actual changes in the hard scheduler design are minimal to make it work with the 'completely fair' scheduler
1210886993 N * DoberMann[PullA] DoberMann
1210887015 M * Bertl IMHO we have three issues/areas to work on
1210887028 M * daniel_hozac doener_: what do you mean?
1210887030 M * Bertl first, the change from runqueues to rb-trees
1210887081 M * Bertl second, the change from tick based to completely tick less (i.e. time based)
1210887099 M * doener_ because OToneH it seems like using the control groups would be smart, but OTOH, we'd then need them recursively. Once to group the guests and then in those groups, the "normal" support
1210887145 M * Bertl and finally, from priority adjustments to calculated fair share priorities
1210887174 M * Bertl yes, I think the control groups will give us some headache to incorporate in Linux-VServer 
1210887215 M * daniel_hozac i don't know... it shouldn't be too hard.
1210887224 M * Bertl but I also think we should treat them like a namespace, i.e. allow them to be set on a per guest basis
1210887226 M * daniel_hozac but i haven't looked at the controllers yet
1210887264 M * Bertl (and ignore stuff like per user control groups and such)
1210887307 M * doener_ if there's sub-control-group support, it should be quite easy to adapt that, the math should be "kind of" there for us... but I didn't look at it yet either
1210887337 M * daniel_hozac i don't think any of the controllers support hierarchical resources yet.
1210887346 M * daniel_hozac but i think it'd be fine to just not allow guests to use cgroups.
1210887360 M * Bertl well, there is a reason for not having guests inside guests ... they tend to complicate typical hot-pathes 
1210887370 M * daniel_hozac (i thought they needed something like CAP_SYS_ADMIN either way)
1210887418 M * Bertl doener_: don't forget that although the math is there, it needs to calculated (and that will happen fairly often)
1210887447 Q * MatBoy Quit: Ik ga weg
1210887480 M * doener_ Bertl: yeah, but if it's there, mainline has probably already optimized it. Though I kinda doubt that it actually _is_ there.
1210887517 M * daniel_hozac if it is, it's on a per-controller basis. there's no high-level support for it.
1210887520 M * Bertl now, IMHO the alternative to sticking the hard cpu scheduler ontop of the cfscheduler we could think about improving the cfs scheduler to allow for cpu limits
1210887522 Q * cryptronic Quit: Leaving.
1210887539 M * Bertl +is
1210887561 M * Bertl i.e. for actual hard limits
1210887572 M * daniel_hozac wouldn't that be pretty much the same thing, only the latter drops all the vx_ names? :)
1210887582 M * daniel_hozac oh, no idle time?
1210887599 M * Bertl precisely, that could be covered by the fair scheduling part
1210887614 M * daniel_hozac hmm, i suppose.
1210887649 M * daniel_hozac nice values don't have quite the same granularity though.
1210887695 M * Bertl the thing is, although I'd love to improve on the mainline scheduler, the hard cpu scheduler is a big developments and (except for corner cases and small bugs we still keep fixing) has proven quite stable
1210887710 M * daniel_hozac right.
1210887793 Q * mick_work Ping timeout: 480 seconds
1210887881 M * Bertl okay, any ideas/insights so far how to control the time a process is scheduled, without harming the entire cfs framework?
1210887901 Q * balbir Read error: Operation timed out
1210887904 J * brc bruce@megarapido.cliquerapido.com.br
1210888384 M * Bertl I also had the strange idea to add a 'dummy' CPU which just doesn't get work done at all, but allows a control mechanism (token bucket) to move tasks to/from
1210888463 M * doener_ hm, somehow that sounds familiar...
1210888568 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210888577 J * yarihm ~yarihm@84-75-103-252.dclient.hispeed.ch
1210888594 M * daniel_hozac that sounds like a lot of work.
1210888608 M * doener_ ah, was with cpusets, but for memory (fake NUMA nodes)
1210888624 Q * ntrs__ Ping timeout: 480 seconds
1210888659 M * Bertl the main question is, what happens when we grab a task out of the rb tree and put it on our own rb-tree for some time
1210888673 M * Bertl (like with the hold-queue)
1210888703 M * daniel_hozac do we need our own rb-tree?
1210888713 M * Bertl will this or putting it back make the cfs go haywire?
1210888723 Q * dna Quit: Verlassend
1210888736 M * daniel_hozac i'd assume tasks that aren't running also aren't present in the rb-tree. is that a faulty assumption?
1210888743 M * Bertl good question, a different aaproach would be to keep the tasks from scheduling
1210888781 M * Bertl the question is, how does the rb-tree perform if there are 1000+ tasks which do not want to be scheduled :)
1210888786 Q * JonB Ping timeout: 480 seconds
1210888831 M * Bertl i.e. we would have to keep them somehow out of the calculation
1210888843 M * daniel_hozac isn't a hold queue easier?
1210888874 M * Bertl well, yeah, but that would require us to have hold-queue and rb-tree structures
1210888888 M * daniel_hozac why the need for the rb-tree?
1210888899 M * Bertl because mainline cfs uses it?
1210888917 M * daniel_hozac right, but why would _we_ need it? or do you mean the mainline rb-tree?
1210888948 M * Bertl yes, the current structure in each task
1210888961 M * daniel_hozac okay.
1210888997 M * Bertl but, I think the rb tree could actually serve some purpose in our hold-rbt too
1210889057 M * Bertl it could structure the tasks/contexts according to the time they are supposed to be idle
1210889080 M * daniel_hozac yeah, i guess that makes sense.
1210889094 M * daniel_hozac though we could do that with a list too, no?
1210889102 M * Bertl yes, definitely
1210889106 Q * yarihm Quit: This computer has gone to sleep
1210889109 M * doener_ quick glance over the code looks like the thread is dropped from the rb tree when it is no runnable
1210889128 M * Bertl dropped where?
1210889139 M * Bertl I mean, where is it at that time?
1210889173 M * Bertl hmm, let me rephrase that once again: where does it get 'stored' instead?
1210889175 M * doener_ that, I'm still trying to figure out, just noticed an rb_erase call somewhere in the call-chain
1210889250 M * doener_ the sched_entity stuff still confuses the hell out of me...
1210889275 M * daniel_hozac yeah...
1210889279 M * Bertl it was added to somehow accommodate the cgroups
1210889283 M * daniel_hozac i guess we'll need to use that.
1210889294 M * daniel_hozac for the idle time replacement.
1210889299 M * Bertl we could basically disable it for now
1210889321 Q * esa Ping timeout: 480 seconds
1210889337 M * Bertl (if we put the hard cpu scheduler on top as controller)
1210889407 Q * Piet Quit: Piet
1210889497 Q * _gh_ Remote host closed the connection
1210889660 Q * opuk Remote host closed the connection
1210889674 J * opuk ~kupo@2001:16d8:ffbd:100::10
1210889796 J * esa ~esa@ip-87-238-2-45.static.adsl.cheapnet.it
1210890129 J * bfremon ~ben@lns-bzn-52-82-65-102-239.adsl.proxad.net
1210890464 N * DoberMann DoberMann[ZZZzzz]
1210890918 Q * mick_work Ping timeout: 480 seconds
1210890969 M * doener_ hm, ok, so sched_entity does actually have a parent pointer
1210890975 N * arthur_ arthur
1210891692 J * mick_work ~clamwin@h-74-2-196-226.miatflad.covad.net
1210891809 J * [PUPPETS]Gonzo gonzo@fellatio.deswahnsinns.de
1210892069 J * Linus ~nuhx@bl7-130-23.dsl.telepac.pt
1210892077 Q * bfremon Quit: Leaving.
1210894804 M * Guy- hi, has the xfs link breaking corruption issue been fixed yet?
1210894844 M * Bertl don't think so .. and I doubt that xfs will get a lot of care in the near future
1210894870 M * Bertl (unless somebody _really_ cares about it :)
1210894897 M * Guy- are there any other known issues?
1210894902 M * Guy- with xfs, I mean
1210894925 M * Bertl with the changes in 2.6.24+, xfs would need a complete overhaul
1210894943 M * Guy- yikes
1210894954 M * Guy- what does that mean for the end user?
1210894977 M * Guy- sharing xfs filesystems between guests doesn't work? having an xfs filesystem mounted inside a single guest doesn't work?
1210894992 M * Bertl basically that we are likely going to remove the tagging and barrier flags
1210895019 M * Bertl which in turn, would make xfs not suitable for secure guests
1210895035 M * Guy- at least not as their rootfs, right?
1210895043 M * Bertl correct
1210895055 M * Guy- I don't think I care about tagging
1210895080 M * Bertl but we are investigating the options there .. actually the tagging is less a problem than the flags it seems
1210895116 M * Guy- can't you use extended attributes?
1210895130 M * Bertl the flagword we used (inside xfs) has been used up by other xfs flags, which basically kills our piggy-back ride we used to have
1210895161 M * Bertl you definitely don't want extended attribute checks in time sensitive code
1210895172 M * Guy- I see
1210895186 M * Bertl well, I definitely wouldn't want that, but I do not consider xfs a performant filesystem either
1210895204 M * Guy- it seems to be doing pretty well in multi-threaded benchmarks though
1210895232 M * Bertl do you have some with a good comparison to e.g. ext3?
1210895240 M * Guy- well, yes and no
1210895264 M * Bertl just curious, i.e. not starting the fs flame war (yet :)
1210895269 M * Guy- a friend ran some benchmarks that I'm inclined to take seriously
1210895288 M * Guy- he didn't just use bonnie but also iozone and postmark
1210895294 M * Guy- with several threads
1210895307 M * Bertl on the same physical partition(s)?
1210895312 M * Guy- but only one or two different hardware configurations, I don't recall that exactly
1210895315 M * Guy- yes, same phyiscal devices
1210895338 M * Bertl and the results are somewhere?
1210895344 M * Guy- he created one fs, ran the benchmarks (this took a couple of days), then removed the fs, created the new one, etc
1210895370 M * Guy- yes, I'm sure he still has them, I'll ask him when he wakes up
1210895386 M * Guy- the trend I though I could perceive was that ext3 deteriorated pretty rapidly as the number of threads went up
1210895395 M * Guy- whereas xfs and jfs didn't
1210895405 M * Guy- *thought
1210895419 M * Bertl would be interesting to look at
1210895538 M * Guy- anyway, doesn't xfs perform xattr lookups anyway? I thought posix ACLs were implemented as xattrs
1210895559 M * Bertl xattrs and extended attributes are two different things
1210895579 M * Bertl but yes, posix ACLs are implemented in extended attributes
1210895590 M * Bertl (doesn't make them more performant)
1210895644 M * Guy- what I meant was, would an additional attribute lookup make that much of a difference?
1210895660 M * Guy- also, I was under the impression that xattr and extended attributes were the same thing
1210895670 M * Guy- there's even a user_xattr or similar mount option
1210895677 M * Bertl no, but it would require to have those attributes turned on unconditionally, which I don't want to do
1210895689 Q * infowolfe Read error: Connection reset by peer
1210895743 M * Guy- it was my understanding that extended attributes were 'forks', as in alternative data areas of a file
1210895759 M * Guy- and that xattr was an abbreviation for extended attributes
1210895766 M * Guy- isn't this the case?
1210895780 M * Bertl nah, that is some kind of confusion the kernel creates
1210895783 J * infowolfe ~infowolfe@c-67-160-167-96.hsd1.or.comcast.net
1210895815 M * Bertl attributes like for example the immutable 'flag' or 'append'
1210895815 M * Guy- so which is what then? :)
1210895823 M * Guy- yes, those are attributes
1210895830 M * Bertl are not realized with extended attributes
1210895834 M * Guy- but they're not 'extended' attributes
1210895838 M * Guy- exactly
1210895842 M * Guy- they're just flags in the fs
1210895844 M * Bertl they are so called 'xattribs'
1210895851 M * Guy- oh, wonderful :)
1210895856 M * Bertl (guess what that stands for :)
1210895872 M * Guy- extra attributes? :)
1210895878 M * Bertl so, we are basically using those flags for our purpose
1210895899 M * Bertl unfortunately, xfs was kind of cheap when allocation space for them
1210895905 M * Bertl *allocating
1210895918 M * Bertl and now, they have run out of them completelsy
1210895921 M * Bertl *completely
1210895948 M * Bertl could be, that the xfs folks extend that in the near future
1210895962 M * Bertl at which point, we would be back in the game
1210895970 M * Guy- speaking of which, what's with this filesystem capabilities stuff?
1210895989 M * Bertl posix filesystem capabilities?
1210895990 M * Guy- have they finally gotten around to actually doing this so it works? we no longer need setuid root?
1210895993 M * Guy- yes