1158970195 M * spq_ derjohn2, how do i have to configure these rlimits? like written in the flower page or http://oldwiki.linux-vserver.org/Resource+Limits
1158970321 M * coocoon g8 too all
1158970324 Q * coocoon Quit: KVIrc 3.2.0 'Realia'
1158970742 M * derjohn2 spq_, http://linux-vserver.org/Frequently_Asked_Questions#How_do_I_limit_a_guests_RAM.3F_I_want_to_prevent_OOM_situations_on_the_host.21
1158970766 M * derjohn2 spq_, you can add more like for those options you'll find on the GFp.
1158970775 M * derjohn2 i am tired now, and off ... bye !
1158970795 M * spq_ ok gn8 thx
1158971242 Q * spq_ Quit: spq_
1158972305 Q * derjohn2 Ping timeout: 480 seconds
1158972310 J * derjohn2 ~aj@dslb-084-058-205-159.pools.arcor-ip.net
1158972949 J * bluelines ~bronson@c-71-198-75-160.hsd1.ca.comcast.net
1158973262 J * mircomayic ~AK768@65.23.247.223
1158973270 M * mircomayic !list
1158973278 P * mircomayic 
1158975110 J * shedi ~siggi@inferno.lhi.is
1158976010 Q * bluelines Ping timeout: 480 seconds
1158976165 Q * Johnnie Ping timeout: 480 seconds
1158976483 J * Johnnie ~jdlewis@jdlewis.org
1158978101 M * ntrs Can you limit a guest to run on only one CPU on a SMP server?
1158978165 M * doener using cpusets
1158978177 M * ntrs cpusets?
1158978181 M * ntrs how?
1158978243 M * ntrs where do you configure that in the guest's configuration?
1158978269 M * doener baggins made a patch for util-vserver to support that
1158978276 M * doener I'm currently searching for it
1158978291 M * doener AFAIK at least one distro (debian?) includes it
1158978339 M * ntrs no, I need it for vanilla
1158978381 M * doener http://list.linux-vserver.org/archive/vserver/msg11505.html
1158978398 M * doener no idea if that applies to .210/.211-rc1 
1158978468 M * doener on devel the new token bucket scheduler also supports per-cpu buckets (I think), so that would also be an option if you run devel
1158978534 M * ntrs Ok, thanks
1158978568 M * doener np
1158978602 M * doener off to bed now *yawns*
1158982144 Q * Johnnie Quit: G'bye!
1158982150 J * Johnnie ~jdlewis@jdlewis.org
1158982955 Q * _node Ping timeout: 480 seconds
1158983530 J * kseeker stanking@unix.shell.la
1158984960 Q * gerrit_ Ping timeout: 480 seconds
1158990717 J * AjAx-- hiddenserv@tor.noreply.org
1158991205 P * AjAx-- 
1158993592 J * gerrit ~gerrit@1153ahost99.starwoodbroadband.com
1158994627 J * meandtheshell ~markus@85.124.140.164
1158997828 J * dna_ ~naucki@157-211-dsl.kielnet.net
1158998336 J * bluelines ~bronson@c-71-198-75-160.hsd1.ca.comcast.net
1158999145 Q * derjohn2 Quit: Verlassend
1159000646 J * bonbons ~bonbons@83.222.36.111
1159001380 Q * bluelines Ping timeout: 480 seconds
1159002479 Q * harry Read error: Operation timed out
1159003415 M * daniel_hozac ntrs: 0.30.211-rc1 includes it.
1159004743 J * coocoon ~coocoon@p54A0536E.dip.t-dialin.net
1159004774 M * coocoon morning
1159004805 M * daniel_hozac good morning.
1159004830 M * coocoon hello daniel have u seen the how to
1159004856 M * daniel_hozac i saw the link, but i haven't read it yet.
1159004874 M * coocoon ah ok dunno if the name is better
1159004880 M * coocoon but ok
1159004888 M * daniel_hozac i think the name is fine.
1159005133 M * coocoon ok good but for searching a little bit too long vcd how to was shorter ;-)
1159005362 J * harry ~harry@d54C2508C.access.telenet.be
1159005748 M * yang does "reboot" on a host succesfully shutdowns also all guests, or do the guests need to be shutdown each by itself before that?
1159005789 M * daniel_hozac do you have an initscript in your reboot/halt runlevel that stops guests?
1159005800 M * matti If so, then yes.
1159005800 M * matti ;)
1159005804 M * yang i don't think so
1159005815 M * daniel_hozac so you don't start guests during boot either?
1159005820 M * yang nope
1159005825 M * yang i start them manually
1159005835 M * daniel_hozac then i guess you'll have to stop them manually too.
1159005839 M * yang ok
1159006404 J * ensc ~irc-ensc@p54B4DB36.dip.t-dialin.net
1159006419 M * ensc hi, is there an official fix for
1159006420 M * ensc kernel/built-in.o: In function `fill_pid':
1159006420 M * ensc taskstats.c:(.text+0x33c2e): undefined reference to `vx_rmap_pid'
1159006427 M * ensc with 2.6.18?
1159006445 M * ensc and vs2.0.2.1-t5?
1159007054 M * goblin vshelper.init: can not determine xid of vserver 'samba2'; returned value was ''
1159007063 M * goblin any ideas where this may come from? :-)
1159007085 M * goblin just after rc script finishes after I run vserver samba2 start
1159007874 Q * Greek0 Read error: Connection reset by peer
1159007936 J * Greek0 ~greek0@85.255.145.201
1159008182 M * goblin what is a xid?
1159008193 M * mnemoc context id
1159008270 M * goblin mhm
1159008518 M * goblin +++ /usr/local/sbin/vserver-info /usr/local/etc/vservers/samba2 CONTEXT false
1159008518 M * goblin ++ xid=
1159008533 M * goblin so why isn't it allocated...?
1159008541 M * mnemoc you have to define it
1159008556 M * goblin aaah.. that'd explain a lot...
1159008752 P * kseeker 
1159008753 M * goblin vserver build build --help shows: --context   ...  the static context of the vserver [default: none; a dynamic context will be assumed]
1159008778 M * goblin and I see plenty of symlinks in /etc/vservers/.defaults/run.rev
1159008784 M * goblin with different numbers
1159008790 M * goblin where do I define it then?
1159009019 M * goblin If I build the vserver with --context 100, I get the same error message from vshelper.init
1159009064 M * mnemoc read the flower page
1159009223 M * goblin mnemoc, hmm... the two places it mentions xid are /etc/vservers/name/run, which looks like a dynamically allocated thing, and /etc/vservers/.defaults/run.rev, which already has plenty of symlinks named with numbers and pointing to vserver directories
1159009305 M * mnemoc look for 'context'
1159009412 M * goblin /etc/vservers/name/context ? yes, I have this file in the vserver that I build with --context 100
1159009422 M * goblin and it contains '100', as expected
1159009461 M * goblin oh, you probably missed my main problem...
1159009485 M * goblin vshelper.init: can not determine xid of vserver 'samba3'; returned value was ''
1159009526 M * goblin which is what I get after I try to start the vserver, and after rc script has finished
1159009636 M * mnemoc uhm
1159009645 M * goblin the output of vserver --debug samba3 start is at http://uukgoblin.net/g
1159009711 Q * AndrewLee Ping timeout: 480 seconds
1159009896 M * mnemoc uhm
1159011204 Q * mire Quit: Leaving
1159011227 M * goblin hmm... I created a wrapper script which calls vserver-info <name> CONTEXT true instead of CONTEXT false
1159011252 M * goblin this returns the static xid of 100 properly now, starting the server doesn't produce the error any more
1159011269 M * goblin but when I try to enter the guest, it turns out that it's not running :-/
1159011943 M * goblin oh great.
1159011959 M * goblin I just need to run a background process in a guest! :-D
1159011979 M * goblin otherwise it apparently dies with no processes left. piece of cake. :->
1159013433 Q * shedi Quit: Leaving
1159013792 M * derjohn daniel_hozac, the patch compiled and the kernel boots. the patch compiled a 2nd time, with v6 not as module but compiled in. now I think I need patched tools.
1159014817 M * daniel_hozac derjohn: yep, or at least chbind6.
1159014845 M * cryptronic hi all, someone here who develops vserver-stat from util-vserver
1159014856 M * daniel_hozac ensc: #include <linux/vs_cvirt.h> is required in kernel/taskstats.c, kernel/rtmutex-debug.c and mm/migrate.c
1159014868 M * daniel_hozac cryptronic: don't trust the values.
1159014908 M * cryptronic daniel_hozac, that's not the thing i want to know ;)
1159014913 M * daniel_hozac goblin: that error message means that your guest doesn't keep a process running.
1159014932 M * goblin daniel_hozac, exactly. :-)
1159014936 M * cryptronic the question is how generete vserver-stat the uptime
1159014943 M * cryptronic from vservers
1159014962 M * goblin daniel_hozac, I've figured it out just few  messages ago :-)
1159014964 M * cryptronic because the content of /proc/virtual/xid/cvirt biasuptime is very strange
1159014986 M * daniel_hozac cryptronic: BiasUptime is IIRC the offset from the host's uptime.
1159015010 M * cryptronic ah, ok thanks :)
1159015020 M * cryptronic i'll try to get the correct value
1159015054 M * daniel_hozac cryptronic: i wouldn't trust vserver-stat's value for the uptime either, AFAICT it's using the oldest process's start time to determine it.
1159015160 M * cryptronic it's not the thing i want to trust vserver-stat ;) i'm developing a new webinterface for openvcp and there we include the uptime of the vservers with the content of /proc/virtual/xid/cvirt biasuptime. So i have to know how i have to use this value
1159015185 M * daniel_hozac i'm just saying, you shouldn't look at vserver-stat for how to do it correctly ;)
1159015211 M * cryptronic ok ;) next time i ask you directly ;)
1159015781 Q * weasel Ping timeout: 480 seconds
1159015905 M * cryptronic daniel_hozac, using biasuptime as offset works well :) thanks a lot :)
1159015935 M * daniel_hozac np.
1159016274 J * weasel weasel@asteria.debian.or.at
1159018541 Q * Kowi 
1159019220 N * Bertl_zZ Bertl
1159019224 M * Bertl morning folks!
1159019231 M * daniel_hozac morning Bertl!
1159019253 M * Bertl ensc: when do you get that?
1159019266 M * gdm Bertl: morning!
1159019268 M * daniel_hozac Bertl: when CONFIG_TASKSTATS is enabled.
1159019280 M * gdm Bertl: i have that old problem with a machine hanging again...
1159019281 M * daniel_hozac Bertl: http://people.linux-vserver.org/~dhozac/p/k/delta-headers-fix84.diff
1159019287 M * Bertl ah, good, tx!
1159019307 M * Bertl gdm: hanging means lockup?
1159019352 M * gdm Bertl: it was running backports kernel - i have to find out what that was again
1159019355 M * gdm Bertl: yep
1159019369 M * Bertl backports mean?
1159019370 M * gdm Bertl: i can show you some munin graphs about it... it  has been down for about 2 hours
1159019373 M * Bertl *means
1159019378 M * gdm debian backports
1159019390 M * Bertl which version is that in numbers?
1159019424 M * gdm 2.6.16-2 backport i think 
1159019434 M * gdm micah just had the exact same problem as well
1159019435 M * ensc Bertl: http://ensc.de/kernel-kosh-config
1159019456 M * Bertl ensc: thanks! seems to be fixed by daniels patch
1159019471 M * gdm htere is absolutely nothing in the munin graphs that suggest any problems.. cpu, memory, netstat, all the same
1159019497 M * gdm i guess the thing to do now is to get it rebooted and then to put the latest vanilla kernel on, yes?
1159019501 M * ensc yes; including right headers is not a problem. I just wondered that -t5 does not have the fix yet ;)
1159019511 M * Bertl micah, gdm: is some soft lockup and mutex/spinlock check enabled?
1159019520 M * Bertl ensc: yeah, obviously I ahd that option turned off ...
1159019529 M * Bertl ensc: will be in t6
1159019614 M * gdm oh, i don't understand what that means... i know that previously there has been no response to anything (keyboard, serial, ssh) and nothing on the screen when a monitor is plugged in
1159019614 M * Bertl ensc: glad to hear that you are testing 2.6.18 kernels!
1159019628 M * gdm currently, there is no response to sysrq (well, break via minicom)
1159019646 M * Bertl gdm: how long does it take the machine to hang?
1159019681 M * Bertl gdm: and more recent kernels might help, check with waldi for bleeding edge debian stuff
1159019698 M * ensc Bertl: why 'testing'? I hope I can use them ;)
1159019708 M * Bertl hehe, even better :)
1159019726 M * gdm Bertl: 7 1/2 days this time
1159019764 M * gdm but i now know of at least 5-6 different servers that this is happening to
1159019771 M * gdm run by 4 different sysadmins, at least
1159019780 M * gdm in several different locations :/
1159019782 M * daniel_hozac ensc: oh, and now that you're here, what was "if ((flag & MS_NODEV)!=0) flag |= MS_NODEV;" supposed to do in src/secure-mount.c?
1159019828 M * daniel_hozac (line 432 in 0.30.210)
1159019831 M * Bertl gdm: hmm, that's a long time for testing ...
1159019856 M * Bertl gdm: I would suggest to enable spinlock/mutex debugging and make sure that magic sysrq works
1159019858 M * gdm 7 1/2 days? yes
1159019870 M * gdm magic sysrq did work before
1159019881 M * Bertl but not when it happened?
1159019892 M * gdm not working now, no
1159019905 M * Bertl nmi watchdog is also on?
1159020185 M * gdm sorry, am just trying to get hold of someone to check/reboot the machine
1159020187 M * gdm brb
1159020343 M * ensc daniel_hozac: mmh... I am not sure what I meant with it... should be perhaps check the mask, not the flag
1159020494 Q * meandtheshell Quit: exit (0);
1159020911 M * ensc daniel_hozac: should be 'if (!(mnt->mask & MS_NODEV)) flag |= MS_NODEV;'
1159020976 M * ensc negated logic confuses me :(
1159021110 M * gdm Bertl: ok. someone else is testing sysrq as well for me.. but nothing for him, either
1159021143 M * Bertl gdm: hmm, that looks like a hardware issue then ...
1159021157 M * gdm no Bertl it is not a hardware issue
1159021159 Q * derjohn Read error: Connection reset by peer
1159021165 J * bone_idol ~bone_idol@springnight.burngreave.net
1159021165 M * gdm it has been extensively tested
1159021173 M * gdm memtest86 for about a week
1159021182 M * Bertl gdm: well, when you end up with irqs disabled, the nmi should kick in
1159021185 M * gdm cpuburn for over a month - like 42 days or somehing
1159021204 M * Bertl and when you have irqs enabled, the magic sysrq should work
1159021227 J * dkg ~dkg@lair.fifthhorseman.net
1159021241 M * Bertl gdm: but anyways, let's check a mainline kernel
1159021247 M * Bertl welcome dkg!
1159021250 M * dkg hi Bertl!
1159021254 M * dkg thanks for yer advice here.
1159021261 M * gdm Bertl: dkg is another sysadmin with me who has access
1159021276 M * gdm bone_idol is also a sysadmin of this same machine, but has no serial access
1159021281 M * bone_idol hi
1159021296 M * Bertl ah, cool, welcome bone_idol! :)
1159021338 M * gdm dkg: you can call in the reboot, right?
1159021347 M * Bertl gdm: I assume we already checked the config, yes?
1159021358 M * dkg yeah, i can call in a reboot.  shall i do that?
1159021366 M * dkg and what should we reboot into?
1159021384 M * gdm well, this is the stock debian backports kernel from backports .org
1159021395 M * gdm [sorry - buggy wirless here ';-)]
1159021409 M * Bertl gdm: ah, so whatever debian considers appropriate 
1159021432 M * gdm i think we should boot into the same... ?
1159021437 M * Bertl well, no idea about that actually, i.e. we have to ask/check with the maintainer
1159021438 M * gdm Bertl: yeah.
1159021449 M * dkg Bertl: not sure what background you've had here: the cpuburn was running on a RIP kernel (slackware-based)
1159021466 M * gdm well, how about we boot in to it, get the config out to show Bertl and then we can disable all the vservers
1159021472 M * gdm and set up a new kernel from there?
1159021476 M * dkg gdm: sounds good to me.
1159021476 M * gdm i.e. with nothing running?
1159021481 M * dkg calling it in now...
1159021486 M * gdm ok, thanks
1159021555 M * gdm Bertl: sorry to take over the channel here, you want us to go elsewhere for a bit?
1159021578 M * Bertl try to provide a few details for that machine, like memory, cpus and such (upload that to paste.linux-vserver.org)
1159021588 M * Bertl gdm: nah, channel is fine for this purpose
1159021606 M * gdm ok, i will upload stuff while dkg reboot
1159021606 M * gdm s
1159021795 M * gdm how long does stuff stay on paste.linux-vserver.org for?
1159021799 M * gdm does it get erased?
1159021805 M * Bertl not really
1159021813 M * gdm err, ok
1159021869 M * gdm ok, i just gave youo the login details for our admin wiki backup. hope that's ok
1159021880 M * gdm there is all the hardware info on that page
1159022025 M * Bertl okay
1159022174 J * mkhl mkhl@200-148-40-125.dsl.telesp.net.br
1159022178 M * gdm dkg, bone_idol - <Bertl> and do you have a typical memory (/proc/meminfo) output/graph when everything is running?
1159022188 M * gdm i think the answer is 'yes' 
1159022205 M * Bertl what I'm trying to figure is, are the 5GB memory actually used?
1159022218 M * Bertl (by the 14 guests running there)
1159022242 M * dkg machine is back up, but it's resyncing the RAID array, so it'll be a little while before it's accessible :/
1159022308 M * gdm Bertl: we have used the 5gb before, but i think we never got it all used.
1159022322 M * gdm this time, i think the backports kernel only had 4gb enabled
1159022350 M * dkg i could bring it down and back up into a ramfs-only boot so we have something to work with while the RAID arrays re-sync if that's desired.
1159022396 M * gdm ask Bertl ;-) but he's currently looking at the munin graphs too
1159022462 M * Bertl dkg: soft or hard raid?
1159022470 M * dkg Bertl: soft raid
1159022481 M * Bertl then it should be fine even with reconstruction
1159022489 M * Bertl (just not that performant)
1159022537 M * goblin where is http://www.linux-vserver.org/index.php?page=Linux-Vserver+FAQ gone?
1159022538 M * dkg yeah, it's just that crypto on LVM on top of the RAID tends to lag during the initramfs stage of a standard boot if the RAID arrays aren't properly synced.
1159022557 M * dkg but they've synced and we're almost up already anyway...
1159022558 M * goblin or, why do I have to run vprocunhide and what does it do? :-)
1159022603 M * Bertl goblin: I'm not sure that was ever there (try removing the www and/or replace it by oldwiki)
1159022612 M * gdm goblin: http://linux-vserver.org/Frequently_Asked_Questions <== is that the page you are looking for?
1159022645 M * Bertl goblin: procfs security is required to make the guest's procfs secure, it is 'configured' by the vprocunhide script
1159022686 M * goblin gdm, not sure. The link I pasted was output by "vserver name start" after saying that /proc/uptime cannot be accessed
1159022699 M * goblin I've found some forum which said to run vprocunhide first
1159022727 M * goblin Bertl, oh, OK then. I thought it removes some proc security from my host. :-)
1159022731 M * gdm goblin: i think that is a page from the 'oldwiki' - the site has been recently upgraded
1159022733 M * Bertl goblin: yes, vprocunhide should be run once after host-system startup
1159022786 M * goblin thank a lot then :-) perhaps something in util-vserver requires an update of this link (I'm using vs2.0.2-grsec2.1.9)
1159022805 M * Bertl gdm: ECC is enabled in the bios? (just checking)
1159022821 M * gdm ooh, i don't recall.. dkg ?
1159022833 M * dkg i think ECC is enabled in the BIOS, yes.
1159022834 M * gdm i will look and see if i have any notes anywhere about that
1159022874 M * dkg it showed up in the memtest console, iirc.
1159022888 P * mkhl 
1159022894 M * dkg btw, the machine is accessible on the network now.
1159022924 M * Bertl okay, let's start the guest and capture /proc/meminfo before and after the startup
1159022929 M * Bertl *guests
1159022941 M * gdm err, they maight be started already....
1159022944 M * gdm i guess i will go login
1159022948 M * dkg they have started already.
1159022950 M * Bertl okay, then only the meminfo after :)
1159023010 M * gdm http://paste.linux-vserver.org/400
1159023011 M * dkg Bertl: do you want that on paste.l-v.o?
1159023017 M * dkg nevermind
1159023021 M * gdm yeah, they are all started
1159023024 M * gdm sorry dkg ;-)
1159023056 M * Bertl okay, as I thought, memory is almsot unused, 4GB max atm, with highmem
1159023089 M * Bertl on this particular setup, you might be better of if you configure a 1/3 split and disable highmem completely
1159023098 M * Bertl will give you 3GB accessible memory
1159023136 M * Bertl is the swapspace on raid too? maybe encrypted?
1159023138 M * dkg Bertl: we've never gotten around to rebuilding the kernel with 64GB support at all.
1159023152 M * dkg swap is on crypt on lvm on raid :)
1159023168 M * Bertl dmcrypt or cryptloop or aes?
1159023168 M * dkg randomly keyed crypt
1159023172 M * dkg dmcrypt
1159023236 M * Bertl do we have a total/host cpu graph too somewhere? (or am I just too blind to see it :)
1159023293 M * Bertl the cpus have HT flag, so basically you should end up with 4 virtual cpus, no?
1159023347 M * matti Bertl: :)
1159023422 M * gdm Bertl: https://chili.freeit.org/munin/mayfirst.org/shadow.mayfirst.org-cpu.html
1159023435 M * matti gdm: :)
1159023438 M * dkg Bertl: we've disabled HT in the bios because it was previously a suspect cause of the flakiness.
1159023457 M * gdm we did have 64g highmem enabled with our first ever kernel which was 2.6.12 - but that was last year. we didn't have any problems with that kernel
1159023461 M * gdm matti: :)
1159023468 M * gdm matti: just a few days to go!
1159023516 M * matti gdm: Few?
1159023537 M * matti gdm: I have flight tommorow at 12:05 GMT +0200 ;p
1159023544 M * matti Damn.
1159023546 M * matti I am so tired.
1159023549 M * matti Sorry
1159023577 M * matti s/tommorow/tomorrow/
1159023677 M * Bertl dkg: okay, my suggestion would be to do the following:
1159023699 M * Bertl - remove the 512MB modules
1159023710 M * Bertl - configure 1/3 split w/o highmem
1159023714 M * matti k, sorry for interrupting.
1159023716 M * Bertl - reenable HT
1159023737 M * Bertl - configure/compile your own 2.6.17.13 kernel
1159023756 M * Bertl - bind certain irqs and kernel threads to fixed vCPUs
1159023810 M * dkg thanks for all the suggestions, Bertl!
1159023825 M * Bertl well, doesn't really address the hang issue
1159023831 M * matti :)
1159023833 M * dkg i don't think i know how to bind irqs and kernel threads, but i'll read up on that.
1159023868 M * gdm when you say "remove the 512MB modules" you mean take that bit of memory physically out of the machine?
1159023868 M * dkg When you say "remove the 512MB modules", you mean physically take them out of the machine, yes?
1159023873 M * dkg ha ha
1159023961 M * dkg i swear we are different people!
1159023985 M * m4z with different bodies also?
1159024002 M * gdm actually in different countries at the moment, so yes!
1159024019 M * gdm although i guess we both gott the same ip as we're logged in from same server i think
1159024021 M * m4z thats the point where i stop to believe you (;
1159024096 M * gdm actually, i think we're using different servers
1159024132 M * gdm Bertl: [question above about memory], also, any thoughts about the hang issue itself?
1159024145 M * Bertl dkg: yes, I would remove them physically
1159024180 M * Bertl well, I would like to see that it actually hangs with e.g. 2.0.2.1 on 2.6.17.x or 2.6.18
1159024226 M * gdm ok... our experience is that it has hung up to 14 days (or maybe a bit more) after starting up before
1159024238 M * gdm so we were planning to wait at least a month to call it "stable"
1159024263 M * gdm all the dottines on the munin graphs from around may/june was due to hanging
1159024550 M * Bertl lowering memory and/or increasing system load (i.e. enable that tor guest for example) might help to trigger issues
1159024586 M * Bertl a possible source for issues might also be the filesystem + iosystem
1159024618 M * Bertl e.g. crypt + lvm + raid was known to have issues (especially stack wise)
1159024632 M * Bertl similar goes for certain sata chipsets
1159024669 M * dkg Bertl: can you give me some pointers to the crypt+lvm+raid discussion?
1159024677 M * Bertl maybe xfs or reiser ontop of that, and you almost certainly get a good chance for crashes :)
1159024701 M * dkg no xfs or reiser, fortunately.
1159024710 M * dkg all ext3
1159024741 M * Bertl http://www.uwsg.iu.edu/hypermail/linux/kernel/0605.1/0133.html
1159024751 M * Bertl (just a random example :)
1159024771 M * gdm Bertl: but i don't think the other servers with similar issues have such a complex underlying setup
1159024790 M * gdm e.g. the ones run by micah and stefani and also someone i know in canada who has seen this happen
1159024792 M * Bertl that would be a valuable information
1159024811 M * dkg Bertl: thanks for the link.
1159024813 M * Bertl but for micah it would at least point towards the debian kernel
1159024834 M * Bertl (i.e. I would even more like to check with a mainline version :)
1159024841 M * gdm micah and stefani both run debian, and so does the guy in .ca - i suspect they mostly use debian kernels too
1159024854 M * gdm yes, we are going to rebuild a mainline kernel now for you ;-)
1159024871 M * gdm and then dkg will remove the memory either later today or tomorrow when he can access the box
1159024909 M * gdm also, should we disable the vservers for now?
1159024931 A * gdm thinks we probably should, so the machine doesn't become unresponsive during the kernel build
1159025068 M * ensc Bertl: ctx 1 does not seem to work anymore with -t5; vserver-stat gives out main-processes only. 'vkill' fails too with
1159025071 M * ensc  /usr/sbin/vkill -s INT --xid 152 -- 1
1159025074 M * ensc vkill: vc_ctx_kill(): No such process
1159025198 M * Bertl ah, probably get_task_for_pid() doesn't handle that case
1159025202 M * Bertl checking now
1159025302 Q * bubulak Ping timeout: 480 seconds
1159025345 J * bubulak ~bubulak@whisky.pendo.sk
1159026169 M * daniel_hozac Bertl: btw, any reason lxdialog.scrltmp is still in the patches? :) does it have some purpose
1159026171 M * dkg Bertl: we're rebuilding a stock vanilla 2.6.17.13 with vserver 2.0.2.1 as per your suggestion: do you have any kernel config preferences we should use?
1159026232 M * Bertl daniel_hozac: hum?
1159026306 M * gdm dkg: we should ensure nmiwatch is set and vserver kernel debug stuff, i think
1159026307 M * daniel_hozac Bertl: there's a file at the top of the tree named lxdialog.scrltmp that's created by the patch, and it contains 11.
1159026309 M * Bertl daniel_hozac: just means that distclean didn't reamove that properly
1159026338 M * gdm dkg: also, if we post the config up, i'm sure Bertl and maybe matti and/or some others will chip in ideas ;-)
1159026342 M * Bertl daniel_hozac: i.e. probably a kernel build system bug :)
1159026356 J * shedi ~siggi@dsl-149-109-85.hive.is
1159026368 M * Bertl gdm, dkg: would not hurt :)
1159026397 M * Bertl daniel_hozac: i.e. will remove it manually now :)
1159026412 M * daniel_hozac ;)
1159026538 M * daniel_hozac Bertl: hmm, shouldn't we check for context visibility in kernel/pid.c:get_task_pid?
1159026562 M * daniel_hozac s/get_task_pid/get_pid_task/
1159026582 M * Bertl the readdir does return everything in xid=1
1159026592 M * Bertl so it must be a problem of the pid lookup
1159026597 M * [PUPPETS]Gonzo good evening ladies and gentlemen
1159026635 M * [PUPPETS]Gonzo is it possible to add more than one ip to a single device within the vserver-guest? how could I do this (or where could I find some docs?) - thanks
1159026654 M * daniel_hozac just specify another interface with the same device?
1159026664 M * Bertl [PUPPETS]Gonzo: hey!
1159026686 M * Bertl [PUPPETS]Gonzo: there are no devices for guests :)
1159026701 M * Bertl [PUPPETS]Gonzo: you can have as many as 16 ips per guest
1159026703 M * gdm Bertl: http://paste.linux-vserver.org/401
1159026710 M * [PUPPETS]Gonzo daniel_hozac: create another directory (1) in interfaces and set ip up? wouldn't I end having another (virtual) device within the guest?
1159026723 M * daniel_hozac only if you specify name.
1159026739 M * daniel_hozac but that's just an alias, not another device.
1159026748 M * [PUPPETS]Gonzo ok, thanks
1159026748 M * gdm Bertl, dkg, matti, that is the config from teh debian backports that we currently have adn should modify
1159026753 M * gdm Bertl: any advice?
1159026842 M * Bertl well, I'd try with a 'defconfig' and work from there
1159026856 M * Bertl the debian config includes everything plus the kitchen sink ...
1159026886 M * Bertl you do not want drivers for stuff you do not have, as they add to isntalbility
1159026991 M * [PUPPETS]Gonzo works fine, thanks a lot
1159028172 M * mnemoc hi, there port of 1.2.11-rc1 to 2.4.33(.3) ?
1159028422 J * meandtheshell ~markus@85-124-62-144.dynamic.xdsl-line.inode.at
1159028533 M * trippeh Are there any experimental patches for 2.6.18 available yet?
1159028670 M * Bertl http://vserver.13thfloor.at/Experimental/patch-2.6.18-vs2.0.2.1-t5.diff
1159028690 M * trippeh Thanks
1159028832 M * Bertl np
1159028978 Q * michal` Ping timeout: 480 seconds
1159029135 J * VxJasonxV ~jason@ip68-110-115-17.ph.ph.cox.net
1159029352 J * michal` ~michal@www.rsbac.org
1159029907 M * Bertl gdm: disable highmem, enable the 1/3 split
1159029932 M * Bertl gdm: disable sparse mem
1159029949 M * Bertl enable smt scheduler and irq balance
1159029960 M * Bertl (maybe enable regparm)
1159029969 M * Bertl set hz to 100
1159029998 M * Bertl disable power management and all drivers/components you do not have
1159030016 M * Bertl disable frequency scaling
1159030031 M * Bertl compile it for your cpu/hardware
1159030058 M * cehteh mhm
1159030086 M * Bertl gdm: a good start would be to check with lsmod what modules are loaded, you can then disable all options showing =m which are _not_ listed :
1159030379 M * Bertl http://vserver.13thfloor.at/Experimental/patch-2.6.18-vs2.0.2.1-t6.diff
1159030387 M * Bertl ensc: this should fix the issues for you
1159031428 M * gdm Bertl: we read you  :)
1159031447 M * gdm Bertl: should be done with make menuconfig soon - will put .config up for you when it's done
1159031458 M * Bertl excellent!
1159031824 M * Bertl okay, off for now .. back later ...
1159031830 N * Bertl Bertl_oO
1159032488 Q * ComplexMind Remote host closed the connection
1159033325 Q * gerrit Ping timeout: 480 seconds
1159034049 J * gerrit ~gerrit@01153bhost130.starwoodbroadband.com
1159034983 Q * ruskie Remote host closed the connection
1159035111 M * gdm matti or Bertl_oO?
1159035141 M * daniel_hozac mnemoc: it doesn't apply cleanly?
1159035157 M * daniel_hozac mnemoc: and eyck is our 2.4 guy :) (i.e. the only one who has admitted to using it ;))
1159035170 J * ruskie ~ruskie@ruskie.user.oftc.net
1159035180 M * gdm Bertl_oO: we got confused over the mem split, weren't sure if you meant 1G kernel/3G user or 3G kernel/1G user.
1159035192 M * daniel_hozac how much RAM do you have?
1159035194 M * gdm if you could tell us, would be good again. thanks!
1159035210 M * gdm daniel_hozac: well, we have 5G but gonna change to 4G on Bertl_oO's advice
1159035219 M * gdm i.e. remove 2x512mb 
1159035222 M * daniel_hozac ah, so you'll need HIGHMEM regardless.
1159035229 M * gdm well....
1159035238 M * gdm < Bertl> gdm: disable highmem, enable the 1/3 split
1159035246 M * daniel_hozac i'd assume he meant 3G kernel/1G user then.
1159035247 M * gdm but reall?!
1159035272 M * daniel_hozac so you'll get at least 3 GiB of RAM.
1159035283 M * gdm i mean, does he really mean disable highmem if we have 4GB?
1159035332 M * daniel_hozac i guess he does.
1159035340 M * daniel_hozac since that's what he said :)
1159035366 M * gdm but then the other 3GB won't be used at all, will it?
1159035416 M * daniel_hozac hmmm, with a 3 G kernel/1G user split, you'd have access to 3 GiB.
1159035426 M * daniel_hozac so you'd only lose the one gibibyte.
1159036179 M * gdm ok, so here is our new .config file then, if anyone is able to look/advise
1159036181 M * gdm Bertl_oO: http://paste.linux-vserver.org/402
1159036381 Q * shedi Quit: Leaving
1159037551 M * waldi okay, 2.0.2.1-t6 can build powerpc64
1159037716 Q * comfrey Ping timeout: 480 seconds
1159038227 Q * meandtheshell Quit: exit (0);
1159038630 J * comfrey ~comfrey@h-64-105-215-75.sttnwaho.covad.net
1159038937 M * gdm waldi: hi
1159038948 M * gdm waldi: bertl said you were the person to speak to about debian kernels
1159038967 M * gdm waldi: but have just actually used the vanilla kernel to make our own
1159038982 M * gdm waldi: cos debian backports kernel "hangs" without explanation
1159040551 M * waldi s390 builds also
1159041441 Q * glut Read error: No route to host
1159041460 M * vasko Bertl_oO: hi, i've just tried Experimental/patch-2.6.18-vs2.0.2.1-t6 on amd64, the same config like seemlesly working 2.6.17.11-vs2.0.2-rc31, it didn't booted. on several reboots it stucked on different points, latest point was a note about going to do fscks...
1159041583 M * vasko i understand it is still not yet a release, but anyway it seems to be much unstable to my amateur eye :)
1159041639 M * daniel_hozac are you sure you configured it correctly?
1159041650 M * daniel_hozac 2.6.18 might've added/removed/renamed options.
1159041670 M * trippeh Run diff on the configs.
1159041688 J * DreamerC_ ~dreamerc@61-224-132-191.dynamic.hinet.net
1159041953 Q * DreamerC Read error: Connection reset by peer
1159042058 N * Bertl_oO Bertl
1159042070 M * Bertl back for a moment ...
1159042089 M * Bertl vasko: did you try to boot with a vanilla kernel with the same config?
1159042125 M * Bertl i.e. just take the .config file from the 2.6.18-vs2* and do 'make oldconfig' in a vanilla tree
1159042253 M * Bertl gdm: looks pretty good from the first glance
1159042298 M * dkg Bertl: we have the machine booted into a 2.6.17.13 kernel with the latest vserver patch with that config now.
1159042305 M * dkg and it runs :)
1159042331 M * daniel_hozac Bertl: hmm, fs/ext3/balloc.c:1180 is giving me warnings.
1159042368 M * dkg we can try loading it up any way you like, if it would help to test.
1159042370 M * daniel_hozac Bertl: i'd say they are legit, ext3_fsblk_t is unsigned long and __dl_adjust_block expects unsigned ints.
1159042477 M * Bertl was that changed recently?
1159042543 M * Bertl I remember seeing some new options regarding structure/counter sizes
1159042588 M * gdm Bertl: or we could change some of the config settings and modify the kernel somehow
1159042610 M * Bertl vasko: don't get me wrong, we really appreciate such input
1159042644 M * Bertl gdm: looks good to me, let's see how it does in action ...
1159042663 M * gdm Bertl: one other question.... in the bios, it gives an option called "Watch Dog Timer" which can be set to 2, 5, 10 or 15 minutes
1159042669 M * gdm but currently it is disabled
1159042685 M * Bertl could be a hardware watchdog
1159042696 M * gdm so that is ok to be disabled then?
1159042706 M * Bertl if you figure which type, then it might be usable in the future
1159042708 M * gdm we have nmi_watchdog set and also the sysrq works
1159042722 M * gdm yeah, i can do some research on the bios later
1159042740 M * gdm dkg says he will try to remove the memory tomorrow
1159042766 M * gdm and if you have suggestions (like restarting the tor server?) then we can do that too
1159042792 M * Bertl yes, I would suggest to stress test it a little
1159042807 M * gdm is it essential to remove teh memory, do you think?
1159042823 M * gdm as that will mean two trips - one to remove and one to replace - that could otherwise be avoided?
1159042838 M * Bertl no, it's only one possible source of problems
1159042956 M * gdm ok. thank you. and thanks a lot for all the help today!!
1159042976 M * gdm i must go eat some food now, i will fix up the rest of the vservers and start some load etc after dinner
1159042989 M * Bertl you're welcome! hope it really helps ...
1159043007 M * Bertl enjoy your meal ..
1159043017 M * gdm i hope it helps too :)
1159043025 M * gdm but if not, i hope we can solve the problem together
1159043029 M * gdm and help everyone
1159043038 M * gdm hasta luego
1159043047 M * Bertl if it is a Linux-VServer issue, we will solve it :)
1159043056 M * gdm :)
1159043070 M * dkg thanks!
1159043731 M * Bertl okay, off again .. back later ...
1159043738 N * Bertl Bertl_oO
1159043874 M * vasko Bertl_oO: back, i've just finshed with vanilla compilation, but cannot reboot until tomorrow. i'll let you know
1159045287 Q * Greek0 Quit: Lost terminal
1159045600 J * Greek0 ~greek0@85.255.145.201
1159045766 J * mire ~mire@65-167-222-85.COOL.ADSL.VLine.verat.net
1159047836 Q * coocoon helium.oftc.net hydrogen.oftc.net
1159047836 Q * dna_ helium.oftc.net hydrogen.oftc.net
1159047836 Q * sladen helium.oftc.net hydrogen.oftc.net
1159047836 Q * node helium.oftc.net hydrogen.oftc.net
1159047836 Q * Curus helium.oftc.net hydrogen.oftc.net
1159047836 Q * Medivh helium.oftc.net hydrogen.oftc.net
1159047836 Q * ex helium.oftc.net hydrogen.oftc.net
1159047836 Q * Vudumen helium.oftc.net hydrogen.oftc.net
1159047836 Q * gdm helium.oftc.net hydrogen.oftc.net
1159047836 Q * bragon helium.oftc.net hydrogen.oftc.net
1159047836 Q * meebey helium.oftc.net hydrogen.oftc.net
1159047836 Q * SNy helium.oftc.net hydrogen.oftc.net
1159047836 Q * fs helium.oftc.net hydrogen.oftc.net
1159047836 Q * pusling helium.oftc.net hydrogen.oftc.net
1159047861 M * ensc Bertl_oO: vserver-stat works with -t6, vkill still fails
1159047862 M * ensc /usr/sbin/vkill -s INT --xid 141 -- 1
1159047862 M * ensc vkill: vc_ctx_kill(): No such process
1159048045 J * coocoon ~coocoon@p54A0536E.dip.t-dialin.net
1159048045 J * dna_ ~naucki@157-211-dsl.kielnet.net
1159048045 J * sladen paul@starsky.19inch.net
1159048045 J * node ~dwindsor@stanford.columbia.tresys.com
1159048045 J * Curus ~Curus@kbhn-vbrg-sr0-vl209-213-185-8-10.perspektivbredband.net
1159048045 J * gdm ~gdm@www.iteration.org
1159048045 J * Vudumen ~vudumen@perverz.hu
1159048045 J * fs fs@213.178.77.98
1159048045 J * ex ex@valis.net.pl
1159048045 J * meebey meebey@booster.qnetp.net
1159048045 J * pusling pusling@195.215.29.124
1159048045 J * bragon ~weechat@sd866.sivit.org
1159048045 J * SNy 6cfbac777d@bmx-chemnitz.de
1159048045 J * Medivh ck@paradise.by.the.dashboardlight.de
1159048146 Q * dna_ Quit: Verlassend
1159048176 M * daniel_hozac ensc: vserver debugging enabled?
1159048253 M * daniel_hozac echo 16 > /proc/sys/vserver/debug_misc might shed some light on it.
1159049861 M * daniel_hozac ensc: does anything using vc_task_xid on the guest's init's pid work?
1159049924 M * daniel_hozac i.e. does vps show the context?
1159050096 N * Bertl_oO Bertl
1159050099 M * daniel_hozac ensc: and does /proc/virtual/<xid>/info have the correct initpid?
1159050122 M * Bertl the problem is that the find_task_by_real_pid() doesn't work as expected in the current setup
1159050140 M * daniel_hozac oh?
1159050149 M * Bertl sec, I'll uplaod a trace :)
1159050162 M * daniel_hozac is this new for 2.6.18, or has this been a silent problem for a while?
1159050177 M * Bertl no idea, but I suspect it is new ...
1159050220 M * Bertl http://paste.linux-vserver.org/403
1159050249 M * Bertl added this debug line in signal.c ~58
1159050250 M * Bertl                 printk(" kill #%u,%d %p[#%u,%d]\n", vxi->vx_id, pid, p, p?p->xid:0, p?p->pid:0);
1159050340 M * Bertl note, the sleep is init in xid=100 and it has pid=34
1159050400 M * Bertl nope, it is the result of my changes ...
1159050420 M * daniel_hozac ah?
1159050423 M * Bertl the thing is, the find_task_by_real_pid() uses the pid_task()
1159050437 M * Bertl which is modified to limit the pids
1159050453 M * daniel_hozac ah, of course.
1159050493 M * Bertl maybe we can use this to completely work around modifying find_task_by_pid()
1159050529 M * Bertl the only thing we need to change when we want to address all pids 
1159050546 M * Bertl is to move the process into the spectator context
1159050578 M * Bertl (or to pass some flag down the callchain)
1159050589 M * Bertl now the pid_task() has pid, and the type
1159050606 M * Bertl the pid is out of question here, but the type could be interesting
1159050615 M * daniel_hozac agreed.
1159050721 M * Bertl I wonder if adding VX_ADMIN would harm/break anything atm?
1159050734 M * Bertl (to the check in pid_task())
1159050781 M * daniel_hozac we still have checks elsewhere, don't we?
1159050803 M * Bertl yes, it would basically allow signalling and such from the host context, I guess
1159050820 M * Bertl (checking now)
1159050871 Q * bonbons Quit: Leaving
1159050900 M * Bertl looks good here, i.e. ps looks quite normal on xid=0
1159050928 M * Bertl signalling works fine too
1159050953 M * daniel_hozac ok, sounds good then.
1159050963 M * Bertl ensc: could you test this modification if it works for you?
1159050978 M * Bertl ensc: would you prefer a patch or a delta?
1159051222 M * Bertl ensc: well, here you go: http://vserver.13thfloor.at/Experimental/patch-2.6.18-vs2.0.2.1-t7.diff, http://vserver.13thfloor.at/Experimental/delta-vkill-fix01.diff
1159051323 M * Bertl but I think we definitely should look into getting rid of the _real_pid() part in devel (maybe even stable) soon
1159051353 M * daniel_hozac hmm, wouldn't _real_pid still be a handy shortcut for the new pidtype, or whichever way it's solved?
1159051394 M * Bertl yes, indeed, but we might get around modifying the original macro
1159051412 M * daniel_hozac right, _real_pid would just be an additional one.
1159051439 M * Bertl and external modules would be 'auto' limited to the context they are called in
1159051443 M * daniel_hozac would solve all those pesky vx_rmap_pid: undefined reference errors once and for all :)
1159051469 M * Bertl precisely
1159051473 M * daniel_hozac yep, sounds good.
1159052059 M * gdm Bertl: all is started up again. feel free to check the munin graphs again or something in a few hours maybe. but i'm off to bed now. so back in 9+ hours. thanks again for all the help!! :)
1159052192 M * dkg i'm heading off also.  thanks for the clear thinking and explanations, y'all.  it's good to learn.
1159052199 Q * dkg Quit: bye!
1159052206 M * Bertl np
1159052720 Q * Johnnie Remote host closed the connection
1159054602 J * shedi ~siggi@inferno.lhi.is
1159055350 J * matti_ matti@linux.gentoo.pl
1159055369 Q * matti Read error: Connection reset by peer
1159055373 N * matti_ matti