1158970195 M * spq_ derjohn2, how do i have to configure these rlimits? like written in the flower page or http://oldwiki.linux-vserver.org/Resource+Limits 1158970321 M * coocoon g8 too all 1158970324 Q * coocoon Quit: KVIrc 3.2.0 'Realia' 1158970742 M * derjohn2 spq_, http://linux-vserver.org/Frequently_Asked_Questions#How_do_I_limit_a_guests_RAM.3F_I_want_to_prevent_OOM_situations_on_the_host.21 1158970766 M * derjohn2 spq_, you can add more like for those options you'll find on the GFp. 1158970775 M * derjohn2 i am tired now, and off ... bye ! 1158970795 M * spq_ ok gn8 thx 1158971242 Q * spq_ Quit: spq_ 1158972305 Q * derjohn2 Ping timeout: 480 seconds 1158972310 J * derjohn2 ~aj@dslb-084-058-205-159.pools.arcor-ip.net 1158972949 J * bluelines ~bronson@c-71-198-75-160.hsd1.ca.comcast.net 1158973262 J * mircomayic ~AK768@65.23.247.223 1158973270 M * mircomayic !list 1158973278 P * mircomayic 1158975110 J * shedi ~siggi@inferno.lhi.is 1158976010 Q * bluelines Ping timeout: 480 seconds 1158976165 Q * Johnnie Ping timeout: 480 seconds 1158976483 J * Johnnie ~jdlewis@jdlewis.org 1158978101 M * ntrs Can you limit a guest to run on only one CPU on a SMP server? 1158978165 M * doener using cpusets 1158978177 M * ntrs cpusets? 1158978181 M * ntrs how? 1158978243 M * ntrs where do you configure that in the guest's configuration? 1158978269 M * doener baggins made a patch for util-vserver to support that 1158978276 M * doener I'm currently searching for it 1158978291 M * doener AFAIK at least one distro (debian?) includes it 1158978339 M * ntrs no, I need it for vanilla 1158978381 M * doener http://list.linux-vserver.org/archive/vserver/msg11505.html 1158978398 M * doener no idea if that applies to .210/.211-rc1 1158978468 M * doener on devel the new token bucket scheduler also supports per-cpu buckets (I think), so that would also be an option if you run devel 1158978534 M * ntrs Ok, thanks 1158978568 M * doener np 1158978602 M * doener off to bed now *yawns* 1158982144 Q * Johnnie Quit: G'bye! 1158982150 J * Johnnie ~jdlewis@jdlewis.org 1158982955 Q * _node Ping timeout: 480 seconds 1158983530 J * kseeker stanking@unix.shell.la 1158984960 Q * gerrit_ Ping timeout: 480 seconds 1158990717 J * AjAx-- hiddenserv@tor.noreply.org 1158991205 P * AjAx-- 1158993592 J * gerrit ~gerrit@1153ahost99.starwoodbroadband.com 1158994627 J * meandtheshell ~markus@85.124.140.164 1158997828 J * dna_ ~naucki@157-211-dsl.kielnet.net 1158998336 J * bluelines ~bronson@c-71-198-75-160.hsd1.ca.comcast.net 1158999145 Q * derjohn2 Quit: Verlassend 1159000646 J * bonbons ~bonbons@83.222.36.111 1159001380 Q * bluelines Ping timeout: 480 seconds 1159002479 Q * harry Read error: Operation timed out 1159003415 M * daniel_hozac ntrs: 0.30.211-rc1 includes it. 1159004743 J * coocoon ~coocoon@p54A0536E.dip.t-dialin.net 1159004774 M * coocoon morning 1159004805 M * daniel_hozac good morning. 1159004830 M * coocoon hello daniel have u seen the how to 1159004856 M * daniel_hozac i saw the link, but i haven't read it yet. 1159004874 M * coocoon ah ok dunno if the name is better 1159004880 M * coocoon but ok 1159004888 M * daniel_hozac i think the name is fine. 1159005133 M * coocoon ok good but for searching a little bit too long vcd how to was shorter ;-) 1159005362 J * harry ~harry@d54C2508C.access.telenet.be 1159005748 M * yang does "reboot" on a host succesfully shutdowns also all guests, or do the guests need to be shutdown each by itself before that? 1159005789 M * daniel_hozac do you have an initscript in your reboot/halt runlevel that stops guests? 1159005800 M * matti If so, then yes. 1159005800 M * matti ;) 1159005804 M * yang i don't think so 1159005815 M * daniel_hozac so you don't start guests during boot either? 1159005820 M * yang nope 1159005825 M * yang i start them manually 1159005835 M * daniel_hozac then i guess you'll have to stop them manually too. 1159005839 M * yang ok 1159006404 J * ensc ~irc-ensc@p54B4DB36.dip.t-dialin.net 1159006419 M * ensc hi, is there an official fix for 1159006420 M * ensc kernel/built-in.o: In function `fill_pid': 1159006420 M * ensc taskstats.c:(.text+0x33c2e): undefined reference to `vx_rmap_pid' 1159006427 M * ensc with 2.6.18? 1159006445 M * ensc and vs2.0.2.1-t5? 1159007054 M * goblin vshelper.init: can not determine xid of vserver 'samba2'; returned value was '' 1159007063 M * goblin any ideas where this may come from? :-) 1159007085 M * goblin just after rc script finishes after I run vserver samba2 start 1159007874 Q * Greek0 Read error: Connection reset by peer 1159007936 J * Greek0 ~greek0@85.255.145.201 1159008182 M * goblin what is a xid? 1159008193 M * mnemoc context id 1159008270 M * goblin mhm 1159008518 M * goblin +++ /usr/local/sbin/vserver-info /usr/local/etc/vservers/samba2 CONTEXT false 1159008518 M * goblin ++ xid= 1159008533 M * goblin so why isn't it allocated...? 1159008541 M * mnemoc you have to define it 1159008556 M * goblin aaah.. that'd explain a lot... 1159008752 P * kseeker 1159008753 M * goblin vserver build build --help shows: --context ... the static context of the vserver [default: none; a dynamic context will be assumed] 1159008778 M * goblin and I see plenty of symlinks in /etc/vservers/.defaults/run.rev 1159008784 M * goblin with different numbers 1159008790 M * goblin where do I define it then? 1159009019 M * goblin If I build the vserver with --context 100, I get the same error message from vshelper.init 1159009064 M * mnemoc read the flower page 1159009223 M * goblin mnemoc, hmm... the two places it mentions xid are /etc/vservers/name/run, which looks like a dynamically allocated thing, and /etc/vservers/.defaults/run.rev, which already has plenty of symlinks named with numbers and pointing to vserver directories 1159009305 M * mnemoc look for 'context' 1159009412 M * goblin /etc/vservers/name/context ? yes, I have this file in the vserver that I build with --context 100 1159009422 M * goblin and it contains '100', as expected 1159009461 M * goblin oh, you probably missed my main problem... 1159009485 M * goblin vshelper.init: can not determine xid of vserver 'samba3'; returned value was '' 1159009526 M * goblin which is what I get after I try to start the vserver, and after rc script has finished 1159009636 M * mnemoc uhm 1159009645 M * goblin the output of vserver --debug samba3 start is at http://uukgoblin.net/g 1159009711 Q * AndrewLee Ping timeout: 480 seconds 1159009896 M * mnemoc uhm 1159011204 Q * mire Quit: Leaving 1159011227 M * goblin hmm... I created a wrapper script which calls vserver-info CONTEXT true instead of CONTEXT false 1159011252 M * goblin this returns the static xid of 100 properly now, starting the server doesn't produce the error any more 1159011269 M * goblin but when I try to enter the guest, it turns out that it's not running :-/ 1159011943 M * goblin oh great. 1159011959 M * goblin I just need to run a background process in a guest! :-D 1159011979 M * goblin otherwise it apparently dies with no processes left. piece of cake. :-> 1159013433 Q * shedi Quit: Leaving 1159013792 M * derjohn daniel_hozac, the patch compiled and the kernel boots. the patch compiled a 2nd time, with v6 not as module but compiled in. now I think I need patched tools. 1159014817 M * daniel_hozac derjohn: yep, or at least chbind6. 1159014845 M * cryptronic hi all, someone here who develops vserver-stat from util-vserver 1159014856 M * daniel_hozac ensc: #include is required in kernel/taskstats.c, kernel/rtmutex-debug.c and mm/migrate.c 1159014868 M * daniel_hozac cryptronic: don't trust the values. 1159014908 M * cryptronic daniel_hozac, that's not the thing i want to know ;) 1159014913 M * daniel_hozac goblin: that error message means that your guest doesn't keep a process running. 1159014932 M * goblin daniel_hozac, exactly. :-) 1159014936 M * cryptronic the question is how generete vserver-stat the uptime 1159014943 M * cryptronic from vservers 1159014962 M * goblin daniel_hozac, I've figured it out just few messages ago :-) 1159014964 M * cryptronic because the content of /proc/virtual/xid/cvirt biasuptime is very strange 1159014986 M * daniel_hozac cryptronic: BiasUptime is IIRC the offset from the host's uptime. 1159015010 M * cryptronic ah, ok thanks :) 1159015020 M * cryptronic i'll try to get the correct value 1159015054 M * daniel_hozac cryptronic: i wouldn't trust vserver-stat's value for the uptime either, AFAICT it's using the oldest process's start time to determine it. 1159015160 M * cryptronic it's not the thing i want to trust vserver-stat ;) i'm developing a new webinterface for openvcp and there we include the uptime of the vservers with the content of /proc/virtual/xid/cvirt biasuptime. So i have to know how i have to use this value 1159015185 M * daniel_hozac i'm just saying, you shouldn't look at vserver-stat for how to do it correctly ;) 1159015211 M * cryptronic ok ;) next time i ask you directly ;) 1159015781 Q * weasel Ping timeout: 480 seconds 1159015905 M * cryptronic daniel_hozac, using biasuptime as offset works well :) thanks a lot :) 1159015935 M * daniel_hozac np. 1159016274 J * weasel weasel@asteria.debian.or.at 1159018541 Q * Kowi 1159019220 N * Bertl_zZ Bertl 1159019224 M * Bertl morning folks! 1159019231 M * daniel_hozac morning Bertl! 1159019253 M * Bertl ensc: when do you get that? 1159019266 M * gdm Bertl: morning! 1159019268 M * daniel_hozac Bertl: when CONFIG_TASKSTATS is enabled. 1159019280 M * gdm Bertl: i have that old problem with a machine hanging again... 1159019281 M * daniel_hozac Bertl: http://people.linux-vserver.org/~dhozac/p/k/delta-headers-fix84.diff 1159019287 M * Bertl ah, good, tx! 1159019307 M * Bertl gdm: hanging means lockup? 1159019352 M * gdm Bertl: it was running backports kernel - i have to find out what that was again 1159019355 M * gdm Bertl: yep 1159019369 M * Bertl backports mean? 1159019370 M * gdm Bertl: i can show you some munin graphs about it... it has been down for about 2 hours 1159019373 M * Bertl *means 1159019378 M * gdm debian backports 1159019390 M * Bertl which version is that in numbers? 1159019424 M * gdm 2.6.16-2 backport i think 1159019434 M * gdm micah just had the exact same problem as well 1159019435 M * ensc Bertl: http://ensc.de/kernel-kosh-config 1159019456 M * Bertl ensc: thanks! seems to be fixed by daniels patch 1159019471 M * gdm htere is absolutely nothing in the munin graphs that suggest any problems.. cpu, memory, netstat, all the same 1159019497 M * gdm i guess the thing to do now is to get it rebooted and then to put the latest vanilla kernel on, yes? 1159019501 M * ensc yes; including right headers is not a problem. I just wondered that -t5 does not have the fix yet ;) 1159019511 M * Bertl micah, gdm: is some soft lockup and mutex/spinlock check enabled? 1159019520 M * Bertl ensc: yeah, obviously I ahd that option turned off ... 1159019529 M * Bertl ensc: will be in t6 1159019614 M * gdm oh, i don't understand what that means... i know that previously there has been no response to anything (keyboard, serial, ssh) and nothing on the screen when a monitor is plugged in 1159019614 M * Bertl ensc: glad to hear that you are testing 2.6.18 kernels! 1159019628 M * gdm currently, there is no response to sysrq (well, break via minicom) 1159019646 M * Bertl gdm: how long does it take the machine to hang? 1159019681 M * Bertl gdm: and more recent kernels might help, check with waldi for bleeding edge debian stuff 1159019698 M * ensc Bertl: why 'testing'? I hope I can use them ;) 1159019708 M * Bertl hehe, even better :) 1159019726 M * gdm Bertl: 7 1/2 days this time 1159019764 M * gdm but i now know of at least 5-6 different servers that this is happening to 1159019771 M * gdm run by 4 different sysadmins, at least 1159019780 M * gdm in several different locations :/ 1159019782 M * daniel_hozac ensc: oh, and now that you're here, what was "if ((flag & MS_NODEV)!=0) flag |= MS_NODEV;" supposed to do in src/secure-mount.c? 1159019828 M * daniel_hozac (line 432 in 0.30.210) 1159019831 M * Bertl gdm: hmm, that's a long time for testing ... 1159019856 M * Bertl gdm: I would suggest to enable spinlock/mutex debugging and make sure that magic sysrq works 1159019858 M * gdm 7 1/2 days? yes 1159019870 M * gdm magic sysrq did work before 1159019881 M * Bertl but not when it happened? 1159019892 M * gdm not working now, no 1159019905 M * Bertl nmi watchdog is also on? 1159020185 M * gdm sorry, am just trying to get hold of someone to check/reboot the machine 1159020187 M * gdm brb 1159020343 M * ensc daniel_hozac: mmh... I am not sure what I meant with it... should be perhaps check the mask, not the flag 1159020494 Q * meandtheshell Quit: exit (0); 1159020911 M * ensc daniel_hozac: should be 'if (!(mnt->mask & MS_NODEV)) flag |= MS_NODEV;' 1159020976 M * ensc negated logic confuses me :( 1159021110 M * gdm Bertl: ok. someone else is testing sysrq as well for me.. but nothing for him, either 1159021143 M * Bertl gdm: hmm, that looks like a hardware issue then ... 1159021157 M * gdm no Bertl it is not a hardware issue 1159021159 Q * derjohn Read error: Connection reset by peer 1159021165 J * bone_idol ~bone_idol@springnight.burngreave.net 1159021165 M * gdm it has been extensively tested 1159021173 M * gdm memtest86 for about a week 1159021182 M * Bertl gdm: well, when you end up with irqs disabled, the nmi should kick in 1159021185 M * gdm cpuburn for over a month - like 42 days or somehing 1159021204 M * Bertl and when you have irqs enabled, the magic sysrq should work 1159021227 J * dkg ~dkg@lair.fifthhorseman.net 1159021241 M * Bertl gdm: but anyways, let's check a mainline kernel 1159021247 M * Bertl welcome dkg! 1159021250 M * dkg hi Bertl! 1159021254 M * dkg thanks for yer advice here. 1159021261 M * gdm Bertl: dkg is another sysadmin with me who has access 1159021276 M * gdm bone_idol is also a sysadmin of this same machine, but has no serial access 1159021281 M * bone_idol hi 1159021296 M * Bertl ah, cool, welcome bone_idol! :) 1159021338 M * gdm dkg: you can call in the reboot, right? 1159021347 M * Bertl gdm: I assume we already checked the config, yes? 1159021358 M * dkg yeah, i can call in a reboot. shall i do that? 1159021366 M * dkg and what should we reboot into? 1159021384 M * gdm well, this is the stock debian backports kernel from backports .org 1159021395 M * gdm [sorry - buggy wirless here ';-)] 1159021409 M * Bertl gdm: ah, so whatever debian considers appropriate 1159021432 M * gdm i think we should boot into the same... ? 1159021437 M * Bertl well, no idea about that actually, i.e. we have to ask/check with the maintainer 1159021438 M * gdm Bertl: yeah. 1159021449 M * dkg Bertl: not sure what background you've had here: the cpuburn was running on a RIP kernel (slackware-based) 1159021466 M * gdm well, how about we boot in to it, get the config out to show Bertl and then we can disable all the vservers 1159021472 M * gdm and set up a new kernel from there? 1159021476 M * dkg gdm: sounds good to me. 1159021476 M * gdm i.e. with nothing running? 1159021481 M * dkg calling it in now... 1159021486 M * gdm ok, thanks 1159021555 M * gdm Bertl: sorry to take over the channel here, you want us to go elsewhere for a bit? 1159021578 M * Bertl try to provide a few details for that machine, like memory, cpus and such (upload that to paste.linux-vserver.org) 1159021588 M * Bertl gdm: nah, channel is fine for this purpose 1159021606 M * gdm ok, i will upload stuff while dkg reboot 1159021606 M * gdm s 1159021795 M * gdm how long does stuff stay on paste.linux-vserver.org for? 1159021799 M * gdm does it get erased? 1159021805 M * Bertl not really 1159021813 M * gdm err, ok 1159021869 M * gdm ok, i just gave youo the login details for our admin wiki backup. hope that's ok 1159021880 M * gdm there is all the hardware info on that page 1159022025 M * Bertl okay 1159022174 J * mkhl mkhl@200-148-40-125.dsl.telesp.net.br 1159022178 M * gdm dkg, bone_idol - and do you have a typical memory (/proc/meminfo) output/graph when everything is running? 1159022188 M * gdm i think the answer is 'yes' 1159022205 M * Bertl what I'm trying to figure is, are the 5GB memory actually used? 1159022218 M * Bertl (by the 14 guests running there) 1159022242 M * dkg machine is back up, but it's resyncing the RAID array, so it'll be a little while before it's accessible :/ 1159022308 M * gdm Bertl: we have used the 5gb before, but i think we never got it all used. 1159022322 M * gdm this time, i think the backports kernel only had 4gb enabled 1159022350 M * dkg i could bring it down and back up into a ramfs-only boot so we have something to work with while the RAID arrays re-sync if that's desired. 1159022396 M * gdm ask Bertl ;-) but he's currently looking at the munin graphs too 1159022462 M * Bertl dkg: soft or hard raid? 1159022470 M * dkg Bertl: soft raid 1159022481 M * Bertl then it should be fine even with reconstruction 1159022489 M * Bertl (just not that performant) 1159022537 M * goblin where is http://www.linux-vserver.org/index.php?page=Linux-Vserver+FAQ gone? 1159022538 M * dkg yeah, it's just that crypto on LVM on top of the RAID tends to lag during the initramfs stage of a standard boot if the RAID arrays aren't properly synced. 1159022557 M * dkg but they've synced and we're almost up already anyway... 1159022558 M * goblin or, why do I have to run vprocunhide and what does it do? :-) 1159022603 M * Bertl goblin: I'm not sure that was ever there (try removing the www and/or replace it by oldwiki) 1159022612 M * gdm goblin: http://linux-vserver.org/Frequently_Asked_Questions <== is that the page you are looking for? 1159022645 M * Bertl goblin: procfs security is required to make the guest's procfs secure, it is 'configured' by the vprocunhide script 1159022686 M * goblin gdm, not sure. The link I pasted was output by "vserver name start" after saying that /proc/uptime cannot be accessed 1159022699 M * goblin I've found some forum which said to run vprocunhide first 1159022727 M * goblin Bertl, oh, OK then. I thought it removes some proc security from my host. :-) 1159022731 M * gdm goblin: i think that is a page from the 'oldwiki' - the site has been recently upgraded 1159022733 M * Bertl goblin: yes, vprocunhide should be run once after host-system startup 1159022786 M * goblin thank a lot then :-) perhaps something in util-vserver requires an update of this link (I'm using vs2.0.2-grsec2.1.9) 1159022805 M * Bertl gdm: ECC is enabled in the bios? (just checking) 1159022821 M * gdm ooh, i don't recall.. dkg ? 1159022833 M * dkg i think ECC is enabled in the BIOS, yes. 1159022834 M * gdm i will look and see if i have any notes anywhere about that 1159022874 M * dkg it showed up in the memtest console, iirc. 1159022888 P * mkhl 1159022894 M * dkg btw, the machine is accessible on the network now. 1159022924 M * Bertl okay, let's start the guest and capture /proc/meminfo before and after the startup 1159022929 M * Bertl *guests 1159022941 M * gdm err, they maight be started already.... 1159022944 M * gdm i guess i will go login 1159022948 M * dkg they have started already. 1159022950 M * Bertl okay, then only the meminfo after :) 1159023010 M * gdm http://paste.linux-vserver.org/400 1159023011 M * dkg Bertl: do you want that on paste.l-v.o? 1159023017 M * dkg nevermind 1159023021 M * gdm yeah, they are all started 1159023024 M * gdm sorry dkg ;-) 1159023056 M * Bertl okay, as I thought, memory is almsot unused, 4GB max atm, with highmem 1159023089 M * Bertl on this particular setup, you might be better of if you configure a 1/3 split and disable highmem completely 1159023098 M * Bertl will give you 3GB accessible memory 1159023136 M * Bertl is the swapspace on raid too? maybe encrypted? 1159023138 M * dkg Bertl: we've never gotten around to rebuilding the kernel with 64GB support at all. 1159023152 M * dkg swap is on crypt on lvm on raid :) 1159023168 M * Bertl dmcrypt or cryptloop or aes? 1159023168 M * dkg randomly keyed crypt 1159023172 M * dkg dmcrypt 1159023236 M * Bertl do we have a total/host cpu graph too somewhere? (or am I just too blind to see it :) 1159023293 M * Bertl the cpus have HT flag, so basically you should end up with 4 virtual cpus, no? 1159023347 M * matti Bertl: :) 1159023422 M * gdm Bertl: https://chili.freeit.org/munin/mayfirst.org/shadow.mayfirst.org-cpu.html 1159023435 M * matti gdm: :) 1159023438 M * dkg Bertl: we've disabled HT in the bios because it was previously a suspect cause of the flakiness. 1159023457 M * gdm we did have 64g highmem enabled with our first ever kernel which was 2.6.12 - but that was last year. we didn't have any problems with that kernel 1159023461 M * gdm matti: :) 1159023468 M * gdm matti: just a few days to go! 1159023516 M * matti gdm: Few? 1159023537 M * matti gdm: I have flight tommorow at 12:05 GMT +0200 ;p 1159023544 M * matti Damn. 1159023546 M * matti I am so tired. 1159023549 M * matti Sorry 1159023577 M * matti s/tommorow/tomorrow/ 1159023677 M * Bertl dkg: okay, my suggestion would be to do the following: 1159023699 M * Bertl - remove the 512MB modules 1159023710 M * Bertl - configure 1/3 split w/o highmem 1159023714 M * matti k, sorry for interrupting. 1159023716 M * Bertl - reenable HT 1159023737 M * Bertl - configure/compile your own 2.6.17.13 kernel 1159023756 M * Bertl - bind certain irqs and kernel threads to fixed vCPUs 1159023810 M * dkg thanks for all the suggestions, Bertl! 1159023825 M * Bertl well, doesn't really address the hang issue 1159023831 M * matti :) 1159023833 M * dkg i don't think i know how to bind irqs and kernel threads, but i'll read up on that. 1159023868 M * gdm when you say "remove the 512MB modules" you mean take that bit of memory physically out of the machine? 1159023868 M * dkg When you say "remove the 512MB modules", you mean physically take them out of the machine, yes? 1159023873 M * dkg ha ha 1159023961 M * dkg i swear we are different people! 1159023985 M * m4z with different bodies also? 1159024002 M * gdm actually in different countries at the moment, so yes! 1159024019 M * gdm although i guess we both gott the same ip as we're logged in from same server i think 1159024021 M * m4z thats the point where i stop to believe you (; 1159024096 M * gdm actually, i think we're using different servers 1159024132 M * gdm Bertl: [question above about memory], also, any thoughts about the hang issue itself? 1159024145 M * Bertl dkg: yes, I would remove them physically 1159024180 M * Bertl well, I would like to see that it actually hangs with e.g. 2.0.2.1 on 2.6.17.x or 2.6.18 1159024226 M * gdm ok... our experience is that it has hung up to 14 days (or maybe a bit more) after starting up before 1159024238 M * gdm so we were planning to wait at least a month to call it "stable" 1159024263 M * gdm all the dottines on the munin graphs from around may/june was due to hanging 1159024550 M * Bertl lowering memory and/or increasing system load (i.e. enable that tor guest for example) might help to trigger issues 1159024586 M * Bertl a possible source for issues might also be the filesystem + iosystem 1159024618 M * Bertl e.g. crypt + lvm + raid was known to have issues (especially stack wise) 1159024632 M * Bertl similar goes for certain sata chipsets 1159024669 M * dkg Bertl: can you give me some pointers to the crypt+lvm+raid discussion? 1159024677 M * Bertl maybe xfs or reiser ontop of that, and you almost certainly get a good chance for crashes :) 1159024701 M * dkg no xfs or reiser, fortunately. 1159024710 M * dkg all ext3 1159024741 M * Bertl http://www.uwsg.iu.edu/hypermail/linux/kernel/0605.1/0133.html 1159024751 M * Bertl (just a random example :) 1159024771 M * gdm Bertl: but i don't think the other servers with similar issues have such a complex underlying setup 1159024790 M * gdm e.g. the ones run by micah and stefani and also someone i know in canada who has seen this happen 1159024792 M * Bertl that would be a valuable information 1159024811 M * dkg Bertl: thanks for the link. 1159024813 M * Bertl but for micah it would at least point towards the debian kernel 1159024834 M * Bertl (i.e. I would even more like to check with a mainline version :) 1159024841 M * gdm micah and stefani both run debian, and so does the guy in .ca - i suspect they mostly use debian kernels too 1159024854 M * gdm yes, we are going to rebuild a mainline kernel now for you ;-) 1159024871 M * gdm and then dkg will remove the memory either later today or tomorrow when he can access the box 1159024909 M * gdm also, should we disable the vservers for now? 1159024931 A * gdm thinks we probably should, so the machine doesn't become unresponsive during the kernel build 1159025068 M * ensc Bertl: ctx 1 does not seem to work anymore with -t5; vserver-stat gives out main-processes only. 'vkill' fails too with 1159025071 M * ensc /usr/sbin/vkill -s INT --xid 152 -- 1 1159025074 M * ensc vkill: vc_ctx_kill(): No such process 1159025198 M * Bertl ah, probably get_task_for_pid() doesn't handle that case 1159025202 M * Bertl checking now 1159025302 Q * bubulak Ping timeout: 480 seconds 1159025345 J * bubulak ~bubulak@whisky.pendo.sk 1159026169 M * daniel_hozac Bertl: btw, any reason lxdialog.scrltmp is still in the patches? :) does it have some purpose 1159026171 M * dkg Bertl: we're rebuilding a stock vanilla 2.6.17.13 with vserver 2.0.2.1 as per your suggestion: do you have any kernel config preferences we should use? 1159026232 M * Bertl daniel_hozac: hum? 1159026306 M * gdm dkg: we should ensure nmiwatch is set and vserver kernel debug stuff, i think 1159026307 M * daniel_hozac Bertl: there's a file at the top of the tree named lxdialog.scrltmp that's created by the patch, and it contains 11. 1159026309 M * Bertl daniel_hozac: just means that distclean didn't reamove that properly 1159026338 M * gdm dkg: also, if we post the config up, i'm sure Bertl and maybe matti and/or some others will chip in ideas ;-) 1159026342 M * Bertl daniel_hozac: i.e. probably a kernel build system bug :) 1159026356 J * shedi ~siggi@dsl-149-109-85.hive.is 1159026368 M * Bertl gdm, dkg: would not hurt :) 1159026397 M * Bertl daniel_hozac: i.e. will remove it manually now :) 1159026412 M * daniel_hozac ;) 1159026538 M * daniel_hozac Bertl: hmm, shouldn't we check for context visibility in kernel/pid.c:get_task_pid? 1159026562 M * daniel_hozac s/get_task_pid/get_pid_task/ 1159026582 M * Bertl the readdir does return everything in xid=1 1159026592 M * Bertl so it must be a problem of the pid lookup 1159026597 M * [PUPPETS]Gonzo good evening ladies and gentlemen 1159026635 M * [PUPPETS]Gonzo is it possible to add more than one ip to a single device within the vserver-guest? how could I do this (or where could I find some docs?) - thanks 1159026654 M * daniel_hozac just specify another interface with the same device? 1159026664 M * Bertl [PUPPETS]Gonzo: hey! 1159026686 M * Bertl [PUPPETS]Gonzo: there are no devices for guests :) 1159026701 M * Bertl [PUPPETS]Gonzo: you can have as many as 16 ips per guest 1159026703 M * gdm Bertl: http://paste.linux-vserver.org/401 1159026710 M * [PUPPETS]Gonzo daniel_hozac: create another directory (1) in interfaces and set ip up? wouldn't I end having another (virtual) device within the guest? 1159026723 M * daniel_hozac only if you specify name. 1159026739 M * daniel_hozac but that's just an alias, not another device. 1159026748 M * [PUPPETS]Gonzo ok, thanks 1159026748 M * gdm Bertl, dkg, matti, that is the config from teh debian backports that we currently have adn should modify 1159026753 M * gdm Bertl: any advice? 1159026842 M * Bertl well, I'd try with a 'defconfig' and work from there 1159026856 M * Bertl the debian config includes everything plus the kitchen sink ... 1159026886 M * Bertl you do not want drivers for stuff you do not have, as they add to isntalbility 1159026991 M * [PUPPETS]Gonzo works fine, thanks a lot 1159028172 M * mnemoc hi, there port of 1.2.11-rc1 to 2.4.33(.3) ? 1159028422 J * meandtheshell ~markus@85-124-62-144.dynamic.xdsl-line.inode.at 1159028533 M * trippeh Are there any experimental patches for 2.6.18 available yet? 1159028670 M * Bertl http://vserver.13thfloor.at/Experimental/patch-2.6.18-vs2.0.2.1-t5.diff 1159028690 M * trippeh Thanks 1159028832 M * Bertl np 1159028978 Q * michal` Ping timeout: 480 seconds 1159029135 J * VxJasonxV ~jason@ip68-110-115-17.ph.ph.cox.net 1159029352 J * michal` ~michal@www.rsbac.org 1159029907 M * Bertl gdm: disable highmem, enable the 1/3 split 1159029932 M * Bertl gdm: disable sparse mem 1159029949 M * Bertl enable smt scheduler and irq balance 1159029960 M * Bertl (maybe enable regparm) 1159029969 M * Bertl set hz to 100 1159029998 M * Bertl disable power management and all drivers/components you do not have 1159030016 M * Bertl disable frequency scaling 1159030031 M * Bertl compile it for your cpu/hardware 1159030058 M * cehteh mhm 1159030086 M * Bertl gdm: a good start would be to check with lsmod what modules are loaded, you can then disable all options showing =m which are _not_ listed : 1159030379 M * Bertl http://vserver.13thfloor.at/Experimental/patch-2.6.18-vs2.0.2.1-t6.diff 1159030387 M * Bertl ensc: this should fix the issues for you 1159031428 M * gdm Bertl: we read you :) 1159031447 M * gdm Bertl: should be done with make menuconfig soon - will put .config up for you when it's done 1159031458 M * Bertl excellent! 1159031824 M * Bertl okay, off for now .. back later ... 1159031830 N * Bertl Bertl_oO 1159032488 Q * ComplexMind Remote host closed the connection 1159033325 Q * gerrit Ping timeout: 480 seconds 1159034049 J * gerrit ~gerrit@01153bhost130.starwoodbroadband.com 1159034983 Q * ruskie Remote host closed the connection 1159035111 M * gdm matti or Bertl_oO? 1159035141 M * daniel_hozac mnemoc: it doesn't apply cleanly? 1159035157 M * daniel_hozac mnemoc: and eyck is our 2.4 guy :) (i.e. the only one who has admitted to using it ;)) 1159035170 J * ruskie ~ruskie@ruskie.user.oftc.net 1159035180 M * gdm Bertl_oO: we got confused over the mem split, weren't sure if you meant 1G kernel/3G user or 3G kernel/1G user. 1159035192 M * daniel_hozac how much RAM do you have? 1159035194 M * gdm if you could tell us, would be good again. thanks! 1159035210 M * gdm daniel_hozac: well, we have 5G but gonna change to 4G on Bertl_oO's advice 1159035219 M * gdm i.e. remove 2x512mb 1159035222 M * daniel_hozac ah, so you'll need HIGHMEM regardless. 1159035229 M * gdm well.... 1159035238 M * gdm < Bertl> gdm: disable highmem, enable the 1/3 split 1159035246 M * daniel_hozac i'd assume he meant 3G kernel/1G user then. 1159035247 M * gdm but reall?! 1159035272 M * daniel_hozac so you'll get at least 3 GiB of RAM. 1159035283 M * gdm i mean, does he really mean disable highmem if we have 4GB? 1159035332 M * daniel_hozac i guess he does. 1159035340 M * daniel_hozac since that's what he said :) 1159035366 M * gdm but then the other 3GB won't be used at all, will it? 1159035416 M * daniel_hozac hmmm, with a 3 G kernel/1G user split, you'd have access to 3 GiB. 1159035426 M * daniel_hozac so you'd only lose the one gibibyte. 1159036179 M * gdm ok, so here is our new .config file then, if anyone is able to look/advise 1159036181 M * gdm Bertl_oO: http://paste.linux-vserver.org/402 1159036381 Q * shedi Quit: Leaving 1159037551 M * waldi okay, 2.0.2.1-t6 can build powerpc64 1159037716 Q * comfrey Ping timeout: 480 seconds 1159038227 Q * meandtheshell Quit: exit (0); 1159038630 J * comfrey ~comfrey@h-64-105-215-75.sttnwaho.covad.net 1159038937 M * gdm waldi: hi 1159038948 M * gdm waldi: bertl said you were the person to speak to about debian kernels 1159038967 M * gdm waldi: but have just actually used the vanilla kernel to make our own 1159038982 M * gdm waldi: cos debian backports kernel "hangs" without explanation 1159040551 M * waldi s390 builds also 1159041441 Q * glut Read error: No route to host 1159041460 M * vasko Bertl_oO: hi, i've just tried Experimental/patch-2.6.18-vs2.0.2.1-t6 on amd64, the same config like seemlesly working 2.6.17.11-vs2.0.2-rc31, it didn't booted. on several reboots it stucked on different points, latest point was a note about going to do fscks... 1159041583 M * vasko i understand it is still not yet a release, but anyway it seems to be much unstable to my amateur eye :) 1159041639 M * daniel_hozac are you sure you configured it correctly? 1159041650 M * daniel_hozac 2.6.18 might've added/removed/renamed options. 1159041670 M * trippeh Run diff on the configs. 1159041688 J * DreamerC_ ~dreamerc@61-224-132-191.dynamic.hinet.net 1159041953 Q * DreamerC Read error: Connection reset by peer 1159042058 N * Bertl_oO Bertl 1159042070 M * Bertl back for a moment ... 1159042089 M * Bertl vasko: did you try to boot with a vanilla kernel with the same config? 1159042125 M * Bertl i.e. just take the .config file from the 2.6.18-vs2* and do 'make oldconfig' in a vanilla tree 1159042253 M * Bertl gdm: looks pretty good from the first glance 1159042298 M * dkg Bertl: we have the machine booted into a 2.6.17.13 kernel with the latest vserver patch with that config now. 1159042305 M * dkg and it runs :) 1159042331 M * daniel_hozac Bertl: hmm, fs/ext3/balloc.c:1180 is giving me warnings. 1159042368 M * dkg we can try loading it up any way you like, if it would help to test. 1159042370 M * daniel_hozac Bertl: i'd say they are legit, ext3_fsblk_t is unsigned long and __dl_adjust_block expects unsigned ints. 1159042477 M * Bertl was that changed recently? 1159042543 M * Bertl I remember seeing some new options regarding structure/counter sizes 1159042588 M * gdm Bertl: or we could change some of the config settings and modify the kernel somehow 1159042610 M * Bertl vasko: don't get me wrong, we really appreciate such input 1159042644 M * Bertl gdm: looks good to me, let's see how it does in action ... 1159042663 M * gdm Bertl: one other question.... in the bios, it gives an option called "Watch Dog Timer" which can be set to 2, 5, 10 or 15 minutes 1159042669 M * gdm but currently it is disabled 1159042685 M * Bertl could be a hardware watchdog 1159042696 M * gdm so that is ok to be disabled then? 1159042706 M * Bertl if you figure which type, then it might be usable in the future 1159042708 M * gdm we have nmi_watchdog set and also the sysrq works 1159042722 M * gdm yeah, i can do some research on the bios later 1159042740 M * gdm dkg says he will try to remove the memory tomorrow 1159042766 M * gdm and if you have suggestions (like restarting the tor server?) then we can do that too 1159042792 M * Bertl yes, I would suggest to stress test it a little 1159042807 M * gdm is it essential to remove teh memory, do you think? 1159042823 M * gdm as that will mean two trips - one to remove and one to replace - that could otherwise be avoided? 1159042838 M * Bertl no, it's only one possible source of problems 1159042956 M * gdm ok. thank you. and thanks a lot for all the help today!! 1159042976 M * gdm i must go eat some food now, i will fix up the rest of the vservers and start some load etc after dinner 1159042989 M * Bertl you're welcome! hope it really helps ... 1159043007 M * Bertl enjoy your meal .. 1159043017 M * gdm i hope it helps too :) 1159043025 M * gdm but if not, i hope we can solve the problem together 1159043029 M * gdm and help everyone 1159043038 M * gdm hasta luego 1159043047 M * Bertl if it is a Linux-VServer issue, we will solve it :) 1159043056 M * gdm :) 1159043070 M * dkg thanks! 1159043731 M * Bertl okay, off again .. back later ... 1159043738 N * Bertl Bertl_oO 1159043874 M * vasko Bertl_oO: back, i've just finshed with vanilla compilation, but cannot reboot until tomorrow. i'll let you know 1159045287 Q * Greek0 Quit: Lost terminal 1159045600 J * Greek0 ~greek0@85.255.145.201 1159045766 J * mire ~mire@65-167-222-85.COOL.ADSL.VLine.verat.net 1159047836 Q * coocoon helium.oftc.net hydrogen.oftc.net 1159047836 Q * dna_ helium.oftc.net hydrogen.oftc.net 1159047836 Q * sladen helium.oftc.net hydrogen.oftc.net 1159047836 Q * node helium.oftc.net hydrogen.oftc.net 1159047836 Q * Curus helium.oftc.net hydrogen.oftc.net 1159047836 Q * Medivh helium.oftc.net hydrogen.oftc.net 1159047836 Q * ex helium.oftc.net hydrogen.oftc.net 1159047836 Q * Vudumen helium.oftc.net hydrogen.oftc.net 1159047836 Q * gdm helium.oftc.net hydrogen.oftc.net 1159047836 Q * bragon helium.oftc.net hydrogen.oftc.net 1159047836 Q * meebey helium.oftc.net hydrogen.oftc.net 1159047836 Q * SNy helium.oftc.net hydrogen.oftc.net 1159047836 Q * fs helium.oftc.net hydrogen.oftc.net 1159047836 Q * pusling helium.oftc.net hydrogen.oftc.net 1159047861 M * ensc Bertl_oO: vserver-stat works with -t6, vkill still fails 1159047862 M * ensc /usr/sbin/vkill -s INT --xid 141 -- 1 1159047862 M * ensc vkill: vc_ctx_kill(): No such process 1159048045 J * coocoon ~coocoon@p54A0536E.dip.t-dialin.net 1159048045 J * dna_ ~naucki@157-211-dsl.kielnet.net 1159048045 J * sladen paul@starsky.19inch.net 1159048045 J * node ~dwindsor@stanford.columbia.tresys.com 1159048045 J * Curus ~Curus@kbhn-vbrg-sr0-vl209-213-185-8-10.perspektivbredband.net 1159048045 J * gdm ~gdm@www.iteration.org 1159048045 J * Vudumen ~vudumen@perverz.hu 1159048045 J * fs fs@213.178.77.98 1159048045 J * ex ex@valis.net.pl 1159048045 J * meebey meebey@booster.qnetp.net 1159048045 J * pusling pusling@195.215.29.124 1159048045 J * bragon ~weechat@sd866.sivit.org 1159048045 J * SNy 6cfbac777d@bmx-chemnitz.de 1159048045 J * Medivh ck@paradise.by.the.dashboardlight.de 1159048146 Q * dna_ Quit: Verlassend 1159048176 M * daniel_hozac ensc: vserver debugging enabled? 1159048253 M * daniel_hozac echo 16 > /proc/sys/vserver/debug_misc might shed some light on it. 1159049861 M * daniel_hozac ensc: does anything using vc_task_xid on the guest's init's pid work? 1159049924 M * daniel_hozac i.e. does vps show the context? 1159050096 N * Bertl_oO Bertl 1159050099 M * daniel_hozac ensc: and does /proc/virtual//info have the correct initpid? 1159050122 M * Bertl the problem is that the find_task_by_real_pid() doesn't work as expected in the current setup 1159050140 M * daniel_hozac oh? 1159050149 M * Bertl sec, I'll uplaod a trace :) 1159050162 M * daniel_hozac is this new for 2.6.18, or has this been a silent problem for a while? 1159050177 M * Bertl no idea, but I suspect it is new ... 1159050220 M * Bertl http://paste.linux-vserver.org/403 1159050249 M * Bertl added this debug line in signal.c ~58 1159050250 M * Bertl printk(" kill #%u,%d %p[#%u,%d]\n", vxi->vx_id, pid, p, p?p->xid:0, p?p->pid:0); 1159050340 M * Bertl note, the sleep is init in xid=100 and it has pid=34 1159050400 M * Bertl nope, it is the result of my changes ... 1159050420 M * daniel_hozac ah? 1159050423 M * Bertl the thing is, the find_task_by_real_pid() uses the pid_task() 1159050437 M * Bertl which is modified to limit the pids 1159050453 M * daniel_hozac ah, of course. 1159050493 M * Bertl maybe we can use this to completely work around modifying find_task_by_pid() 1159050529 M * Bertl the only thing we need to change when we want to address all pids 1159050546 M * Bertl is to move the process into the spectator context 1159050578 M * Bertl (or to pass some flag down the callchain) 1159050589 M * Bertl now the pid_task() has pid, and the type 1159050606 M * Bertl the pid is out of question here, but the type could be interesting 1159050615 M * daniel_hozac agreed. 1159050721 M * Bertl I wonder if adding VX_ADMIN would harm/break anything atm? 1159050734 M * Bertl (to the check in pid_task()) 1159050781 M * daniel_hozac we still have checks elsewhere, don't we? 1159050803 M * Bertl yes, it would basically allow signalling and such from the host context, I guess 1159050820 M * Bertl (checking now) 1159050871 Q * bonbons Quit: Leaving 1159050900 M * Bertl looks good here, i.e. ps looks quite normal on xid=0 1159050928 M * Bertl signalling works fine too 1159050953 M * daniel_hozac ok, sounds good then. 1159050963 M * Bertl ensc: could you test this modification if it works for you? 1159050978 M * Bertl ensc: would you prefer a patch or a delta? 1159051222 M * Bertl ensc: well, here you go: http://vserver.13thfloor.at/Experimental/patch-2.6.18-vs2.0.2.1-t7.diff, http://vserver.13thfloor.at/Experimental/delta-vkill-fix01.diff 1159051323 M * Bertl but I think we definitely should look into getting rid of the _real_pid() part in devel (maybe even stable) soon 1159051353 M * daniel_hozac hmm, wouldn't _real_pid still be a handy shortcut for the new pidtype, or whichever way it's solved? 1159051394 M * Bertl yes, indeed, but we might get around modifying the original macro 1159051412 M * daniel_hozac right, _real_pid would just be an additional one. 1159051439 M * Bertl and external modules would be 'auto' limited to the context they are called in 1159051443 M * daniel_hozac would solve all those pesky vx_rmap_pid: undefined reference errors once and for all :) 1159051469 M * Bertl precisely 1159051473 M * daniel_hozac yep, sounds good. 1159052059 M * gdm Bertl: all is started up again. feel free to check the munin graphs again or something in a few hours maybe. but i'm off to bed now. so back in 9+ hours. thanks again for all the help!! :) 1159052192 M * dkg i'm heading off also. thanks for the clear thinking and explanations, y'all. it's good to learn. 1159052199 Q * dkg Quit: bye! 1159052206 M * Bertl np 1159052720 Q * Johnnie Remote host closed the connection 1159054602 J * shedi ~siggi@inferno.lhi.is 1159055350 J * matti_ matti@linux.gentoo.pl 1159055369 Q * matti Read error: Connection reset by peer 1159055373 N * matti_ matti