1213920126 Q * yarihm Quit: Leaving 1213922596 N * Guest862 Genghis 1213922632 N * Genghis Guest868 1213923492 Q * arapaho Ping timeout: 480 seconds 1213923682 J * arapaho ~arapaho@LAubervilliers-151-12-45-14.w80-14.abo.wanadoo.fr 1213924074 J * doener_ ~doener@i577B97A5.versanet.de 1213924175 Q * doener Ping timeout: 480 seconds 1213926286 N * Guest868 Genghis 1213926322 N * Genghis Guest878 1213929976 N * Guest878 Genghis 1213930012 N * Genghis Guest891 1213930820 Q * xdr_ Ping timeout: 480 seconds 1213933082 Q * derjohn_mob Ping timeout: 480 seconds 1213933666 N * Guest891 Genghis 1213933702 N * Genghis Guest898 1213936383 Q * dddd osmotic.oftc.net resistance.oftc.net 1213936383 Q * fatgoose_ osmotic.oftc.net resistance.oftc.net 1213936383 Q * nenolod osmotic.oftc.net resistance.oftc.net 1213936383 Q * Hunger osmotic.oftc.net resistance.oftc.net 1213936383 Q * Hollow osmotic.oftc.net resistance.oftc.net 1213936383 Q * quasisane osmotic.oftc.net resistance.oftc.net 1213936383 Q * MooingLemur osmotic.oftc.net resistance.oftc.net 1213936383 Q * nkukard osmotic.oftc.net resistance.oftc.net 1213936383 Q * AndrewLee osmotic.oftc.net resistance.oftc.net 1213936383 Q * laptopnenolod osmotic.oftc.net resistance.oftc.net 1213936383 Q * tam osmotic.oftc.net resistance.oftc.net 1213936383 Q * awk osmotic.oftc.net resistance.oftc.net 1213936383 Q * micah osmotic.oftc.net resistance.oftc.net 1213936398 J * dddd ~matthew@scorpion.sorbs.net 1213936666 J * cryptronic ~oli@p54A3B3AC.dip0.t-ipconnect.de 1213937356 N * Guest898 Genghis 1213937388 N * Genghis Guest904 1213938606 J * fatgoose_ ~samuel@76-10-149-199.dsl.teksavvy.com 1213938606 J * nenolod ~nenolod@ip70-189-74-62.ok.ok.cox.net 1213938606 J * Hunger ~Hunger.hu@213.163.11.138 1213938606 J * Hollow ~hollow@proteus.croup.de 1213938606 J * quasisane ~sanep@c-75-68-59-175.hsd1.nh.comcast.net 1213938606 J * MooingLemur ~troy@shells195.pinchaser.com 1213938661 Q * cryptronic Quit: Leaving. 1213939282 Q * bronson Read error: Connection reset by peer 1213939307 J * nkukard ~nkukard@196.212.73.74 1213939307 J * AndrewLee ~andrew@flat.iis.sinica.edu.tw 1213939307 J * laptopnenolod ~nenolod@ip70-189-74-62.ok.ok.cox.net 1213939307 J * tam ~tam@gw.nettam.com 1213939307 J * awk ~awk@security.web.za 1213939307 J * micah ~micah@micah.riseup.net 1213939323 J * bronson ~bronson@adsl-68-122-117-135.dsl.pltn13.pacbell.net 1213939342 Q * bronson 1213941046 N * Guest904 Genghis 1213941086 N * Genghis Guest910 1213941350 J * derjohn_mob ~aj@e180206148.adsl.alicedsl.de 1213942298 J * yarihm ~yarihm@whitehead2.nine.ch 1213944357 J * FireEgl FireEgl@adsl-159-182-181.bhm.bellsouth.net 1213944612 J * larsivi ~larsivi@85.221.53.194 1213944701 J * ktwilight ~ktwilight@218.119-66-87.adsl-dyn.isp.belgacom.be 1213944737 N * Guest910 Genghis 1213944776 N * Genghis Guest913 1213945100 Q * ktwilight_ Ping timeout: 480 seconds 1213945978 J * meandtheshell ~sa@d91-129-28-167.cust.tele2.at 1213946007 Q * Aiken Remote host closed the connection 1213946538 J * pisco ~pisco@tor.noreply.org 1213946648 J * dna ~dna@107-243-dsl.kielnet.net 1213947152 J * bfremon ben@lns-bzn-57-82-249-11-231.adsl.proxad.net 1213947326 Q * pisco Quit: leaving 1213947333 Q * ktwilight Read error: Connection reset by peer 1213947392 J * ktwilight ~ktwilight@87.66.206.234 1213948426 N * Guest913 Genghis 1213948466 N * Genghis Guest917 1213949093 Q * meandtheshell Quit: Leaving. 1213949098 J * kir ~kir@swsoft-msk-nat.sw.ru 1213949328 N * DoberMann[ZZZzzz] DoberMann 1213949402 J * meandtheshell ~sa@d91-129-28-167.cust.tele2.at 1213952116 N * Guest917 Genghis 1213952156 N * Genghis Guest923 1213952712 N * Bertl_zZ Bertl 1213952716 M * Bertl morning folks! 1213952847 M * kwowt yo:p 1213953713 M * padde hi 1213953728 J * Aiken ~james@ppp59-167-113-120.lns3.bne4.internode.on.net 1213953880 M * padde i have an ldap server (openldap) running on one vserver (db1) and an MTA (postfix) on another vserver (mail1), both on the same host. now when one of the vservers is very busy (usually mail1), and a mail comes in, the ldap directory doesn't answer quickly enough when postfix wants to know if the receiver is really a local user or not, and the mail gets rejected with "Temporary lookup failure" 1213953912 M * padde how would I configure the vservers so this can't happen (or at least is less likely)? 1213954077 M * blathijs Configure postfix with a larger timeout? :-) 1213954094 M * blathijs What is "not quickly enough" currently? 1213954102 M * Bertl well, I'd first investigate _why_ the other guest doesn't answer within time 1213954119 M * Bertl (or more precisely, the ldap server in that other guest) 1213954284 J * pisco ~pisco@tor.noreply.org 1213954377 J * Mojo1978 ~Mojo1978@ip-78-94-98-56.hsi.ish.de 1213954511 M * padde blathijs: i raised this from 15 to 30 seconds already... 1213954582 M * padde Bertl: well, that's definitely part of the question ;) for example just now this happened while i unpacked a 1G tar archive on mail1 1213954596 M * blathijs Ah, if we're talking about seconds, then raising the timeout is not a solution :-) 1213954610 Q * yarihm Quit: This computer has gone to sleep 1213954619 M * blathijs Is it a large LDAP database? 1213954644 M * padde blathijs: no, it's ridiculously small. 45 users 1213954658 M * blathijs then it's unlikely that it's an I/O problem I guess 1213954660 M * padde blathijs: (and it's not distributed, all local) 1213954677 M * blathijs perhaps changing the scheduler on the host system could make a difference? 1213954688 M * padde i also find it to be very unlikely. but it just happens. happened 4 times already (in 4 months) 1213954696 M * Bertl padde: what kernel version and util-vserver? 1213954740 M * padde 2.6.22.19-vs2.3.0.34.1 #1 SMP Mon Mar 17 05:32:04 EDT 2008 i686 i686 i386 GNU/Linux (centos 5 rpm from daniel's repo) 1213954763 M * padde util-vserver 0.30.215 1213954836 M * Bertl okay, you said, it is related to larger I/O operations? 1213954858 M * Bertl how much memory do you have on the host? 1213954872 M * padde Bertl: i think so. just now i unpacked a big tar. it wasn't compressed, so mainly i/o, no cpu load... 1213954898 M * padde Bertl: 2 GB + 2 GB swap 1213954919 M * Bertl have to run right now ... but should be back in 30min, try to use the CFQ I/O scheduler 1213954922 M * padde Bertl: storage is on a software RAID 5, consisting of 4 hdds 1213954931 N * Bertl Bertl_oO 1213954933 M * padde Bertl_oO: thanks 1213955143 Q * kir Quit: Leaving. 1213955806 N * Guest923 Genghis 1213955846 N * Genghis Guest928 1213955892 J * yarihm ~yarihm@gw.ptr-80-238-203-84.customer.ch.netstream.com 1213955896 J * bfremon1 ben@lns-bzn-22-82-249-77-86.adsl.proxad.net 1213956124 Q * Aiken Remote host closed the connection 1213956195 Q * bfremon Ping timeout: 480 seconds 1213956695 J * Aiken ~james@ppp59-167-113-120.lns3.bne4.internode.on.net 1213957052 N * Bertl_oO Bertl 1213957059 M * Bertl back now ... 1213957111 M * Bertl padde: you can also try to increase the I/O priority for the ldap process, if the timing is really that critical 1213957146 M * Bertl padde: also make sure that the ldap guest has enough tokens to be able to run 1213957163 M * Bertl (if you are using the hard/priority cpu scheduler) 1213957166 M * padde what do you mean by 'tokens'? 1213957188 M * padde actually i don't even know which scheduler i'm using. i left everything at the defaults 1213957459 M * Bertl okay, then no worries, you are not using any scheduler restrictions then 1213957478 M * Bertl you know how to set the I/O scheduler on the host? 1213957640 M * padde Bertl: you mean in the kernel? 1213957650 M * padde Bertl: via proc i guess... i did that before on my desktop 1213957674 M * padde Bertl: i'll look into it on monday, have to catch a train now. thanks for your time and suggestions for now! :) 1213957692 M * Bertl you're welcome! cya! 1213958895 Q * yarihm Ping timeout: 480 seconds 1213958968 J * yarihm ~yarihm@gw.ptr-80-238-203-84.customer.ch.netstream.com 1213959058 M * Bertl nap attack ... off for now, bbl 1213959070 N * Bertl Bertl_zZ 1213959496 N * Guest928 Genghis 1213959536 N * Genghis Guest932 1213959735 Q * yarihm Ping timeout: 480 seconds 1213960147 Q * Mojo1978 Read error: Connection reset by peer 1213960977 Q * pisco Ping timeout: 480 seconds 1213960982 J * pisco ~pisco@tor.noreply.org 1213961652 Q * pisco Ping timeout: 480 seconds 1213961785 Q * laptopnenolod Ping timeout: 480 seconds 1213962099 J * yarihm ~yarihm@gw.ptr-80-238-203-84.customer.ch.netstream.com 1213962141 Q * larsivi Quit: Konversation terminated! 1213962335 Q * yarihm 1213963130 J * Hawq ~hawk@limanowa.net 1213963136 M * Hawq hello 1213963187 N * Guest932 Genghis 1213963226 N * Genghis Guest941 1213963228 J * laptopnenolod ~nenolod@ip70-189-74-62.ok.ok.cox.net 1213964454 J * loddafni1 ~mike@193.170.48.107 1213964526 J * pisco ~pisco@tor.noreply.org 1213964753 P * Hawq 1213964924 Q * pisco Remote host closed the connection 1213965454 Q * Aiken Remote host closed the connection 1213965544 Q * FireEgl Quit: Leaving... 1213965703 J * yarihm ~yarihm@whitehead2.nine.ch 1213966134 Q * loddafni1 Quit: Leaving. 1213966513 Q * dna Quit: Verlassend 1213966877 N * Guest941 Genghis 1213966916 N * Genghis Guest946 1213968758 J * cryptronic ~oli@p54A3B3AC.dip0.t-ipconnect.de 1213970266 Q * cryptronic Quit: Leaving. 1213970567 N * Guest946 Genghis 1213970606 N * Genghis Guest951 1213973852 J * dowdle ~dowdle@scott.coe.montana.edu 1213973891 M * nkukard is there something similar to vserver-stat that i can use to get more parsable stats for cacti graphing? 1213974257 N * Guest951 Genghis 1213974296 N * Genghis Guest953 1213975046 Q * balbir Ping timeout: 480 seconds 1213975106 M * daniel_hozac nkukard: i'd use the API directly. 1213975156 M * nkukard oooo 1213975161 M * nkukard thanks for the tip daniel_hozac 1213975671 J * loddafnir ~mike@193.170.48.107 1213975750 J * balbir ~balbir@122.167.180.64 1213977948 N * Guest953 Genghis 1213977986 N * Genghis Guest957 1213978354 N * Bertl_zZ Bertl_oO 1213981637 N * Guest957 Genghis 1213981676 N * Genghis Guest966 1213982263 Q * micah Remote host closed the connection 1213982309 J * micah ~micah@micah.riseup.net 1213982657 J * larsivi ~larsivi@169.80-202-217.nextgentel.com 1213982742 J * mrfree ~mrfree@host59-183-dynamic.27-79-r.retail.telecomitalia.it 1213985327 N * Guest966 Genghis 1213985366 N * Genghis Guest976 1213986203 Q * yarihm Quit: This computer has gone to sleep 1213986650 Q * bfremon1 Ping timeout: 480 seconds 1213989017 N * Guest976 Genghis 1213989056 N * Genghis Guest985 1213990295 Q * derjohn_mob Remote host closed the connection 1213991046 Q * meandtheshell Quit: Leaving. 1213992522 J * bonbons ~bonbons@2001:960:7ab:0:2c0:9fff:fe2d:39d 1213992707 N * Guest985 Genghis 1213992746 N * Genghis Guest992 1213993710 J * bfremon ben@lns-bzn-22-82-249-77-86.adsl.proxad.net 1213995565 Q * mrfree Quit: Leaving 1213996107 Q * bonbons Quit: Leaving 1213996307 J * Aiken ~james@ppp59-167-113-120.lns3.bne4.internode.on.net 1213996397 N * Guest992 Genghis 1213996436 N * Genghis Guest999 1213997077 Q * fatgoose_ Quit: fatgoose_ 1213997932 Q * larsivi Ping timeout: 480 seconds 1214000087 N * Guest999 Genghis 1214000126 N * Genghis Guest1006 1214000599 M * micah has anyone been running into this new openssh proc oom_adj thing in a vserver? 1214000612 M * micah sshd: error writing /proc/self/oom_adj: Operation not permitted 1214000633 M * micah obviously its not permitting in a guest, but its annoying that openssh is trying to adjust a proc value like that because it causes log errors 1214000642 M * micah so I'm wondering how to turn that off 1214000721 M * daniel_hozac i don't see a way to turn it off. 1214000730 M * daniel_hozac might make more sense to ask for the output to be redirected to /dev/null though. 1214000770 M * micah it would be nice if it could be configurable in some way 1214000775 M * Bertl_oO we could ignore writes to oom_adj or actually permit them (based on flags) 1214000797 M * micah or even better... ignore them unless they are permitted based on flags? 1214000835 M * micah daniel_hozac: how are you thinking would be a good way to ask for the output to be redirected to /dev/null? 1214000841 M * Bertl_oO well, I'm not really sure that applications setting such kind of kernel attributes are a good idea in the first place 1214000846 M * daniel_hozac file a bug against openssh-server? 1214000873 M * daniel_hozac it looks like you could set /etc/default/ssh too. 1214000874 M * micah daniel_hozac: oh I was going to do that, but I mean... do you think by default it should be redirected to /dev/null? 1214000878 M * Bertl_oO micah: how long will it take till _every_ application sets the adjustments to _not_ be killed? 1214000885 M * micah Bertl_oO: seriously :) 1214000901 M * micah well not killing ssh does seem like at least a reasonable one to nice the oom_adj 1214000917 M * daniel_hozac micah: i.e. setting SSHD_OOM_ADJUST=0 in /etc/default/ssh should "fix" it. 1214000922 M * Bertl_oO sure, definitely, but is the application itself the one to decide that? 1214000945 M * micah Bertl_oO: yeah, its a convenience setting I think 1214000953 M * Bertl_oO similar goes for 'nice' and 'ionice', btw 1214000975 M * micah do a lot of applications set nice/ionice? 1214001008 M * Bertl_oO I home not, except for pam stuff 1214001011 M * Bertl_oO *hope 1214001034 M * Bertl_oO but to me it feels like sshd is overdoing it there 1214001048 M * micah daniel_hozac: I dont think openssh even has an /etc/default file, why do you say "it looks like you could set /etc/default/ssh"? or do you mean that as a suggested fix 1214001048 Q * Aiken Remote host closed the connection 1214001049 M * Bertl_oO no problem with startup scripts or so, adjusting that value 1214001062 Q * mick_home Remote host closed the connection 1214001098 M * micah Bertl_oO: the problem with doing this in an initscript for ssh is that will end up immortalizing child processes of ssh as well 1214001134 M * Bertl_oO shouldn't that be handled by the priviledge separation part? 1214001151 M * Bertl_oO btw, I'm not sure the oom stuff is inherited 1214001166 M * micah looks like this is actually the path that was taken: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=480020 1214001178 M * micah Bertl_oO: it is, see the above url 1214001222 M * Bertl_oO well, looks to me like papering over the issue :) 1214001223 M * micah hmmm there is an environment variable? 1214001228 M * micah https://bugzilla.mindrot.org/attachment.cgi?id=1507 1214001240 M * micah +const char *oom_adj = getenv("SSHD_OOM_ADJUST"); 1214001245 M * micah +if (!oom_adj) 1214001254 M * micah which is what daniel_hozac was suggesting above :) 1214001408 J * Aiken ~james@ppp59-167-113-120.lns3.bne4.internode.on.net 1214001916 M * daniel_hozac sshd itself is doing it now? 1214001934 M * daniel_hozac in my somewhat dated lenny guest, it's in the initscript. 1214002087 J * infowolfe_ ~infowolfe@c-67-160-167-96.hsd1.or.comcast.net 1214002087 Q * infowolfe Read error: Connection reset by peer 1214002243 M * daniel_hozac micah: do you know if there's a way to tell dpkg to always keep a local file, ignoring the new one? 1214002264 M * daniel_hozac effectively telling dpkg "don't mind this file, i'm taking care of it"? 1214002444 M * micah daniel_hozac: yeah, see the url I posted above, the initscript mechanism didn't work, so it was pulled into sshd... also there appears to be a /etc/default/ssh setting for this, but even setting it to 0 doesn't work 1214002493 M * micah daniel_hozac: a file, not a package? hmm 1214002518 M * micah you might be able to do that with a dpkg-divert 1214002797 Q * Aiken Remote host closed the connection 1214002841 M * daniel_hozac micah: huh, EPERM isn't the error i'd expect... i guess you need CAP_SYS_ADMIN or similar to set it on a process != self. 1214002850 M * micah daniel_hozac: hah 1214002854 M * micah export SSHD_OOM_ADJUST=-17 1214002859 M * micah is in the initscript 1214002863 M * daniel_hozac ... geez. 1214002869 M * micah yeah, will be filing a bug now 1214002875 M * daniel_hozac after the sourcing of /etc/default/ssh? 1214002901 M * micah right before 1214002922 M * micah hmm, I would expect the default/ssh to override it then 1214002932 M * daniel_hozac yeah. 1214002954 M * daniel_hozac you could always do ps fauxe and inspect sshd's environment. 1214002956 J * Aiken ~james@ppp59-167-113-120.lns3.bne4.internode.on.net 1214002984 M * micah if I remove that export from the initscript, and unset it in /etc/default/ssh it works 1214003034 M * micah odd 1214003035 M * daniel_hozac that's expected, it ignores it completely then. 1214003049 M * micah yeah 1214003101 M * micah I wonder why that happens 1214003118 M * micah probably ssh is scrubbing its environment somehow? or the /etc/default setting needs to be exported 1214003174 M * micah yeah both 1214003253 J * meandtheshell1 ~sa@d91-129-28-167.cust.tele2.at 1214003257 M * daniel_hozac well, it seems to be set correctly for me. 1214003265 M * daniel_hozac i still get the error though :) 1214003327 M * micah to solve it you either have to: 1214003332 M * micah a. remove the export from the initscript 1214003338 M * micah b. export the value in /etc/default/ssh 1214003364 M * daniel_hozac hmm? i didn't do either, and sshd has the correct value. 1214003369 M * micah hm 1214003402 M * micah it seems like the only way to make the error go away is to remove the export in the initscript then? 1214003449 M * daniel_hozac well, i'd like to know why it fails in the first place. 1214003500 M * micah hmm, now I cant get it to stop giving me the erorr 1214003582 M * micah the only way I can get it to stop giving me the error is if I comment out the export in the initscript, and in the default file 1214003598 M * daniel_hozac right, since it's not doing anything then. 1214003662 N * DoberMann DoberMann[ZZZzzz] 1214003716 M * micah or is it because the mere fact of the environment variable existing that makes the if (!oom_adj) fail? 1214003726 M * daniel_hozac right. 1214003726 M * micah if it is set to 0, -17 or even nothing 1214003740 M * daniel_hozac it needs to be non-existant. 1214003765 M * micah yeah 1214003766 M * daniel_hozac 0+ _should_ work though, assuming the shell you start sshd from isn't adjusted. 1214003777 N * Guest1006 Genghis 1214003790 M * daniel_hozac i'm trying to figure out why it doesn't... 1214003816 N * Genghis Guest1014 1214004842 M * daniel_hozac micah: hmm. 1214004884 M * daniel_hozac it seems it's just _really_ strict about what options you open it with... O_CREAT and O_TRUNC are absolutely forbidden. 1214004886 M * micah daniel_hozac: i reported the bug 1214004903 M * daniel_hozac O_WRONLY lets me write 0\n to it. 1214004931 M * micah weird 1214005262 M * daniel_hozac ... and now it seems to work. 1214005287 M * daniel_hozac i.e. no error, with SSHD_OOM_ADJUST=0 in /etc/default/ssh... 1214005320 M * micah errr 1214005341 M * micah daniel_hozac: you have the latest testing one installed? 1214005347 M * daniel_hozac yep. 1214005357 M * daniel_hozac 1:4.7p1-12 1214005416 M * micah if I set mine to 0 and restart ssh i get the error 1214005439 M * daniel_hozac really? 1214005443 M * micah yes 1214005443 M * daniel_hozac could you strace that? 1214005460 M * daniel_hozac and just to make sure, cat /proc/self/oom_adj outputs 0? 1214005465 M * micah do you have the initscript export commented out? 1214005479 M * daniel_hozac no. 1214005514 M * micah yes, /proc/self/oom_adj is 0 1214005551 M * micah trying to think of how to do the strace in the initscript 1214005565 M * daniel_hozac just strace the whole thing. 1214005567 M * micah so that I dont loose envrionement 1214005574 M * micah oh strace /etc/init.d/ssh restart? 1214005575 M * daniel_hozac i.e. strace -fF -o sshd.strace /etc/init.d/rsetart 1214005588 M * micah ok 1214005593 M * daniel_hozac umm... with ssh, and a proper spelling of restart :) 1214005629 M * micah heh 1214005741 M * micah daniel_hozac: if you want to look at it: http://micah.riseup.net/sshd.strace 1214005747 M * micah I have to step out for a bit 1214005901 M * daniel_hozac micah: what kernel are you on? 1214005914 M * micah daniel_hozac: 2.6.18 1214005927 M * micah its an etch host, with a testing guest 1214005956 M * daniel_hozac ah. that's your problem :-) 1214006025 M * daniel_hozac 2.6.20 removed the unconditional capable(CAP_SYS_RESOURCE) check. 1214006115 M * micah hmmm, i dont really follow you 1214006181 M * daniel_hozac in kernels less than 2.6.20, CAP_SYS_RESOURCE is required to modify the oom_adj value at all. 1214006196 M * daniel_hozac with newer kernels, it's just needed to lower the value. 1214006232 M * Bertl_oO maybe something to backport? 1214006235 M * daniel_hozac (which is why it works for me, and not for you) 1214006254 M * daniel_hozac anywho, the bug is still legit. 1214006262 M * daniel_hozac the initscript shouldn't make it impossible to unset it. 1214006318 M * micah daniel_hozac: ahhh, nie 1214006321 M * micah nice even 1214006339 M * micah daniel_hozac: would you care to follow-up to #487325 with that information? otherwise I can tomorrow 1214006380 M * micah (I think having more voices in a bug sometimes draws the maintainer's attention towards resolution faster :) 1214006388 M * daniel_hozac hehe