1396310401 Q * fisted Remote host closed the connection 1396310412 J * fisted ~fisted@xdsl-87-78-189-245.netcologne.de 1396312181 Q * N3mesis1 Remote host closed the connection 1396313763 Q * Aiken Ping timeout: 480 seconds 1396315557 J * Aiken ~Aiken@2001:44b8:2168:1000:21f:d0ff:fed6:d63f 1396316976 N * l0kit Guest5097 1396316981 J * l0kit ~1oxT@0001b54e.user.oftc.net 1396317383 Q * Guest5097 Ping timeout: 480 seconds 1396328691 Q * SteeleNivenson_ Read error: Operation timed out 1396329051 Q * SteeleNivenson Ping timeout: 480 seconds 1396329088 J * SteeleNivenson ~SteeleNiv@pool-108-29-139-222.nycmny.fios.verizon.net 1396329101 J * SteeleNivenson_ ~SteeleNiv@pool-108-29-139-222.nycmny.fios.verizon.net 1396329899 N * Bertl_zZ Bertl 1396329913 M * Bertl morning folks! 1396330220 J * fisted_ ~fisted@xdsl-87-78-141-95.netcologne.de 1396330662 Q * fisted Ping timeout: 480 seconds 1396330663 N * fisted_ fisted 1396333013 M * hijacker morning 1396333452 J * Ghislain ~aqueos@adsl1.aqueos.com 1396334798 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:8873:750:665b:f777 1396341560 J * beng_ ~BenG@cpc29-aztw22-2-0-cust128.18-1.cable.virginm.net 1396349683 Q * xdr Ping timeout: 480 seconds 1396349883 J * xdr ~xdr@h56n6-aahm-a11.ias.bredband.telia.com 1396350464 N * Roomster Romster 1396350470 Q * Romster Quit: Geeks shall inherit properties and methods of object earth. 1396350485 J * Romster ~Romster@202.168.100.149.dynamic.rev.eftel.com 1396350562 M * Bertl off for now .. bbl 1396350573 N * Bertl Bertl_oO 1396350738 Q * ircuser-1 Ping timeout: 480 seconds 1396351001 Q * thierryp Remote host closed the connection 1396353076 Q * Aiken Remote host closed the connection 1396353281 J * ircuser-1 ~ircuser-1@35.222-62-69.ftth.swbr.surewest.net 1396353601 Q * fisted Remote host closed the connection 1396353612 J * fisted ~fisted@xdsl-87-78-141-95.netcologne.de 1396354006 J * thierryp ~thierry@62.200.30.45 1396354035 Q * thierryp Remote host closed the connection 1396357115 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:c49d:996b:d5f4:7367 1396357691 J * n ~n@s0.servercrunch.com 1396357711 M * n Hi guys 1396357726 M * n I want to setup memory limits with cgroups 1396357760 M * n I'm running 3.13.6-vs2.3.6.11-beng, util-vserver 0.30.216-pre3054-1 1396357777 M * n First issue that I have is when executing vserver-stat 1396357781 M * n I get errors messages 1396357795 M * n open(memory.stat): No such file or directory 1396357819 M * beng_ did you ask about this on the mailing list? 1396358182 M * n beng_: it was part of one of my emails, btw you responded to it :) 1396358190 M * n I just thought it would be faster to ask here 1396358204 M * n You suggested booting with 3.10 kernel 1396358208 M * n I will try to do that 1396359033 Q * xdr Ping timeout: 480 seconds 1396360126 J * xdr ~xdr@h56n6-aahm-a11.ias.bredband.telia.com 1396360168 Q * SteeleNivenson_ Quit: Leaving 1396360178 J * SteeleNivenson_ ~SteeleNiv@pool-108-29-139-222.nycmny.fios.verizon.net 1396360222 Q * fisted Read error: Connection reset by peer 1396360263 J * fisted ~fisted@xdsl-87-78-141-95.netcologne.de 1396360455 Q * n Quit: leaving 1396360484 J * n ~n@s0.servercrunch.com 1396362233 M * Bertl_oO n: sounds like a setup problem 1396362238 N * Bertl_oO Bertl 1396362323 M * beng_ n, if you unmount your cgroupfs then run "service util-vserver start" what happens? 1396362341 M * beng_ I made a few suggestions on the mailing list n 1396362348 M * beng_ did you have a go? 1396363232 M * beng_ I just got your email responses n 1396363444 M * n Bertl: probably 1396363448 M * n What how can I debug this 1396363461 M * n beng_: cgroupfs is unmounted 1396363467 M * n actually booted with 3.10 1396363471 M * daniel_hozac what util-vserver package are you running? 1396363478 M * daniel_hozac are you sure it's not from Debian? 1396363486 M * daniel_hozac because that initscript only does like a third of what's needed. 1396363494 M * n when i run service util-vserver start it does not mount cgroupfs 1396363503 M * daniel_hozac which suggests it's the Debian one. 1396363505 M * n daniel_hozac: let me double check that 1396363733 M * n daniel_hozac: you were right, i was not running the package from repo.psand.net 1396363796 M * Bertl now you owe beng_ an apology :) 1396363823 M * n i'm rebooting now, so we will see if it will work 1396363829 M * n Bertl: probably yes 1396363832 M * beng_ no need n 1396363844 M * beng_ just run "service util-vserver start" 1396363853 M * beng_ no need to reboot I mean 1396363865 M * beng_ I would accept and apology though :) 1396363865 M * n i need to check why the package from psand repo was upgraded with the one from debian 1396363909 M * beng_ only some apt-pinning would have done that n 1396363922 M * Bertl didn't debian promise to drop the packages :) 1396363968 M * n ok, seems like the limits are working now 1396363983 M * n beng_: i do appologize if i offended you in any way :) 1396364024 M * beng_ Bertl, no Debian promised to drop the kernel packages 1396364043 M * beng_ they've update the util-vserver packages for Debian Jessie 1396364051 M * beng_ https://packages.debian.org/jessie/util-vserver 1396364097 M * Bertl *sigh* 1396364175 M * beng_ lol 1396364182 M * beng_ I'll give them a try at some point 1396364247 M * beng_ ah, now those packages would get installed over the top of the ones from repo.psand.net, I'll make sure I increment the version number when I start packaging for Jessie 1396364289 M * n beng_: i think is what happened, because i'm running sid, the package overwritten by the one from debian 1396364312 M * Bertl do they still do 0.31 packages or so? 1396364324 M * beng_ ah, you said you where running wheezy on the mailing list n 1396364360 M * beng_ Bertl - Package: util-vserver (0.30.216-pre3054-1) 1396364372 M * n "I'm on Debian wheezy with sid" 1396364379 M * beng_ that's the Debian Jessie version of util-vserver 1396364387 M * n i think i wrote it wrong 1396364397 M * beng_ n, that's a bit crazy 1396364403 M * n yes :) 1396364407 M * n so i am running sid 1396364419 M * n my bad, i will clarify everything on the list 1396364431 M * beng_ if you run sid, you should also have jessie in the sources.list, and not wheezy 1396364481 M * beng_ a sid wheezy/jessie combination will not be fun 1396364521 M * beng_ sorry, a sid wheezy combination will not be fun 1396364524 M * n currently i have wheezy / sid 1396364534 M * n so this needs to be fixed 1396364538 M * n on my end 1396364538 M * beng_ well don't ;P 1396364556 M * n i will do that later then :) 1396364568 M * n thanks guys for your help, i appreciate it! 1396364593 M * n i will also switch back to 3.13 1396364602 M * beng_ for util-vserver in Debian Jessie, systemd will probably do the cgroup mounting 1396364618 M * beng_ perhaps try that n 1396364629 M * beng_ and submit bug reports to Debian if it doesn't work 1396364764 M * n for now i will stick what you provide on psand 1396364779 M * n it worked for me for years so 1396364875 M * n when you will increment the package version the issue will be gone :) 1396364940 M * daniel_hozac you could update to 3060 1396364955 M * n daniel_hozac: true :) 1396365042 M * beng_ ah, I hadn't noticed 3060 1396365049 M * beng_ I'll compiled that soon 1396365049 M * daniel_hozac but yes, does dpkg have something like rpm's epoch? you might want to set that. 1396365094 M * beng_ packages can be pinned, held and so on, but I don't know what 'epoch' is 1396365121 M * daniel_hozac epoch is basically a "i don't care what versions you got, i am best!" as far as versions comparisons go. 1396365189 M * beng_ held would be the equivalent 1396365205 M * daniel_hozac no, that's on the client side. 1396365208 M * daniel_hozac epoch is in the package. 1396365216 M * beng_ ah right 1396365246 M * beng_ no, I would supersede the version number to create a similar situation 1396365249 M * daniel_hozac ensuring that the Debian package is never considered an "upgrade". 1396365281 M * beng_ i think that that could be achieved by "pinning" also 1396365302 M * beng_ ah, no that client also 1396365322 M * daniel_hozac looks like epoch's are in dpkg too. 1396365580 M * beng_ it appears that they are part of a packages version number. Is this how it works in RPM too? 1396365704 M * beng_ anyhow, the packages in repo.psand.net aren't prepared for jessie yet, when they are I will make sure they take precedence over the Debian ones 1396368330 J * bonbons ~bonbons@2001:a18:209:4501:918a:46bd:1160:66bb 1396371614 M * daniel_hozac essentially, it's a separate field in RPM. but the version for all intents and purposes is [epoch:]version-release in RPM too. 1396372106 J * zerick ~eocrospom@190.187.21.53 1396372162 Q * zerick Read error: Connection reset by peer 1396372174 J * zerick ~eocrospom@190.187.21.53 1396372245 Q * zerick Read error: Connection reset by peer 1396372262 J * zerick ~eocrospom@190.187.21.53 1396373633 J * fisted_ ~fisted@xdsl-87-78-187-244.netcologne.de 1396373691 J * benl ~benl@dockoffice.sonassihosting.com 1396373702 M * benl Hey troops 1396373728 M * benl @Bertl - remember the file descriptor limit issue I saw in a guest a few weeks ago 1396373734 M * benl Its reared its head again 1396373777 M * benl FILES: 494868 0/ 494868 -1/ -1 0 1396373794 M * benl I've not restarted the guest yet, so what is your suggestion to trace the open files? 1396373855 M * Bertl so this shows that the files are actually allocated and not freed 1396373857 M * benl There are limits applied to the guest now. So trying to enter via `vserver _x_ enter` no longer works 1396373860 M * benl vlogin: openpty(): File table overflow 1396373884 M * Bertl yeah, increse them slightly and enter, then dump a list of processes 1396373892 M * Bertl (or use vps) 1396373902 M * benl How can you increase the limit on-the-fly 1396373920 M * Bertl look for processes in the 'D' state or zombies 'Z' 1396373931 M * Bertl just use vps instead, it's probably safer 1396373953 M * benl [/]$ vps ax | grep -E " [D|Z] " 1396373953 M * benl 2922 0 MAIN ? Z 0:00 [watchdog] 1396373974 M * benl Nothing out of the ordinary 1396373975 M * Bertl yeah, okay, that's on the host, so not an issue 1396374069 M * Bertl list all the processes of this specific guest, and iterate over them to check (from the spectator context) how many file descriptors they have 1396374074 Q * fisted Ping timeout: 480 seconds 1396374076 N * fisted_ fisted 1396374084 M * Bertl I still suspect those will be sockets, but just to make sure 1396374118 M * benl Don't suppose you could give me a few pointers how to list the guest processes without being able to use "vserver x enter" 1396374149 M * benl nm 1396374152 M * benl I'm being thick. 1396374266 M * benl For anyone reading. 1396374273 M * benl vps axu | grep -E "[a-z-]+[\t ]+[0-9]+[\t ]+__XID__" 1396374277 M * benl replace __XID__ as necessary 1396374369 J * derjohn_mob ~aj@tmo-106-84.customers.d1-online.com 1396374736 M * benl Ok 1396374739 M * benl Wrote a quick script 1396374740 M * benl TOTAL_FILES=0 1396374740 M * benl for PID in $(vps axu | grep -E "[a-z-]+[\t ]+[0-9]+[\t ]+101" | awk '{print $2}'); do 1396374740 M * benl FILE_COUNT=$(chcontext --xid 1 bash -c "ls /proc/$PID/fd | wc -l") 1396374740 M * benl TOTAL_FILES=$(( TOTAL_FILES + FILE_COUNT )) 1396374741 M * benl done 1396374742 M * benl echo $TOTAL_FILES 1396374758 M * benl Result is fairly insignificant 1396374759 M * benl 284 1396374803 Q * derjohn_mob Read error: Connection reset by peer 1396374863 M * Bertl okay, kind of expected 1396374872 J * derjohn_mob ~aj@tmo-106-84.customers.d1-online.com 1396374904 M * benl And 1396374905 M * benl chcontext --xid 1 bash -c "cat /proc/net/* 2>/dev/null | wc -l" 1396374910 M * benl Results in 577 1396374951 M * Bertl dump the /proc//net/sockstat for the guest processes 1396374958 M * Bertl and upload the output somewhere 1396375076 M * benl For yours/other ref. 1396375083 M * benl . 1396375083 M * benl TOTAL_SOCKETS=0 1396375083 M * benl for PID in $(vps axu | grep -E "[a-z-]+[\t ]+[0-9]+[\t ]+101" | awk '{print $2}'); do 1396375083 M * benl SOCKETS=$(chcontext --xid 1 cat /proc/4944/net/sockstat | awk 'NR==1{print $3}') 1396375084 M * benl TOTAL_SOCKETS=$(( TOTAL_SOCKETS + SOCKETS )) 1396375084 M * benl done 1396375085 M * benl echo $TOTAL_SOCKETS 1396375091 M * benl Results in 15540 1396375097 M * benl I'll post a more verbose output shortly 1396375132 M * Bertl the orphan/tw count might be interesting 1396375158 M * Bertl the used count is probably the normal usage 1396375201 M * benl Oddly the sockstat's are the same for all PIDs in the guest? 1396375240 M * Bertl hmm, okay, forget it, wrong file 1396375265 M * Bertl I remember a per pid file, this obviously isn't it 1396375385 M * benl Ok 1396375389 M * benl For your ref. http://paste.linux-vserver.org/58643 1396375404 M * benl nothing near the 512k limit set on the guest :\ 1396375587 M * benl What next? 1396375654 M * Bertl good question, it looks like something in the kernel is 'losing' sockets (or at least misplacing them) 1396375667 M * Bertl or maybe just file descriptors 1396375675 M * benl FYI. 1396375675 M * benl 3.9.5-vs2.3.6.5-beng 1396375684 M * benl 0.30.216-pre3038 1396375689 M * Bertl yeah, it might help to update that and recheck 1396375725 M * Bertl one option is to use the debug framework to drop a conditional message for every allocated and freed filehandle 1396375754 M * Bertl and then see what gets allocated but never freed 1396375761 M * benl [~]$ cat /boot/config-$(uname -r) | grep CONFIG_VSERVER_DEBUG 1396375761 M * benl # CONFIG_VSERVER_DEBUG is not set 1396375770 M * benl Out of luck here! 1396375780 M * Bertl yeah, not sure it has the proper hooks anyway 1396375788 M * Bertl i.e. would probably need a new kernel compile 1396375867 M * benl Not sure how to proceed from here! 1396375888 M * benl Is there no way to inform the kernel the file descriptors aren't in use? 1396375900 M * benl Where is the funky accounting coming from to begin with 1396376032 Q * derjohn_mob Read error: Connection reset by peer 1396376152 J * derjohn_mob ~aj@tmo-106-84.customers.d1-online.com 1396376815 M * Bertl the funky accounting is done by Linux-VServer 1396376836 M * benl :( 1396376842 M * Bertl i.e. it keeps count whenever a kernel structure is allocated and/or freed 1396376858 M * Bertl now there are two possibilities here 1396376875 M * Bertl a) we missed a place where it is freed 1396376881 M * Bertl b) it is never freed 1396376896 M * benl Ok 1396376900 M * Bertl in case a) it's not a big deal, just the accounting/limit is wrong 1396376912 M * Bertl no memory is wasted for that 1396376924 M * benl But in a) it does mean that the host limit still needs to continually be increased 1396376929 M * Bertl in case b) there is a problem, but most likely outside Linux-VServer 1396376961 M * Bertl no, the host accounting is mainline, i.e. not Linux-VServer related/patched 1396376993 M * Bertl we just hook into the alloc/free places to do our own accounting per guest 1396376996 M * benl But the host limit also goes up 1396377004 M * benl And stays up - despite the guest being restarted 1396377017 M * benl s/host limit/host max count/g 1396377024 M * Bertl which suggests that it is b 1396377092 M * benl So how do you trace this exactly? 1396377100 M * benl Given no normal tools are showing anything 1396377109 M * benl (lsof etc.) 1396377132 M * Bertl as I said, you can instrument the alloc/free and run some analysis over the collected data 1396377202 M * Bertl personally, if that was my server, I'd first move the suspicious guest to a different host running the 'latest and greatest' with some debug output enabled/added 1396377217 M * Bertl then see if the issue happens again and if so, analyze the output 1396377356 M * benl Its already happening 1396377365 M * benl A guest restart begins the process all over again 1396377513 J * derjohn_mobi ~aj@tmo-106-84.customers.d1-online.com 1396377852 M * Bertl I can probably whip up a patch for the instrumentation part 1396377914 Q * derjohn_mob Ping timeout: 480 seconds 1396378143 Q * thierryp Remote host closed the connection 1396378412 Q * beng_ Quit: I Leave 1396378686 Q * derjohn_mobi Remote host closed the connection 1396379245 M * benl @Bertl 1396379251 M * benl I've been graphing socket usage by guest 1396379261 M * benl and that never increased beyond ~300 1396379266 M * benl So I doubt it was sockets 1396379405 M * benl oddly OFD always stays normal 1396379418 M * benl but FILES is what continually grows 1396379980 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:e1cb:dcc:4a0c:c406 1396380298 Q * benl Quit: HydraIRC -> http://www.hydrairc.com <- Organize your IRC 1396380463 J * thierryp_ ~thierry@home.parmentelat.net 1396380465 Q * thierryp Ping timeout: 480 seconds 1396381917 Q * zerick Read error: Connection reset by peer 1396381949 J * zerick ~eocrospom@190.187.21.53 1396383552 J * Aiken ~Aiken@2001:44b8:2168:1000:21f:d0ff:fed6:d63f 1396385847 Q * bonbons Quit: Leaving 1396386358 J * N3mesis1 ~N3mesis@659AAISHB.tor-irc.dnsbl.oftc.net 1396390408 Q * N3mesis1 Remote host closed the connection 1396392351 Q * Ghislain Quit: Leaving. 1396394159 M * Bertl off to bed now ... have a good one everyone! 1396394167 N * Bertl Bertl_zZ 1396396205 N * l0kit Guest5188 1396396211 J * l0kit ~1oxT@0001b54e.user.oftc.net 1396396613 Q * Guest5188 Ping timeout: 480 seconds