1379656073 N * Bertl_zZ Bertl 1379656076 M * Bertl morning folks! 1379657877 J * Ghislain ~aqueos@adsl1.aqueos.com 1379657997 M * Bertl off for now ... bbl 1379658001 N * Bertl Bertl_oO 1379663806 Q * hparker Quit: I've fallen off the 'net and can't get up 1379663826 J * hparker ~hparker@0000fb24.user.oftc.net 1379674821 Q * ircuser-1 Ping timeout: 480 seconds 1379677343 J * ircuser-1 ~ircuser-1@35.222-62-69.ftth.swbr.surewest.net 1379688343 Q * thierryp Remote host closed the connection 1379688642 J * thierryp ~thierry@home.parmentelat.net 1379688906 Q * thierryp Remote host closed the connection 1379696911 J * thierryp ~thierry@lns-bzn-47f-62-147-212-202.adsl.proxad.net 1379697493 Q * thierryp Remote host closed the connection 1379701714 J * thierryp ~thierry@lns-bzn-47f-62-147-212-202.adsl.proxad.net 1379702196 Q * thierryp Ping timeout: 480 seconds 1379706294 J * bonbons ~bonbons@2001:a18:20f:4601:e8c1:9485:a64f:356a 1379706559 J * hijacker_ ~hijacker@cable-84-43-134-121.mnet.bg 1379708055 Q * hijacker_ Quit: Leaving 1379708098 N * l0kit Guest7296 1379708106 J * l0kit ~1oxT@0001b54e.user.oftc.net 1379708501 Q * Guest7296 Ping timeout: 480 seconds 1379708568 J * thierryp ~thierry@lns-bzn-47f-62-147-212-202.adsl.proxad.net 1379708672 Q * bonbons Quit: Leaving 1379712094 Q * rawplayer Ping timeout: 480 seconds 1379712154 J * rawplayer ~xyzzy@shell.students.os3.nl 1379714567 Q * sannes Remote host closed the connection 1379714723 J * pistache ~nikolay@carlingue.pstch.net 1379714731 M * pistache Hi 1379714779 M * pistache Some hours after boot of my vServer guest, I get many errors related to flock() returning ENOMEM 1379714796 M * pistache From what I know, flock() should not be returning ENOMEM (undocumented error), and I have lots of memory available 1379714843 M * pistache I get this error when postfix is running and receiving mails, usually after the 15th mail. Then, postfix must be force killed, and won't restart anymore because it can't call flock() on the needed files (such as alias.db) 1379714918 M * pistache If I run, for example, newaliases, I get : lock /etc/aliases.db: Cannot allocate memory. Same if I manually set a lock using the flock command, or if I try to restart rsyslogd (can't lock its pid-file) 1379714932 M * Bertl_oO kernel/patch/util-vserver version? 1379714992 M * pistache Linux 2.6.36 x86_64 1379715020 M * pistache util-vserver 0.30.216-pre2864 (debian squeeze) 1379715065 M * Bertl_oO 2.6.36.4 with vs2.3.0.36.39? 1379715078 M * pistache 36.38, but yes 1379715094 M * pistache well, more precisely 2.6.36.1-vs2.3.0.36.38.build1-cti 1379715149 Q * thierryp Remote host closed the connection 1379715299 M * Bertl_oO you might actually be out of kernel memory 1379715307 M * Bertl_oO what does /proc/meminfo show? 1379715313 M * Bertl_oO (please use paste.linux-vserver.org for everything longer than 3 lines) 1379715378 M * pistache here it is http://paste.linux-vserver.org/25156 1379715386 M * pistache thanks a lot for your help, Bertl_oO 1379715506 M * Bertl_oO np, what is the actual syscall failing? 1379715520 M * Bertl_oO can you try with strace -fF ? 1379715654 M * pistache open("/etc/aliases.db", O_RDWR) = 5 1379715665 M * pistache flock(5, LOCK_EX) = -1 ENOMEM (Cannot allocate memory) 1379715681 M * pistache I don't see any other interesting lines in the strace, but I'll paste it anyway since I'm not an expert 1379715835 M * pistache the error above is the one from postfix's newaliases, here is the one from "flock 1" : http://paste.linux-vserver.org/25157 1379715860 M * Bertl_oO okay, sec, checking the kernel source 1379715929 M * Bertl_oO well, despite the manual saying ENOMEM is undocumented, it comes kind of natural 1379715947 M * Bertl_oO fs/locks.c does fl = locks_alloc_lock(); 1379715963 M * Bertl_oO and if that returns NULL, it returns -ENOMEM 1379715989 M * Bertl_oO now that is where Linux-VServer comes into play 1379715989 M * pistache but why wouldn't the kernel allocate a lock ? 1379715992 M * pistache ah 1379715994 J * carpoon ~carpoon@carpoon.hu 1379716004 M * Bertl_oO either kmem_cache_alloc() fails 1379716017 M * Bertl_oO or, which is more likely, this check hits: 1379716024 M * Bertl_oO if (!vx_locks_avail(1)) 1379716041 M * Bertl_oO which means, that you somehow ran out of the available locks 1379716046 M * pistache yes 1379716050 M * pistache I thought about this 1379716053 M * Bertl_oO let's check /proc/virtual//limits 1379716060 M * pistache ok 1379716097 M * pistache I do not have /proc/virtual 1379716108 M * Bertl_oO the xid is the context number of your guest 1379716125 M * Bertl_oO (as seen in vserver-stat for example) 1379716133 M * pistache I do not have access to the host 1379716150 M * Bertl_oO hmm, okay, that's a problem then 1379716170 M * pistache well I'll contact them with these infos, you made me make some progress 1379716173 M * Bertl_oO I presume, the admin has set a certain lock limit which you reached 1379716173 M * pistache just a thing : 1379716177 M * pistache yes yes 1379716180 M * pistache but I thought of it 1379716197 M * Bertl_oO the issue could be that you actually used up the locks, or that the accounting is buggy 1379716201 M * pistache and I set up locks myself using (flock 9; sleep 1000) 9> test 1379716215 M * pistache I managed to get as high as 500 locks, then stopped 1379716239 M * pistache when I test with postfix, it never gets higher than 30 locks, however there are lots of locks/unlocks 1379716260 M * pistache would that mean that as you said, the accounting is buggy ? 1379716260 M * Bertl_oO which would suggest an accounting error 1379716273 M * pistache in the host ? 1379716286 M * Bertl_oO it's possible with older kernels/patches, the admin can easily check 1379716291 M * pistache and would that be a configuration error, a bad build, or a code source bug ? 1379716292 M * pistache ok 1379716309 M * Bertl_oO most likely a kernel patch bug 1379716322 M * pistache do you agree if I paste this conversation to the vserver admin ? (I can strip out your name if needed) 1379716394 M * Bertl_oO yes please, go ahead, and you can leave my name there for contact 1379716424 M * Bertl_oO i.e. either Bertl or Bertl_oO/Bertl_zZ 1379716433 M * pistache Okay 1379716509 M * pistache Thanks a lot, Bertl_oO, you really enlightened me. You can't imagine how depressed I was today, I lost all self-confidence about my sysadmin skills, and was banging my head on the walls because of this. Have a nice day/night, maybe I'll come again if I need more information. 1379716524 M * pistache THanks a lot a gain, your help was greatly appreciated 1379716552 M * Bertl_oO you're welcome! note that the problem can still be caused by your userspace 1379716572 M * pistache because of too frequent lock/unlocks, right ? or even something else ? 1379716581 M * Bertl_oO (i.e. recursive locks or processes hangin on to locks or locks not getting 'unlocked' 1379716599 M * pistache but wouldn't that make the list in /proc/locks increase ? 1379716606 M * Bertl_oO although a limit of 500 locks might be a little low in general 1379716618 M * pistache it keeps really low, even when the problem is happenning (never more than 21, yes 21, locks) 1379716624 M * Bertl_oO (but that's up to the admin/provider) 1379716770 M * pistache ok, thanks again :) 1379716775 M * pistache i'll keep that in mind 1379716977 M * Bertl_oO np 1379719371 J * thierryp ~thierry@lns-bzn-47f-62-147-212-202.adsl.proxad.net 1379719856 Q * thierryp Ping timeout: 480 seconds 1379720164 Q * Ghislain Quit: Leaving.