1303171343 J * ichavero_ ~ichavero@189.231.8.242 1303171523 J * ichavero__ ~ichavero@189.155.149.105 1303171724 Q * imcsk8 Ping timeout: 480 seconds 1303171901 Q * ichavero_ Ping timeout: 480 seconds 1303172340 Q * ichavero__ Quit: This computer has gone to sleep 1303175392 Q * PowerKe Ping timeout: 480 seconds 1303175964 J * PowerKe ~tom@94-226-192-17.access.telenet.be 1303176506 J * ichavero__ ~ichavero@189.155.149.105 1303178042 Q * ichavero__ Quit: This computer has gone to sleep 1303182806 J * imcsk8 ~ichavero@148.229.9.250 1303184212 Q * FireEgl Quit: Leaving... 1303186238 J * FireEgl FireEgl@2001:470:e056:1:58dd:8216:6981:1a3d 1303187747 J * manana ~mayday090@84.17.25.149 1303187827 Q * imcsk8 Quit: This computer has gone to sleep 1303187914 J * imcsk8 ~ichavero@148.229.9.250 1303189183 N * Bertl_zZ Bertl 1303189187 M * Bertl morning folks! 1303189670 Q * manana Remote host closed the connection 1303190076 J * derjohn_mob ~aj@d025137.adsl.hansenet.de 1303191576 J * darkhawk ~darkhawk@shell.offlinehoster.de 1303191695 Q * darkhawk_ Ping timeout: 480 seconds 1303194780 Q * derjohn_mob Ping timeout: 480 seconds 1303194928 M * daniel_hozac Bertl: chcontext doesn't wait for context destruction, so in some ways i suppose those traces are expected. 1303194940 M * daniel_hozac although the half-destroyed context is somewhat concerning. 1303195032 M * Bertl yeah, we should make sure that they are either there or gone ... but I'm not sure if userspace or kernel space is to blame here 1303195098 J * ghislain ~AQUEOS@adsl2.aqueos.com 1303195283 M * daniel_hozac hmm 1303195362 M * daniel_hozac shouldn't __lookup_vx_info check for VXS_SHUTDOWN? 1303195400 M * daniel_hozac and maybe unhash_vx_info call __shutdown with the lock? 1303195522 M * daniel_hozac (or unhash before shutdown 1303195548 M * daniel_hozac (which seems more sensical to me) 1303195573 M * Bertl we cannot call shutdown with the lock held 1303195600 M * daniel_hozac shouldn't we unhash before shutdown though? 1303195644 M * daniel_hozac i.e. make it unreachable before we start making it unusable 1303195668 M * Bertl yes, that should be possible and makes sense to me 1303195698 J * petzsch ~markus@dslb-088-075-165-242.pools.arcor-ip.net 1303195772 M * Bertl at least I don't see any reason why the __shutdown() should happen before unhashing 1303195818 M * Bertl even if we end up creating a new context (with the same id) while the __shutdown() is still in progress, we won't interfere as it isn't hashed anymore 1303195830 M * daniel_hozac right 1303195866 M * Bertl so let's see what happens if we move the shutdown after the unhash :^) 1303195875 M * daniel_hozac exciting! :) 1303195940 M * Bertl the interesting part is that this seems to be the first case where this race condition shows up ... so I wonder what the variables are which trigger this 1303195973 M * daniel_hozac well 1303195981 M * Bertl i.e. I couldn't recreate the issue neither in kvm nor on real hardware here 1303195986 M * daniel_hozac i have seen a few race conditions over the years 1303195990 M * daniel_hozac particularly with rpm-fake 1303196004 M * daniel_hozac that i have been unable to explain with userspace 1303196036 M * daniel_hozac this could explain all of those. 1303196038 M * Bertl well, it boots and passes testme.sh here, so it can't be that bad :) 1303196041 M * daniel_hozac sure 1303196052 M * daniel_hozac it's something that very rarely happens 1303196055 M * Bertl no, I meant the new change :) 1303196057 M * daniel_hozac ah 1303196059 M * daniel_hozac hehe 1303196067 M * daniel_hozac sounds promising 1303196075 J * derjohn_mob ~aj@213.238.45.2 1303196086 M * Bertl so, we'll see what happens when frank gets around for some testing .... 1303196108 M * daniel_hozac yeah... 1303196111 M * Bertl it's quite interesting that he can trigger the issue with a single bash line 1303196121 M * daniel_hozac yeah 1303196124 M * daniel_hozac that's convenient. 1303196133 M * daniel_hozac i've never seen it be that easy before 1303196134 M * daniel_hozac heh 1303196160 M * daniel_hozac maybe his kernels have/lack some debug option? 1303196207 M * Bertl I've been running the line in a while sleep 0.1; do since I went to bed .. not a single failure there 1303196216 M * daniel_hozac interesting. 1303196234 M * daniel_hozac maybe run them in parallel? 1303196236 M * Bertl it might be related to scheduler settings though 1303196242 M * daniel_hozac yeah... 1303196273 M * Bertl yes, when I run them in parallel I can trigger this (or similar) 1303196311 M * daniel_hozac even with the changed kernel? 1303196321 M * Bertl well, the messages are different I think 1303196326 M * Bertl vcontext: vc_ctx_migrate(): No such process 1303196331 M * Bertl vcontext: vc_ctx_create(): Device or resource busy 1303196333 M * Bertl vs 1303196341 M * Bertl vspace: vc_enter_namespace(): No such process 1303196347 M * daniel_hozac right... 1303196387 M * daniel_hozac i suppose that is kind of expected. 1303196405 M * daniel_hozac STATE_SETUP should return EBUSY. 1303196475 M * Bertl where? 1303196488 M * daniel_hozac for migrate 1303196494 M * daniel_hozac sorry, create. 1303196518 M * Bertl ah, you are confirming the output, not commenting on a bug :) 1303196530 M * daniel_hozac yes :) 1303196541 M * Bertl got it, was confused for a moment ... 1303198832 Q * imcsk8 Quit: This computer has gone to sleep 1303201554 J * frank\ d5d3efb2@ircip4.mibbit.com 1303201563 M * frank\ good morning! 1303201600 M * frank\ Bertl: I'm at work now - if you need me to compile something with heavy debug. 1303201688 M * Bertl we'll try a simple patch first, if that's okay with you :) 1303201733 M * Bertl http://vserver.13thfloor.at/ExperimentalT/delta-unhash-fix01.diff 1303201890 M * frank\ with my plain .config as usual - or anything with more debug? 1303201901 M * Bertl same setup as before 1303202153 M * frank\ ok - build started (2.6.38.3 + patch-2.6.38.3-vs2.3.0.37-rc14.diff + delta-unhas-fix01.diff) 1303202166 M * Bertl great! thanks! 1303202926 M * frank\ build done - rebooting into this kernel 1303203676 M * frank\ sorry - was distracted... 1303203677 M * frank\ =) 1303203697 M * frank\ did reboot - new kernel is up and running with patches as stated above ---> failure is gone 1303203736 M * Bertl okay, so we consider this issue fixed then :) 1303203820 M * frank\ nice - thanks alot! 1303203832 M * Bertl no problem, thanks for reporting and testing! 1303203901 M * frank\ will there be rc15 vserver patch anytime soon today then? 1303203939 J * thierryp ~thierry@lns-bzn-47f-62-147-212-202.adsl.proxad.net 1303203947 M * Bertl maybe, but I wouldn't count on it, it's still a minor issue 1303203959 M * frank\ kk 1303203963 M * frank\ ;) 1303203972 Q * thierryp Remote host closed the connection 1303205066 Q * frank\ Quit: http://www.mibbit.com ajax IRC Client 1303205150 J * bsingh ~balbir@121.245.0.235 1303212689 J * BenG ~bengreen@cpc12-aztw24-2-0-cust146.aztw.cable.virginmedia.com 1303213349 M * Bertl nap attack ... bbl 1303213353 N * Bertl Bertl_zZ 1303213395 Q * BenG Quit: I Leave 1303213465 Q * derjohn_mob Ping timeout: 480 seconds 1303215421 Q * petzsch Quit: Leaving. 1303215579 J * derjohn_mob ~aj@213.238.45.2 1303215975 J * ktwilight__ ~keliew@91.176.23.140 1303216270 Q * ktwilight_ Ping timeout: 480 seconds 1303220164 M * disposable i have a vserver running asterisk and am seeing lots of messages saying - "Too many open files." How do i increase this limit for a vserver? it's kind of urgent and I can't make a mistake. 1303220182 M * daniel_hozac /etc/vservers//ulimits/nofile 1303220194 M * daniel_hozac assuming you use vserver ... enter. 1303220200 M * daniel_hozac if you use e.g. ssh, check your pam configuration 1303220213 M * disposable daniel_hozac: so i'll put a value in there and will have to restart the vserver? 1303220232 M * daniel_hozac no, another vserver ... enter will suffice. 1303220236 M * daniel_hozac and restarting the service from there. 1303220265 M * daniel_hozac you might want to check with ulimit -Hn and ulimit -Sn 1303220274 M * daniel_hozac before restarting it in vain 1303220522 Q * Chlorek Remote host closed the connection 1303220585 M * disposable daniel_hozac: sorry for asking again, but i just need a y/n answer. after "echo 100000 > /etc/vservers//ulimits/nofile" do i need to restart the ? 1303220641 M * hijacker disposable, vserver ... enter , check ulimit, then restart the service 1303220643 J * Chlorek chlorek@chlorek.com 1303220782 M * disposable hijacker: by 'restart the service' you mean restart the program that's having issues with opening new files, right? 1303220786 J * petzsch ~markus@dslb-088-075-165-242.pools.arcor-ip.net 1303220794 M * hijacker indeed 1303220803 M * hijacker from the shell that has the proper ulimits 1303220861 M * disposable hijacker: thank you 1303220915 M * hijacker you're welcome, i just rewrote what daniel_hozac said ;-) 1303221046 M * disposable hijacker: does the base server need to have ulimit on open files higher_or_equal than the vserver? 1303221058 M * disposable hijacker: or are teh two values independend? 1303221175 M * daniel_hozac no relation 1303222310 M * disposable daniel_hozac: thank you 1303222627 M * Mr_Smoke Hm 1303222645 M * Mr_Smoke Any reason why ~single_ip could cause bind9 to stop responding to IPv6 requests ? 1303222665 M * daniel_hozac no. 1303222674 M * Mr_Smoke Ok, misconfiguration then probably 1303222676 M * Mr_Smoke Looking it up 1303222727 M * Mr_Smoke Hm weird 1303222732 M * Mr_Smoke Restarting the service worked 1303222744 M * Mr_Smoke Looks like a race condition upon vserver startup 1303222770 M * daniel_hozac unlikely 1303222797 M * Mr_Smoke But still 1303222801 M * Mr_Smoke I just recreated it 1303222814 M * Mr_Smoke Apr 19 16:19:17 ns0 named[18003]: additionally listening on IPv6 interface eth0, 2001:758:d00d::180#53 1303222817 M * Mr_Smoke Apr 19 16:19:17 ns0 named[18003]: could not listen on UDP socket: permission denied 1303222820 M * Mr_Smoke Apr 19 16:19:17 ns0 named[18003]: creating IPv6 interface eth0 failed; interface ignored 1303222844 M * daniel_hozac strace 1303222913 M * Mr_Smoke strace what ? the initscript ? 1303222919 M * daniel_hozac sure 1303222962 M * Mr_Smoke That calls for some meddling, hangon 1303223045 M * Mr_Smoke daniel_hozac: if only fails *during* vserver startup 1303223059 M * Mr_Smoke So I'll have to modify the stock initscript 1303223105 M * daniel_hozac sure 1303223610 M * Mr_Smoke daniel_hozac: 20475 bind(513, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:758:d00d::180", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address) 1303223687 M * Mr_Smoke It really looks like a race 1303223704 M * Mr_Smoke Although I've never seen such a race before. The only difference in setup here is ~single_ip 1303223748 M * daniel_hozac and is the IP address there when you start it? 1303223780 M * Mr_Smoke Later on there is : 1303223780 M * Mr_Smoke 20475 bind(513, {sa_family=AF_INET6, sin6_port=htons(53), inet_pton(AF_INET6, "2001:758:d00d::180", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EACCES (Permission denied) 1303223790 M * Mr_Smoke daniel_hozac: when the vserver is up, yes, it's there alright 1303223844 M * daniel_hozac sure 1303223849 M * daniel_hozac but is it there when that is executed? 1303223858 M * Mr_Smoke How can I know that ? 1303223866 M * Mr_Smoke Adding more stuff to init script I guess 1303223867 M * Mr_Smoke Let's see 1303223977 M * Mr_Smoke daniel_hozac: aha. It's there, but as tentative 1303223986 M * Mr_Smoke Meaning DAD is probably still ongoing 1303223995 M * daniel_hozac yep 1303224016 M * Mr_Smoke Hm 1303224027 M * daniel_hozac of course, you can just bind to ::. 1303224030 M * Mr_Smoke VServer is *too* fast then ? 1303224038 M * daniel_hozac seemingly so. 1303224052 M * Mr_Smoke Hm I don't get it, how will binding to :: solve this issue ? 1303224068 M * Mr_Smoke if it's "tentative" at startup, it won't retry later 1303224079 M * daniel_hozac :: is not limited to a specific IP. 1303224091 M * Mr_Smoke Sure, but that's the only one available to that context 1303224095 M * daniel_hozac sure. 1303224101 M * Mr_Smoke sorry, not following :) 1303224120 Q * petzsch Read error: Connection reset by peer 1303224121 M * daniel_hozac a socket bound to :: will allow access from any IPv6 address that is available. 1303224129 M * daniel_hozac at the time of the connection. 1303224137 M * daniel_hozac as opposed to at the time of starting the service. 1303224141 M * Mr_Smoke Oh 1303224143 M * Mr_Smoke Let's try then 1303224221 M * Mr_Smoke daniel_hozac: damn straight :) Thanks a bunch :)à 1303224246 M * daniel_hozac personally, i never specify IP addresses if i can help it. 1303224271 M * daniel_hozac makes it easier to clone it to a test vserver with a different IP, and do upgrades/change config/etc. 1303224309 M * Mr_Smoke True. 1303224313 M * Mr_Smoke I'll follow your advice :) 1303225701 J * dowdle ~dowdle@scott.coe.montana.edu 1303226020 N * Bertl_zZ Bertl 1303226038 M * Bertl back now ... 1303230112 J * bonbons ~bonbons@2001:960:7ab:0:a04e:e5a6:af54:aff9 1303230260 J * imcsk8 ~ichavero@148.229.9.250 1303230353 Q * derjohn_mob Ping timeout: 480 seconds 1303231316 J * petzsch ~markus@dslb-088-075-165-242.pools.arcor-ip.net 1303232411 Q * imcsk8 Quit: This computer has gone to sleep 1303233885 J * derjohn_mob ~aj@d025137.adsl.hansenet.de 1303235477 Q * derjohn_mob Ping timeout: 480 seconds 1303236671 J * derjohn_mob aj@88.128.65.177 1303238633 Q * derjohn_mob Ping timeout: 480 seconds 1303240298 Q * jrklein Remote host closed the connection 1303240328 J * jrklein ~quassel@2001:470:1f0f:572::250:160 1303240724 J * hijacker_ ~hijacker@87-126-142-51.btc-net.bg 1303241041 Q * jrklein Remote host closed the connection 1303241070 J * jrklein ~quassel@2001:470:1f0f:572::250:160 1303241932 Q * jrklein Quit: Quitting 1303241984 J * jrklein ~quassel@2001:470:1f0f:572::250:160 1303242317 Q * daniel_hozac Ping timeout: 480 seconds 1303242606 Q * jrklein Quit: Quitting 1303242637 J * jrklein ~quassel@2001:470:1f0f:572::250:160 1303243060 J * daniel_hozac ~daniel@c-923071d5.08-230-73746f22.cust.bredbandsbolaget.se 1303243516 Q * jrklein Remote host closed the connection 1303243545 J * jrklein ~osx@2001:470:1f0f:572::250:160 1303244224 Q * hijacker_ Quit: Leaving 1303244545 J * imcsk8 ~ichavero@189.155.126.15 1303244697 J * ichavero_ ~ichavero@189.155.102.11 1303245001 J * thierryp ~thierry@lns-bzn-47f-62-147-212-202.adsl.proxad.net 1303245008 Q * imcsk8 Read error: Operation timed out 1303245009 Q * thierryp Remote host closed the connection 1303245194 Q * ichavero_ Quit: This computer has gone to sleep 1303246377 Q * petzsch Quit: Leaving. 1303246846 Q * bonbons Quit: Leaving 1303246910 J * imcsk8 ~ichavero@189.155.102.11 1303247495 J * cuba33ci_ ~cuba33ci@111-240-169-22.dynamic.hinet.net 1303247824 Q * cuba33ci Ping timeout: 480 seconds 1303247836 N * cuba33ci_ cuba33ci 1303247994 Q * imcsk8 Ping timeout: 480 seconds 1303249920 J * derjohn_mob ~aj@d047079.adsl.hansenet.de 1303251377 J * imcsk8 ~ichavero@148.229.9.250 1303254419 Q * Piet_ Remote host closed the connection 1303254450 Q * nkukard Read error: Operation timed out 1303254466 J * Piet_ ~Piet__@1RDAAAJ3D.tor-irc.dnsbl.oftc.net 1303255141 Q * dowdle Remote host closed the connection 1303255243 J * dowdle ~dowdle@scott.coe.montana.edu 1303257562 Q * ghislain Quit: Leaving.