1393547719 Q * zerick Ping timeout: 480 seconds 1393548491 J * zerick ~eocrospom@190.114.248.34 1393549212 Q * zerick Ping timeout: 480 seconds 1393557502 N * Bertl_zZ Bertl 1393557523 M * Bertl morning folks! 1393563916 J * undefined ~undefined@00011a48.user.oftc.net 1393564428 N * l0kit Guest1697 1393564433 J * l0kit ~1oxT@0001b54e.user.oftc.net 1393564799 Q * Guest1697 Ping timeout: 480 seconds 1393566963 J * Ghislain ~aqueos@adsl1.aqueos.com 1393569133 J * SteeleNivenson ~SteeleNiv@105-237-5-122.access.mtnbusiness.co.za 1393571484 Q * Romster Quit: Geeks shall inherit properties and methods of object earth. 1393571693 J * Romster ~Romster@202.168.100.149.dynamic.rev.eftel.com 1393573908 Q * Romster Quit: Geeks shall inherit properties and methods of object earth. 1393574034 J * Romster ~Romster@202.168.100.149.dynamic.rev.eftel.com 1393574202 M * hijacker morning 1393583455 Q * SteeleNivenson Ping timeout: 480 seconds 1393586571 J * SteeleNivenson ~SteeleNiv@105-236-119-12.access.mtnbusiness.co.za 1393587538 J * SteeleNivenson_ ~SteeleNiv@105-237-5-122.access.mtnbusiness.co.za 1393587634 Q * SteeleNivenson Ping timeout: 480 seconds 1393589378 Q * Aiken Remote host closed the connection 1393589536 Q * ircuser-1 Ping timeout: 480 seconds 1393589660 M * ard daniel_hozac : do you use memory cgroups with or without swap control... 1393589696 M * ard Or do your vservers have seperate filesystems? 1393589784 M * ard http://paste.linux-vserver.org/54800 after a lot of these OOMs ext4 locks up 1393589805 M * ard ext4 on that partition I mean... 1393589862 M * Bertl usually with swap control 1393589918 M * Bertl locks up where? do you have a kernel stack trace? 1393589941 M * ard I've got a console log full of trace :-) 1393589954 M * Bertl upload them somewhere to look at 1393589993 M * ard it's very hard to trigger... 1393590042 M * Bertl that's why you already triggered it and collected all the cpu stack traces :) 1393590068 M * ard actually I have a system set up doing just OOM and disk write, never triggered it :-( 1393590101 M * Bertl so no strack trace from the ext4 lockup 1393590132 M * ard coming... 1393590143 M * ard have to cp cp cp cross DMZ :-) 1393590158 M * Bertl hehe 1393590329 M * ard http://217.196.41.9/~ard/bertl-is-the-nicest-guy-in-the-world/jimmy.txt 1393590345 M * ard From initial boot on friday till the lockup this morning 1393590397 M * daniel_hozac that problem sounds familiar. 1393590450 M * Bertl but I don't see any ext4 in the traces 1393590481 M * Bertl we got our favorite friend (stuck cpu) 1393590496 M * daniel_hozac first one has __ext4_journal_stop 1393590499 M * Bertl and we got a lot of mem_cgroup trying to charge 1393590519 M * ard Hmmm.... yes... ... 1393590554 M * ard the 3.10.15 and lower did have a lot of ext4... But in the beginning it starts of with ext4 1393590562 M * Bertl yeah, it looks like that there is an oom kill 1393590572 M * ard a single one? 8-D 1393590578 M * Bertl and the journal_stop never finishes because it got killed 1393590589 M * Bertl yes, a single one on the ext4 path :) 1393590609 M * ard wait, what?? you guys just see that from this dump... 1393590622 M * Bertl you could probably easily trigger this by increasing the time the journal stop takes 1393590632 M * ard I should just write an extra hour, and paypal that to you guys :-) 1393590640 Q * undefined Quit: Closing object 1393590697 M * Bertl daniel_hozac: do you agree that this might be the cause? 1393590798 M * ard Ok... this is even better than watching you*rn... I've been trying to trigger this thing for months 1393590874 M * ard Hmmm , increasing the journal stop means I have to increase flushing times or hack some sleep in the code :-) 1393590902 M * Bertl sleep won't work, I presume you wouldn't be allowed to sleep in this path 1393590910 M * ard :-) 1393590926 M * Bertl but you could run a busy loop for a little while 1393590971 M * ard My idea was to trigger it and then slowly rebuild the system to an lxc, because as I see it, it is an upstream bug, right? 1393590973 M * Bertl I'm going to update the kernel to .32, maybe check if there is a fix in the changelog 1393591020 A * ard already has applied the patch on a .32, and it seems to apply with a few offsets 1393591051 M * Bertl really? I see quite a number of colissions here 1393591064 M * daniel_hozac Bertl: yeah, that seems likely. 1393591198 M * ard I mean the 3.10.27-vs2.3.6.8 patch :-) 1393591215 M * Bertl yes, that's what I'm talking about too 1393591249 M * ard 3 offsets, 1 fuzz, and the Makefile 1393591308 M * ard I checked the fuzz, and it seemed ok to me 1393591317 M * Bertl okay 1393591320 M * ard (inet_diag) 1393591352 M * Bertl so updating won't help you then :) 1393591673 M * ard there are some changes in oom handling 1393591684 M * Bertl not on our side 1393591716 M * Bertl i.e. it looks like an ext4 problem which usually doesn't trigger 1393591748 M * ard Anyway: as a work around was thinking about turning off the swap in the cgroup handler, and was hoping that ext4 then gets the memory it desires 1393591851 M * ard Or is page cache considered part of the memory? 1393591852 A * ard sighs 1393591962 M * Bertl well, without oom kill, the issue should go away 1393592041 J * ircuser-1 ~ircuser-1@35.222-62-69.ftth.swbr.surewest.net 1393592209 M * ard Task in / killed as a result of limit of /loureed is also nice to read ... :-) 1393592564 M * daniel_hozac yeah... 1393592580 M * daniel_hozac that's what you want to see from a process isolation framework :) 1393592734 M * ard I wouldn't care less if the complete vserver got killed, but it actually kills the host too :-( 1393592743 M * daniel_hozac right 1393592756 M * ard (and with that the other vservers) 1393592967 M * Bertl in any case, report the stack trace to upstream 1393592985 M * Bertl best cc to one of the memory cgroup maintainers 1393593019 M * Bertl but it might be already fixed in 3.13+ 1393593048 M * ard Well, the 3.10 should be a long term stable :-) 1393593079 M * ard (and there are several other bugs that I need to test the patches today) 1393593114 M * ard The kernel is getting quite huge... 1393593232 Q * ard Remote host closed the connection 1393593233 J * ard ~ard@gw-cistron.kwaak.net 1393593295 M * ard I got this so far: lvs pmtud bug for ipv6 (going to test), nat sequence recalculation bug (fixed), queued invalidate of iommu TLB's (work around with intel_iommu=strict, might be fixed), and this... 1393593349 M * ard Maybe I need some jenkins to test all this stuff... 1393593664 J * beng_ ~BenG@cpc29-aztw22-2-0-cust128.18-1.cable.virginm.net 1393593902 Q * mcp Remote host closed the connection 1393594628 J * undefined ~undefined@66-190-97-211.dhcp.unas.tx.charter.com 1393595867 Q * beng_ Quit: I Leave 1393595872 Q * SteeleNivenson_ Ping timeout: 480 seconds 1393596127 J * mcp ~mcp@wolk-project.de 1393601762 J * zerick ~eocrospom@190.187.21.53 1393602138 M * Ghislain just a little question. I tried to share a directory between 2 guest with a bind mount 1393602138 M * Ghislain but it failed (i saw the file but could not write to it). The goal was to have the web server see the socket of the mysql server so i can skip the network stack 1393602241 M * daniel_hozac unix sockets also go through the network stack. 1393602250 M * daniel_hozac just differently. 1393602265 M * Ghislain oh 1393602279 M * daniel_hozac what error dod you get? 1393602284 M * daniel_hozac anything in dmesg? 1393602396 M * Ghislain not anything just that mysql client refused any connect to the server 1393602413 M * Bertl off for a nap ... bbl 1393602425 M * Ghislain putting a network socket work. The setup is a dir on the host that is bond mounted on both servers in /var/run/mysqld/ 1393602427 N * Bertl Bertl_zZ 1393602484 M * Ghislain so if i understand your question, apart from the fact that i could never get the right user/group permissions i should be able to write just fine to it 1393602504 M * Ghislain so this is my setup then i must try again, just wanted to check that it was indeed possible 1393602523 M * Ghislain the socket being 777 user/grp should not ben an issue 1393602563 M * daniel_hozac yeah shouldn't be a problem. 1393602570 M * daniel_hozac is mysql rejecting the connection or the kernel? 1393602612 M * ard shouldn't you be able to see that with screen? screen /.../mysqlsocket ? 1393602643 M * daniel_hozac probably not my test of choice... 1393602666 A * ard also thinks that's not the case... 1393602765 M * Ghislain well the client just say connexion refuse right away 1393602766 M * ard the filename is more like a key for the socket 1393602798 M * daniel_hozac what does strace -fF -e connect mysql... say? 1393602826 M * Ghislain i cannot do the test right now but i will try to do it asap :) 1393602903 M * Ghislain for now it run on a network socket and is used i will have to wait a maintenance window to retry 1393603081 Q * renihs Quit: narf 1393603100 M * Ghislain but the goal was to shortcut MTU and iptables and other happy things happening. Now that i know it should be possible i will try harder at next oportunity 1393607822 M * Guy- I have an ancient box with 2.6.38-vs2.3.0.37-rc14 1393607833 M * Guy- I'm trying to start an ircnet ircd in it, and it fails with this: 1393607834 M * Guy- bind(5, {sa_family=AF_INET6, sin6_port=htons(6667), inet_pton(AF_INET6, "::ffff:0.0.0.0", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 EADDRNOTAVAIL (Cannot assign requested address) 1393607849 M * Guy- the guest does have an ipv6 address 1393607856 M * Guy- nothing is using port 6667 1393607861 M * Guy- what could be wrong? 1393608004 M * shaggy64 Guy- is it set to only bind to the ipv6 address? 1393608026 M * Guy- unfortunately I know very little about it 1393608032 M * Guy- so I have no idea 1393608059 M * shaggy64 Also, did you check netstat on the HOST to see if it is binding to the address already? 1393608061 M * Guy- otoh, it also uses different binds that succeed: bind(5, {sa_family=AF_INET6, sin6_port=htons(6668), inet_pton(AF_INET6, "::", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0 1393608089 M * Guy- shaggy64: nothing is using port 6667 on the host either 1393608141 N * shaggy64 Shaggy63 1393608176 M * Guy- and this is the only guest 1393608234 M * Shaggy63 I could be wrong but it looks like the guest is trying to bind to 0.0.0.0 and ::ffff IE all ips. 1393608250 M * Guy- that's exactly what's intended 1393608262 M * Guy- shouldn't this work? 1393608272 M * Guy- it should end up binding to all IPs available to the guest, no? 1393608391 M * Shaggy63 Will it bind to all ips on 6668 but not 6667? 1393608475 M * Guy- apparently yes, but for 6668 it specifies "::" instead of "::ffff:0.0.0.0", and I don't know what that means 1393608488 M * Guy- the bind succeeds, but the ircd fails to listen() on the socket for some reason anyway 1393608619 M * Guy- if I compile it without ipv6 support, the first (failing) bind becomes this: 1393608620 M * Guy- bind(5, {sa_family=AF_INET, sin_port=htons(6667), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 1393608649 M * Guy- and the second one this: bind(6, {sa_family=AF_INET, sin_port=htons(6668), sin_addr=inet_addr("0.0.0.0")}, 16) = 0 1393608660 M * Guy- (so they're identical then, other than the port number) 1393608897 M * Shaggy63 Then I have no idea. Maybe bert can help when he wakes up from his nap. 1393609205 M * Guy- still, thanks 1393609695 M * Guy- fwiw, it doesn't listen() with ipv6 support disabled either 1393609695 Q * ensc|w Remote host closed the connection 1393610344 N * Bertl_zZ Bertl 1393612040 J * SteeleNivenson ~SteeleNiv@105-236-119-12.access.mtnbusiness.co.za 1393613091 P * undefined 1393613985 Q * SteeleNivenson Ping timeout: 480 seconds 1393614248 J * SteeleNivenson ~SteeleNiv@105-237-5-122.access.mtnbusiness.co.za 1393615405 M * daniel_hozac Guy-: ::ffff:0.0.0.0... so you don't want to receive IPv6 connections then? 1393617074 Q * Defaultti Quit: Quitting. 1393617193 J * Defaultti defaultti@lakka.kapsi.fi 1393617380 Q * SteeleNivenson Ping timeout: 480 seconds 1393618278 J * Aiken ~Aiken@2001:44b8:2168:1000:21f:d0ff:fed6:d63f 1393620070 J * bonbons ~bonbons@2001:a18:207:c601:892e:de1f:d05a:e4bb 1393620777 M * Guy- daniel_hozac: no, the only ipv6 address the guest has is bogus anyway 1393620792 M * Guy- daniel_hozac: it's just that the ircd had been compiled with ipv6 support and failed to start if there was no ipv6 in the guest 1393620905 M * Guy- turns out the failure to listen was caused by a configuration error on a different server that resulted in no meaningful error message locally... *sigh* 1393622199 M * Guy- (and the "solution" to the ipv6 problem was to recompile ircd without ipv6 support) 1393628293 J * undefined ~undefined@00011a48.user.oftc.net 1393629935 Q * Ghislain Quit: Leaving. 1393631006 M * Bertl off to bed now ... have a good one everyone! 1393631014 N * Bertl Bertl_zZ