1368576023 M * ser i mean the whole PERF_EVENTS thingy 1368576438 J * thierryp ~thierry@home.parmentelat.net 1368576491 M * Jb_boin well, it can be nice to "debug" performance issues 1368576607 M * Jb_boin but usually not that useful on a production espacially for a non root user 1368576925 Q * thierryp Ping timeout: 480 seconds 1368577334 Q * Kabaka Remote host closed the connection 1368577987 J * Kabaka kabaka@equine.vacantminded.com 1368580069 J * thierryp ~thierry@home.parmentelat.net 1368580555 Q * thierryp Ping timeout: 480 seconds 1368583066 J * neofutur neofutur@cahier2.ww7.be 1368583699 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:847d:3917:af50:adfb 1368584181 Q * thierryp Ping timeout: 480 seconds 1368587331 J * thierryp ~thierry@home.parmentelat.net 1368587815 Q * thierryp Ping timeout: 480 seconds 1368590416 Q * alex3 Remote host closed the connection 1368590962 J * thierryp ~thierry@home.parmentelat.net 1368591184 N * Bertl_zZ Bertl 1368591188 M * Bertl morning folks! 1368591448 Q * thierryp Ping timeout: 480 seconds 1368592581 J * alex1 ~alex@v1.fob.spline.inf.fu-berlin.de 1368594171 J * thierryp ~thierry@home.parmentelat.net 1368594456 Q * marcel__ Ping timeout: 480 seconds 1368594967 J * marcel__ ~marcel@pd95c7474.dip0.t-ipconnect.de 1368595521 Q * thierryp Remote host closed the connection 1368599559 Q * hparker Ping timeout: 480 seconds 1368599847 J * thierryp ~thierry@zebra.inria.fr 1368599927 Q * nkukard Ping timeout: 480 seconds 1368600118 J * hparker ~hparker@2001:470:1f0f:32c:beae:c5ff:fe01:b647 1368600164 M * Bertl off for now ... bbl 1368600170 N * Bertl Bertl_oO 1368600559 J * Ghislain ~aqueos@adsl1.aqueos.com 1368600725 J * nkukard ~nkukard@197.87.148.190 1368601765 Q * quasisane Quit: leaving 1368602740 Q * ncopa Quit: Leaving 1368602928 J * ncopa ~test@3.203.202.84.customer.cdi.no 1368603142 J * quasisane ~sanep@0001267b.user.oftc.net 1368603333 Q * thierryp Remote host closed the connection 1368603659 J * thierryp ~thierry@zebra.inria.fr 1368604527 J * grembleb ~bengreen@cpc35-aztw23-2-0-cust207.18-1.cable.virginmedia.com 1368604617 Q * grembleb Remote host closed the connection 1368605094 J * grembleb ~bengreen@cpc35-aztw23-2-0-cust207.18-1.cable.virginmedia.com 1368605183 M * Ghislain Bertl, honor privacy of guest is just host to guest limitation or does it improve guets/guest security ? 1368605275 M * fback Ghislain: afair spectator context will have limited access to guests 1368605585 M * Ghislain yes , i wonder what are the other effects of that setting 1368605619 M * Ghislain host => guest is not an issue to me but if it lower guest/guest this is one so before disabling it i would love to know :) 1368606081 J * thierryp_ ~thierry@zebra.inria.fr 1368606081 Q * thierryp Read error: Connection reset by peer 1368606391 J * derjohn_mob ~aj@87.253.171.198 1368606727 M * Bertl_oO Ghislain: guest/guest is not affected 1368607050 Q * derjohn_mob Ping timeout: 480 seconds 1368607589 Q * nkukard Read error: Connection timed out 1368607619 M * Ghislain thanks bertl great then i will disbale it, this will help monitoring from the host of the guest 1368607765 J * derjohn_mob ~aj@87.253.171.214 1368607992 J * nkukard ~nkukard@197.87.148.190 1368608101 Q * ncopa Quit: Leaving 1368608377 Q * Aiken Remote host closed the connection 1368608849 J * ncopa ~test@3.203.202.84.customer.cdi.no 1368608956 Q * nkukard Read error: Connection timed out 1368609272 J * nkukard ~nkukard@197.87.148.190 1368613714 M * Bertl_oO off for a nap ... bbl 1368613725 N * Bertl_oO Bertl_zZ 1368616190 Q * ircuser-1 Ping timeout: 480 seconds 1368617253 Q * thierryp_ Remote host closed the connection 1368617491 Q * BlackPanx Read error: Connection reset by peer 1368619482 J * thierryp ~thierry@zebra.inria.fr 1368619653 J * ircuser-1 ~ircuser-1@35.222-62-69.ftth.swbr.surewest.net 1368622316 Q * thierryp Remote host closed the connection 1368624146 J * thierryp ~thierry@zebra.inria.fr 1368624553 J * nlm__ ~nlm@host47.190-230-238.telecom.net.ar 1368624630 Q * thierryp Ping timeout: 480 seconds 1368624944 Q * nlm_ Ping timeout: 480 seconds 1368626800 Q * grembleb Quit: I Leave 1368626844 N * Bertl_zZ Bertl_oO 1368627404 J * thierryp ~thierry@zebra.inria.fr 1368628118 J * nlm_ ~nlm@host47.190-230-238.telecom.net.ar 1368628403 Q * hparker Remote host closed the connection 1368628404 Q * thierryp Remote host closed the connection 1368628526 Q * nlm__ Ping timeout: 480 seconds 1368628673 J * hparker ~hparker@2001:470:1f0f:32c:beae:c5ff:fe01:b647 1368629554 J * grembleb ~bengreen@cpc35-aztw23-2-0-cust207.18-1.cable.virginmedia.com 1368630452 J * whocares ~whocares@host-82-135-63-232.customer.m-online.net 1368630495 M * whocares hey guys. anybody knows if iptables in guest system is possible? 1368630781 M * Bertl_oO yes, is possible with network namespaces, but usuaally it is simpler to 'relay' the necessary rules to the host system 1368630891 M * whocares hi bertl: i need to run it with fail2ban so this could be problematic from host side. 1368631033 M * hparker I run fail2ban on the host, just send the logs from the guest to host 1368631136 M * Ghislain you can also use hsots.allow files to do the job if really you need to sticxk in the guest 1368632617 J * thierryp ~thierry@home.parmentelat.net 1368632719 Q * whocares 1368632721 Q * thierryp Read error: No route to host 1368632721 J * thierryp_ ~thierry@2a01:e35:2e2b:e2c0:207f:58b5:29db:16a4 1368634148 Q * grembleb Quit: I Leave 1368634315 J * bonbons ~bonbons@2001:a18:20a:9d01:7403:ffd3:683e:2e78 1368634689 Q * derjohn_mob Ping timeout: 480 seconds 1368636293 M * Jb_boin or use denyhosts which is using hosts.allow but it doesnt have all the fail2ban capabilities 1368637072 J * Aiken ~Aiken@2001:44b8:2168:1000:21f:d0ff:fed6:d63f 1368637614 J * BlackPanx ~alen@31.15.133.178 1368638115 M * Ghislain yep it does only ssh i think 1368640766 Q * nlm_ Remote host closed the connection 1368641387 J * hijacker_ ~hijacker@cable-84-43-134-121.mnet.bg 1368642067 J * nlm_ ~nlm@host47.190-230-238.telecom.net.ar 1368642530 Q * nlm_ Remote host closed the connection 1368643468 J * nlm_ ~nlm@host47.190-230-238.telecom.net.ar 1368643949 Q * thierryp_ Remote host closed the connection 1368643972 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:207f:58b5:29db:16a4 1368644046 J * thierryp_ ~thierry@home.parmentelat.net 1368644181 J * BenG_ ~bengreen@cpc35-aztw23-2-0-cust207.18-1.cable.virginmedia.com 1368644456 Q * thierryp Ping timeout: 480 seconds 1368644528 Q * thierryp_ Ping timeout: 480 seconds 1368645289 Q * BenG_ Quit: I Leave 1368646488 Q * hijacker_ Quit: Leaving 1368646997 Q * Ghislain Quit: Leaving. 1368646999 J * Ghislain ~aqueos@adsl1.aqueos.com 1368647264 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:8820:3acf:b189:b2cb 1368647484 Q * Ghislain Ping timeout: 480 seconds 1368648046 Q * thierryp Ping timeout: 480 seconds 1368648313 Q * PowerKe Quit: leaving 1368648832 J * Arach ~arach@04ZAAATDC.tor-irc.dnsbl.oftc.net 1368649438 J * PowerKe ~tom@94-227-30-112.access.telenet.be 1368649818 N * l0kit Guest5552 1368649824 J * l0kit ~1oxT@0001b54e.user.oftc.net 1368649940 J * orzel ~orzel@000127e2.user.oftc.net 1368650033 M * orzel Bertl_oO: hi... there ? My server is doing really weird things again. There are 2cpus and cpuusage show that 100% (of 200% total) is used in 'system'. I can't see that in top, but loadavg is ~12 which is not at all usual 1368650075 M * orzel i tried to restart the vserver, and now it's stucks (prompt doesn't get back at "vserver web restart") 1368650093 M * orzel on another console 'vserver-stats' still shows rootserver/monitoring/web 1368650106 M * orzel but i can NOT enter it, of course 1368650125 M * orzel (and i have yet another console stucked). Otherwise, the main host seems to behave ok 1368650138 M * orzel i can't find anything useful with sysrq... thought that you might be more lucky 1368650225 Q * Guest5552 Ping timeout: 480 seconds 1368650362 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:e085:3977:29bf:4c93 1368650374 Q * thierryp Remote host closed the connection 1368650411 M * Bertl_oO orzel: did you dump the 'stuck cpu' and similar? 1368650431 M * Bertl_oO in case of doubt, dump all task and cpus with magic-sysrq 1368650567 Q * nkukard Max SendQ exceeded 1368650727 J * thierryp ~thierry@home.parmentelat.net 1368650786 Q * orzel Read error: Connection reset by peer 1368650796 J * orzel ~orzel@000127e2.user.oftc.net 1368650804 M * orzel Bertl_oO: this ? 'w' - Dumps tasks that are in uninterruptable (blocked) state. 1368650850 Q * thierryp Remote host closed the connection 1368650882 Q * nlm_ Read error: Operation timed out 1368650883 J * thierryp ~thierry@home.parmentelat.net 1368650965 M * Bertl_oO yep, for example, but also dump the cpu state/traces 1368651038 M * orzel this one is weird : it only displayed one cpu 1368651053 M * Bertl_oO but you have more than one? 1368651091 M * Bertl_oO and do you use a serial console or a vga one? 1368651207 M * orzel Bertl_oO: i have 2 cpus (/proc/cpuinfo confirms this) : here's an example http://pastebin.com/5rVf9DFt 1368651227 M * orzel do get that i used 'echo > /proc/sysrq-trigger 1368651228 J * nkukard ~nkukard@197.87.148.190 1368651234 M * orzel so now i got all of those in /var/log/kern.log 1368651244 M * orzel s/do/to :) 1368651352 M * orzel btw, i just noticed this in kernel log : May 15 22:25:27 localhost kernel: INFO: rcu_sched self-detected stall on CPU { 0} (t=3030201 jiffies) 1368651356 M * orzel looks bad, right ? 1368651541 Q * thierryp Remote host closed the connection 1368651544 M * orzel here's another sysrq output for "blocked state" http://pastebin.com/cnvraz0T 1368651545 M * Bertl_oO do you have a trace for that one? 1368651548 M * orzel (doesn't speak to me) 1368651555 M * Bertl_oO i.e. for the self-detected stall 1368651564 M * orzel yes, there's a stack 1368651566 Q * bonbons Quit: Leaving 1368651586 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:141e:4374:aa32:98d0 1368651595 M * orzel there : http://pastebin.com/Zvn107ap 1368651685 M * Bertl_oO how often does this happen? 1368651721 M * Bertl_oO or the better question is: how long do you have to wait until something like this happens? 1368651769 M * orzel you mean in the log ? very often (every ~4min ?) 1368651803 M * Bertl_oO so, once you reboot, it takes less than 10 minutes till you get a self detected stall? 1368651807 M * orzel ah... this time i had to wait really few, the server had an uptime of 3days 22h 1368651817 M * orzel but usually it's more like 30days before i got some really bad problems 1368651839 M * orzel nono, the ~4min is the frequency in logs for the stalled dump 1368651852 M * Bertl_oO ah, okay, understood 1368651870 M * orzel started at 18:56 today according to logs 1368651885 M * orzel (european times, it's 23:4 right now) 1368651896 M * Bertl_oO well, I see ext4 over dm, which I remember having problems at some point, but they should be fixed in this kernel 1368651921 M * Bertl_oO is the trace (for the stall) you uploaded the first one? 1368651939 M * Bertl_oO if not, please see if you can find the first occurance in the logs 1368651947 M * orzel currently using 3.6.11-vs2.3.4.6 but i had the pb with 3.7.10-vs2.3.5.6 1368652016 M * orzel it was not the first, here's the first stall trace : http://pastebin.com/cBFSWqfd 1368652028 M * orzel (not sure if the end is related/relevant) 1368652123 M * Bertl_oO okay, do you have the kernel source/build tree at hand? 1368652130 M * orzel yeps 1368652143 M * orzel at least for the one currently running (3.6.11-vs2.3.4.6) 1368652153 M * Bertl_oO check if addr2line -e vmlinux ffffffff8103ba8f 1368652160 M * Bertl_oO gives you a file and a linenumber 1368652185 M * orzel nope, it says : ??:? 1368652219 M * Bertl_oO okay, so the kernel was built without debug information :( 1368652236 M * orzel yes, very probably. it's a production one (sorta)... 1368652269 M * Bertl_oO next time you build a kernel, please enable CONFIG_DEBUG_INFO=y 1368652290 M * orzel there's no such thing in my .config 1368652291 M * orzel ? 1368652292 M * Bertl_oO it doesn't affect performance on the system, it just generates the necessary debug information 1368652309 M * Bertl_oO it depends on CONFIG_DEBUG_KERNEL=y IIRC 1368652329 M * orzel k 1368652417 M * Bertl_oO please check the log, if the process listed around such traces (the initial ones) usually is uptimed 1368652439 M * Bertl_oO (it might be triggering the issue somehow) 1368652448 M * orzel found kernel option 1368652458 M * orzel you know what uptimed is ? 1368652481 M * orzel not very important, i can stop it. It's supposed to be very unintrusive (just keep track of uptime) 1368652494 M * Bertl_oO yeah, I know 1368652509 M * Bertl_oO but it might help to pinpoint the issue 1368652536 M * Bertl_oO it might even be a good test tool if it can be made more agressive 1368652582 M * orzel i had already disabled (soft) watchdog in fear it was responsible. i can't remember why i thought it might be the source 1368652607 M * Bertl_oO probably because I told you at some point that it often causes soft lockups 1368652660 M * orzel oh ? i dont think you told me, but great to know :) 1368652667 M * Bertl_oO if you have more logs from previous issues, try to find the first stack trace you get (in each of them) and upload them somewhere (or look for similarities) 1368652705 M * orzel unfortunately this is the first time i got any trace 1368652730 M * orzel ah, wait, no. i do for this 'stall' stuff 1368652795 M * orzel such as this one http://pastebin.com/BFC4Hwcg 1368652803 M * orzel according to last i had to reboot few hours later 1368652840 M * Bertl_oO is that one followed by something else after this (incomplete) trace? 1368652888 M * orzel more : http://pastebin.com/TXEzSyXN 1368652897 M * orzel (i thought it was looping on the stall stuff) 1368652947 M * Bertl_oO hmm, yeah, looks like it is, strange ... 1368652972 M * orzel it goes on like this until the reboot ... 1368652998 M * orzel this probably will sound stupid to you, but i used to check /var/log/messages and never checked this kern.log 1368652998 M * Bertl_oO a hardware problem can be excluded? i.e. did you switch out the machine or do you experience this on more than one hardware? 1368653001 M * orzel i discover lots of things :) 1368653009 M * Bertl_oO :) 1368653046 M * orzel i have this only on this computer. My first guess was bad ram and I tested this. It was ok 1368653063 M * orzel it might be something else but i dont know what / how to check 1368653082 M * orzel the server has worked for several years before this 1368653108 M * Bertl_oO well, from the traces, I would suspect the APIC, I saw similar issues years ago on an older Asus mainboard 1368653136 A * orzel checks his logs : bought in september 2005, used as server since october 2009 1368653156 M * Bertl_oO but that should show up on a normal kernel as well (so nothing Linux-VServer specific) 1368653162 M * orzel any way to check the 'apic' ? I dont know what this is.... or just that it's somehow related to interrupts and/or smp 1368653197 M * Bertl_oO yeah, it's the Advanced Programmable Interrupt Controller 1368653209 M * orzel is that something that can break ? Because i'm pretty sure the computer went very smooth for years (as desktop first and server then) 1368653218 M * Bertl_oO and also a source for high resolution timers used in the kernel 1368653235 M * orzel ah, i remember seeing options about this in configuring the kernel 1368653249 M * Bertl_oO it can cause hard to detect issues with multi processing (i.e. SMP) 1368653288 M * Bertl_oO but I'm not saying that this is the case here, I'm just doing some brainstorming 1368653302 M * orzel ok 1368653310 M * Bertl_oO how did you test the memory? 1368653329 M * orzel running memtest86plus for some hours 1368653338 M * orzel (aka "huge server downtime") 1368653345 M * Bertl_oO hmm, okay, I see 1368653362 M * orzel i had also used 'memtester' before (userland stuff, doesn't test it all but still doesn't break uptime) 1368653365 M * Bertl_oO I have a better suggestion for some hardware testing you can do without downtime 1368653373 M * orzel i'm all listening :) 1368653394 M * Bertl_oO get a linux kernel source (doesn't matter which one) 1368653437 M * Bertl_oO make allyesconfig, let it build, remove any modules/configs which cause an error (if there are any) 1368653455 M * Bertl_oO (probably simplier to use 'make defconfig') 1368653489 M * Bertl_oO then write a short script which does a 'make clean' and 'make -j 99' 1368653514 M * Bertl_oO and check that the make -j 99 did finish with success 1368653517 Q * thierryp Remote host closed the connection 1368653555 M * orzel ouch, it will be on its knees. almost down 1368653555 M * Bertl_oO then run this in a loop with .e.g ionice -c3 -- nice -n 20 -- ./build.sh 1368653573 M * orzel gee, i had never heard about ionice 1368653592 M * Bertl_oO that should not affect your system much, but it should use any idle cycles for building in many threads 1368653627 M * Bertl_oO you can reduce the -j 99 to e.g. -j 8 or -j 6 if it brings down the system too much 1368653640 M * orzel i'd rather, yes.. still enough to fill the two cpus, no ? 1368653654 M * Bertl_oO nah, it should be at least two threads on each cpu 1368653666 M * Bertl_oO so don't go below 4 (with two cpus) 1368653669 M * orzel -j 8 should do it 1368653672 M * orzel ok 1368653675 M * orzel and your hope is that it will trigger the pb,.. right ? 1368653690 M * Bertl_oO there are two possible results 1368653711 M * Bertl_oO 1) the build fails at some point (please check and log the build success) 1368653738 M * Bertl_oO 2) the problem is triggered and your system logs a trace 1368653776 M * Bertl_oO well, in theory there is also 3) nothing happens 1368653805 M * Bertl_oO which means the problem is neither load nor scheduling related and we have to look somewhere else 1368653813 M * orzel mm,ok 1368653820 M * orzel i'm installing the new kernel (with debug stuff in) 1368653823 M * Bertl_oO but kernel building with a known good kernel config should always succeed 1368653845 M * orzel i have lot of 'known' good .config lying around. i can just use that then ? 1368653846 M * Bertl_oO and that's an excellent test for memory and cpu 1368653879 M * Bertl_oO you can, but it should have a bunch of modules (i.e. not a stripped down config) because otherwise you won't get several threads 1368653925 M * orzel g 1368653966 M * orzel find /lib/modules/3.6.11-vs2.3.4.6/ -name '*.ko' |wc -l -> 331 1368653969 M * orzel seems enough ? 1368654404 J * derjohn_mob ~aj@85.182.152.178 1368654911 M * orzel ok, server now booted with a kernel CONFIG_DEBUG_INFO=y 1368655041 M * orzel tail -f on kern.log 1368655048 M * orzel and playing with "build a lot with -j 8" 1368655325 J * thierryp ~thierry@home.parmentelat.net 1368655754 Q * fichte` Quit: bashpipe 1368655807 Q * thierryp Ping timeout: 480 seconds 1368655814 J * FIChTe ~fichte@bashpipe.de 1368656096 M * Bertl_oO excellent! off to bed now ... have a good one everyone! 1368656100 N * Bertl_oO Bertl_zZ 1368656113 M * orzel Bertl_zZ: good night. Already one run without pb. wait&see 1368656820 Q * derjohn_mob Read error: Connection timed out 1368657111 J * derjohn_mob ~aj@d046100.adsl.hansenet.de 1368657965 Q * derjohn_mob Ping timeout: 480 seconds 1368658501 J * derjohn_mob ~aj@tmo-104-111.customers.d1-online.com 1368658974 J * thierryp ~thierry@2a01:e35:2e2b:e2c0:9803:ec2d:ec4b:bd98 1368659295 J * nlm ~nlm@host47.190-230-238.telecom.net.ar 1368659456 Q * thierryp Ping timeout: 480 seconds 1368661003 Q * Aiken Remote host closed the connection 1368661182 J * Aiken ~Aiken@2001:44b8:2168:1000:21f:d0ff:fed6:d63f