1540685008 J * uberjay ~uberjay@114-37-208-105.dynamic-ip.hinet.net 1540685051 Q * uberjay Remote host closed the connection 1540687300 N * arekmx arekm 1540687334 N * arekm Guest1565 1540689171 J * DougieBot5000 ~DougieBot@31.130.106.66 1540689302 Q * DougieBot5000 Remote host closed the connection 1540689790 J * giesen ~giesen@lfbn-rei-1-371-249.w86-225.abo.wanadoo.fr 1540689793 Q * giesen Remote host closed the connection 1540690015 J * awal6 ~awal@ip158c40.banglalionwimax.com 1540690515 Q * awal6 Ping timeout: 480 seconds 1540691514 Q * any0n Remote host closed the connection 1540691553 J * any0n ~k@4G4AAAUYD.tor-irc.dnsbl.oftc.net 1540692142 J * fstd_ ~fstd@xdsl-85-197-57-125.netcologne.de 1540692593 Q * fstd Ping timeout: 480 seconds 1540692638 Q * romster Quit: Leaving 1540695329 J * cybrNaut ~cybrNaut@211.176.11.141 1540695376 Q * cybrNaut Remote host closed the connection 1540696184 J * peikk0 ~peikk0@cpe-24-90-155-113.nyc.res.rr.com 1540696216 Q * peikk0 Remote host closed the connection 1540705526 M * Guy- Bertl_oO: I can reproduce arekm's issue reliably on one specific box, but on no others 1540706141 M * Bertl_oO cool, what makes that box different from all the others? 1540706371 J * buirg ~buirg@212.92.123.192 1540706465 Q * buirg Remote host closed the connection 1540712346 J * OmniMancer ~OmniMance@5.37.250.169.dynamic-dsl-ip.omantel.net.om 1540712806 M * Guy- unfortunately, nothing is immediately obvious 1540712841 M * Guy- I can post dmidecode, dmesg, lspci, cpuinfo and such later 1540712845 Q * OmniMancer Ping timeout: 480 seconds 1540712857 M * Guy- (in the evening, probably) 1540712885 M * Guy- if you think anything else would be useful, tell me and I'll post that too 1540713115 J * Boniche ~Boniche@179.42.224.139 1540713117 M * Bertl_oO a decoded kernel trace of the stuck CPU would be cool :) 1540713147 Q * Boniche Remote host closed the connection 1540716020 J * romster ~romster@158.140.215.184 1540717679 M * Guest1565 I have two almost exactly the same machines and yet only one triggers that bug 1540717684 N * Guest1565 arekm 1540717905 M * arekm or even exact machines if I remember correctly 1540717918 A * arekm back to fighting with sysrq 1540720387 J * Chon_Lee ~Chon_Lee@1.246.40.192 1540720392 Q * Chon_Lee autokilled: Possible spambot. Mail support@oftc.net if you think this is in error. (2018-10-28 09:53:12) 1540722074 M * Ghislain differences in firware versions ? 1540722080 M * Ghislain firmware 1540722751 M * arekm sysrq (even just h) fails to work when lockup happens 1540723019 M * arekm cpu is different, memory different, disks different, the rest is the same 1540723066 M * arekm wake up switch as reported by dmitool is different, Wake-up Type: Power Switch vs Wake-up Type: LAN Remote but that shouldn't matter 1540723088 M * arekm - Version: Intel(R) Xeon(R) CPU X5470 @ 3.33GHz 1540723088 M * arekm + Version: Intel(R) Xeon(R) CPU E5462 @ 2.80GHz 1540723101 M * arekm E5462 is where problem happens. X5470 is immune 1540723241 M * arekm E cpu doesn't support xsave 1540723417 M * Bertl_oO "rest ist the same" means same network card, same operating system, same kernel, yes? 1540723450 M * Bertl_oO also same network setup (except for IP) and same filter rules/tables? 1540723586 M * arekm mainboard, kernel, firmware - the same 1540723628 M * arekm if I remember corectly previously I tested "ping" in single mode, so there was no iptables rules beside simple routing 1540723644 M * arekm let me retry to be 100% sure 1540724248 J * rxcomm18 ~rxcomm@119.42.83.172 1540724735 Q * rxcomm18 Ping timeout: 480 seconds 1540726490 J * stealth_ ~stealth_@85.102.72.136 1540726533 Q * stealth_ Remote host closed the connection 1540727002 M * arekm so far my reproducer is, two sessions, A and B - A: ip a a 1.1.1.1 dev eth0; A: ncontext --create --nid 999 /bin/bash; and in separate B: naddress --add --nid 999 --ip 1.1.1.1; A: wget wp.pl 1540727401 M * arekm some backtrace (catched via ipmi which makes it messy/loosing lines etc): https://pastebin.com/uF47umws 1540727536 M * arekm https://pastebin.com/6xh5LrAs 1540727539 M * Bertl_oO messy is a nice description :) 1540727582 M * Bertl_oO but the trace is interesting 1540727599 M * Bertl_oO can you at least addr2line the v4_dev_in_nx_info+0x6a/0x140 1540729129 M * arekm (gdb) info line *v4_dev_in_nx_info+0x6a/0x140 1540729131 M * arekm Line 72 of "/tmp/B.FgMWvw/BUILD/kernel-4.9-4.9.135/linux-4.9/kernel/vserver/inet.c" starts at address 0xffffffff810e9c70 and ends at 0xffffffff810e9c75 . 1540729271 M * arekm (gdb) info line *0xffffffff810e9c70 1540729272 M * arekm Line 72 of "/tmp/B.FgMWvw/BUILD/kernel-4.9-4.9.135/linux-4.9/kernel/vserver/inet.c" starts at address 0xffffffff810e9c70 and ends at 0xffffffff810e9c75 . 1540729275 M * arekm (gdb) info line *0xffffffff810e9c70+0x6a 1540729278 M * arekm Line 291 of "/tmp/B.FgMWvw/BUILD/kernel-4.9-4.9.135/linux-4.9/include/linux/spinlock.h" starts at address 0xffffffff810e9cd2 and ends at 0xffffffff810e9ce1 . 1540729553 M * Bertl_oO well, line 72 is the start of the function 1540729564 M * Bertl_oO we want the offset of 0x6A 1540729592 M * arekm (gdb) info line *v4_dev_in_nx_info+0x6a 1540729592 M * arekm warning: Could not find DWO CU kernel/vserver/inet.dwo(0x5968e3d9ab23cf78) referenced by CU at offset 0x495a [in module /tmp/vmlinux-4.9.135-1] 1540729595 M * arekm Line 291 of "/tmp/B.FgMWvw/BUILD/kernel-4.9-4.9.135/linux-4.9/include/linux/spinlock.h" starts at address 0xffffffff810e9cd2 and ends at 0xffffffff810e9ce1 . 1540729744 M * arekm (gdb) info line *(v4_dev_in_nx_info+0x5f) 1540729745 M * arekm Line 70 of "/tmp/B.FgMWvw/BUILD/kernel-4.9-4.9.135/linux-4.9/include/linux/vs_inet.h" starts at address 0xffffffff810e9cc9 and ends at 0xffffffff810e9cd2 . 1540729776 M * arekm and line 70 is v4_addr_in_nx_info: if ((tmask & NXA_LOOPBACK) && 1540730215 M * arekm now such trace: https://pastebin.com/QYaxgRWm 1540731264 M * arekm Bertl_oO: could you try this? ip rule add from 123.123.123.0/29 table 10 and then reproducer from "so far my reproducer is, two sessions..." above? 1540731299 M * arekm Ghislain: do you use multiple routing tables on the machine where you can reproduce? 1540731468 J * ip_forwa1d ~ip_forwa1@203-222-12-243.veetime.com 1540731495 M * Bertl_oO arekm: please let me know the exact steps you do after the system comes up 1540731503 M * Bertl_oO I'll try to recreate here ... 1540731516 Q * ip_forwa1d Remote host closed the connection 1540731520 M * arekm everything on host: 1540731523 M * arekm ip rule add from 123.123.123.0/29 table 10 1540731528 M * arekm ip a a 1.1.1.1 dev eth0 1540731533 M * arekm ncontext --create --nid 999 /bin/bash 1540731546 M * arekm in separate host session assign ip to nid 999: naddress --add --nid 999 --ip 1.1.1.1 1540731556 M * arekm back inside /bin/bash session: ping 12.12.12.12 1540731661 M * Bertl_oO nothing unusual happens ... 1540731946 M * Bertl_oO can you try if the following script triggers it for you? 1540731948 M * Bertl_oO https://pastebin.com/raw/2SYkBJwf 1540731995 M * Bertl_oO note that initially all interfaces are down and without IPs 1540732007 M * Bertl_oO (which might differ from your setup) 1540732044 M * arekm it differs, lo and eth0 are up and eth0 has assigned public routable IP + default routing is set and I'm logged in over ssh to do "host" stuff 1540732062 M * arekm and system is booted with init=/bin/sh, so nothing else runs 1540732075 M * arekm let me try to simplify things here and test 1540732254 M * Bertl_oO okay, doesn't trigger with lo/eth0 up and IP assigned either 1540732404 M * arekm sshd in, so there will be some traffic? 1540732499 M * Bertl_oO not yet 1540732552 J * cYmen ~cYmen@host86-191-38-251.range86-191.btcentralplus.com 1540732596 Q * cYmen Remote host closed the connection 1540732816 M * arekm here adding public ip, default working routing + script (so ping doesn't return "unreachable") is enough to trigger 1540732902 M * Bertl_oO but this still only works on the 'special' machine, yes? 1540733188 Q * Aiken Remote host closed the connection 1540733367 M * arekm my current reproducer is: https://pastebin.com/wp3DgSFR 1540733934 M * arekm Bertl_oO: tested on other machine that works fine with 4.9 but doesn't use multiple routing tables and boom: https://pastebin.com/KA9H9ud0 1540734040 M * arekm takes about 30-60s for kernel to notice lockup. new ssh connections to host stop working earlier 1540734531 J * MrAlexandr0 ~MrAlexand@198.255.26.2 1540734573 M * Bertl_oO works just fine here, no problem at all 1540734577 Q * MrAlexandr0 Remote host closed the connection 1540734590 M * arekm real hw or qemu? 1540734619 M * Bertl_oO currently kvm, I don't have a test system with recent 4.9 1540734800 J * anthk_ ~anthk_@125-231-108-69.dynamic-ip.hinet.net 1540734817 Q * anthk_ Remote host closed the connection 1540734914 J * deanman ~deanman@205.185.209.115 1540734937 Q * deanman Remote host closed the connection 1540735394 J * GTHaxor ~GTHaxor@5.145.203.170 1540735417 Q * GTHaxor Remote host closed the connection 1540736313 J * L0j1k ~L0j1k@194.187.249.190 1540736318 Q * L0j1k Remote host closed the connection 1540737141 M * arekm Bertl_oO: locked up in virtualbox guest (can't login in, can't ctrl+c but no stack traces though... ) 1540737161 M * Bertl_oO nice, can you upload the image for me? 1540737256 M * arekm ftp://ftp1.pld-linux.org/dists/th/PLD/x86_64/RPMS/kernel-4.9-4.9.133-1.x86_64.rpm, can you extract from rpm easily? (if not I'll convert to tgz) 1540737294 M * Bertl_oO rpm is not a problem, but I was talking about the virtualbox image 1540737299 J * Alibaba ~Alibaba@5ED181FA.cm-7-2c.dynamic.ziggo.nl 1540737305 M * arekm ah 1540737338 Q * Alibaba Remote host closed the connection 1540737747 J * tobie ~tobie@p1742003-ipngn15001hodogaya.kanagawa.ocn.ne.jp 1540737759 Q * tobie Remote host closed the connection 1540738029 M * arekm Bertl_oO: http://ixion.pld-linux.org/~arekm/PLD%20Test/ 1540738148 M * arekm root/passwd 1540738799 J * martiert_work ~martiert_@148.101.58.209 1540739202 Q * martiert_work Remote host closed the connection 1540739214 J * ChirnoBot ~ChirnoBot@181.166.151.99 1540739237 M * Bertl_oO arekm: okay, tx, got it, will test tonight 1540739259 Q * ChirnoBot Remote host closed the connection 1540739634 M * arekm poldek; install package if you want something installed inside 1540739755 M * Bertl_oO okay, thanks! 1540740205 J * cougar_ ~cougar_@211.198.66.46 1540740220 Q * cougar_ Remote host closed the connection 1540740717 J * dElAvA ~dElAvA@87.110.160.245 1540740759 Q * dElAvA Remote host closed the connection 1540742105 J * zopieux ~zopieux@170.80.227.120 1540742140 Q * zopieux Remote host closed the connection 1540743203 J * mdo ~mdo@27.34.41.244 1540743221 Q * mdo Remote host closed the connection 1540744582 J * Jupelius ~Jupelius@190.186.119.69 1540745084 Q * Jupelius Ping timeout: 480 seconds 1540746202 J * A|TARIS ~A|TARIS@94.15.238.200 1540746222 Q * A|TARIS Remote host closed the connection 1540746278 J * merpnderp ~merpnderp@176.194.35.197 1540746282 Q * merpnderp Remote host closed the connection 1540747269 J * ephemeron ~ephemeron@abrx239.neoplus.adsl.tpnet.pl 1540747302 Q * ephemeron Remote host closed the connection 1540755492 J * savoir-faire ~savoir-fa@123.21.108.245 1540755493 Q * savoir-faire autokilled: Spambot. Don't mail support@oftc.net if you think this is in error. (2018-10-28 19:38:12) 1540757327 J * hiyosi ~hiyosi@178.121.226.231 1540757432 J * Aiken ~Aiken@b951.h.jbmb.net 1540757809 Q * hiyosi Ping timeout: 480 seconds 1540758559 J * Juggie ~Juggie@175.214.213.136 1540758560 Q * Juggie autokilled: Spambot. Don't mail support@oftc.net if you think this is in error. (2018-10-28 20:29:20) 1540758622 J * DrTenma ~DrTenma@81.171.81.127 1540758623 Q * DrTenma autokilled: Spambot. Don't mail support@oftc.net if you think this is in error. (2018-10-28 20:30:23) 1540760475 J * jvanbure ~jvanbure@121.131.147.201 1540760476 Q * jvanbure autokilled: Spambot. Don't mail support@oftc.net if you think this is in error. (2018-10-28 21:01:15) 1540762786 J * neet ~neet@anon-62-97.vpn.ipredator.se 1540762847 J * Llamatron2112 ~Llamatron@188.250.5.90 1540762849 Q * Llamatron2112 autokilled: Spambot. Don't mail support@oftc.net if you think this is in error. (2018-10-28 21:40:49) 1540763619 J * Spec-Chum ~Spec-Chum@177.11.47.48 1540764114 Q * Spec-Chum Ping timeout: 480 seconds 1540765750 J * molz ~molz@194.242.96.25 1540765792 Q * molz Remote host closed the connection 1540765923 J * davido ~davido@148.0.33.47 1540765944 Q * davido autokilled: Spambot. Don't mail support@oftc.net if you think this is in error. (2018-10-28 22:32:24) 1540766597 J * pizzaops ~pizzaops@64.145.94.71 1540766613 Q * pizzaops autokilled: Spambot. Don't mail support@oftc.net if you think this is in error. (2018-10-28 22:43:33) 1540770695 Q * neet