From: Mathieu Desnoyers Date: Wed, 2 Oct 2013 00:06:37 +0000 (-0400) Subject: Fix: urcu-bp segfault in glibc pthread_kill() X-Git-Tag: v0.9.0~137 X-Git-Url: https://git.liburcu.org/?p=urcu.git;a=commitdiff_plain;h=c1be8fb947a1d56a0f0b2fd82eca6f23c467b0ee;hp=c1be8fb947a1d56a0f0b2fd82eca6f23c467b0ee Fix: urcu-bp segfault in glibc pthread_kill() This fixes an issue that appears after this recent urcu-bp fix is applied: Fix: urcu-bp: Bulletproof RCU arena resize bug Prior to this fix, on Linux at least, the behavior was to allocate (and leak) one memory map region per reader thread. It worked, except for the unfortunate leak. The fact that it worked, even though not the way we had intended it to, is is why testing did not raise any red flag. That state of affairs has prevailed for a long time, but it was side-tracking some issues. After fixing the underlying bug that was causing the memory map leak, another issue appears. The garbage collection scheme reclaiming the thread tracking structures in urcu-bp fails in stress tests to due a bug in glibc (tested against glibc 2.13 and 2.17). Under this workload, on a 2-core/hyperthreaded i7: ./test_urcu_bp 40 4 10 we can easily trigger a segmentation fault in the pthread_kill() code. Program terminated with signal 11, Segmentation fault. Backtrace: #0 __pthread_kill (threadid=140723681437440, signo=0) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:42 42 ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory. (gdb) bt full #0 __pthread_kill (threadid=140723681437440, signo=0) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:42 __x = pd = 0x7ffcc90b2700 tid = val = #1 0x0000000000403009 in rcu_gc_registry () at ../../urcu-bp.c:437 tid = 140723681437440 ret = 0 chunk = 0x7ffcca0b8000 rcu_reader_reg = 0x7ffcca0b8120 __PRETTY_FUNCTION__ = "rcu_gc_registry" #2 0x0000000000402b9c in synchronize_rcu_bp () at ../../urcu-bp.c:230 cur_snap_readers = {next = 0x7ffcb4888cc0, prev = 0x7ffcb4888cc0} qsreaders = {next = 0x7ffcb4888cd0, prev = 0x7ffcb4888cd0} newmask = {__val = {18446744067267100671, 18446744073709551615 }} oldmask = {__val = {0, 140723337334144, 0, 0, 0, 140723690351643, 0, 140723127058464, 4, 0, 140723698253920, 140723693868864, 4096, 140723690370432, 140723698253920, 140723059951840}} ret = 0 __PRETTY_FUNCTION__ = "synchronize_rcu_bp" #3 0x0000000000401803 in thr_writer (_count=0x76b2f0) at test_urcu_bp.c:223 count = 0x76b2f0 new = 0x7ffca80008c0 old = 0x7ffca40008c0 #4 0x00007ffcc9c83f8e in start_thread (arg=0x7ffcb4889700) at pthread_create.c:311 __res = pd = 0x7ffcb4889700 now = unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140723337336576, 6546223316613858487, 0, 140723698253920, 140723693868864, 4096, -6547756131873848137, -6547872135220034377}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 pagesize_m1 = sp = freesize = __PRETTY_FUNCTION__ = "start_thread" #5 0x00007ffcc99ade1d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 It appears that the memory backing the thread information can be relinquished by NPTL concurrently with execution of pthread_kill() targeting an already joined thread and cause this segfault. We were using pthread_kill(tid, 0) to discover if the target thread was alive or not, as documented in pthread_kill(3): If sig is 0, then no signal is sent, but error checking is still per‐ formed; this can be used to check for the existence of a thread ID. but it appears that the glibc implementation is racy. Instead of using the racy pthread_kill implementation, implement cleanup using a pthread_key destroy notifier for a dummy key. This notifier is called for each thread exit and destroy. Signed-off-by: Mathieu Desnoyers ---