urcu/annotate: Add CMM annotation The CMM annotation is highly experimental and not meant to be used by user for now, even though it is exposed in the public API since some parts of the liburcu public API require those annotations. The main primitive is the cmm_annotate_t which denotes a group of memory operations associated with a memory barrier. A group follows a state machine, starting from the `CMM_ANNOTATE_VOID' state. The following are the only valid transitions: CMM_ANNOTATE_VOID -> CMM_ANNOTATE_MB (acquire & release MB) CMM_ANNOTATE_VOID -> CMM_ANNOTATE_LOAD (acquire memory) CMM_ANNOTATE_LOAD -> CMM_ANNOTATE_MB (acquire MB) The macro `cmm_annotate_define(name)' can be used to create an annotation object on the stack. The rest of the `cmm_annotate_*' macros can be used to change the state of the group after validating that the transition is allowed. Some of these macros also inject TSAN annotations to help it understand the flow of events in the program since it does not currently support thread fence. Sometime, a single memory access does not need to be associated with a group. In the case, the acquire/release macros variant without the `group' infix can be used to annotate memory accesses. Note that TSAN can not be used on the liburcu-signal flavor. This is because TSAN hijacks calls to sigaction(3) and places its own handler that will deliver the signal to the application at a synchronization point. Thus, the usage of TSAN on the signal flavor is undefined behavior. However, there's at least one known behavior which is a deadlock between readers that want to unregister them-self by locking the `rcu_registry_lock' while a synchronize RCU is made on the writer side which has already locked that mutex until all the registered readers execute a memory barrier in a signal handler defined by liburcu-signal. However, TSAN will not call the registered handler while waiting on the mutex. Therefore, the writer spin infinitely on pthread_kill(3p) because the reader simply never complete the handshake. See the deadlock minimal reproducer below. Deadlock reproducer: ``` #include <poll.h> #include <signal.h> #include <pthread.h> #define SIGURCU SIGUSR1 static pthread_mutex_t rcu_registry_lock = PTHREAD_MUTEX_INITIALIZER; static int need_mb = 0; static void *reader_side(void *nil) { (void) nil; pthread_mutex_lock(&rcu_registry_lock); pthread_mutex_unlock(&rcu_registry_lock); return NULL; } static void writer_side(pthread_t reader) { __atomic_store_n(&need_mb, 1, __ATOMIC_RELEASE); while (__atomic_load_n(&need_mb, __ATOMIC_ACQUIRE)) { pthread_kill(reader, SIGURCU); (void) poll(NULL, 0, 1); } pthread_mutex_unlock(&rcu_registry_lock); pthread_join(reader, NULL); } static void sigrcu_handler(int signo, siginfo_t *siginfo, void *context) { (void) signo; (void) siginfo; (void) context; __atomic_store_n(&need_mb, 0, __ATOMIC_SEQ_CST); } static void install_signal(void) { struct sigaction act; act.sa_sigaction = sigrcu_handler; act.sa_flags = SA_SIGINFO | SA_RESTART; sigemptyset(&act.sa_mask); (void) sigaction(SIGURCU, &act, NULL); } int main(void) { pthread_t th; install_signal(); pthread_mutex_lock(&rcu_registry_lock); pthread_create(&th, NULL, reader_side, NULL); writer_side(th); return 0; } ``` Change-Id: I9c234bb311cc0f82ea9dbefdf4fee07047ab93f9 Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
src: use SPDX identifiers The SPDX identifiers [1] are a legally binding shorthand, which can be used instead of the full boiler plate text. This is another step towards implementing the full REUSE spec [2] to help with copyright and licensing audits and compliance. This will reduce a lot a manual work required for the licensing audit required in Debian on each update. [1] https://spdx.org/ids-how [2] https://reuse.software/tutorial/ Change-Id: Ia28ed8c14984ac9acd140ef544fd6e09b96fb03b Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: call_rcu: teardown default call_rcu worker on application exit Teardown the default call_rcu worker thread if there are no queued callbacks on process exit. This prevents leaking memory. Here is how an application can ensure graceful teardown of this worker thread: - An application queuing call_rcu callbacks should invoke rcu_barrier() before it exits. - When chaining call_rcu callbacks, the number of calls to rcu_barrier() on application exit must match at least the maximum number of chained callbacks. - If an application chains callbacks endlessly, it would have to be modified to stop chaining callbacks when it detects an application exit (e.g. with a flag), and wait for quiescence with rcu_barrier() after setting that flag. - The statements above apply to a library which queues call_rcu callbacks, only it needs to invoke rcu_barrier in its library destructor. Fixes: #1317 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I40556bc872d3df58a22fb88a0dbb528ce5c9b4af
Fix: urcu-qsbr: futex wait: handle spurious futex wakeups Observed issue ============== The urcu-qsbr wait_gp() implements a futex wait/wakeup scheme identical to the workqueue code, which has an issue with spurious wakeups. A spurious wakeup on wait_gp can cause wait_gp to return with a urcu_qsbr_gp.futex state of -1, which is unexpected. It would cause the following loops in wait_for_readers() to decrement the urcu_qsbr_gp.futex to values below -1, thus actively using CPU as values will be decremented to very low negative values until it reaches 0 through underflow, or until the input_readers list is found to be empty. The state is restored to 0 when the input_readers list is found to be empty, which restores the futex state to a correct state for the following calls to wait_for_readers(). This issue will cause spurious unexpected high CPU use, but will not lead to data corruption. Cause ===== From futex(5): FUTEX_WAIT Returns 0 if the caller was woken up. Note that a wake-up can also be caused by common futex usage patterns in unrelated code that happened to have previously used the futex word's memory location (e.g., typical futex-based implementations of Pthreads mutexes can cause this under some conditions). Therefore, call‐ ers should always conservatively assume that a return value of 0 can mean a spurious wake-up, and use the futex word's value (i.e., the user-space synchronization scheme) to decide whether to continue to block or not. Solution ======== We therefore need to validate whether the value differs from -1 in user-space after the call to FUTEX_WAIT returns 0. Known drawbacks =============== None. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I87f7cd3b02820cefe850c3bdb8da27fb2f9be9b2
Add `urcu_posix_assert()` as `assert()` replacement This macro acts like the regular `assert()` macro unless NDEBUG is defined in which case it consumes the expression and becomes a no-op. This consumption trick (see `_urcu_use_expression()` macro) prevents the compiler from warning about unused variables even when assert() are removed by the NDEBUG define. This macro is also used for the existing `urcu_assert_debug()` macro. The implementation of `_urcu_use_expression()` is inspired by the Babeltrace 2 approach. See `BT_USE_EXPR()` macro and documentation in Babeltrace commit [1]: commit 1778c2a4134647150b199b2b57130817144446b0 Author: Philippe Proulx <eeppeliteloop@gmail.com> Date: Tue Apr 21 11:15:42 2020 -0400 lib: assign a unique ID to each pre/postcond. and report it on failure All assertion macros are moved to the new urcu/assert.h file. Link: https://github.com/efficios/babeltrace/commit/1778c2a4134647150b199b2b57130817144446b0 [1] Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: If60ce2d3f45ea8f5ec1dbb92fb43f83fd9f8102b
Port: no symbols aliases on MacOS There is no equivalent to symbols aliases on MacOS, this will unfortunatly break the ABI for SONAME(6) and will require a rebuild of client applications. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Refactor liburcu to support many flavors per compile unit This refactoring keeps the prior use of liburcu "map" APIs unchanged. However, it introduces the following new APIs: Each urcu flavor is now available as its own header: include/urcu/urcu-memb.h include/urcu/urcu-mb.h include/urcu/urcu-signal.h include/urcu/urcu-bp.h include/urcu/urcu-qsbr.h The installed urcu headers that were not under the urcu/ subdirectory are moved there: include/urcu-call-rcu.h -> include/urcu/call-rcu.h include/urcu-defer.h -> include/urcu/defer.h include/urcu-flavor.h -> include/urcu/flavor.h include/urcu-pointer.h -> include/urcu/pointer.h include/urcu-bp.h -> include/urcu/urcu-bp.h include/urcu.h -> include/urcu/urcu.h include/urcu-qsbr.h -> include/urcu/urcu-qsbr.h The liburcu "map" API is now only available for use when URCU_API_MAP is defined before including the liburcu flavor headers. The old headers are now placeholders defining URCU_API_MAP and including the new headers for backward compatibility: include/urcu-bp.h include/urcu-call-rcu.h include/urcu-defer.h include/urcu-flavor.h include/urcu-pointer.h include/urcu-qsbr.h include/urcu.h The header include/urcu/urcu.h now includes the right header between the memb, signal, or mb flavors based on the compiler defines. The symbol names of liburcu flavors are cleaned up, favoring the following hierarchy: urcu_<flavor name>_... This is an ABI-breaking change, however the previous symbols name were kept as aliases to maintain backward compatibility. They will be removed when the next SONAME bump occurs. The new liburcu-memb.so shared object is introduced, properly namespacing this flavor. It is a duplicate of the previous liburcu.so, which is kept around for backward compatibility. The new URCU_API_MAP macro is introduced, controlling whether the urcu API "mapping" should stay defined after inclusion of the flavor headers. Users wishing to use the prior urcu API should either explicitly define URCU_API_MAP before including the urcu/urcu*.h flavor headers, or include the flavor header files from the include toplevel directory, which are placeholders for backward compatibility. Use of many urcu flavors within the same _LGPL_SOURCE compile unit should not use the "map" APIs. Internally, the "map" header files are split into one header per flavor. The include guards are removed, so their effect can be applied more than once. A new include/urcu/map/clear.h header is introduced, which undefines the mappings at the end of the flavor header if URCU_API_MAP is not set. The new APIs namespaced for each urcu flavor is the recommended way to use liburcu. We can expect the prior APIs to eventually become deprecated over time. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Revert "Use initial-exec tls model" This reverts commit 6fd172f599e8d798e68974a786dd930d876f182e. The initial-exec model seems to behave differently than global-dynamic with respect to lazy initialization, causing locks to be taken then first time each thread touch the TLS. This introduces deadlocks with library constructors waiting on other threads. This will require further investigation. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Use initial-exec tls model The initial-exec tls model removes requirement on performing memory allocation the first time a tls variable is touched by any given thread. This is needed to ensure usage of the TLS from a signal handler works fine. Given that the link-editor figures out the right model to use at runtime, we can change the tls model without changing the soname major version. This also brings interesting speedups over the GD model. This does not affects TLS accesses performed by executables, but does affect TLS accesses performed by libraries. * Executable (no change) ./test_urcu 1 0 10 SUMMARY /media/truecrypt1/compudj/doc/userspace-rcu/tests/benchmark/.libs/test_urcu testdur 10 nr_readers 1 rdur 0 wdur 0 nr_writers 0 wdelay 0 nr_reads 4420328692 nr_writes 0 nr_ops 4420328692 (with initial-exec) ./test_urcu 1 0 10 SUMMARY /media/truecrypt1/compudj/doc/userspace-rcu/tests/benchmark/.libs/test_urcu testdur 10 nr_readers 1 rdur 0 wdur 0 nr_writers 0 wdelay 0 nr_reads 4424925864 nr_writes 0 nr_ops 4424925864 * Library (with global-dynamic) ./test_urcu_dynamic_link 1 0 10 SUMMARY /media/truecrypt1/compudj/doc/userspace-rcu/tests/benchmark/.libs/test_urcu_dynamic_link testdur 10 nr_readers 1 rdur 0 wdur 0 nr_writers 0 wdelay 0 nr_reads 573209491 nr_writes 0 nr_ops 573209491 (with initial-exec) ./test_urcu_dynamic_link 1 0 10 SUMMARY /media/truecrypt1/compudj/doc/userspace-rcu/tests/benchmark/.libs/test_urcu_dynamic_link testdur 10 nr_readers 1 rdur 0 wdur 0 nr_writers 0 wdelay 0 nr_reads 1088836185 nr_writes 0 nr_ops 1088836185 Link: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter8-20.html Link: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attributes.html#Common-Variable-Attributes Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cleanup: Re-organise source dir Re-organise the sources, add a top level "src" and "include" dir and move relevant files. Disable autotools automated includes and define them manually. This fixes problems with collision of header names with system headers. Include the autoconf config.h in the default includes and remove it where it's explicitely included. Remove _GNU_SOURCE defines since it's detected at configure for platforms that requires it and added to the config.h. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>