Fix: change order of _cds_lfht_new_with_alloc parameters The "flavor" parameter should come before the "alloc" parameter to match the order of cds_lfht_new_with_flavor_alloc() parameters. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: Ia704a0fd9cb90af966464e25e6202fed1a952eed
Add support for custom memory allocators for rculfhash The current implementation of rculfhash relies on calloc() to allocate memory for its buckets. This can in some cases lead to latency spikes when accessing the hash table, which can be avoided by using an optimized custom memory allocator. However, there is currently no way of replacing the default allocator with a custom one. This commit allows custom allocators to be used during the table initialization. The default behavior of the hash table remains unaffected, by using the stdlib calloc() and free(), if no custom allocator is given. Signed-off-by: Xenofon Foukas <fon1989@gmail.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: Id9a405e5dc42e5564ff8623394c86056a4d1ff48
Fix -Walloc-size GCC 14 introduces a new -Walloc-size included in -Wextra which gives: ``` urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] workqueue.c:401:20: warning: allocation of insufficient size '1' for type 'struct urcu_workqueue_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] workqueue.c:432:14: warning: allocation of insufficient size '1' for type 'struct urcu_workqueue_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:912:20: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion' with size '16' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] urcu-call-rcu-impl.h:927:22: warning: allocation of insufficient size '1' for type 'struct call_rcu_completion_work' with size '24' [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] qsbr.c:49:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] mb.c:50:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] membarrier.c:50:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] signal.c:49:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] bp.c:49:14: warning: allocation of insufficient size ‘1’ for type ‘struct mynode’ with size ‘40’ [-Walloc-size[https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#index-Walloc-size]] ``` The calloc prototype is: ``` void *calloc(size_t nmemb, size_t size); ``` So, just swap the number of members and size arguments to match the prototype, as we're initialising 1 struct of size `sizeof(struct ...)`. GCC then sees we're not doing anything wrong. Signed-off-by: Sam James <sam@gentoo.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: Id84ce5cf9a1b97bfa942597aa188ef6e27e7c10d
cleanup: use an enum for the error states of nr_cpus_mask Using an enum with labels for error states instead of literal values will make the code easier to read and understand. Change-Id: I4558e17ccb45ab40515bb516af840b2852ee8fc3 Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: urcu-bp: misaligned reader accesses This is a port from a fix in LTTng-UST's embedded urcu (d1a0fad8). The original message follows: Running the LTTng-tools tests (test_valid_filter, for example) under address sanitizer results in the following warning: /usr/include/lttng/urcu/static/urcu-ust.h:155:6: runtime error: member access within misaligned address 0x7fc45db3a020 for type 'struct lttng_ust_urcu_reader', which requires 128 byte alignment 0x7fc45db3a020: note: pointer points here c4 7f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ While the node member of lttng_ust_urcu_reader has an "aligned" attribute of CAA_CACHE_LINE_SIZE, the compiler can't ensure the alignment of members for dynamically allocated instances. The `data` pointer is changed from char* to struct lttng_ust_urcu_reader*, allowing the compiler to enforce the expected alignment constraints. Since `data` was addressed in bytes, the code using this field is adapted to use element counts. As the chunks are only used to allocate reader instances (and not other types), it makes the code a bit easier to read. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I89ea1c32ca3c5c45621b562ab68f47a8428d3574
rculfhash: Only pass integral types to atomic builtins Clang expects the pointers passed to atomic builtins to be integral. Fix this by casting nodes address to uintptr_t *. Change-Id: Ifb8833c493df849a542a22f0bb2baeeb85be0297 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Complete removal of urcu-signal flavor This commit completes removal of the urcu-signal flavor. Users can migrate to liburcu-memb with a kernel implementing the membarrier(2) system call to have similar read-side performance without requiring use of a reserved signal, and with improved grace period performance. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I75b9171e705b9b2ef4c8eeabe6164e5587816fb4
Phase 1 of deprecating liburcu-signal The first phase of liburcu-signal deprecation consists of implementing it in term of liburcu-mb. In other words, liburcu-signal is identical to liburcu-mb at the exception of the function symbols and public header files. This is done by: 1) Removing the RCU_SIGNAL specific code in urcu.c 2) Making the RCU_MB specific code also specific to RCU_SIGNAL in urcu.c 3) Rewriting _urcu_signal_read_unlock_update_and_wakeup to use a atomic store with CMM_SEQ_CST instead of a store CMM_RELAXED with cmm_barrier() around it. We could keep the explicit barriers, but that would require to add some cmm_annotate annotations. Therefore, to be less intrusive in a public header file, simply use the CMM_SEQ_CST like for the mb flavor. Change-Id: Ie406f7df2f47da0a9f464df94b968ad9204821f3 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
urcu/annotate: Add CMM annotation The CMM annotation is highly experimental and not meant to be used by user for now, even though it is exposed in the public API since some parts of the liburcu public API require those annotations. The main primitive is the cmm_annotate_t which denotes a group of memory operations associated with a memory barrier. A group follows a state machine, starting from the `CMM_ANNOTATE_VOID' state. The following are the only valid transitions: CMM_ANNOTATE_VOID -> CMM_ANNOTATE_MB (acquire & release MB) CMM_ANNOTATE_VOID -> CMM_ANNOTATE_LOAD (acquire memory) CMM_ANNOTATE_LOAD -> CMM_ANNOTATE_MB (acquire MB) The macro `cmm_annotate_define(name)' can be used to create an annotation object on the stack. The rest of the `cmm_annotate_*' macros can be used to change the state of the group after validating that the transition is allowed. Some of these macros also inject TSAN annotations to help it understand the flow of events in the program since it does not currently support thread fence. Sometime, a single memory access does not need to be associated with a group. In the case, the acquire/release macros variant without the `group' infix can be used to annotate memory accesses. Note that TSAN can not be used on the liburcu-signal flavor. This is because TSAN hijacks calls to sigaction(3) and places its own handler that will deliver the signal to the application at a synchronization point. Thus, the usage of TSAN on the signal flavor is undefined behavior. However, there's at least one known behavior which is a deadlock between readers that want to unregister them-self by locking the `rcu_registry_lock' while a synchronize RCU is made on the writer side which has already locked that mutex until all the registered readers execute a memory barrier in a signal handler defined by liburcu-signal. However, TSAN will not call the registered handler while waiting on the mutex. Therefore, the writer spin infinitely on pthread_kill(3p) because the reader simply never complete the handshake. See the deadlock minimal reproducer below. Deadlock reproducer: ``` #include <poll.h> #include <signal.h> #include <pthread.h> #define SIGURCU SIGUSR1 static pthread_mutex_t rcu_registry_lock = PTHREAD_MUTEX_INITIALIZER; static int need_mb = 0; static void *reader_side(void *nil) { (void) nil; pthread_mutex_lock(&rcu_registry_lock); pthread_mutex_unlock(&rcu_registry_lock); return NULL; } static void writer_side(pthread_t reader) { __atomic_store_n(&need_mb, 1, __ATOMIC_RELEASE); while (__atomic_load_n(&need_mb, __ATOMIC_ACQUIRE)) { pthread_kill(reader, SIGURCU); (void) poll(NULL, 0, 1); } pthread_mutex_unlock(&rcu_registry_lock); pthread_join(reader, NULL); } static void sigrcu_handler(int signo, siginfo_t *siginfo, void *context) { (void) signo; (void) siginfo; (void) context; __atomic_store_n(&need_mb, 0, __ATOMIC_SEQ_CST); } static void install_signal(void) { struct sigaction act; act.sa_sigaction = sigrcu_handler; act.sa_flags = SA_SIGINFO | SA_RESTART; sigemptyset(&act.sa_mask); (void) sigaction(SIGURCU, &act, NULL); } int main(void) { pthread_t th; install_signal(); pthread_mutex_lock(&rcu_registry_lock); pthread_create(&th, NULL, reader_side, NULL); writer_side(th); return 0; } ``` Change-Id: I9c234bb311cc0f82ea9dbefdf4fee07047ab93f9 Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
urcu-wait: Fix wait state load/store The state of a wait node must be accessed atomically. Also, the action of busy loading until the teardown state is seen must follow a CMM_ACQUIRE semantic while storing the teardown must follow a CMM_RELEASE semantic. Change-Id: I9cd9cf4cd9ab2081551d7f33c0b1c23c3cf3942f Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Add CMM memory model Introducing the CMM memory model with the following new primitives: - uatomic_load(addr, memory_order) - uatomic_store(addr, value, memory_order) - uatomic_and_mo(addr, mask, memory_order) - uatomic_or_mo(addr, mask, memory_order) - uatomic_add_mo(addr, value, memory_order) - uatomic_sub_mo(addr, value, memory_order) - uatomic_inc_mo(addr, memory_order) - uatomic_dec_mo(addr, memory_order) - uatomic_add_return_mo(addr, value, memory_order) - uatomic_sub_return_mo(addr, value, memory_order) - uatomic_xchg_mo(addr, value, memory_order) - uatomic_cmpxchg_mo(addr, old, new, memory_order_success, memory_order_failure) The CMM memory model reflects the C11 memory model with an additional CMM_SEQ_CST_FENCE memory order. The memory order can be selected through the enum cmm_memorder. * With Atomic Builtins If configured with atomic builtins, the correspondence between the CMM memory model and the C11 memory model is a one to one at the exception of the CMM_SEQ_CST_FENCE memory order which implies the memory order CMM_SEQ_CST and a thread fence after the operation. * Without Atomic Builtins However, if not configured with atomic builtins, the following stipulate the memory model. For load operations with uatomic_load(), the memory orders CMM_RELAXED, CMM_CONSUME, CMM_ACQUIRE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are allowed. A barrier may be inserted before and after the load from memory depending on the memory order: - CMM_RELAXED: No barrier - CMM_CONSUME: Memory barrier after read - CMM_ACQUIRE: Memory barrier after read - CMM_SEQ_CST: Memory barriers before and after read - CMM_SEQ_CST_FENCE: Memory barriers before and after read For store operations with uatomic_store(), the memory orders CMM_RELAXED, CMM_RELEASE, CMM_SEQ_CST and CMM_SEQ_CST_FENCE are allowed. A barrier may be inserted before and after the store to memory depending on the memory order: - CMM_RELAXED: No barrier - CMM_RELEASE: Memory barrier before operation - CMM_SEQ_CST: Memory barriers before and after operation - CMM_SEQ_CST_FENCE: Memory barriers before and after operation For load/store operations with uatomic_and_mo(), uatomic_or_mo(), uatomic_add_mo(), uatomic_sub_mo(), uatomic_inc_mo(), uatomic_dec_mo(), uatomic_add_return_mo() and uatomic_sub_return_mo(), all memory orders are allowed. A barrier may be inserted before and after the operation depending on the memory order: - CMM_RELAXED: No barrier - CMM_ACQUIRE: Memory barrier after operation - CMM_CONSUME: Memory barrier after operation - CMM_RELEASE: Memory barrier before operation - CMM_ACQ_REL: Memory barriers before and after operation - CMM_SEQ_CST: Memory barriers before and after operation - CMM_SEQ_CST_FENCE: Memory barriers before and after operation For the exchange operation uatomic_xchg_mo(), any memory order is valid. A barrier may be inserted before and after the exchange to memory depending on the memory order: - CMM_RELAXED: No barrier - CMM_ACQUIRE: Memory barrier after operation - CMM_CONSUME: Memory barrier after operation - CMM_RELEASE: Memory barrier before operation - CMM_ACQ_REL: Memory barriers before and after operation - CMM_SEQ_CST: Memory barriers before and after operation - CMM_SEQ_CST_FENCE: Memory barriers before and after operation For the compare exchange operation uatomic_cmpxchg_mo(), the success memory order can be anything while the failure memory order cannot be CMM_RELEASE nor CMM_ACQ_REL and cannot be stronger than the success memory order. A barrier may be inserted before and after the store to memory depending on the memory orders: Success memory order: - CMM_RELAXED: No barrier - CMM_ACQUIRE: Memory barrier after operation - CMM_CONSUME: Memory barrier after operation - CMM_RELEASE: Memory barrier before operation - CMM_ACQ_REL: Memory barriers before and after operation - CMM_SEQ_CST: Memory barriers before and after operation - CMM_SEQ_CST_FENCE: Memory barriers before and after operation Barriers after the operations are only emitted if the compare exchange succeed. Failure memory order: - CMM_RELAXED: No barrier - CMM_ACQUIRE: Memory barrier after operation - CMM_CONSUME: Memory barrier after operation - CMM_SEQ_CST: Memory barriers before and after operation - CMM_SEQ_CST_FENCE: Memory barriers before and after operation Barriers after the operations are only emitted if the compare exchange failed. Barriers before the operation are never emitted by this memory order. Change-Id: I213ba19c84e82a63083f00143a3142ffbdab1d52 Co-authored-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
urcu-wait: Initialize node in URCU_WAIT_NODE_INIT C++ emits warnings with the URCU_WAIT_NODE_INIT() macro because the member node is not initialized. Fix this by initializing the node to null. Change-Id: I7ee3b35624ef61cab826e3668f111e2483ca3c05 Signed-off-by: Olivier Dion <odion@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
src: use SPDX identifiers The SPDX identifiers [1] are a legally binding shorthand, which can be used instead of the full boiler plate text. This is another step towards implementing the full REUSE spec [2] to help with copyright and licensing audits and compliance. This will reduce a lot a manual work required for the licensing audit required in Debian on each update. [1] https://spdx.org/ids-how [2] https://reuse.software/tutorial/ Change-Id: Ia28ed8c14984ac9acd140ef544fd6e09b96fb03b Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Build system: use SPDX identifiers The SPDX identifiers [1] are a legally binding shorthand, which can be used instead of the full boiler plate text. This is the first step towards implementing the full REUSE spec [2] to help with copyright and licensing audits and compliance. This will reduce a lot a manual work required for the licensing audit required in Debian on each update. For files that lacked copyright and licensing information, I used the following guidelines. If a clear author could be determined from the git history use it, otherwise use 'EfficiOS Inc.'. For build system files, use 'MIT', for documentation 'CC-BY-4.0' and for data files 'CC-1.0'. [1] https://spdx.org/ids-how [2] https://reuse.software/tutorial/ Change-Id: Ie507130c00b95606dc439616fda4fd9b1d35353d Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: use __noreturn__ for C11-compatibility The noreturn convenience macro provided by stdnoreturn.h might get included before urcu headers, use __noreturn__ for better compatibility with code using <stdnoreturn.h> header. Signed-off-by: Ondřej Surý <ondrej@sury.org> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix: urcu-bp: only teardown call-rcu worker in destructor Do not invoke urcu_call_rcu_exit() every time a reader thread unregisters from urcu-bp. This causes pthread join hangs observed on Cygwin. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I4e5c6e06df9966d65f2dcf01bb3281cbfcb05a5b
Fix: rculfhash: urcu_die() takes positive error value Found by Coverity: ** CID 1504537: Error handling issues (NEGATIVE_RETURNS) /src/rculfhash.c: 1934 in do_auto_resize_destroy_cb() Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I6c130fc18fc36da8d0ed13188cc9415e7ee6104b
Fix: call_rcu: teardown default call_rcu worker on application exit Teardown the default call_rcu worker thread if there are no queued callbacks on process exit. This prevents leaking memory. Here is how an application can ensure graceful teardown of this worker thread: - An application queuing call_rcu callbacks should invoke rcu_barrier() before it exits. - When chaining call_rcu callbacks, the number of calls to rcu_barrier() on application exit must match at least the maximum number of chained callbacks. - If an application chains callbacks endlessly, it would have to be modified to stop chaining callbacks when it detects an application exit (e.g. with a flag), and wait for quiescence with rcu_barrier() after setting that flag. - The statements above apply to a library which queues call_rcu callbacks, only it needs to invoke rcu_barrier in its library destructor. Fixes: #1317 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: I40556bc872d3df58a22fb88a0dbb528ce5c9b4af
Fix: join worker thread in call_rcu_data_free When freeing the call rcu descriptor, join its associated thread as well to ensure complete teardown. Fixes: #1317 Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Change-Id: Ic522e0466f56deca8bde519e082f3d7f99ec2e7f