Cleanup: Re-organise source dir Re-organise the sources, add a top level "src" and "include" dir and move relevant files. Disable autotools automated includes and define them manually. This fixes problems with collision of header names with system headers. Include the autoconf config.h in the default includes and remove it where it's explicitely included. Remove _GNU_SOURCE defines since it's detected at configure for platforms that requires it and added to the config.h. Signed-off-by: Michael Jeanson <mjeanson@efficios.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Fix call_rcu fork handling Fix call_rcu fork handling by putting all call_rcu threads in a quiescent state before fork (paused state), and unpausing them when the parent returns from fork. On the child, everything will run fine as long as we don't issue fork() from a call_rcu callback. Side-note: pthread_atfork is not appropriate when using with multithread and malloc/free. The glibc malloc implementation sadly expects that all malloc/free are executed from the context of a single thread while pthread atfork handlers are running, which leads to interesting hang in glibc. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
urcu call_rcu: Use RCU read-side protection for per-cpu call_rcu data A concurrent get_cpu_call_rcu_data(), called by get_call_rcu_data(), could dereference this pointer without holding any mutex. So this situation would happen if we have a concurrent call_rcu() executing while we do the create_all_cpu_call_rcu_data(). I think we would need to put a rcu_dereference() around per_cpu_call_rcu_data read within get_cpu_call_rcu_data() too. per_cpu_call_rcu_data should be done with rcu_set_pointer. Also, a rcu read-side critical section would be required around any usage of per_cpu_call_rcu_data, and the action of tearing down the per-cpu data would require to wait for a quiescent state. So we would basically require that the call_rcu users need to be registered as RCU reader threads. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
call_rcu: use cpu affinity for per-cpu call_rcu threads I played a bit with the call_rcu() implementation alongside with my rbtree tests, and noticed the following: If I use per-cpu call_rcu threads with URCU_CALL_RCU_RT flag, with one updater thread only for my rbtree (no reader), I get 38365 updates/s. If I add cpu affinity to these per-cpu call_rcu threads (I have prepared a patch that does this), it jumps to 54219 updates/s. So it looks like keeping per-cpu affinity for the call_rcu thread is a good thing. Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Add call_rcu() interface Adds call_rcu(), with RCU threads to invoke the callbacks. By default, there will be one such RCU thread per process, created the first time that call_rcu() is invoked. On systems supporting sched_getcpu(), it is possible to create one RCU thread per CPU by calling create_all_cpu_call_rcu_data(). This version includes feedback from Mathieu Desnoyers. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>