Mathieu Desnoyers [Wed, 13 May 2009 17:40:26 +0000 (13:40 -0400)]
Add rewrite of rep_nop
Rewritten from scratch, OK for LGPL license.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 13 May 2009 17:38:50 +0000 (13:38 -0400)]
Remove rep_nop() (GPL)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 13 May 2009 15:21:26 +0000 (11:21 -0400)]
LGPLv2.1 relicensing statement
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 13 May 2009 15:21:02 +0000 (11:21 -0400)]
Apply MIT-style license to compiler.h
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 13 May 2009 15:15:37 +0000 (11:15 -0400)]
Rewrite of likely, unlikely, barrier and ACCESS_ONCE
compiler.h now licensed under MIT-style license.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 13 May 2009 15:09:49 +0000 (11:09 -0400)]
Removing GPL likely, unlikely, ACCESS_ONCE and barrier
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 13 May 2009 13:59:01 +0000 (09:59 -0400)]
Create separate document for relicensing details
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Wed, 13 May 2009 03:43:01 +0000 (23:43 -0400)]
Fix some typos in PowerPC support code.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Tue, 12 May 2009 19:31:55 +0000 (15:31 -0400)]
Remove bogus comment.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Tue, 12 May 2009 19:28:55 +0000 (15:28 -0400)]
Implementation of xchg primitives derived from MIT license
See LICENSE for details.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 20:58:17 +0000 (16:58 -0400)]
Add Paul's URCU model
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 02:53:36 +0000 (22:53 -0400)]
Fix arch_ppc precompiler error
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 02:50:48 +0000 (22:50 -0400)]
Fix precompiler error in arch_*.h, add arch-api test
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 02:24:24 +0000 (22:24 -0400)]
Add final license to test file, cleanup makefile
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 02:22:02 +0000 (22:22 -0400)]
Remove automatically generated api.h from repository
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 02:20:47 +0000 (22:20 -0400)]
Add missing urcu-static.h
This new file should be inly included in LGPL-compatible code.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 02:15:08 +0000 (22:15 -0400)]
LGPL relicensing part 2
Rewrote the non-trivial functions in arch_ppc.h and arch_x86.h.
Added complete header license declarations. Created LICENSE file to explain the
library licence.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 11 May 2009 00:54:43 +0000 (20:54 -0400)]
LGPLv2.1 relicensing
Except the content of arch_x86.h and arch_ppc.h, everything that was not
LGPL-compatible has been either rewritten or was simply a trivial one-liner.
The only content which could still be non-LGPL licensable in this commit is :
arch_ppc.h: atomic_inc()
__xchg_u32()
get_cycles (maybe ?)
arch_x86.h: __xchg()
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sat, 9 May 2009 14:48:38 +0000 (10:48 -0400)]
Add ACCESS_ONCE to _STORE_SHARED
We assume that the compiler does not reorder _STORE_SHARED across each other
and wrt _READ_SHARED. We therefore need to mark this access as volatile. The
compiler cannot reorder volatile accesses.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Sat, 9 May 2009 05:11:49 +0000 (01:11 -0400)]
LGPL relicensing of IBM's contributions
Add comments noting IBM's permission to relicense its contributions to the
urcu.h and urcu.c files under the LGPLv2 license, or any later version.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reviewed-by: Steven L. Bennett <steven.bennett@us.ibm.com>
Mathieu Desnoyers [Fri, 8 May 2009 19:54:54 +0000 (15:54 -0400)]
formal verif : move bits produced declarations closer to processes
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 8 May 2009 18:18:03 +0000 (14:18 -0400)]
Add no sync_core() test to ooo two writes model
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 8 May 2009 18:14:30 +0000 (14:14 -0400)]
Add instruction scheduling model using SSA model
As promised, here is a new model :-) So far the results are very good.
Basically, I model the dependencies between the instructions using a SSA
assignment. It behaves a bit like a Petri network, passing tokens
between the "instructions". When instructions have multiple dependencies
as input, they wait for all their input tokens to be ready before they
execute. It assumes we have an infinite number of registers available to
hold the variables, given this is an "abstract" architecture.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 8 May 2009 15:26:27 +0000 (11:26 -0400)]
Add readme file
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 8 May 2009 14:29:10 +0000 (10:29 -0400)]
Remove arch.h
Created by build scripts.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 8 May 2009 14:21:10 +0000 (10:21 -0400)]
Add missing arch.h
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 27 Apr 2009 20:19:17 +0000 (16:19 -0400)]
Add ooo mem instruction scheduling
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sun, 26 Apr 2009 23:13:58 +0000 (19:13 -0400)]
Remove unneeded signal in the cache-coherent case
the "reader kick" signal is only needed to perform a smp_mc() on readers, which
is pointless in a cache-coherent arch.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Tue, 21 Apr 2009 17:30:39 +0000 (13:30 -0400)]
Minor fix to userspace-rcu Makefile
Small update to the Makefile to allow it to clean up more completely.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Sat, 18 Apr 2009 06:43:32 +0000 (02:43 -0400)]
Support kernels with broken signal delivery
Some Linux kernels seems to have problems sending _all_ the signals. Use a
workaround which re-sends the signal when we wait for it to be executed.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Sat, 18 Apr 2009 06:34:37 +0000 (02:34 -0400)]
Add x86 and ppc arch definitions
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sun, 12 Apr 2009 02:59:42 +0000 (22:59 -0400)]
change reader_data for reader_registry
> > In Figure 9, "reader_data" shouldn't be used as both a variable name
> > and a structure tag. Furthermore, the text needs to explain what it
> > is.
>
> I will defer to Mathieu on this one, though it would certainly prevent
> using this code in a C++ program.
>
Yes, I've been lazy on the naming here. How about :
struct reader_registry {
pthread_t tid;
long *urcu_active_readers;
};
static struct reader_registry *registry;
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Fri, 10 Apr 2009 16:41:19 +0000 (12:41 -0400)]
Split out architecture-dependent definitions into api.h and arch.h.
Break out architecture-specific definitions into api.h and arch.h.
Modify Makefile to add pthreads-x86 and pthreads-ppc targets to adapt
to either architecture. Create api_x86.h, api_ppc.h, arch_x86.h, and
arch_ppc.h files.
The api/arch distinction is historical in nature. In a perfect world,
these would be merged. In reality, the __thread storage class does not
adapt nicely to accessing one thread's variables from another thread,
though sufficiently insane gcc hackery could probably make this work.
In the event, arch.h manually constructs an array of pointers to the
__thread variables that need cross-thread access, while api.h punts
and uses an array for the per-thread variables themselves, paying a
significant performance penalty for so doing.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 8 Apr 2009 00:35:46 +0000 (20:35 -0400)]
Don't mix pthread sleepable lock with busy-waiting locking.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 6 Apr 2009 20:41:20 +0000 (16:41 -0400)]
Use more standard flags
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 6 Apr 2009 20:30:04 +0000 (16:30 -0400)]
Check double write read order
Check the following scenario :
> > P.S.: Regarding very weak memory-ordering models, I recall that in our
> > earlier discussions we considered this scenario:
> > CPU 0 CPU 1
> > ----- -----
> > x = 1; y = 1;
> > mb(); mb();
> > read y read x
> > with the question being whether it was possible for both reads to
> > obtain the value 0. With sufficiently weak ordering it is possible,
> > but it turned out that there are places in the Linux kernel (I don't
> > remember where) that essentially assume this cannot happen. Have you
> > gotten far enough in your outline of the semi-formal model to address
> > this sort of question?
Mathieu Desnoyers [Wed, 1 Apr 2009 02:01:30 +0000 (22:01 -0400)]
Commit urcu verif results
Use make summary is each dir.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 1 Apr 2009 02:01:09 +0000 (22:01 -0400)]
Commit for tests
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 30 Mar 2009 20:52:46 +0000 (16:52 -0400)]
Execute sig handler unconditionnally
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sun, 29 Mar 2009 02:40:33 +0000 (22:40 -0400)]
RCU signal handler reader over reader
Added RCU test for read over read signal handler.
Data structures can now support multiple readers, but it fills my system's
memory (16GB+).
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 19 Mar 2009 20:47:29 +0000 (16:47 -0400)]
remove duplicate ooo_mem statements
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 19 Mar 2009 19:13:12 +0000 (15:13 -0400)]
add missing ooo_mem() to writer model
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
compudj [Thu, 19 Mar 2009 18:05:02 +0000 (14:05 -0400)]
spin model : inline reader
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
compudj [Mon, 2 Mar 2009 16:15:54 +0000 (11:15 -0500)]
Add documentation of urcu
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 27 Feb 2009 19:13:12 +0000 (14:13 -0500)]
Add remote barrier model
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 26 Feb 2009 23:13:46 +0000 (18:13 -0500)]
Fix makefile, set default nesting to 2
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 26 Feb 2009 23:03:24 +0000 (18:03 -0500)]
Default nesting level to 1 (< 2)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 26 Feb 2009 23:01:12 +0000 (18:01 -0500)]
Add reader nesting test
Add readers with configurable nesting level.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 25 Feb 2009 01:58:55 +0000 (20:58 -0500)]
Add independent reader and writer progress checks
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 23 Feb 2009 07:02:54 +0000 (02:02 -0500)]
dual writer fix
Make mutex between writers really atomic. Fixes the two flips run.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 23 Feb 2009 06:31:11 +0000 (01:31 -0500)]
Run 2 writers and show single flip error case
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 23 Feb 2009 05:53:37 +0000 (00:53 -0500)]
Add ooomem and urcu checks
Create a more simple model for ooo memory accesses and basic urcu checks.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Fri, 20 Feb 2009 16:56:20 +0000 (11:56 -0500)]
Restructure urcu_updater() to more accurately reflect actual failure scenario
Restructure urcu_updater() to more accurately reflect actual failure
scenario.
This allows an easier transformation to force failure -- simple #ifdef
out the second counter flip out of urcu_updater()'s model of
"current synchronize_rcu()".
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Fri, 20 Feb 2009 16:55:38 +0000 (11:55 -0500)]
Remove spurious read-side infinite loops.
Remove spurious read-side infinite loops from urcu_reader() model.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 13 Feb 2009 15:37:58 +0000 (10:37 -0500)]
Add _STORE_SHARED() and _LOAD_SHARED()
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 13 Feb 2009 12:25:33 +0000 (07:25 -0500)]
Turn *_REMOTE into *_SHARED
"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>:
I suspect that LOAD_SHARED() and STORE_SHARED() would be more intuitive.
But shouldn't we align with the Linux-kernel usage where reasonable?
(Yes, this can be a moving target, but there isn't much else that
currently supports this level of SMP function on quite the variety of
CPU architectures.)
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 21:53:23 +0000 (16:53 -0500)]
Add missing cpu_relax in loop
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 21:45:19 +0000 (16:45 -0500)]
Fix compiler error.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 21:35:01 +0000 (16:35 -0500)]
Add smp_mc() to force_mb_single_thread so we don't assume anything from the OS.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 21:32:30 +0000 (16:32 -0500)]
Add gitignore files
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 21:29:55 +0000 (16:29 -0500)]
Support architectures with non-coherent caches
Add
wmc(), rmc() and mc() which turns into barrier() or cache flush depending on the
architecture.
mb(), rmb() and wmb() turns into memory barriers or cache flush depending on the
architecture.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 18:42:31 +0000 (13:42 -0500)]
Add support for x86 older than P4, with CONFIG_HAS_FENCE option
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Thu, 12 Feb 2009 18:29:18 +0000 (13:29 -0500)]
Fix warnings in urcutorture and use access once in urch.u
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Thu, 12 Feb 2009 18:25:05 +0000 (13:25 -0500)]
Fix formal model nesting
Also, the original model I sent out has a minor bug that prevents it
from fully modeling the nested-read-side case. The patch below fixes this.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 06:44:34 +0000 (01:44 -0500)]
Add read/write counts to test
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 06:43:54 +0000 (01:43 -0500)]
Fix force_mb_all_threads must be called within internal local
It iterates on threads, and must therefore be called within the mutex.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 06:00:00 +0000 (01:00 -0500)]
Add barriers
Use smp_rmb() is busy-loops to make sure we re-read the variable.
Kick the reader threads we are waiting for after a few loops.
Documents memory ordering in rcu_read_lock().
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 04:52:31 +0000 (23:52 -0500)]
Remove debug yield statements
Just too ugly.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 04:47:15 +0000 (23:47 -0500)]
use smp_*mb()
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Thu, 12 Feb 2009 03:49:22 +0000 (22:49 -0500)]
Add Promela model
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Wed, 11 Feb 2009 21:52:54 +0000 (16:52 -0500)]
Add missing memory barriers to ensure progress and remove an unnecessary ACCESS_ONCE
[...]
So we could have the following reader+writer sequence :
Interleaved writer Sequence 2 and reader seq. 1.
Reader:
R.1 - read urcu_active_readers
S.2 - read counter
Writer:
B - read other threads urcu_active_readers (there are none)
A.1 - flip counter
A.2 - read counter
Reader:
RS.2- write urcu_active_readers, depends on read counter and read
urcu_active_readers
Here, the reader would have updated its counter as belonging to the old
q.s. period, but the writer will later wait for the new period. But
given the writer will eventually do a second flip+wait, the reader in
the other q.s. window will be caught by the second flip.
Therefore, we could be tempted to think that those mb() could be
unnecessary, which would lead to a scheme where urcu_active_readers and
urcu_gp_ctr are done in a completely random order one vs the other.
Let's see what it gives :
synchronize_rcu()
force_mb_all_threads() /*
* Orders pointer publication and
* (urcu_active_readers/urcu_gp_ctr accesses)
*/
switch_qparity()
switch_next_urcu_qparity() [just does counter flip 0->1]
wait_for_quiescent_state()
wait for all threads in parity 0
switch_qparity()
switch_next_urcu_qparity() [Just does counter flip 1->0]
wait_for_quiescent_state()
Wait for all threads in parity 1
force_mb_all_threads() /*
* Orders
* (urcu_active_readers/urcu_gp_ctr accesses)
* and old data removal.
*/
*but* ! There is a reason why we don't want to do this. If
switch_next_urcu_qparity() [Just does counter flip 1->0]
happens before the end of the previous
Wait for all threads in parity 0
We enter in a situation where all newly coming readers will see the
parity bit as 0, although we are still waiting for that parity to end.
We end up in a state when the writer can be blocked forever (no possible
progress) if there are steadily readers subscribed for the data.
Basically, to put it differently, we could simply remove the bit
flipping from the writer and wait for *all* readers to exit their
critical section (even the ones simply interested in the new pointer).
But this shares the same problem the version above has, which is that we
end up in a situation where the writer won't progress if there are
always readers in a critical section.
The same applies to
switch_next_urcu_qparity() [Just does counter flip 0->1]
wait for all threads in parity 0
If we don't put a mb() between those two (as I mistakenly did), we can
end up waiting for readers in parity 0 while the parity bit wasn't
flipped yet. oops. Same potential no-progress situation.
The ordering of memory reads in the reader for
urcu_active_readers/urcu_gp_ctr accesses does not seem to matter because
the data contains information about which q.s. period parity it is in.
In whichever order those variables are read seems to all work fine.
In the end, it's to insure that the writer will always progress that we
have to enforce smp_mb() between *all* switch_next_urcu_qparity and wait
for threads. Mine and yours.
The code in rcu_old_gp_ongoing (in my git tree) uses ACCESS_ONCE around
urcu_active_readers and urcu_gp_ctr reads. I think the ACCESS_ONCE
around urcu_gp_crt is useless because urcu_gp_ctr is only being modified
by ourself and we hold a mutex.
However, making sure that urcu_active_readers is only read once is
clearly required.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Tue, 10 Feb 2009 18:57:53 +0000 (13:57 -0500)]
Enhance test cases
New -n (to disable sleep in writer)
Now use a random delay for -w and -r.
Add -DDEBUG_FULL_MB option to use mb() in the reader rather than the
signal-based alternative.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 19:05:19 +0000 (14:05 -0500)]
Add comment in rcu_add_lock
Comment optimization to make it clear to the reader.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 18:52:41 +0000 (13:52 -0500)]
Fix int->long and keep a reader count of 1 in the global GP variable
- Forgot a few migration from int to long
- Accelerate fast path by reading a GP variable which already has 1 reader
count. This saves an increment in the fast path.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 18:48:38 +0000 (13:48 -0500)]
Fix RCU_GP_CTR_BIT
> > You lost me on this one:
> >
> > sizeof(long) << 2 = 0x10
> >
> > I could believe the following (run on a 32-bit machine):
> >
> > 1 << (sizeof(long) * 8 - 1) = 0x80000000
> >
> > Or, if you were wanting to use a bit halfway up the word, perhaps this:
> >
> > 1 << (sizeof(long) * 4 - 1) = 0x8000
> >
> > Or am I confused?
>
> Well, I am at least partly confused. You were wanting a low-order bit,
> so you want to lose the "- 1" above. Here are some of the possibilities:
>
> sizeof(long) = 0x4
> sizeof(long) << 2 = 0x10
> 1 << (sizeof(long) * 8 - 1) = 0x80000000
> 1 << (sizeof(long) * 4) = 0x10000
> 1 << (sizeof(long) * 4 - 1) = 0x8000
> 1 << (sizeof(long) * 2) = 0x100
> 1 << (sizeof(long) * 2 - 1) = 0x80
>
> My guess is that 1 << (sizeof(long) * 4) and 1 << (sizeof(long) * 2)
> are of the most interest.
>
Exactly. I'll change it to :
I somehow thought this define was used as a bit number rather than the
bit mask.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 18:00:31 +0000 (13:00 -0500)]
Fix data type, should now be long rather than int.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 17:55:09 +0000 (12:55 -0500)]
Raise the number of nested readers limit 2^16 and 2^32 on 32 and 64-bits arch.
> I don't know of any code in the Linux kernel that nests rcu_read_lock()
> anywhere near that deep. And if someone does find such a case, it is
> pretty easy to use 15 bits rather than 8 to hold the nesting depth, just
> by changing the definition of RCU_GP_CTR_BIT.
>
You know what ? Changing RCU_GP_CTR_BIT to 16 uses a
testw %ax, %ax instead of a testb %al, %al. The trick here is that
RCU_GP_CTR_BIT must be a multiple of 8 so we can use a full 8-bits,
16-bits or 32-bits bitmask for the lower order bits.
On 64-bits, using a RCU_GP_CTR_BIT of 32 is also ok. It uses a testl.
To provide 32-bits compability and allow the deepest nesting possible, I
think it makes sense to use
/* Use the amount of bits equal to half of the architecture long size */
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 17:58:11 +0000 (12:58 -0500)]
Fix get_cycles for 32-bits x86
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 17:06:53 +0000 (12:06 -0500)]
don't __USE_GNU in urcu.h
Not required. Let the .c do it.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 17:05:45 +0000 (12:05 -0500)]
Include pthread.h in urcu.h
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 17:04:15 +0000 (12:04 -0500)]
Add branch prediction, fix xchg for -mtune=core2
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:48:04 +0000 (01:48 -0500)]
Add urcu-asm.c
Small file to get the assembly output of fast path.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:44:01 +0000 (01:44 -0500)]
Add rcutorture with yield
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:31:05 +0000 (01:31 -0500)]
Add rcutorture
Add a modified version of rcutorture.h and api.h.
Used bu urcutorture.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 06:27:48 +0000 (01:27 -0500)]
Fix lock -> unlock in synchronize_rcu
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 05:54:50 +0000 (00:54 -0500)]
Use xchg in publish content
Also changes the publish content parameter. Now takes the pointer itself as
first parameter rather than a pointer to the pointer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 05:29:58 +0000 (00:29 -0500)]
Add rcu_assign_pointer
rcu_assign_pointer has a memory barrier which lets the writer make sure the data
has been properly written to memory before setting the pointer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 04:56:15 +0000 (23:56 -0500)]
Remove parameter from rcu_read_lock()
Also makes the read fast-path twice faster :
7 cycles instead of 14 on a 8-cores x86_64.
Mathieu :
I limited the amount of nested readers to 256. Should be enough and lets us use
testb generically.
Changed the 64-bits code to make it the same as 32-bits. I prefer to have the
exact same behavior on both architectures.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 00:21:34 +0000 (19:21 -0500)]
Add randomness to yield debug test
Add randomness to allow 1/2 odds that the yield will be done.
This lets some paths go quickly and others not.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Mon, 9 Feb 2009 00:01:58 +0000 (19:01 -0500)]
Add DEBUG_YIELD, add test duration
Add some testing calling the scheduler, and add a duration parameter to
test_urcu.c.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sun, 8 Feb 2009 22:10:22 +0000 (17:10 -0500)]
Change API
* rcu_read_lock
> A patch below to allow nested rcu_read_lock() while keeping to the Linux
> kernel API, just FYI. One can argue that the overhead of accessing the
> extra per-thread variables is offset by the fact that there no longer
> needs to be a return value from rcu_read_lock() nor an argument to
> rcu_read_unlock(), but hard to say.
>
I ran your modified version within my benchmarks :
with return value : 14.164 cycles per read
without return value : 16.4017 cycles per read
So we have a 14% performance decrease due to this. We also pollute the
branch prediction buffer and we add a cache access due to the added
variables in the TLS. Returning the value has the clear advantage of
letting the compiler keep it around in registers or on the stack, which
clearly costs less.
So I think the speed factor outweights the visual considerations. Maybe
we could switch to something like :
unsigned int qparity;
urcu_read_lock(&qparity);
...
urcu_read_unlock(&qparity);
That would be a bit like local_irq_save() in the kernel, except that we
could do it in a static inline because we pass the address. I
personnally dislike the local_irq_save() way of hiding the fact that it
writes to the variable in a "clever" macro. I'd really prefer to leave
the " & ".
* rcu_write_lock
Removed.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Sun, 8 Feb 2009 20:42:08 +0000 (15:42 -0500)]
Add timing tests
initial results :
On a 8-cores x86_64
test_rwlock_timing.c : 4339.07 cycles
test_urcu_timing.c : 14.16 cycles
Speedup : 306
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Paul E. McKenney [Sun, 8 Feb 2009 00:00:55 +0000 (19:00 -0500)]
Do two parity flip in the writer to fix race condition
On Sat, Feb 07, 2009 at 07:10:28AM -0800, Paul E. McKenney wrote:
> So, how to fix? Here are some approaches:
>
> o Make urcu_publish_content() do two parity flips rather than one.
> I use this approach in my rcu_rcpg, rcu_rcpl, and rcu_rcpls
> algorithms in CodeSamples/defer.
>
> o Use a single free-running counter, in a manner similar to rcu_nest,
> as suggested earlier. This one is interesting, as I rely on a
> read-side memory barrier to handle the long-preemption case.
> However, if you believe that any thread that waits several minutes
> between executing adjacent instructions must have been preempted
> (which the memory barriers that are required to do a context
> switch), then a compiler barrier suffices. ;-)
>
> Of course, the probability of seeing this failure during test is quite
> low, since it is unlikely that thread 0 would run just long enough to
> execute its signal handler. However, it could happen. And if you were
> to adapt this algorithm for use in a real-time application, then priority
> boosting could cause this to happen naturally.
And here is a patch, taking the first approach. It also exposes a
synchronize_rcu() API that is used by the existing urcu_publish_content()
API. This allows easier handling of structures that are referenced by
more than one pointer. And should also allow to be more easily plugged
into my rcutorture test. ;-)
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Bert Wesarg [Sat, 7 Feb 2009 23:59:29 +0000 (18:59 -0500)]
test_urcu.c: use gettid()
It's probably better to print the tid for each thread, not the pid.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Bert Wesarg [Fri, 6 Feb 2009 11:45:42 +0000 (06:45 -0500)]
URCU : use pthread_equal()
But you should use pthread_equal() for you equality test of pthread_t.
Signed-off-by: Bert Wesarg <bert.wesarg@googlemail.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 6 Feb 2009 11:13:49 +0000 (06:13 -0500)]
Run longer tests
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Mathieu Desnoyers [Fri, 6 Feb 2009 03:25:51 +0000 (22:25 -0500)]
update Makefile, -Wall
Mathieu Desnoyers [Fri, 6 Feb 2009 02:47:01 +0000 (21:47 -0500)]
remove ugly gcc warning removal ack, simply cast the caller parameter
Mathieu Desnoyers [Fri, 6 Feb 2009 02:41:04 +0000 (21:41 -0500)]
add acknowledgements, fix gcc warnings
Mathieu Desnoyers [Fri, 6 Feb 2009 02:14:20 +0000 (21:14 -0500)]
fix wait_for_quiescent_state
This page took 0.04411 seconds and 4 git commands to generate.