urcu.git
11 years agolist: implement cds_list_for_each_safe()
Mathieu Desnoyers [Wed, 13 Mar 2013 16:23:11 +0000 (12:23 -0400)] 
list: implement cds_list_for_each_safe()

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix: tests/api.h use cpuset.h
Mathieu Desnoyers [Fri, 22 Feb 2013 16:34:25 +0000 (11:34 -0500)] 
Fix: tests/api.h use cpuset.h

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix hurd-i386: move cpuset tests outside of sched_setaffinity conditional
Mathieu Desnoyers [Fri, 22 Feb 2013 15:57:48 +0000 (10:57 -0500)] 
Fix hurd-i386: move cpuset tests outside of sched_setaffinity conditional

Comment about introduction of cpuset.h within urcu tests:

> Unfortunately it doesn't work, because sched_setaffinity is for now
> just a fail-stub on hurd-i386, and thus configure considers it as
> missing, and thus the CPU_SET test is disabled completely.
>
> I however guess you could just disable defining your own cpu_set_t
> when !HAVE_SCHED_SETAFFINITY, since it is probably used only for using
> sched_setaffinity.

Fix by moving cpu_set_t, CPU_SET and CPU_ZERO tests outside of the
sched_setaffinity conditional.

Reported-by: Samuel Thibault <sthibault@debian.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix tests: finer-grained use of CPU_SET, CPU_ZERO and cpu_set_t
Mathieu Desnoyers [Fri, 22 Feb 2013 14:05:32 +0000 (09:05 -0500)] 
Fix tests: finer-grained use of CPU_SET, CPU_ZERO and cpu_set_t

Noticed build failure at
https://buildd.debian.org/status/package.php?p=liburcu :

Tail of log for liburcu on hurd-i386:

test_urcu.c:110:0: warning: "CPU_SET" redefined [enabled by default]
In file included from /usr/include/pthread/pthread.h:50:0,
                 from /usr/include/pthread.h:2,
                 from test_urcu.c:26:
/usr/include/sched.h:80:0: note: this is the location of the previous definition
make[3]: *** [test_urcu.o] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all] Error 2
dh_auto_build: make -j1 returned exit code 2
make: *** [build-arch] Error 2
dpkg-buildpackage: error: debian/rules build-arch gave error exit status 2
make[3]: Entering directory `/build/buildd-liburcu_0.7.6-1-hurd-i386-wGBAtt/liburcu-0.7.6/tests'
  CC     test_urcu.o
make[3]: Leaving directory `/build/buildd-liburcu_0.7.6-1-hurd-i386-wGBAtt/liburcu-0.7.6/tests'
make[2]: Leaving directory `/build/buildd-liburcu_0.7.6-1-hurd-i386-wGBAtt/liburcu-0.7.6'

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoTest for CPU_SET
Mathieu Desnoyers [Fri, 22 Feb 2013 13:50:49 +0000 (08:50 -0500)] 
Test for CPU_SET

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix build on architectures with HAVE_SCHED_GETCPU but without HAVE_SYSCONF
Mathieu Desnoyers [Fri, 22 Feb 2013 13:35:37 +0000 (08:35 -0500)] 
Fix build on architectures with HAVE_SCHED_GETCPU but without HAVE_SYSCONF

Noticed on: https://buildd.debian.org/status/package.php?p=liburcu

Tail of log for liburcu on kfreebsd-amd64:

  CC     urcu.lo
In file included from urcu.c:450:0:
urcu-call-rcu-impl.h:145:12: error: static declaration of 'sched_getcpu' follows non-static declaration
In file included from /usr/include/sched.h:43:0,
                 from /usr/include/pthread.h:20,
                 from urcu.c:30:
/usr/include/x86_64-kfreebsd-gnu/bits/sched.h:65:12: note: previous declaration of 'sched_getcpu' was here
make[3]: *** [urcu.lo] Error 1
make[3]: Leaving directory `/build/buildd-liburcu_0.7.6-1-kfreebsd-amd64-nnkICd/liburcu-0.7.6'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/build/buildd-liburcu_0.7.6-1-kfreebsd-amd64-nnkICd/liburcu-0.7.6'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/build/buildd-liburcu_0.7.6-1-kfreebsd-amd64-nnkICd/liburcu-0.7.6'
dh_auto_build: make -j1 returned exit code 2
make: *** [build-arch] Error 2

Tail of log for liburcu on kfreebsd-i386:

  CC     urcu.lo
In file included from urcu.c:450:0:
urcu-call-rcu-impl.h:145:12: error: static declaration of 'sched_getcpu' follows non-static declaration
In file included from /usr/include/sched.h:43:0,
                 from /usr/include/pthread.h:20,
                 from urcu.c:30:
/usr/include/i386-kfreebsd-gnu/bits/sched.h:65:12: note: previous declaration of 'sched_getcpu' was here
make[3]: *** [urcu.lo] Error 1
make[3]: Leaving directory `/build/buildd-liburcu_0.7.6-1-kfreebsd-i386-sWzNKU/liburcu-0.7.6'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory `/build/buildd-liburcu_0.7.6-1-kfreebsd-i386-sWzNKU/liburcu-0.7.6'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/build/buildd-liburcu_0.7.6-1-kfreebsd-i386-sWzNKU/liburcu-0.7.6'
dh_auto_build: make -j1 returned exit code 2
make: *** [build-arch] Error 2

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoREADME: document that Clang 3.0 (based on LLVM 3.0) is supported
Mathieu Desnoyers [Fri, 22 Feb 2013 13:04:29 +0000 (08:04 -0500)] 
README: document that Clang 3.0 (based on LLVM 3.0) is supported

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoclang: silence "unused expression result" warning
Mathieu Desnoyers [Fri, 22 Feb 2013 12:57:16 +0000 (07:57 -0500)] 
clang: silence "unused expression result" warning

CMM_STORE_SHARED(x, v) is a macro that really acts like an assignment
expression, e.g.:

  x = v;

but internally also has "mc" barriers (useful for cache-incoherent
architectures).

The issue here is that (x = v) can evaluate to "v", but very often we're
not interested to use the assignment expression result. When we have an
explicit assignment, the compiler won't complain that the result of this
expression is unused, but given that the added barrier requires that we
make this macro evaluate explicitly to a value, clang complains.

Fix this by adding "_v = _v" at the last line of the macro, thus
performing what would appear like an effect-less assignment, but
actually tricks clang into thinking we are evaluating to an assignment
expression, thus suppressing the warning.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agorculfhash: add assertions on node alignment
Mathieu Desnoyers [Thu, 14 Feb 2013 16:36:43 +0000 (11:36 -0500)] 
rculfhash: add assertions on node alignment

I've had a report of someone running into issues with the RCU lock-free
hash table by embedding the struct cds_lfht_node into a packed structure
by mistake, thus not respecting alignment requirements stated in
urcu/rculfhash.h. Assertions on "replace" and "add" operations should
catch this, but I notice that we should add assertions on the
REMOVAL_OWNER_FLAG to cover all possible misalignments.

Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoSpelling cleanups within comments and documentation
Etienne Bergeron [Wed, 13 Feb 2013 02:33:16 +0000 (21:33 -0500)] 
Spelling cleanups within comments and documentation

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix configure checks for Tile
Simon Marchi [Tue, 12 Feb 2013 00:10:44 +0000 (19:10 -0500)] 
Fix configure checks for Tile

The previous method of checking whether the architecture is TileGx or
not was buggy. urcu/arch/tile.h included urcu/arch/gcc.h, which was not
installed on the system, causing a configure error. I am not sure why it
worked when I tested commit 1000f1f4204e5fbb337f4ea911f1e29f67df79aa,
maybe some previous partial install or something.

The check is now done earlier, during the configure step and should not
cause any trouble.

Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agouatomic: style fix
Mathieu Desnoyers [Thu, 31 Jan 2013 16:31:39 +0000 (11:31 -0500)] 
uatomic: style fix

- Functions that don't take arguments should be "void" in C, otherwise
  those functions can take a variable number of arguments.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agodoc/cds-api.txt: expand documentation
Mathieu Desnoyers [Sat, 26 Jan 2013 15:51:48 +0000 (10:51 -0500)] 
doc/cds-api.txt: expand documentation

Expand explanations, reorder items to have all wait-free descriptions
first, so that the rculfqueue API comes last, since it is less
featureful and is the only API of the queues/stacks to actually rely on
RCU.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoREADME: document each API file
Mathieu Desnoyers [Sat, 26 Jan 2013 15:51:31 +0000 (10:51 -0500)] 
README: document each API file

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoREADME: reorganize
Mathieu Desnoyers [Sat, 26 Jan 2013 15:48:28 +0000 (10:48 -0500)] 
README: reorganize

Move debug build options, and smp support description, to end of README

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoAdd compilation support for the TileGX architecture
Simon Marchi [Thu, 24 Jan 2013 20:40:54 +0000 (15:40 -0500)] 
Add compilation support for the TileGX architecture

This patch adds compilation support for the TileGx architecture. Since
the tests were not ran on other architectures of the Tile family
(Tile64, TIlepro), errors are triggered during compilation if the
architecture is another Tile arch.

Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack: add nonblocking to _LGPL_SOURCE API
Mathieu Desnoyers [Sun, 20 Jan 2013 21:59:36 +0000 (16:59 -0500)] 
wfstack: add nonblocking to _LGPL_SOURCE API

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoDiscourage use of pthread_atfork() for call_rcu handlers
Mathieu Desnoyers [Wed, 26 Dec 2012 17:18:06 +0000 (12:18 -0500)] 
Discourage use of pthread_atfork() for call_rcu handlers

Discourage use of glibc pthread_atfork() for call_rcu handlers due to
its inappropriate assumptions about single-threadedness while pthread
atfork handlers are executing. This results in hangs within the glibc
memory allocator.

Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix call_rcu fork handling
Mathieu Desnoyers [Wed, 19 Dec 2012 00:31:21 +0000 (19:31 -0500)] 
Fix call_rcu fork handling

Fix call_rcu fork handling by putting all call_rcu threads in a
quiescent state before fork (paused state), and unpausing them when the
parent returns from fork.

On the child, everything will run fine as long as we don't issue fork()
from a call_rcu callback.

Side-note: pthread_atfork is not appropriate when using with multithread
and malloc/free. The glibc malloc implementation sadly expects that all
malloc/free are executed from the context of a single thread while
pthread atfork handlers are running, which leads to interesting hang in
glibc.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest: fork handling
Mathieu Desnoyers [Tue, 18 Dec 2012 04:43:14 +0000 (23:43 -0500)] 
test: fork handling

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agorculfhash: add cds_lfht_replace to the write operations in the comments
Lai Jiangshan [Thu, 20 Dec 2012 11:13:57 +0000 (06:13 -0500)] 
rculfhash: add cds_lfht_replace to the write operations in the comments

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu: fix comments for cds_list_for_each_prev()
Lai Jiangshan [Thu, 20 Dec 2012 11:13:09 +0000 (06:13 -0500)] 
urcu: fix comments for cds_list_for_each_prev()

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agodocumentation: fix rcu-api.txt duplicates
Mathieu Desnoyers [Mon, 10 Dec 2012 22:24:33 +0000 (17:24 -0500)] 
documentation: fix rcu-api.txt duplicates

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest wfcq: remove unneeded urcu.h include
Mathieu Desnoyers [Sat, 8 Dec 2012 15:16:10 +0000 (10:16 -0500)] 
test wfcq: remove unneeded urcu.h include

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest wfs: remove unneeded urcu.h include
Mathieu Desnoyers [Sat, 8 Dec 2012 15:15:49 +0000 (10:15 -0500)] 
test wfs: remove unneeded urcu.h include

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu: declare test_urcu_multiflavor functions
Lai Jiangshan [Fri, 7 Dec 2012 16:37:21 +0000 (11:37 -0500)] 
urcu: declare test_urcu_multiflavor functions

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu: remove the wrong comma
Lai Jiangshan [Fri, 7 Dec 2012 16:33:38 +0000 (11:33 -0500)] 
urcu: remove the wrong comma

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack: implement nonblocking pop and next
Mathieu Desnoyers [Wed, 5 Dec 2012 14:41:08 +0000 (09:41 -0500)] 
wfstack: implement nonblocking pop and next

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: document first/next return values
Mathieu Desnoyers [Thu, 6 Dec 2012 21:02:30 +0000 (16:02 -0500)] 
wfcqueue: document first/next return values

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack: update comments about cds_wfs_empty/first being wait-free
Mathieu Desnoyers [Wed, 5 Dec 2012 14:20:52 +0000 (09:20 -0500)] 
wfstack: update comments about cds_wfs_empty/first being wait-free

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack API: rename cds_wfs_first_blocking to cds_wfs_first
Mathieu Desnoyers [Wed, 5 Dec 2012 14:01:21 +0000 (09:01 -0500)] 
wfstack API: rename cds_wfs_first_blocking to cds_wfs_first

cds_wfs_first never needs to block. This operation can be used to check
if the stack returned by pop_all is empty or not, so it is quite
interesting to have a fully non-blocking semantic for all of
enqueue/pop_all/first operations. Only cds_wfs_next may block.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack test: test if number of push to empty vs pop_all match
Mathieu Desnoyers [Wed, 5 Dec 2012 13:57:44 +0000 (08:57 -0500)] 
wfstack test: test if number of push to empty vs pop_all match

Do same as wfcqueue: we can test if number of push to empty stack match
the number of pop_all that return non-empty stack.

Can be tested with:
./test_urcu_wfs 5 5 10 -w

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack: document first/next return values
Mathieu Desnoyers [Wed, 5 Dec 2012 13:53:08 +0000 (08:53 -0500)] 
wfstack: document first/next return values

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest wfstack: enforce external mutex if needed by default
Mathieu Desnoyers [Wed, 5 Dec 2012 11:13:08 +0000 (06:13 -0500)] 
test wfstack: enforce external mutex if needed by default

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest wfcqueue: enforce external mutex if needed by default
Mathieu Desnoyers [Wed, 5 Dec 2012 11:12:42 +0000 (06:12 -0500)] 
test wfcqueue: enforce external mutex if needed by default

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-mb/signal/membarrier: batch concurrent synchronize_rcu()
Mathieu Desnoyers [Mon, 26 Nov 2012 03:02:18 +0000 (22:02 -0500)] 
urcu-mb/signal/membarrier: batch concurrent synchronize_rcu()

Here are benchmarks on batching of synchronize_rcu(), and it leads to
very interesting scalability improvement and speedups, e.g., on a
24-core AMD, with a write-heavy scenario (4 readers threads, 20 updater
threads, each updater using synchronize_rcu()):

* Serialized grace periods:
./test_urcu 4 20 20
SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers  20 wdelay      0
nr_reads    714598368 nr_writes      5032889 nr_ops    719631257

* Batched grace periods:

./test_urcu 4 20 20
SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers  20 wdelay      0
nr_reads    611848168 nr_writes      9877965 nr_ops    621726133

For a 9877965/5032889 = 1.96 speedup for 20 updaters.

Of course, we can see that readers have slowed down, probably due to
increased update traffic, given there is no change to the read-side code
whatsoever.

Now let's see the penality of managing the stack for single-updater.
With 4 readers, single updater:

* Serialized grace periods :

./test_urcu 4 1 20
SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers   1 wdelay      0
nr_reads    241959144 nr_writes     11146189 nr_ops    253105333
SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers   1 wdelay      0
nr_reads    257131080 nr_writes     12310537 nr_ops    269441617
SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers   1 wdelay      0
nr_reads    259973359 nr_writes     12203025 nr_ops    272176384

* Batched grace periods :

SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers   1 wdelay      0
nr_reads    298926555 nr_writes     14018748 nr_ops    312945303
SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers   1 wdelay      0
nr_reads    272411290 nr_writes     12832166 nr_ops    285243456
SUMMARY ./test_urcu               testdur   20 nr_readers   4
rdur       0 wdur      0 nr_writers   1 wdelay      0
nr_reads    267511858 nr_writes     12822026 nr_ops    280333884

Serialized vs batched seems to similar, batched possibly even slightly
faster, but this is probably caused by NUMA affinity.

More benchmark results:

* Serialized synchronize_rcu() -- test_urcu (mb)

./test_urcu 4 1 20
SUMMARY ./test_urcu               testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads    222512859 nr_writes     10723654 nr_ops    233236513
./test_urcu 4 20 20
SUMMARY ./test_urcu               testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads    722096653 nr_writes      5012429 nr_ops    727109082
./test_urcu 12 12 20
SUMMARY ./test_urcu               testdur   20 nr_readers  12 rdur      0 wdur      0 nr_writers  12 wdelay      0 nr_reads   1822868768 nr_writes      2300787 nr_ops   1825169555
./test_urcu 16 8 20
SUMMARY ./test_urcu               testdur   20 nr_readers  16 rdur      0 wdur      0 nr_writers   8 wdelay      0 nr_reads   2355908375 nr_writes      1604850 nr_ops   2357513225
./test_urcu 20 4 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   4 wdelay      0 nr_reads   3003457459 nr_writes      1074828 nr_ops   3004532287
./test_urcu 20 3 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   3 wdelay      0 nr_reads   2956972543 nr_writes      1036556 nr_ops   2958009099
./test_urcu 20 2 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   2 wdelay      0 nr_reads   2890178860 nr_writes      1030095 nr_ops   2891208955
./test_urcu 20 1 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads   3017482290 nr_writes       783420 nr_ops   3018265710

* Batched synchronize_rcu() -- test_urcu (mb)

./test_urcu 4 1 20
SUMMARY ./test_urcu               testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads    271476751 nr_writes     12858885 nr_ops    284335636
./test_urcu 4 20 20
SUMMARY ./test_urcu               testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads    608488583 nr_writes     10080610 nr_ops    618569193
./test_urcu 12 12 20
SUMMARY ./test_urcu               testdur   20 nr_readers  12 rdur      0 wdur      0 nr_writers  12 wdelay      0 nr_reads   1260044362 nr_writes      7957711 nr_ops   1268002073
./test_urcu 16 8 20
SUMMARY ./test_urcu               testdur   20 nr_readers  16 rdur      0 wdur      0 nr_writers   8 wdelay      0 nr_reads   2048890674 nr_writes      5440985 nr_ops   2054331659
./test_urcu 20 4 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   4 wdelay      0 nr_reads   2819267217 nr_writes      3093008 nr_ops   2822360225
./test_urcu 20 3 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   3 wdelay      0 nr_reads   3067795320 nr_writes      2817760 nr_ops   3070613080
./test_urcu 20 2 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   2 wdelay      0 nr_reads   3116770603 nr_writes      2404242 nr_ops   3119174845
./test_urcu 20 1 20
SUMMARY ./test_urcu               testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads   2238534130 nr_writes      3737588 nr_ops   2242271718

* Serialized synchronize_rcu() -- test_urcu_signal

./test_urcu_signal 4 1 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  16063309841 nr_writes         9217 nr_ops  16063319058
./test_urcu_signal 4 20 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  16065183739 nr_writes         9182 nr_ops  16065192921
./test_urcu_signal 12 12 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  12 rdur      0 wdur      0 nr_writers  12 wdelay      0 nr_reads  48028512672 nr_writes         8890 nr_ops  48028521562
./test_urcu_signal 16 8 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  16 rdur      0 wdur      0 nr_writers   8 wdelay      0 nr_reads  64001589198 nr_writes         8756 nr_ops  64001597954
./test_urcu_signal 20 4 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   4 wdelay      0 nr_reads  79907434070 nr_writes         9068 nr_ops  79907443138
./test_urcu_signal 20 3 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   3 wdelay      0 nr_reads  79987250839 nr_writes         8589 nr_ops  79987259428
./test_urcu_signal 20 2 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   2 wdelay      0 nr_reads  79749947176 nr_writes         8596 nr_ops  79749955772
./test_urcu_signal 20 1 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  79751023090 nr_writes         8624 nr_ops  79751031714

* Batched synchronize_rcu() -- test_urcu_signal

./test_urcu_signal 4 1 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  15739087241 nr_writes         9218 nr_ops  15739096459
./test_urcu_signal 4 20 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  15662135806 nr_writes        94833 nr_ops  15662230639
./test_urcu_signal 12 12 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  12 rdur      0 wdur      0 nr_writers  12 wdelay      0 nr_reads  46634363289 nr_writes        56903 nr_ops  46634420192
./test_urcu_signal 16 8 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  16 rdur      0 wdur      0 nr_writers   8 wdelay      0 nr_reads  62263951759 nr_writes        39058 nr_ops  62263990817
./test_urcu_signal 20 4 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   4 wdelay      0 nr_reads  77799768623 nr_writes        21065 nr_ops  77799789688
./test_urcu_signal 20 3 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   3 wdelay      0 nr_reads  76408008440 nr_writes        17026 nr_ops  76408025466
./test_urcu_signal 20 2 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   2 wdelay      0 nr_reads  77868927424 nr_writes        12630 nr_ops  77868940054
./test_urcu_signal 20 1 20
SUMMARY ./test_urcu_signal        testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  77293186844 nr_writes         8680 nr_ops  77293195524

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-wait: move queue management code into urcu-wait.h
Mathieu Desnoyers [Mon, 19 Nov 2012 23:16:53 +0000 (18:16 -0500)] 
urcu-wait: move queue management code into urcu-wait.h

Note: urcu-wait.h is not yet exposed outside of userspace RCU.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-wait: move wait code into separate file
Mathieu Desnoyers [Sun, 18 Nov 2012 20:16:43 +0000 (15:16 -0500)] 
urcu-wait: move wait code into separate file

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-qsbr: batch concurrent synchronize_rcu()
Mathieu Desnoyers [Mon, 12 Nov 2012 17:40:12 +0000 (12:40 -0500)] 
urcu-qsbr: batch concurrent synchronize_rcu()

Here are benchmarks on batching of synchronize_rcu(), and it leads to
very interesting scalability improvement and speedups, e.g., on a
24-core AMD, with a write-heavy scenario (4 readers threads, 20 updater
threads, each updater using synchronize_rcu()):

* Serialized grace periods :

./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4
rdur      0 wdur      0 nr_writers  20 wdelay      0
nr_reads  20251412728 nr_writes      1826331 nr_ops  20253239059

* Batched grace periods :

./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4
rdur      0 wdur      0 nr_writers  20 wdelay      0
nr_reads  15141994746 nr_writes      9382515 nr_ops  15151377261

For a 9382515/1826331 = 5.13 speedup for 20 updaters.

Of course, we can see that readers have slowed down, probably due to
increased update traffic, given there is no change to the read-side code
whatsoever.

Now let's see the penality of managing the stack for single-updater.
With 4 readers, single updater:

* Serialized grace periods :

./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4
rdur      0 wdur      0 nr_writers   1 wdelay      0
nr_reads  19240784755 nr_writes      2130839 nr_ops  19242915594

* Batched grace periods :

./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4
rdur      0 wdur      0 nr_writers   1 wdelay      0
nr_reads  19160162768 nr_writes      2253068 nr_ops  1916241583

2253068 vs 2137036 -> a couple of runs show that this difference lost in
the noise for single updater.

More benchmark results:

* Serialized synchronize_rcu() -- test_urcu_qsbr

./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  18841016559 nr_writes      1857130 nr_ops  18842873689
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  20272811733 nr_writes      1837027 nr_ops  20274648760
./test_urcu_qsbr 12 12 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  12 rdur      0 wdur      0 nr_writers  12 wdelay      0 nr_reads  60343516643 nr_writes      2353685 nr_ops  60345870328
./test_urcu_qsbr 16 8 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  16 rdur      0 wdur      0 nr_writers   8 wdelay      0 nr_reads  78202711840 nr_writes      2326331 nr_ops  78205038171
./test_urcu_qsbr 20 4 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   4 wdelay      0 nr_reads  94553396003 nr_writes      2238396 nr_ops  94555634399
./test_urcu_qsbr 20 3 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   3 wdelay      0 nr_reads  95004708661 nr_writes      2165966 nr_ops  95006874627
./test_urcu_qsbr 20 2 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   2 wdelay      0 nr_reads  95386506198 nr_writes      2194352 nr_ops  95388700550
./test_urcu_qsbr 20 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  84705972017 nr_writes      2609595 nr_ops  84708581612

* Batched synchronize_rcu() -- test_urcu_qsbr

./test_urcu_qsbr 4 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  19154850714 nr_writes      2238834 nr_ops  19157089548
./test_urcu_qsbr 4 20 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers   4 rdur      0 wdur      0 nr_writers  20 wdelay      0 nr_reads  15114131760 nr_writes      9370255 nr_ops  15123502015
./test_urcu_qsbr 12 12 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  12 rdur      0 wdur      0 nr_writers  12 wdelay      0 nr_reads  45541854970 nr_writes      5786496 nr_ops  45547641466
./test_urcu_qsbr 16 8 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  16 rdur      0 wdur      0 nr_writers   8 wdelay      0 nr_reads  66217337547 nr_writes      4257427 nr_ops  66221594974
./test_urcu_qsbr 20 4 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   4 wdelay      0 nr_reads  95048642908 nr_writes      2416266 nr_ops  95051059174
./test_urcu_qsbr 20 3 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   3 wdelay      0 nr_reads  96679609928 nr_writes      2211168 nr_ops  96681821096
./test_urcu_qsbr 20 2 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   2 wdelay      0 nr_reads  92166219811 nr_writes      1968725 nr_ops  92168188536
./test_urcu_qsbr 20 1 20
SUMMARY ./test_urcu_qsbr          testdur   20 nr_readers  20 rdur      0 wdur      0 nr_writers   1 wdelay      0 nr_reads  87986181951 nr_writes      3278737 nr_ops  87989460688

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotests: use standard malloc/free for synchronize_rcu()
Mathieu Desnoyers [Mon, 12 Nov 2012 14:07:34 +0000 (09:07 -0500)] 
tests: use standard malloc/free for synchronize_rcu()

Allows removing mutex from tests, which allow testing scalability of
concurrent synchronize_rcu() executions.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-bp: move quiescent threads to separate list
Mathieu Desnoyers [Mon, 12 Nov 2012 03:33:34 +0000 (22:33 -0500)] 
urcu-bp: move quiescent threads to separate list

Accelerate 2-phase grace period by not having to iterate twice on
threads not within RCU read-side critical section.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-mb/signal/membarrier: move quiescent threads to separate list
Mathieu Desnoyers [Mon, 12 Nov 2012 03:32:28 +0000 (22:32 -0500)] 
urcu-mb/signal/membarrier: move quiescent threads to separate list

Accelerate 2-phase grace period by not having to iterate twice on
threads not nested within a RCU read-side lock.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-qsbr: move offline threads to separate list
Mathieu Desnoyers [Mon, 12 Nov 2012 03:31:28 +0000 (22:31 -0500)] 
urcu-qsbr: move offline threads to separate list

Accelerate 2-phase grace period by not having to iterate on offline
threads twice.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-bp: improve 2-phase wait scheme
Mathieu Desnoyers [Mon, 12 Nov 2012 02:44:59 +0000 (21:44 -0500)] 
urcu-bp: improve 2-phase wait scheme

In the single-bit, 2-phase grace period scheme, all we need to do is to
observe each reader going through a quiescent state while we are in the
grace period.

We therefore only need to perform one global counter update, surrounded
by 2 iterations on readers to observe change in their snapshot.

We can therefore remove the first counter update (prior to the first
iteration on readers): it was useless and was only slowing down the
grace period.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-mb/signal/membarrier: improve 2-phase wait scheme
Mathieu Desnoyers [Mon, 12 Nov 2012 02:44:20 +0000 (21:44 -0500)] 
urcu-mb/signal/membarrier: improve 2-phase wait scheme

In the single-bit, 2-phase grace period scheme, all we need to do is to
observe each reader going through a quiescent state while we are in the
grace period.

We therefore only need to perform one global counter update, surrounded
by 2 iterations on readers to observe change in their snapshot.

We can therefore remove the first counter update (prior to the first
iteration on readers): it was useless and was only slowing down the
grace period.

CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
CC: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-qsbr: improve 2-phase wait scheme
Mathieu Desnoyers [Mon, 12 Nov 2012 02:40:23 +0000 (21:40 -0500)] 
urcu-qsbr: improve 2-phase wait scheme

In the single-bit, 2-phase grace period scheme, all we need to do is to
observe each reader going through a quiescent state while we are in the
grace period.

We therefore only need to perform one global counter update, surrounded
by 2 iterations on readers to observe change in their snapshot.

We can therefore remove the first counter update (prior to the first
iteration on readers): it was useless and was only slowing down the
grace period.

Suggested-by: Alan Stern <stern@rowland.harvard.edu>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: implement mutex-free splice
Mathieu Desnoyers [Tue, 20 Nov 2012 02:45:04 +0000 (21:45 -0500)] 
wfcqueue: implement mutex-free splice

A carefully crafted splice operation does not need to use an external
mutex to synchronize against other splice operations.

The trick is atomically exchange the head next pointer with
NULL. If the pointer we replaced was NULL, it means the queue was
possibly empty. If head next was not NULL, by setting head to NULL, we
ensure that concurrent splice operations are going to see an empty
queue, even if concurrent enqueue operations move tail further. This
means that as long as we are within splice, after setting head to NULL,
but before moving tail back to head, concurrent splice operations will
always see an empty queue, therefore acting as mutual exclusion.

If exchange returns a NULL head, we confirm that it was indeed empty by
checking if the tail pointer points to the head node, busy-waiting if
necessary.

Then the last step is to move the tail pointer to head. At that point,
enqueuers are going to start enqueuing at head again, and other splice
operations will be able to proceed.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: document empty criterion
Mathieu Desnoyers [Tue, 20 Nov 2012 03:36:05 +0000 (22:36 -0500)] 
wfcqueue: document empty criterion

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-call-rcu: use wait-free splice return value
Mathieu Desnoyers [Tue, 20 Nov 2012 10:28:42 +0000 (05:28 -0500)] 
urcu-call-rcu: use wait-free splice return value

We can now use the splice return value to know if the source queue was
empty rather than testing for destination queue emptiness after the
splice operation.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest wfcqueue: add tests for queue state return value
Mathieu Desnoyers [Sun, 18 Nov 2012 15:35:35 +0000 (10:35 -0500)] 
test wfcqueue: add tests for queue state return value

with e.g. ./test_urcu_wfcq 2 2 10 -w

we can confirm that we see as many "enqueue to empty queue" as we see
"splice from non-empty queue", which confirms that the queue state
returned by enqueue is indeed sampled atomically with enqueue.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
11 years agowfcqueue: enqueue and splice return queue state
Mathieu Desnoyers [Sun, 18 Nov 2012 15:31:35 +0000 (10:31 -0500)] 
wfcqueue: enqueue and splice return queue state

enqueue can return whether the queue was empty or not prior to enqueue.

splice can return this information about destination queue too, but
there are more cases to handle, because we don't touch the destination
queue if the source queue was empty, and in the nonblocking case, we
return that we would need to block on the source queue.

The destination queue state is sampled atomically with enqueue/splice to
destination operations.

Knowing this state is useful when "ownership" on a batch of queue items
can be assigned to those enqueuing the first items, e.g. to implement
wait/wakeup schemes.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
11 years agoFix: wfcqueue nonblocking dequeue
Mathieu Desnoyers [Tue, 20 Nov 2012 04:22:50 +0000 (23:22 -0500)] 
Fix: wfcqueue nonblocking dequeue

Failures were not handled in the nonblocking dequeue implementation.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: Fix lock and unlock functions
Paul E. McKenney [Fri, 16 Nov 2012 03:07:03 +0000 (22:07 -0500)] 
wfcqueue: Fix lock and unlock functions

The current implementation of cds_wfcq_dequeue_lock() and
cds_wfcq_dequeue_unlock() entails mutually assured recursion.
Redirect to _cds_wfcq_dequeue_lock() and _cds_wfcq_dequeue_unlock(),
respectively.

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoruntests: Make path of time binary configurable
Simon Marchi [Thu, 15 Nov 2012 14:29:42 +0000 (09:29 -0500)] 
runtests: Make path of time binary configurable

I work on a platform that does not come with a time program. This patch
makes it possible to specify the path of the time binary or not use it
if none is available.

If the URCU_TEST_TIME_BIN environment variable exists and is executable,
it is used. Otherwise it tries with /usr/bin/time, the most common
location. If it is not there, the tests are ran without timing info.

[ Edit by Mathieu Desnoyers: use "." instead of "source" (no bash-ism),
  edit commit about check for emptiness vs definition to match the code. ]

Signed-off-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agourcu-qsbr: skip Q.S. reporting if already reported
Mathieu Desnoyers [Sun, 11 Nov 2012 16:20:07 +0000 (11:20 -0500)] 
urcu-qsbr: skip Q.S. reporting if already reported

We can skip both memory barriers and store reporting quiescent state if
we notice we already reported Q.S. for the current value of
"rcu_gp_ctr".

It covers the two implementations of QSBR:

* 64-bit architecture: we assume the counter never overflows, and
  therefore only perform one increment followed by waiting for readers.
  In this scenario, we don't care if the rcu_gp_ctr load is moved into
  the prior read-side critical section, as long as the
  URCU_TLS(rcu_reader).ctr store is ordered.

* 32-bit architecture: given the 32-bit counter could overflow,
  we rely on a 2-phase approach, using a single bit: we flip
  the rcu_gp_ctr bit, then wait to observe that all readers have
  taken a copy of the new rcu_gp_ctr. We flip it again, and wait until
  we observe that all readers have copied its new value. We are then
  certain that each reader necessarily passed through a quiescent state
  during the grace period (and that Q.S. was not located prior to our
  grace period). This scheme works even if the rcu_gp_ctr load is moved
  into the prior read-side critical section, as long as store to
  URCU_TLS(rcu_reader).ctr is ordered with respect to other memory
  accesses within that thread.

Suggested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
11 years agoFix TLS detection: test with linker, add --disable-compiler-tls
Mathieu Desnoyers [Fri, 9 Nov 2012 02:45:04 +0000 (21:45 -0500)] 
Fix TLS detection: test with linker, add --disable-compiler-tls

NetBSD 5.1 and older, as well as Darwin, succeed to compile code
containing TLS, but cannot link it. Test with linker in addition to
compiler for TLS support.

Also add a --disable-compiler-tls configure option to allow users to
force using the pthread getspecific fall back.

Fixes #288

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoCleanup: cast pthread_self() return value to unsigned long
Mathieu Desnoyers [Thu, 8 Nov 2012 20:28:59 +0000 (15:28 -0500)] 
Cleanup: cast pthread_self() return value to unsigned long

pthread_t can map to other things that unsigned long (e.g. pointer).
Cast it to unsigned long for debug printing and for debug delay random
value purposes.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFallback mechanism not working on platform where TLS is unsupported
Christian Babeux [Thu, 8 Nov 2012 19:30:08 +0000 (14:30 -0500)] 
Fallback mechanism not working on platform where TLS is unsupported

The CONFIG_RCU_TLS entry in config.h.in is defined by default to "TLS".
This has the unfortunate consequence of defining CONFIG_RCU_TLS on
platform where TLS is unsupported and effectively disabling the pthread
based fallback mechanism. This macro should be #undef by default and the
AX_TLS m4 macro will properly detect if TLS is supported.

Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoRevert "Fix: cross-build: configure.ac should use --target, not --host"
Mathieu Desnoyers [Wed, 7 Nov 2012 20:22:57 +0000 (15:22 -0500)] 
Revert "Fix: cross-build: configure.ac should use --target, not --host"

This reverts commit 1eade46a854eb8211be9fd32e0cf6835576deb63.

No. --target is for building cross-compilers. --host was appropriate.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix: cross-build: configure.ac should use --target, not --host
Mathieu Desnoyers [Wed, 7 Nov 2012 20:09:28 +0000 (15:09 -0500)] 
Fix: cross-build: configure.ac should use --target, not --host

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest_urcu_wfcq: add splice and nosync tests
Mathieu Desnoyers [Sun, 4 Nov 2012 18:04:40 +0000 (13:04 -0500)] 
test_urcu_wfcq: add splice and nosync tests

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest_urcu_wfs: cleanup
Mathieu Desnoyers [Sun, 4 Nov 2012 18:03:59 +0000 (13:03 -0500)] 
test_urcu_wfs: cleanup

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest_urcu_lfs: cleanup
Mathieu Desnoyers [Sun, 4 Nov 2012 18:03:32 +0000 (13:03 -0500)] 
test_urcu_lfs: cleanup

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix static linking: add missing static for _defer_rcu
Mathieu Desnoyers [Thu, 1 Nov 2012 22:34:40 +0000 (18:34 -0400)] 
Fix static linking: add missing static for _defer_rcu

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotests: report error value for make check
Mathieu Desnoyers [Thu, 1 Nov 2012 21:56:04 +0000 (17:56 -0400)] 
tests: report error value for make check

exit 1 as soon as a test fails.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoAdd multiflavor test program
Mathieu Desnoyers [Thu, 1 Nov 2012 21:50:24 +0000 (17:50 -0400)] 
Add multiflavor test program

Add a multiflavor test program to catch symbol name clashes earlier next
time.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix static linking: fix symbol name namespaces
Mathieu Desnoyers [Thu, 1 Nov 2012 21:49:39 +0000 (17:49 -0400)] 
Fix static linking: fix symbol name namespaces

gp_futex, yield_active, rand_yield, has_sys_membarrier, rcu_defer_exit,
call_rcu_data_free, call_rcu_before_fork, call_rcu_after_fork_parent,
call_rcu_after_fork_child are exported by each urcu flavor.

In order to fix use-cases where multiple flavors are statically linked
into the same application, we need to move these symbols to local
namespaces.

Ensure that all symbols are prefixed by "rcu_".

Also add each of those symbols into urcu/map/*.h headers, so they get
mapped to their flavor-specific symbol name by the preprocessor.

This requires bumping our .so version from 1.0.0 to 2.0.0, because it
changes some symbol names.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix static linking: add missing static to thr_defer
Mathieu Desnoyers [Thu, 1 Nov 2012 20:37:04 +0000 (16:37 -0400)] 
Fix static linking: add missing static to thr_defer

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix static linking: add missing static
Mathieu Desnoyers [Thu, 1 Nov 2012 20:33:01 +0000 (16:33 -0400)] 
Fix static linking: add missing static

update_counter_and_wait and call_rcu_data_list are only used locally.
Add the static keyword to ensure their symbol are not exported. This
helps fixing static linking of many URCU flavors into the same program.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agodeprecation: fix build with gcc < 4.5
Mathieu Desnoyers [Tue, 23 Oct 2012 15:40:37 +0000 (11:40 -0400)] 
deprecation: fix build with gcc < 4.5

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack.c: update copyright notice
Mathieu Desnoyers [Tue, 23 Oct 2012 15:22:56 +0000 (11:22 -0400)] 
wfstack.c: update copyright notice

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoUpdate wfstack copyright notice
Mathieu Desnoyers [Tue, 23 Oct 2012 15:02:27 +0000 (11:02 -0400)] 
Update wfstack copyright notice

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoComment fix: update associated LGPL header name
Mathieu Desnoyers [Tue, 23 Oct 2012 15:00:30 +0000 (11:00 -0400)] 
Comment fix: update associated LGPL header name

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoUpdate cds-api.txt following API deprecations
Mathieu Desnoyers [Tue, 23 Oct 2012 12:53:56 +0000 (08:53 -0400)] 
Update cds-api.txt following API deprecations

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoDeprecate wfqueue
Mathieu Desnoyers [Tue, 23 Oct 2012 12:43:33 +0000 (08:43 -0400)] 
Deprecate wfqueue

Replaced by "wfcqueue", which has a semantic that allows placing head
and tail on different cache lines, and does not allocate memory
internally. wfqueue users can easily migrate to wfcqueue.

We choose to deprecate wfqueue rather than reimplementing it on top of
wfcqueue to ensure we keep strong ABI compatibility for existing wfqueue
users.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoDeprecate rculfstack
Mathieu Desnoyers [Tue, 23 Oct 2012 12:36:42 +0000 (08:36 -0400)] 
Deprecate rculfstack

Replaced by "lfstack", which has a less restrictive semantic, and covers
rculfstack completely.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: introduce nonblocking API
Mathieu Desnoyers [Mon, 22 Oct 2012 12:55:22 +0000 (08:55 -0400)] 
wfcqueue: introduce nonblocking API

Introduce nonblocking API in wfcqueue, allowing RT threads to try to
dequeue, splice, or iterate on spliced queues without blocking: the
caller needs to handle CDS_WFCQ_WOULDBLOCK return value (or nonzero
return value for splice).

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
CC: Paul McKenney <paulmck@linux.vnet.ibm.com>
CC: Lai Jiangshan <laijs@cn.fujitsu.com>
11 years agolfstack: test pop_all and pop
Mathieu Desnoyers [Fri, 12 Oct 2012 13:51:41 +0000 (09:51 -0400)] 
lfstack: test pop_all and pop

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agolfstack: implement empty, pop_all and iterators, document API
Mathieu Desnoyers [Fri, 12 Oct 2012 13:30:15 +0000 (09:30 -0400)] 
lfstack: implement empty, pop_all and iterators, document API

We are changing the ABI by adding a mutex into struct cds_lfs_stack.
This ABI has never been exposed in a release so far, so we can change
it.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agolfstack: implement test
Mathieu Desnoyers [Thu, 11 Oct 2012 20:44:40 +0000 (16:44 -0400)] 
lfstack: implement test

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agolfstack: implement lock-free stack
Mathieu Desnoyers [Thu, 11 Oct 2012 19:08:57 +0000 (15:08 -0400)] 
lfstack: implement lock-free stack

This stack does not require to hold RCU read-side lock across push, and
allows multiple strategies to be used for pop.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack: implement pop_all and iteration tests
Mathieu Desnoyers [Sat, 13 Oct 2012 02:11:49 +0000 (22:11 -0400)] 
wfstack: implement pop_all and iteration tests

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfstack: implement cds_wfs_pop_all and iterators, document API
Mathieu Desnoyers [Sat, 13 Oct 2012 01:47:05 +0000 (21:47 -0400)] 
wfstack: implement cds_wfs_pop_all and iterators, document API

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agorculfhash test: fix trivial memleak and return node leak and errors
Mathieu Desnoyers [Mon, 22 Oct 2012 22:17:24 +0000 (18:17 -0400)] 
rculfhash test: fix trivial memleak and return node leak and errors

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agorculfhash: add missing extern
Mathieu Desnoyers [Mon, 22 Oct 2012 21:37:38 +0000 (17:37 -0400)] 
rculfhash: add missing extern

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoCleanup: fix cppcheck errors
Mathieu Desnoyers [Mon, 22 Oct 2012 21:34:31 +0000 (17:34 -0400)] 
Cleanup: fix cppcheck errors

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: remove ancient comment
Mathieu Desnoyers [Sun, 14 Oct 2012 15:59:31 +0000 (11:59 -0400)] 
wfcqueue: remove ancient comment

This comment is a leftover from wfqueue and is now inappropriate in the
context of wfcqueue: the dequeue operation busy-waits if it sees a NULL
next pointer from a node that is not the tail node.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest_urcu_lfq: remove rcu_defer_register_thread() from test_urcu_lfq
Lai Jiangshan [Sat, 13 Oct 2012 16:48:54 +0000 (12:48 -0400)] 
test_urcu_lfq: remove rcu_defer_register_thread() from test_urcu_lfq

test_urcu_lfq has already switch to call_rcu(),
rcu_defer_register_thread() is unneeded.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest_urcu_lfq: test for the proper pointer
Lai Jiangshan [Sat, 13 Oct 2012 16:46:45 +0000 (12:46 -0400)] 
test_urcu_lfq: test for the proper pointer

We should use "if (qnode)" instead of "if (node)" in case of
the struct cds_lfq_node_rcu is not the first field of struct node.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest_urcu_lfs: remove rcu_defer_register_thread() from test_urcu_lfs
Lai Jiangshan [Sat, 13 Oct 2012 16:45:33 +0000 (12:45 -0400)] 
test_urcu_lfs: remove rcu_defer_register_thread() from test_urcu_lfs

test_urcu_lfs has already switch to call_rcu(),
rcu_defer_register_thread() is unneeded.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agotest_urcu_lfs: test for the proper pointer
Lai Jiangshan [Sat, 13 Oct 2012 16:41:17 +0000 (12:41 -0400)] 
test_urcu_lfs: test for the proper pointer

We should use "if (snode)" instead of "if (node)" in case of
the struct cds_lfs_node_rcu is not the first field of struct node.

Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: clarify locking usage
Mathieu Desnoyers [Fri, 12 Oct 2012 14:33:20 +0000 (10:33 -0400)] 
wfcqueue: clarify locking usage

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoDocument APIs in README
Mathieu Desnoyers [Fri, 12 Oct 2012 11:47:11 +0000 (07:47 -0400)] 
Document APIs in README

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoTest cleanup: replace "l" parameter by "loops"
Mathieu Desnoyers [Fri, 12 Oct 2012 11:29:34 +0000 (07:29 -0400)] 
Test cleanup: replace "l" parameter by "loops"

Reported-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoAdd wfcqueue header to cds.h
Mathieu Desnoyers [Thu, 11 Oct 2012 20:41:16 +0000 (16:41 -0400)] 
Add wfcqueue header to cds.h

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix: urcu-bp, urcu, urcu-qsbr should include wfcqueue
Mathieu Desnoyers [Thu, 11 Oct 2012 16:44:10 +0000 (12:44 -0400)] 
Fix: urcu-bp, urcu, urcu-qsbr should include wfcqueue

Those are still including wfqueue.h, but need to move to wfcqueue.h,
since this is now needed by call_rcu. It was still working, because call
rcu headers include wfcqueue.h, but they were doing so _after_ #undef
_LGPL_SOURCE was issued, which made wfcqueue.h depend on
liburcu-common.so to find the wfcqueue symbols. This was in turn adding
a transitive dependency that was not present before, and thus causing
build failure in cross-build environments, especially those on Debian
systems, due to special handling of transitive dependencies on Debian
autotools.

Reported-by: Simon Marchi <simon.marchi@polymtl.ca>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agoFix: call_rcu list corruption on teardown (documentation)
Mathieu Desnoyers [Thu, 11 Oct 2012 16:28:23 +0000 (12:28 -0400)] 
Fix: call_rcu list corruption on teardown (documentation)

This commit is a place-holder to document that commit
5161f31e09ce33dd79afad8d08a2372fbf1c4fbe fixed a list corruption bug in
call_rcu.

Introducing __cds_wfcq_splice_blocking() fixed a list corruption bug in
the 0.7.x series. The equivalent fix appeared in 0.6.8 for the
stable-0.6 branch.

Description of the bug:

* Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> * Lai Jiangshan (laijs@cn.fujitsu.com) wrote:
> > test code:
> > ./tests/test_urcu_lfs 100 10 10
> >
> > bug produce rate > 60%
> >
> > {{{
> > I didn't see any bug when "./tests/test_urcu_lfs 10 10 10" Or
> +"./tests/test_urcu_lfs 100 100 10"
> > But I just test it about 5 times
> > }}}
> >
> > 4cores*1threads: Intel(R) Core(TM) i5 CPU         760
> > RCU_MB (no time to test for other rcu type)
> > test commit: 768fba83676f49eb73fd1d8ad452016a84c5ec2a
> >
> > I didn't see any bug when "./tests/test_urcu_mb 10 100 10"
> >
> > Sorry, I tried, but I failed to find out the root cause currently.
>
> I think I managed to narrow down the issue:
>
> 1) the master branch does not reproduce it, but commit
>    768fba83676f49eb73fd1d8ad452016a84c5ec2a repdroduces it about 50% of the
>    time.
>
> 2) the main change between 768fba83676f49eb73fd1d8ad452016a84c5ec2a and
>    current master (f94061a3df4c9eab9ac869a19e4228de54771fcb) is call_rcu
>    moving to wfcqueue.
>
> 3) the bug always arise, for me, at the end of the 10 seconds.
>    However, it might be simply due to the fact that most of the memory
>    get freed at the end of program execution.
>
> 4) I've been able to get a backtrace, and it looks like we have some
>    call_rcu callback-invocation threads still working while
>    call_rcu_data_free() is invoked. In the backtrace, call_rcu_data_free()
>    is nicely waiting for the next thread to stop, and during that time,
>    two callback-invocation threads are invoking callbacks (and one of
>    them triggers the segfault).
>
> So I expect that commit
>
> commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe
> Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
> Date:   Tue Sep 25 10:50:49 2012 -0500
>
>     call_rcu: use wfcqueue, eliminate false-sharing
>
>     Eliminate false-sharing between call_rcu (enqueuer) and worker threads
>     on the queue head and tail.
>
>     Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
>     Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
>
> Could have managed to fix the issue, or change the timing enough that it
> does not reproduces. I'll continue investigating.

The bug was in call rcu. It is not required for master, because we fixed
it while moving to wfcqueue.  We were erroneously writing to the head
field of the default call_rcu_data rather than tail.

The conditions to reproduce this bug:

1) setup per-cpu callback-invocation threads,
2) use call_rcu
3) call call_rcu_data_free() while there are still some pending
   callbacks that have not yet been executed by the callback-invocation
   threads,
4) we then get corruption due to the "default" callback invocation
   that walks through a corrupted queue.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agocall_rcu: remove head field alignement, explain wfcqueue motivation
Mathieu Desnoyers [Thu, 11 Oct 2012 15:41:48 +0000 (11:41 -0400)] 
call_rcu: remove head field alignement, explain wfcqueue motivation

The following commit:

commit 5161f31e09ce33dd79afad8d08a2372fbf1c4fbe
Author: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Date:   Tue Sep 25 10:50:49 2012 -0500

    call_rcu: use wfcqueue, eliminate false-sharing

    Eliminate false-sharing between call_rcu (enqueuer) and worker threads
    on the queue head and tail.

introduced a change in call_rcu: it moved from "wfqueue" to "wfcqueue".
Its changelog states that the goal is to eliminate false-sharing, but
the changelog rationale is wrong.

The actual primary goal is to use the "splice" operation (which is
similar to the "dequeue_all" operation proposed by Lai Jiangshan),
instead of open-coding this operation directly within the call_rcu
implementation. The objective stated by Lai was to make testing of this
code-path easier, and he was right: we ended up noticing a bug in the
original call_rcu implementation (in this open-coded splice operation)
that was really hard to trigger, which was fixed by the move to
wfcqueue.

About false-sharing: In the case of call_rcu callback invokation threads
vs call_rcu callers, we do not care about false-sharing because call_rcu
callback-invocation threads use batching ("splice") to get an entire
list of callbacks, which effectively empties the queue, and requires to
touch the tail anyway. Ensuring that head and tail are placed on
different cache lines would matter only if we would be using "dequeue"
in the callback-invocation thread, which is not the case: we grab the
whole queue, and then iterate from our local head to our local tail.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue: update credits in patch documentation
Mathieu Desnoyers [Thu, 11 Oct 2012 15:27:37 +0000 (11:27 -0400)] 
wfcqueue: update credits in patch documentation

Give credits to those responsible for the design and implementation of
commit 8ad4ce587f001ae026d5560ac509c2e48986130b, "wfcqueue: implement
concurrency-efficient queue", which happened through rounds of email and
patch exchanges.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
11 years agowfcqueue documentation: hint at for_each iterators
Mathieu Desnoyers [Mon, 8 Oct 2012 16:11:30 +0000 (12:11 -0400)] 
wfcqueue documentation: hint at for_each iterators

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
This page took 0.049668 seconds and 4 git commands to generate.