rculfhash: fix: race between replace and del operations
authorMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mon, 7 May 2012 23:07:35 +0000 (19:07 -0400)
committerMathieu Desnoyers <mathieu.desnoyers@efficios.com>
Mon, 7 May 2012 23:07:35 +0000 (19:07 -0400)
Bug introduced by commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0

Here is the race:

Initially in hash table:  A

T0                                         T1
replace A by B
                                           del A
  read A->next
  -> check REMOVED flag, not set yet.
                                           read A->next
                                           -> check REMOVED flag, not set yet.
  cmpxchg A->next to set REMOVED flag
  -> cmpxchg succeeds
                                           uatomic_or to set REMOVED flag
                                           uatomic_xchg to atomically set the REMOVAL_OWNER flag
                                           -> first to set the flag.
  Replace returns node -> free(A)          Del success -> free(A)

With this race, we have a double-free.

The problem with the replace code is that it does not set the
"REMOVAL_OWNER" flag.

Test case to reproduce the bug:

test_urcu_hash 0 2 20 -A -s -M 1 -N 1 -O 1

(2 threads, doing replace/del, with a hash table that has only a single
key for all values). After just a couple of seconds, either the program
hangs, or, more often, it does:

*** glibc detected ***
/media/truecrypt1/compudj/doc/userspace-rcu/tests/.libs/test_urcu_hash:
malloc(): memory corruption (fast): 0x00007ffff3a29e25 ***

Program received signal SIGSEGV, Segmentation fault.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
rculfhash.c

index b26a69085c35cbbf86d173193920c9e51f3b9e68..e9cf0622f53576ef8ad8c08eda082c37f1151b86 100644 (file)
@@ -734,12 +734,6 @@ int is_removed(struct cds_lfht_node *node)
        return ((unsigned long) node) & REMOVED_FLAG;
 }
 
-static
-struct cds_lfht_node *flag_removed(struct cds_lfht_node *node)
-{
-       return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG);
-}
-
 static
 int is_bucket(struct cds_lfht_node *node)
 {
@@ -764,6 +758,12 @@ struct cds_lfht_node *flag_removal_owner(struct cds_lfht_node *node)
        return (struct cds_lfht_node *) (((unsigned long) node) | REMOVAL_OWNER_FLAG);
 }
 
+static
+struct cds_lfht_node *flag_removed_or_removal_owner(struct cds_lfht_node *node)
+{
+       return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG | REMOVAL_OWNER_FLAG);
+}
+
 static
 struct cds_lfht_node *get_end(void)
 {
@@ -894,6 +894,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size,
                }
                assert(old_next == clear_flag(old_next));
                assert(new_node != old_next);
+               /*
+                * REMOVAL_OWNER flag is _NEVER_ set before the REMOVED
+                * flag. It is either set atomically at the same time
+                * (replace) or after (del).
+                */
+               assert(!is_removal_owner(old_next));
                new_node->next = old_next;
                /*
                 * Here is the whole trick for lock-free replace: we add
@@ -906,10 +912,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size,
                 * the old node, but will not see the new one.
                 * This is a replacement of a node with another node
                 * that has the same value: we are therefore not
-                * removing a value from the hash table.
+                * removing a value from the hash table. We set both the
+                * REMOVED and REMOVAL_OWNER flags atomically so we own
+                * the node after successful cmpxchg.
                 */
                ret_next = uatomic_cmpxchg(&old_node->next,
-                             old_next, flag_removed(new_node));
+                       old_next, flag_removed_or_removal_owner(new_node));
                if (ret_next == old_next)
                        break;          /* We performed the replacement. */
                old_next = ret_next;
This page took 0.026141 seconds and 4 git commands to generate.