From: Mathieu Desnoyers Date: Mon, 7 May 2012 23:07:35 +0000 (-0400) Subject: rculfhash: fix: race between replace and del operations X-Git-Tag: v0.7.0~17 X-Git-Url: https://git.liburcu.org/?p=urcu.git;a=commitdiff_plain;h=71bb3aca70894bd78ec75f47c0c464261ad1dc86 rculfhash: fix: race between replace and del operations Bug introduced by commit db00ccc36e7fb04ce8044fb1be7964acd1de6ae0 Here is the race: Initially in hash table: A T0 T1 replace A by B del A read A->next -> check REMOVED flag, not set yet. read A->next -> check REMOVED flag, not set yet. cmpxchg A->next to set REMOVED flag -> cmpxchg succeeds uatomic_or to set REMOVED flag uatomic_xchg to atomically set the REMOVAL_OWNER flag -> first to set the flag. Replace returns node -> free(A) Del success -> free(A) With this race, we have a double-free. The problem with the replace code is that it does not set the "REMOVAL_OWNER" flag. Test case to reproduce the bug: test_urcu_hash 0 2 20 -A -s -M 1 -N 1 -O 1 (2 threads, doing replace/del, with a hash table that has only a single key for all values). After just a couple of seconds, either the program hangs, or, more often, it does: *** glibc detected *** /media/truecrypt1/compudj/doc/userspace-rcu/tests/.libs/test_urcu_hash: malloc(): memory corruption (fast): 0x00007ffff3a29e25 *** Program received signal SIGSEGV, Segmentation fault. Signed-off-by: Mathieu Desnoyers Tested-by: Mathieu Desnoyers --- diff --git a/rculfhash.c b/rculfhash.c index b26a690..e9cf062 100644 --- a/rculfhash.c +++ b/rculfhash.c @@ -734,12 +734,6 @@ int is_removed(struct cds_lfht_node *node) return ((unsigned long) node) & REMOVED_FLAG; } -static -struct cds_lfht_node *flag_removed(struct cds_lfht_node *node) -{ - return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG); -} - static int is_bucket(struct cds_lfht_node *node) { @@ -764,6 +758,12 @@ struct cds_lfht_node *flag_removal_owner(struct cds_lfht_node *node) return (struct cds_lfht_node *) (((unsigned long) node) | REMOVAL_OWNER_FLAG); } +static +struct cds_lfht_node *flag_removed_or_removal_owner(struct cds_lfht_node *node) +{ + return (struct cds_lfht_node *) (((unsigned long) node) | REMOVED_FLAG | REMOVAL_OWNER_FLAG); +} + static struct cds_lfht_node *get_end(void) { @@ -894,6 +894,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size, } assert(old_next == clear_flag(old_next)); assert(new_node != old_next); + /* + * REMOVAL_OWNER flag is _NEVER_ set before the REMOVED + * flag. It is either set atomically at the same time + * (replace) or after (del). + */ + assert(!is_removal_owner(old_next)); new_node->next = old_next; /* * Here is the whole trick for lock-free replace: we add @@ -906,10 +912,12 @@ int _cds_lfht_replace(struct cds_lfht *ht, unsigned long size, * the old node, but will not see the new one. * This is a replacement of a node with another node * that has the same value: we are therefore not - * removing a value from the hash table. + * removing a value from the hash table. We set both the + * REMOVED and REMOVAL_OWNER flags atomically so we own + * the node after successful cmpxchg. */ ret_next = uatomic_cmpxchg(&old_node->next, - old_next, flag_removed(new_node)); + old_next, flag_removed_or_removal_owner(new_node)); if (ret_next == old_next) break; /* We performed the replacement. */ old_next = ret_next;