in cds_list_add_rcu, use rcu_assign_pointer to update head->next
atomically and provide the memory barrier before publishing head->next.
Notice that we don't need the wmb() prior to store to prev, because RCU
traversals only go forward, and thus only use "next".
in cds_list_del_rcu, use CMM_STORE_SHARED() to store to elem->prev->next
atomically.