+/*
+ * Based on the following articles:
+ * - Ori Shalev and Nir Shavit. Split-ordered lists: Lock-free
+ * extensible hash tables. J. ACM 53, 3 (May 2006), 379-405.
+ * - Michael, M. M. High performance dynamic lock-free hash tables
+ * and list-based sets. In Proceedings of the fourteenth annual ACM
+ * symposium on Parallel algorithms and architectures, ACM Press,
+ * (2002), 73-82.
+ *
+ * Some specificities of this Lock-Free Resizable RCU Hash Table
+ * implementation:
+ *
+ * - RCU read-side critical section allows readers to perform hash
+ * table lookups and use the returned objects safely by delaying
+ * memory reclaim of a grace period.
+ * - Add and remove operations are lock-free, and do not need to
+ * allocate memory. They need to be executed within RCU read-side
+ * critical section to ensure the objects they read are valid and to
+ * deal with the cmpxchg ABA problem.
+ * - add and add_unique operations are supported. add_unique checks if
+ * the node key already exists in the hash table. It ensures no key
+ * duplicata exists.
+ * - The resize operation executes concurrently with add/remove/lookup.
+ * - Hash table nodes are contained within a split-ordered list. This
+ * list is ordered by incrementing reversed-bits-hash value.
+ * - An index of dummy nodes is kept. These dummy nodes are the hash
+ * table "buckets", and they are also chained together in the
+ * split-ordered list, which allows recursive expansion.
+ * - The resize operation for small tables only allows expanding the hash table.
+ * It is triggered automatically by detecting long chains in the add
+ * operation.
+ * - The resize operation for larger tables (and available through an
+ * API) allows both expanding and shrinking the hash table.
+ * - Per-CPU Split-counters are used to keep track of the number of
+ * nodes within the hash table for automatic resize triggering.
+ * - Resize operation initiated by long chain detection is executed by a
+ * call_rcu thread, which keeps lock-freedom of add and remove.
+ * - Resize operations are protected by a mutex.
+ * - The removal operation is split in two parts: first, a "removed"
+ * flag is set in the next pointer within the node to remove. Then,
+ * a "garbage collection" is performed in the bucket containing the
+ * removed node (from the start of the bucket up to the removed node).
+ * All encountered nodes with "removed" flag set in their next
+ * pointers are removed from the linked-list. If the cmpxchg used for
+ * removal fails (due to concurrent garbage-collection or concurrent
+ * add), we retry from the beginning of the bucket. This ensures that
+ * the node with "removed" flag set is removed from the hash table
+ * (not visible to lookups anymore) before the RCU read-side critical
+ * section held across removal ends. Furthermore, this ensures that
+ * the node with "removed" flag set is removed from the linked-list
+ * before its memory is reclaimed. Only the thread which removal
+ * successfully set the "removed" flag (with a cmpxchg) into a node's
+ * next pointer is considered to have succeeded its removal (and thus
+ * owns the node to reclaim). Because we garbage-collect starting from
+ * an invariant node (the start-of-bucket dummy node) up to the
+ * "removed" node (or find a reverse-hash that is higher), we are sure
+ * that a successful traversal of the chain leads to a chain that is
+ * present in the linked-list (the start node is never removed) and
+ * that is does not contain the "removed" node anymore, even if
+ * concurrent delete/add operations are changing the structure of the
+ * list concurrently.
+ * - The add operation performs gargage collection of buckets if it
+ * encounters nodes with removed flag set in the bucket where it wants
+ * to add its new node. This ensures lock-freedom of add operation by
+ * helping the remover unlink nodes from the list rather than to wait
+ * for it do to so.
+ * - A RCU "order table" indexed by log2(hash index) is copied and
+ * expanded by the resize operation. This order table allows finding
+ * the "dummy node" tables.
+ * - There is one dummy node table per hash index order. The size of
+ * each dummy node table is half the number of hashes contained in
+ * this order (except for order 0).
+ * - synchronzie_rcu is used to garbage-collect the old dummy node table.
+ * - The per-order dummy node tables contain a compact version of the
+ * hash table nodes. These tables are invariant after they are
+ * populated into the hash table.
+ *
+ * Dummy node tables:
+ *
+ * hash table hash table the last all dummy node tables
+ * order size dummy node 0 1 2 3 4 5 6(index)
+ * table size
+ * 0 1 1 1
+ * 1 2 1 1 1
+ * 2 4 2 1 1 2
+ * 3 8 4 1 1 2 4
+ * 4 16 8 1 1 2 4 8
+ * 5 32 16 1 1 2 4 8 16
+ * 6 64 32 1 1 2 4 8 16 32
+ *
+ * When growing/shrinking, we only focus on the last dummy node table
+ * which size is (!order ? 1 : (1 << (order -1))).
+ *
+ * Example for growing/shrinking:
+ * grow hash table from order 5 to 6: init the index=6 dummy node table
+ * shrink hash table from order 6 to 5: fini the index=6 dummy node table
+ *
+ * A bit of ascii art explanation:
+ *
+ * Order index is the off-by-one compare to the actual power of 2 because
+ * we use index 0 to deal with the 0 special-case.
+ *
+ * This shows the nodes for a small table ordered by reversed bits:
+ *
+ * bits reverse
+ * 0 000 000
+ * 4 100 001
+ * 2 010 010
+ * 6 110 011
+ * 1 001 100
+ * 5 101 101
+ * 3 011 110
+ * 7 111 111
+ *
+ * This shows the nodes in order of non-reversed bits, linked by
+ * reversed-bit order.
+ *
+ * order bits reverse
+ * 0 0 000 000
+ * 1 | 1 001 100 <-
+ * 2 | | 2 010 010 <- |
+ * | | | 3 011 110 | <- |
+ * 3 -> | | | 4 100 001 | |
+ * -> | | 5 101 101 |
+ * -> | 6 110 011
+ * -> 7 111 111
+ */
+