X-Git-Url: http://git.liburcu.org/?a=blobdiff_plain;f=rcuja%2Fdesign.txt;h=c126f7716726355d9cdce25adc9fd43b49965bed;hb=refs%2Fheads%2Furcu%2Frcuja-range;hp=775ec970827e5d138f8a3010b3b11b3d7b33ac27;hpb=61009379872b40dbd0d69bb3baaf82e55ec58b39;p=userspace-rcu.git diff --git a/rcuja/design.txt b/rcuja/design.txt index 775ec97..c126f77 100644 --- a/rcuja/design.txt +++ b/rcuja/design.txt @@ -89,31 +89,105 @@ Array of 8-bit values followed by array of associated pointers. 64-bit: 1 byte + 113 bytes + 6 bytes pad + 113*8 = 1024 bytes -- Type B: bitmap, followed by pointers array. +- Type B: pools of values and pointers arrays -bitmask (256 entries -> 256 bits -> 32 bytes) of populated children -followed by an array of children pointers, in same order as appears in -the bitmask +Pools of values and pointers arrays. Each pool values array is 32-bytes +in size (so it fits in a L1 cacheline). Each pool begins with an 8-bit +integer, which is the number of children in this pool, followed by an +array of 8-bit values, padding, and an array of pointers. Values and +pointer arrays are associated as in Type A. -+ Allows lookup failure to use 32-byte cache-line only. (1 cacheline) -+ Allows lookup success to use 32-byte cache-line (bitmap), - followed by direct lookup into pointer array. (2 cachelines) +The entries of a node are associated to their respective pool based +on their index position. -Filled at 8 entries 32-bit, 12 entries 64-bit -32-bit: 32 + (4*8) -> 64 bytes -64-bit: 32 + (8*12) -> 128 bytes ++ Allows lookup failure to use 1 32-byte cache-line only. (1 cacheline) + lookup success: 2 cache lines. -Filled at 24 entries 32-bit, 28 entries 64-bit -32-bit: 32 + (4*24) -> 128 bytes -64-bit: 32 + (8*28) -> 256 bytes ++ Allows in-place updates without reallocation, except when a pool is + full. (this was not possible with bitmap-based nodes) +- If one pool exhausts its space, we need to increase the node size. + Therefore, for very dense populations, we will end up using the + pigeon-hole node type sooner, thus consuming more space. -Filled at 56 entries 32-bit, 60 entries 64-bit -32-bit: 32 + (4*56) -> 256 bytes -64-bit: 32 + (8*60) -> 512 bytes +Pool configuration: + +Per pool, filled at 25 entries (32-bit), 28 entries (64-bit) +32-bit: 1 byte + 25 bytes + 2 bytes pad + 25*4bytes = 128 bytes +64-bit: 1 byte + 28 bytes + 3 bytes pad + 28*8 = 256 bytes + +Total up to 50 entries (32-bit), 56 entries (64-bit) +2 pools: 32-bit = 256 bytes +2 pools: 64-bit = 512 bytes + +Total up to 100 entries (32-bit), 112 entries (64-bit) +4 pools: 32-bit = 512 bytes +4 pools: 32-bit = 1024 bytes + + +* Choice of pool configuration distribution: + +We have pools of either 2 or 4 linear arrays. Their total size is +between 256 bytes (32-bit 2 arrays) and 1024 bytes (64-bit 4 arrays). + +Alignment on 256 bytes means that we can spare the 8 least significant +bits of the pointers. Given that the type selection already uses 3 bits, +we have 7 bits left. + +Alignment on 512 bytes -> 8 bits left. + +We can therefore encode which bit, or which two bits, are used as +distribution selection. We can use this technique to reequilibrate pools +if they become unbalanced (e.g. all children are within one of the two +linear arrays). + +Assuming that finding the exact sub-pool usage maximum for any given +distribution is NP complete (not proven). + +Taking into account sub-class size unbalance (tested programmatically by +randomly taking N entries from 256, calculating the distribution for +each bit (number of nodes for which bit is one/zero), and calculating +the difference in number of nodes for each bit, choosing the minimum +difference -- for millions of runs). + +We start from the "ideal" population size (with no unbalance), and do a +fixed point to find the appropriate population size. + +tot entries subclass extra items largest linear array (stat. approx.) +--------------------------------------------------------------------- +48 entries: 1 (98%) 24+1=25 (target ~50/2=25) +54 entries: 1 (97%) 27+1=28 (target ~56/2=28) + +Note: there exists rare worse cases where the unbalance is larger, but +it happens _very_ rarely. But need to provide a fallback if the subclass +does not fit, but it does not need to be efficient. -Filled at 120 entries 32-bit, 124 entries 64-bit -32-bit: 32 + (4*95) -> 512 bytes -64-bit: 32 + (8*124) -> 1024 bytes + +For pool of size 4, we need to approximate what is the maximum unbalance +we can get for choice of distributions grouped by pairs of bits. + +tot entries subclass extra items largest linear array (stat. approx.) +--------------------------------------------------------------------- +92 entries: 2 (99%) 23+2=25 (target: ~100/4=25) +104 entries: 2 (99%) 26+2=28 (target: ~112/4=28) + + +Note: there exists rare worse cases where the unbalance is larger, but +it happens _very_ rarely. But need to provide a fallback if the subclass +does not fit, but it does not need to be efficient. + + +* Population "does not fit" and distribution fallback + +When adding a child to a distribution node, if the child does not fit, +we recalculate the best distribution. If it does not fit in that +distribution neither, we need to expand the node type. + +When removing a child, if the node child count is brought to the number +of entries expected to statistically fit in the lower order node, we try +to shrink. However, if we notice that the distribution does not actually +fit in that shrinked node, we abort the shrink operation. If shrink +fails, we keep a counter of insertion/removal operations on the node +before we allow the shrink to be attempted again. - Type C: pigeon-hole array @@ -152,17 +226,18 @@ up to 6 children up to 12 children -- Type B: bitmap, followed by pointers array. -- 2 cache line hit for lookup success - 128 bytes storage -up to 24 children +up to 25 children + +- Type B: pool - 256 bytes storage -up to 56 children + +up to 50 children - 512 bytes storage -up to 120 children +up to 100 children - Type C: pigeon-hole array - 1 cache line hit for lookup success @@ -192,17 +267,17 @@ up to 7 children up to 14 children -- Type B: bitmap, followed by pointers array. -- 2 cache line hit for lookup success - 256 bytes storage up to 28 children +- Type B: pool + - 512 bytes storage -up to 60 children +up to 56 children - 1024 bytes storage -up to 124 children +up to 112 children - Type C: pigeon-hole array - 1 cache line hit for lookup success @@ -223,16 +298,16 @@ least-significant bits. Types of children: enum child_type { - LINEAR = 0, /* Type A */ - /* 32-bit: 1 to 12 children, 32 to 64 bytes */ - /* 64-bit: 1 to 14 children, 32 to 128 bytes */ - BITMAP = 1, /* Type B */ - /* 32-bit: 13 to 120 children, 128 to 512 bytes */ - /* 64-bit: 15 to 124 children, 256 to 1024 bytes */ - PIGEON = 2, /* Type C */ - /* 32-bit: 121 to 256 children, 1024 bytes */ - /* 64-bit: 125 to 256 children, 2048 bytes */ - LEAF = 3, /* Leaf node */ + RCU_JA_LINEAR = 0, /* Type A */ + /* 32-bit: 1 to 25 children, 8 to 128 bytes */ + /* 64-bit: 1 to 28 children, 16 to 256 bytes */ + RCU_JA_POOL = 1, /* Type B */ + /* 32-bit: 26 to 100 children, 256 to 512 bytes */ + /* 64-bit: 29 to 112 children, 512 to 1024 bytes */ + RCU_JA_PIGEON = 2, /* Type C */ + /* 32-bit: 101 to 256 children, 1024 bytes */ + /* 64-bit: 113 to 256 children, 2048 bytes */ + /* Leaf nodes are implicit from their height in the tree */ }; If entire pointer is NULL, children is empty.