From: compudj Date: Mon, 12 Mar 2007 22:01:36 +0000 (+0000) Subject: update doc X-Git-Tag: v0.12.20~1075 X-Git-Url: https://git.liburcu.org/?a=commitdiff_plain;h=d26b24d9d0d2455e0a9f4f66035a79effc280638;p=lttv.git update doc git-svn-id: http://ltt.polymtl.ca/svn@2414 04897980-b3bd-0310-b5e0-8ef037075253 --- diff --git a/ltt/branches/poly/doc/developer/tsc-smallv2.txt b/ltt/branches/poly/doc/developer/tsc-smallv2.txt new file mode 100644 index 00000000..e21cb9ee --- /dev/null +++ b/ltt/branches/poly/doc/developer/tsc-smallv2.txt @@ -0,0 +1,142 @@ +Adding support for "compact" 32 bits events. + +Mathieu Desnoyers +March 12, 2007 + +Use a separate channel for compact events + +Mux those events into this channel and magically they are "compact". Isn't it +beautiful. + +event header + +### COMPACT EVENTS + +32 bits header +Aligned on 32 bits + 5 bits event ID + 32 events + 27 bits TSC (cut MSB) + wraps 32 times per second at 4GHz + each wraps spaced from 0.03125s + 100HZ clock : tick each 0.01s + detect wrap at least each 3 jiffies (dangerous, may miss) + granularity : 2^0 = 1 cycle : 0.25ns @4GHz +payload size known by facility + +32 bits header +Aligned on 32 bits + 5 bits event ID + 32 events + 27 bits TSC (cut LSB) + wraps each second at 4GHz + 100HZ clock : tick each 0.01s + granularity : 2^5 = 32 cycles : 8ns @4GHz +payload size known by facility + +32 bits header +Aligned on 32 bits + 6 bits event ID + 64 events + 26 bits TSC (cut LSB) + wraps each 0.5 second at 4GHz + 100HZ clock : tick each 0.01s + granularity : 2^6 = 64 cycles : 16ns @4GHz +payload size known by facility + +32 bits header +Aligned on 32 bits + 7 bits event ID + 128 events + 25 bits TSC (cut LSB) + wraps each 0.5 second at 4GHz + 100HZ clock : tick each 0.01s + granularity : 2^7 = 128 cycles : 32ns @4GHz +payload size known by facility + + + +### NORMAL EVENTS + +64 bits header +Aligned on 64 bits + 32 bits TSC + wraps each second at 4GHz + 100HZ clock : tick each 0.01s + 16 bits event id, (major 8 minor 8) + 65536 events + 16 bits event size (extra) + +96 bits header (full 64 bits TSC, useful when no heartbeat available) +Aligned on 64 bits + 64 bits TSC + wraps each 146.14 years at 4GHz + 16 bits event id, (major 8 minor 8) + 65536 events + 16 bits event size (extra) + + +## Discussion of compact events + +Must put the event ID fields first in the large (64, 96-128 bits) event headers +What is the minimum granularity required ? (so we know how much LSB to cut) + - How much can synchonized CPU TSCs drift apart one from another ? + PLL + http://en.wikipedia.org/wiki/Phase-locked_loop + static phase offset -> tracking jitter + 25 MHz oscillator on motherboard for CPU + jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns) + http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM + NEED MORE INFO. + - What is the cacheline synchronization latency between the CPUs ? + Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo + Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf + Intel Core 2, Intel Xeon 5100 + http://www.intel.com/design/processor/manuals/253665.pdf + Up to 10.7 GB/s FSB + http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html + Intel Core Duo Intel Core 2 Duo + L2 cache latency 14 cycles 14 cycles + (round-trip : 28 cycles) 7ns @4GHz + sparc64 : between threads : shares L1 cache. + suspected to be ~2 cycles total (1+1) (to check) + - How close (cycle-wise) can be two consecutive recorded events in the same + buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz) + - Tracing code itself : if it's at a subbuffer boundary, more check to do. + Must see the maximum duration of a non interrupted probe. + Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz. + TODO : test with NMIs disabled and HT disabled. + Ordering can be changed if an interrupt comes between the memory operation + and the tracer call. Therefore, we cannot rely on more precision than the + expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz) + - If there is a faster interconnect between the CPUs, it can be a problem, but + seems to only be proprietary interconnects, not used in general. + - IPI are expected to take much more than 28 cycles. +What is the minimum wrap-around interval ? (must be safe for timer interrupt +miss and multiple timer HZ (configurable) and CPU MHZ frequencies) + +Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB) + Probe never takes 1 cycle. + Number of LSB skipped : max(0, (long)find_first_bit(probe_duration_in_cycles)-1) + +Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s) + (heartbeat each 100HZ, to be safe) + Number of MSB to skip : + 32 - find_first_bit(( (expected_longest_interrupt_latency()[ms] + max_timer_interval[ms]) / cpu_khz )) - 1 + (the last -1 is to make sure we remove less or exact amount of bits, round + near to 0, not round up). + +Heartbeat timer : + Each timer interrupt + Event : 32 bytes in size + each timer tick : 100HZ + 3.2kB/s + +9LSB + 4MSB = 13 bits total. 13 bits for event IDs : 8192 events. + + + + + + +