From 9675a7c5ba4387dfb61d9a8ea6ebd127888112e8 Mon Sep 17 00:00:00 2001 From: compudj Date: Mon, 12 Mar 2007 21:19:01 +0000 Subject: [PATCH] update roadmap git-svn-id: http://ltt.polymtl.ca/svn@2410 04897980-b3bd-0310-b5e0-8ef037075253 --- .../doc/developer/lttng-lttv-roadmap.html | 16 ++- .../poly/doc/developer/tsc-smallv1.txt | 134 ++++++++++++++++++ 2 files changed, 146 insertions(+), 4 deletions(-) create mode 100644 ltt/branches/poly/doc/developer/tsc-smallv1.txt diff --git a/ltt/branches/poly/doc/developer/lttng-lttv-roadmap.html b/ltt/branches/poly/doc/developer/lttng-lttv-roadmap.html index 4cd49d1b..c0701fb7 100644 --- a/ltt/branches/poly/doc/developer/lttng-lttv-roadmap.html +++ b/ltt/branches/poly/doc/developer/lttng-lttv-roadmap.html @@ -20,6 +20,7 @@ The % symbol marks who is interested in the realisation of the item.
# Eric Clement
(3) Make LTTV aware of type formats (visual separators) defined in the XML file.
+ # Gabriel Matnibr (3) Use a per architecture enumeration for traps.
(3) Change the byte pair "facility, event" id for a short combining the informatinon.
@@ -59,15 +60,20 @@ of process 0.
LTT Next Generation Roadmap

* TODO
+(1) efficient dynamic event filtering while recording trace.
+ % Google + % Sensis Corp. Tim Bish
+ # Mathieu Desnoyers
+(1) Support for compact event trace channel.
+ % Google + # Mathieu Desnoyers
(1) CPU Hotplug support. (Only ltt-heartbeat needs to be fixed).
-(1) Add Xen support.
+ # Mathieu Desnoyers
+(1) Add Xen support. (Trace buffer desallocation needs to be fixed)
# Mathieu Desnoyers
(1) Integrate SystemTAP logging with LTTng.
-(2) Test and post to LKML Linux Kernel Markers.
(3) Change the byte pair "facility, event" id for a short combining the informatinon.
-(4) efficient dynamic event filtering while recording trace.
- % Sensis Corp. Tim Bish
(4) instrument kernel bottom half irqsave, spinlocks, rwlocks, seqlocks, semaphores, mutexes, brlock.
(4) integrate NPTL instrumentation (see PTT).
@@ -92,6 +98,8 @@ Xen
S/390
RTLinux
% Wind River for 2.6.14
+sparc64
+# Wind River
sh4


diff --git a/ltt/branches/poly/doc/developer/tsc-smallv1.txt b/ltt/branches/poly/doc/developer/tsc-smallv1.txt new file mode 100644 index 00000000..58f19520 --- /dev/null +++ b/ltt/branches/poly/doc/developer/tsc-smallv1.txt @@ -0,0 +1,134 @@ +Adding support for "compact" 32 bits events. + +Mathieu Desnoyers +March 9, 2007 + + +event header + + +32 bits header +Aligned on 32 bits + 1 bit to select event type + 4 bits event ID + 16 events (too few) + 27 bits TSC (cut MSB) + wraps 32 times per second at 4GHz + each wraps spaced from 0.03125s + 100HZ clock : tick each 0.01s + detect wrap at least each 3 jiffies (dangerous, may miss) + granularity : 2^0 = 1 cycle : 0.25ns @4GHz +payload size known by facility + +32 bits header +Aligned on 32 bits + 1 bit to select event type + 4 bits event ID + 16 events (too few) + 27 bits TSC (cut LSB) + wraps each second at 4GHz + 100HZ clock : tick each 0.01s + granularity : 2^5 = 32 cycles : 8ns @4GHz +payload size known by facility + +32 bits header +Aligned on 32 bits + 1 bit to select event type + 5 bits event ID + 32 events + 26 bits TSC (cut LSB) + wraps each 0.5 second at 4GHz + 100HZ clock : tick each 0.01s + granularity : 2^6 = 64 cycles : 16ns @4GHz +payload size known by facility + +32 bits header +Aligned on 32 bits + 1 bit to select event type + 6 bits event ID + 64 events + 25 bits TSC (cut LSB) + wraps each 0.5 second at 4GHz + 100HZ clock : tick each 0.01s + granularity : 2^7 = 128 cycles : 32ns @4GHz +payload size known by facility + +64 bits header +Aligned on 32 bits + 1 bit to select event type + 15 bits event id, (major 8 minor 8) + 32768 events + 16 bits event size (extra) + 32 bits TSC + wraps each second at 4GHz + 100HZ clock : tick each 0.01s + +96 or 128 bits header (full 64 bits TSC, useful when no heartbeat available + size depends on internal alignment) +Aligned on 32 bits + 1 bit to select event type + 15 bits event id, (major 8 minor 8) + 32768 events + 16 bits event size (extra) +Align on 64 bits + 64 bits TSC + wraps each 146.14 years at 4GHz + + + + + +Must put the event ID fields first in the large (64, 96-128 bits) event headers +Create a "compact" facility which reserves the facility IDs with the MSB at 1. + - or better : select mapping for events +What is the minimum granularity required ? (so we know how much LSB to cut) + - How much can synchonized CPU TSCs drift apart one from another ? + PLL + http://en.wikipedia.org/wiki/Phase-locked_loop + static phase offset -> tracking jitter + 25 MHz oscillator on motherboard for CPU + jitter : expressed in ±picoseconds (should therefore be lower than 0.25ns) + http://www.eetasia.com/ART_8800082274_480600_683c4e6b200103.HTM + NEED MORE INFO. + - What is the cacheline synchronization latency between the CPUs ? + Worse case : Intel Core 2, Intel Xeon 5100, Intel core solo, intel core duo + Unified L2 cache. http://www.intel.com/design/processor/manuals/253668.pdf + Intel Core 2, Intel Xeon 5100 + http://www.intel.com/design/processor/manuals/253665.pdf + Up to 10.7 GB/s FSB + http://www.xbitlabs.com/articles/mobile/display/core2duo_2.html + Intel Core Duo Intel Core 2 Duo + L2 cache latency 14 cycles 14 cycles + (round-trip : 28 cycles) 7ns @4GHz + sparc64 : between threads : shares L1 cache. + suspected to be ~2 cycles total (1+1) (to check) + - How close (cycle-wise) can be two consecutive recorded events in the same + buffer ? (~200ns, time for logging an event) (~800 cycles @4GHz) + - Tracing code itself : if it's at a subbuffer boundary, more check to do. + Must see the maximum duration of a non interrupted probe. + Worse case (had NMIs enabled) : 6997 cycles. 1749 ns @4GHz. + TODO : test with NMIs disabled and HT disabled. + Ordering can be changed if an interrupt comes between the memory operation + and the tracer call. Therefore, we cannot rely on more precision than the + expected interrupt handler duration. (guess : ~10000cycles, 2500ns@4GHz) + - If there is a faster interconnect between the CPUs, it can be a problem, but + seems to only be proprietary interconnects, not used in general. + - IPI are expected to take much more than 28 cycles. +What is the minimum wrap-around interval ? (must be safe for timer interrupt +miss and multiple timer HZ (configurable) and CPU MHZ frequencies) +Must align _all_ headers on 32 bits, not 64. + +Granularity : 800ns (200 cycles@4GHz) : 2^9 = 512 (remove 9 LSB) + Number of LSB skipped : first_bit(probe_duration_in_cycles)-1 + +Min wrap : 100HZ system, each 3 timer ticks : 0.03s (32-4 MSB for 4 GHZ : 0.26s) + (heartbeat each 100HZ, to be safe) + Number of MSB to skip : + 32 - first_bit(( (expected_longest_cli()[ms] + max_timer_interval[ms]) * 2 / + cpu_khz )) + + +9LSB + 4MSB = 13 bits total. 12 bits for event IDs : 4096 events. + + + -- 2.34.1