add documentation and fix typos
[lttv.git] / ltt / branches / poly / doc / developer / format.html
1 <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2 <html>
3 <head>
4 <title>The LTTng trace format</title>
5 </head>
6 <body>
7
8 <h1>The LTTng trace format</h1>
9
10 <P>
11 <EM>Last update: 2008/05/23</EM>
12
13 <P>
14 This document describes the LTTng trace format. It should be useful mainly to
15 developers who code the LTTng tracer or the traceread LTTV library, as this
16 library offers all the necessary abstractions on top of the raw trace data.
17
18 <P>
19 A trace is contained in a directory tree. To send a trace remotely, the
20 directory tree may be tar-gzipped. The trace <tt>foo</tt>, placed in the home
21 directory of user john, /home/john, would have the following contents:
22
23 <PRE><TT>
24 $ cd /home/john
25 $ tree foo
26 foo/
27 |-- control
28 | |-- facilities_0
29 | |-- facilities_1
30 | |-- facilities_...
31 | |-- interrupts_0
32 | |-- interrupts_1
33 | |-- interrupts_...
34 | |-- modules_0
35 | |-- modules_1
36 | |-- modules_...
37 | `-- processes_0
38 | `-- processes_1
39 | `-- processes_...
40 |-- cpu_0
41 |-- cpu_1
42 `-- cpu_...
43
44 </TT></PRE>
45
46 <P>
47 The root directory contains a tracefile for each cpu, numbered from 0,
48 in .trace format. A uniprocessor thus only contains the file cpu_0.
49 A multi-processor with some unused (possibly hotplug) CPU slots may have some
50 unused CPU numbers. For instance an 8 way SMP board with 6 CPUs randomly
51 installed may produce tracefiles named 0, 1, 2, 4, 6, 7.
52
53 <P>
54 The files in the control directory also follow the .trace format and are
55 also per cpu. The "facilities" files only contain "core" marker_id,
56 marker_format and time_heartbeat events. The first two are used to describe the
57 events that are in the trace. The other control files contain the initial
58 system state and various subsequent important events, for example process
59 creations and exit. The interest of placing such subsequent events in control
60 trace files instead of (or in addition to) in the per cpu trace files is that
61 they may be accessed more quickly/conveniently and that they may be kept even
62 when the per cpu files are overwritten in "flight recorder mode".
63
64 <H2>Trace format</H2>
65
66 <P>
67 Each tracefile is divided into equal size blocks with a header at the beginning
68 of the block. Events are packed sequentially in the block starting right after
69 the block header.
70 <P>
71 Each block consists of :
72 <PRE><TT>
73 block start/end header
74 trace header
75 event 1 header
76 event 1 variable length data
77 event 2 header
78 event 2 variable length data
79 ....
80 padding
81 </TT></PRE>
82
83 <H3>The block start/end header</H3>
84
85 <PRE><TT>
86 begin
87 * the beginning of buffer information
88 uint64 cycle_count
89 * TSC at the beginning of the buffer
90 uint64 freq
91 * frequency of the CPUs at the beginning of the buffer.
92 end
93 * the end of buffer information
94 uint64 cycle_count
95 * TSC at the end of the buffer
96 uint64 freq
97 * frequency of the CPUs at the end of the buffer.
98 uint32 lost_size
99 * number of bytes of padding at the end of the buffer.
100 uint32 buf_size
101 * size of the sub-buffer.
102 </TT></PRE>
103
104
105
106 <H3>The trace header</H3>
107
108 <PRE><TT>
109 uint32 magic_number
110 * 0x00D6B7ED, used to check the trace byte order vs host byte order.
111 uint32 arch_type
112 * Architecture type of the traced machine.
113 uint32 arch_variant
114 * Architecture variant of the traced machine. May be unused on some arch.
115 uint32 float_word_order
116 * Byte order of floats and doubles, sometimes different from integer byte
117 order. Useful only for user space traces.
118 uint8 arch_size
119 * Size (in bytes) of the void * on the traced machine.
120 uint8 major_version
121 * major version of the trace.
122 uint8 minor_version
123 * minor version of the trace.
124 uint8 flight_recorder
125 * Is flight recorder mode activated ? If yes, data might be missing
126 (overwritten) in the trace.
127 uint8 has_heartbeat
128 * Does this trace have heartbeat timer event activated ?
129 Yes (1) -> Event header has 32 bits TSC
130 No (0) -> Event header has 64 bits TSC
131 uint8 alignment
132 * Are event headers in this trace aligned ?
133 Yes -> the value indicates the alignment
134 No (0) -> data is packed.
135 uint8 tsc_lsb_truncate
136 uint8 tscbits
137 uint8 compact_data_shift
138 uint32 freq_scale
139 event time is always calculated from :
140 trace_start_time + ((event_tsc - trace_start_tsc) * (freq / freq_scale))
141 uint64 start_freq
142 * CPUs clock frequency at the beginnig of the trace.
143 uint64 start_tsc
144 * TSC at the beginning of the trace.
145 uint64 start_monotonic
146 * monotonically increasing time at the beginning of the trace.
147 (currently not supported)
148 start_time
149 * Real time at the beginning of the trace (as given by date, adjusted by NTP)
150 This is the only time reference with the real world : the rest of the trace
151 has monotonically increasing time from this point (with TSC difference and
152 clock frequency).
153 uint32 seconds
154 uint32 nanoseconds
155 </TT></PRE>
156
157
158 <H3>Event header</H3>
159
160 <P>
161 Event headers differ according to the following conditions : does the
162 traced system have a heartbeat timer? Is tracing alignment activated?
163
164 <P>
165 Event header :
166 <PRE><TT>
167 { uint32 timestamp
168 or
169 uint64 timestamp }
170 * if has_heartbeat : 32 LSB of the cycle counter at the event record time.
171 * else : 64 bits complete cycle counter.
172 uint8 facility_id
173 * Numerical ID of the facility corresponding to the event. See the facility
174 tracefile to know which facility ID matches which facility name and
175 description.
176 uint8 event_id
177 * Numerical ID of the event inside the facility.
178 uint16 event_size
179 * Size of the variable length data that follows this header.
180 </TT></PRE>
181
182 <P>
183 Event header alignment
184
185 <P>
186 If trace alignment is activated (<tt>alignment</tt>), the event header is
187 aligned. In addition, padding is automatically added after the event header so
188 the variable length data is automatically aligned on the architecture size.
189
190 <P>
191 <!--
192 <H2>System description</H2>
193
194 <P>
195 The system type description, in system.xml, looks like:
196
197 <PRE><TT>
198 &lt;system
199 node_name="vaucluse"
200 domainname="polymtl.ca"
201 cpu=4
202 arch_size="ILP32"
203 endian="little"
204 kernel_name="Linux"
205 kernel_release="2.4.18-686-smp"
206 kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002"
207 machine="i686"
208 processor="unknown"
209 hardware_platform="unknown"
210 operating_system="Linux"
211 ltt_major_version="2"
212 ltt_minor_version="0"
213 ltt_block_size="100000"
214 &gt;
215 Some comments about the system
216 &lt;/system&gt;
217 </TT></PRE>
218
219 <P>
220 The system attributes kernel_name, node_name, kernel_release,
221 kernel_version, machine, processor, hardware_platform and operating_system
222 come from the uname(1) program. The domainname attribute is obtained from
223 the "hostname &#045;&#045;domain" command. The arch_size attribute is one of
224 LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
225 long (L) and pointers (P). The endian attribute is "little" or "big".
226 While the arch_size and endian attributes could be deduced from the platform
227 type, having these explicit allows analysing traces from yet unknown
228 platforms. The cpu attribute specifies the maximum number of processors in
229 the system; only tracefiles 0 to this maximum - 1 may exist in the cpu
230 directory.
231
232 <P>
233 Within the system element, the text enclosed may describe further the
234 system traced.
235
236 <H2>Bookmarks</H2>
237
238 <P>
239 Bookmarks are user supplied information added to a trace. They contain user
240 annotations attached to a time interval.
241
242 <PRE><TT>
243 &lt;bookmarks&gt;
244 &lt;location name=name cpu=n start_time=t end_time=t&gt;Some text&lt;/location&gt;
245 ...
246 &lt;/bookmarks&gt;
247 </TT></PRE>
248
249 <P>
250 The interval is defined using either "time=" or "start_time=" and
251 "end_time=", or "cycle=" or "start_cycle=" and "end_cycle=".
252 The time is in seconds with decimals up to nanoseconds and cycle counts
253 are unsigned integers with a 64 bits range. The cpu attribute is optional.
254 -->
255 </BODY>
256 </HTML>
This page took 0.036851 seconds and 5 git commands to generate.