int get int/unsigned, change if/else for switch
[lttv.git] / ltt / branches / poly / doc / developer / format.html
CommitLineData
584db146 1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2<html>
3<head>
a25fb9c4 4 <title>The LTTng trace format</title>
584db146 5</head>
6 <body>
7
a25fb9c4 8<h1>The LTTng trace format</h1>
9
10<P>
11This document describes the LTTng trace format. It should be used only by
12developers who code the LTTng tracer or the traceread LTTV library, as this
13library offers all the necessary abstractions on top of the raw trace data.
584db146 14
15<P>
16A trace is contained in a directory tree. To send a trace remotely,
17the directory tree may be tar-gzipped. Trace foo, placed in the home
18directory of user john, /home/john, would have the following content:
19
20<PRE><TT>
21$ cd /home/john
22$ tree foo
23foo/
24|-- eventdefs
25| |-- core.xml
a25fb9c4 26| |-- fs.xml
cb28e4a2 27| |-- ipc.xml
a25fb9c4 28| |-- kernel.xml
29| |-- memory.xml
30| |-- network.xml
31| |-- process.xml
32| |-- s390_kernel.xml
33| |-- socket.xml
34| |-- timer.xml
35| `-- ...
584db146 36|-- info
37| |-- bookmarks.xml
38| `-- system.xml
39|-- control
a25fb9c4 40| |-- facilities_0
41| |-- facilities_1
42| |-- facilities_...
43| |-- interrupts_0
44| |-- interrupts_1
45| |-- interrupts_...
46| |-- modules_0
47| |-- modules_1
48| |-- modules_...
49| `-- processes_0
50| `-- processes_1
51| `-- processes_...
52|-- cpu_0
53|-- cpu_1
54`-- cpu_...
55
584db146 56</TT></PRE>
57
58<P>
59The eventdefs directory contains the events descriptions for all the
60facilities used. The syntax is a simple subset of XML; XML is widely
61known and easily parsed or hand edited. Each file contains one or more
62<FACILITY NAME=name>...</FACILITY> elements. Indeed, several
63facilities may have the same name but different content (and thus will
9c312311 64generate a different checksum). It typically happens when, while tracing
65is enabled, a module using the named facility is unloaded, modified
66(along with the description of some events), recompiled and reloaded.
67Then, the trace will contain events from two different, similarly named,
68facility versions.
584db146 69
70<P>
a25fb9c4 71A small number of events are predefined, part of the "core" facility,
72and are not present there. These "core" events include "facility_load",
73"facility_unload", "time_heartbeat" and "state_dump_facility_load".
584db146 74
75<P>
a25fb9c4 76The root directory contains a tracefile for each cpu, numbered from 0,
77in .trace format. A uniprocessor thus only contains the file cpu_0.
584db146 78A multi-processor with some unused (possibly hotplug) CPU slots may have some
79unused CPU numbers. For instance a 8 way SMP board with 6 CPUs randomly
80installed may produce tracefiles named 0, 1, 2, 4, 6, 7.
81
82<P>
a25fb9c4 83The files in the control directory also follow the .trace format and are also
84per cpu.
85The "facilities" file only contains "core" facility_load, facility_unload,
86time_heartbeat and state_dump_facility_load events
584db146 87and is used to determine the facilities used and the code range assigned
88to each facility. The other control files contain the initial system
89state and various subsequent important events, for example process
90creations and exit. The interest of placing such subsequent events
91in control trace files instead of (or in addition to) in the per cpu
92trace files is that they may be accessed more quickly/conveniently
93and that they may be kept even when the per cpu files are overwritten
94in "flight recorder mode".
95
96<P>
97The info directory contains in system.xml a description of the system on which
98the trace was created as well as different user annotations in bookmark.xml.
99This directory may also contain various information about the trace, generated
100during trace analysis (statistics, index...).
101
102
103<H2>Trace format</H2>
104
105<P>
a25fb9c4 106Each tracefile is divided into equal size blocks with a header at the beginning
107of the block. Events are packed sequentially in the block starting right after
108the block header.
109<P>
110Each block consists of :
111<PRE><TT>
112block start/end header
113trace header
114event 1 header
115event 1 variable length data
116event 2 header
117event 2 variable length data
118....
119padding
120</TT></PRE>
121
122<P>
123The block start/end header
124
125<PRE><TT>
126begin
127 * the beginning of buffer information
128 timestamp
129 * Used only when no TSC is available.
130 uint32 seconds
131 uint32 microseconds
132 uint64 cycle_count
133 * TSC at the beginning of the buffer
134 uint64 freq
135 * frequency of the CPUs at the beginning of the buffer.
136end
137 * the end of buffer information
138 timestamp
139 * Used only when no TSC is available.
140 uint32 seconds
141 uint32 microseconds
142 uint64 cycle_count
143 * TSC at the beginning of the buffer
144 uint64 freq
145 * frequency of the CPUs at the beginning of the buffer.
146uint32 lost_size
147 * number of bytes of padding at the end of the buffer.
148uint32 buf_size
149 * size of the sub-buffer.
150</TT></PRE>
151
152
153
154<P>
155The trace header
156
157<PRE><TT>
158uint32 magic_number
159 * 0x00D6B7ED, used to check the trace byte order vs host byte order.
160uint32 arch_type
161 * Architecture type of the traced machine.
162uint32 arch_variant
163 * Architecture variant of the traced machine. May be unused on some arch.
164uint32 float_word_order
165 * Byte order of floats and doubles, sometimes different from integer byte
166 order. Useful only for user space traces.
167uint8 arch_size
168 * Size (in bytes) of the void * on the traced machine.
169uint8 major_version
170 * major version of the trace.
171uint8 minor_version
172 * minor version of the trace.
173uint8 flight_recorder
174 * Is flight recorder mode activated ? If yes, data might be missing
175 (overwritten) in the trace.
176uint8 has_heartbeat
177 * Does this trace have heartbeat timer event activated ?
178 Yes (1) -> Event header has 32 bits TSC
179 No (0) -> Event header has 64 bits TSC
180uint8 has_alignment
181 * Is the information in this trace aligned ?
182 Yes (1) -> aligned on min(arch size, atomic data size).
183 No (0) -> data is packed.
184uint8 has_tsc
185 * Does the traced machine has a working TSC ?
186 Yes (1) -> event time is calculated from :
187 trace_start_time + ((event_tsc - trace_start_tsc) * freq)
188 No (0) -> event time is calculated from :
189 trace_start_time
190 + (buffer start timestamp - trace start_monotonic)
191 + (event_time_delta)
192 (not supported)
193uint64 start_freq
194 * CPUs clock frequency at the beginnig of the trace.
195uint64 start_tsc
196 * TSC at the beginning of the trace.
197uint64 start_monotonic
198 * monotonically increasing time at the beginning of the trace.
199 (currently not supported)
200start_time
201 * Real time at the beginning of the trace (as given by date, adjusted by NTP)
202 This is the only time reference with the real world : the rest of the trace
203 has monotonically increasing time from this point (with TSC difference and
204 clock frequency).
205 uint32 seconds
206 uint32 nanoseconds
207</TT></PRE>
208
584db146 209
210<P>
a25fb9c4 211Event header
584db146 212
a25fb9c4 213<P>
214Event headers differs depending on those conditions : does the traced system has
215a heartbeat timer ? Is tracing alignment activated ?
216
217<P>
218Event header :
219<PRE><TT>
220{ uint32 timestamp
221 or
222 uint64 timestamp }
223 * if has_heartbeat : 32 LSB of the cycle counter at the event record time.
224 * else : 64 bits complete cycle counter.
225 * note : if there is no working TSC (has_tsc == 0), then this field contains
226 either the complete monotonically increasing time or the time delta from the
227 previous heartbeat event. (unsupported)
228uint8 facility_id
229 * Numerical ID of the facility corresponding to the event. See the facility
230 tracefile to know which facility ID matches which facility name and
231 description.
232uint8 event_id
233 * Numerical ID of the event inside the facility.
234uint16 event_size
235 * Size of the variable length data that follows this header.
236</TT></PRE>
237
238<P>
239Event header alignment
240
241<P>
242If trace alignment is activated (has_alignment), the event header is aligned
243on the architecture size (void pointer size). In addition, a padding is
244automatically added after the event header so the variable length data is
245automatically aligned on the architecture size.
246
247<P>
584db146 248
249<H2>System description</H2>
250
251<P>
252The system type description, in system.xml, looks like:
253
254<PRE><TT>
255&lt;system
256 node_name="vaucluse"
257 domainname="polymtl.ca"
258 cpu=4
259 arch_size="ILP32"
260 endian="little"
261 kernel_name="Linux"
262 kernel_release="2.4.18-686-smp"
263 kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002"
264 machine="i686"
265 processor="unknown"
266 hardware_platform="unknown"
267 operating_system="Linux"
268 ltt_major_version="2"
269 ltt_minor_version="0"
270 ltt_block_size="100000"
271&gt;
272Some comments about the system
273&lt;/system&gt;
274</TT></PRE>
275
276<P>
277The system attributes kernel_name, node_name, kernel_release,
278 kernel_version, machine, processor, hardware_platform and operating_system
279come from the uname(1) program. The domainname attribute is obtained from
280the "hostname --domain" command. The arch_size attribute is one of
281LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
282long (L) and pointers (P). The endian attribute is "little" or "big".
283While the arch_size and endian attributes could be deduced from the platform
284type, having these explicit allows analysing traces from yet unknown
285platforms. The cpu attribute specifies the maximum number of processors in
286the system; only tracefiles 0 to this maximum - 1 may exist in the cpu
287directory.
288
289<P>
290Within the system element, the text enclosed may describe further the
291system traced.
292
293
294<H2>Event type descriptions</H2>
295
296<P>
297A facility contains the descriptions of several event types. When a structure
298is reused in several event types, a named type is defined and may be referenced
299by several other event types or named types.
300
301<PRE><TT>
302&lt;facility name=facility_name&gt;
303 &lt;description&gt;Some text&lt;/description&gt;
304 &lt;event name=eventtype_name&gt;
305 &lt;description&gt;Some text&lt;/description&gt;
306 --type structure--
307 &lt;/event&gt;
308 ...
309 &lt;type name=type_name&gt;
310 --type structure--
311 &lt;/type&gt;
312&lt;/facility&gt;
313</TT></PRE>
314
315<P>
316The type structure may be one of the following primitive type elements.
317Whenever the keyword isize is used, the allowed values are
318short, medium, long, 1, 2, 4, 8, indicating the size in bytes.
319The fsize keyword represents one of medium, long, 4 and 8 bytes.
320
321<PRE><TT>
322&lt;int size=isize format="printf format"/&gt;
323
324&lt;uint size=isize format="printf format"/&gt;
325
326&lt;float size=fsize format="printf format"/&gt;
327
328&lt;string format="printf format"/&gt;
329
330&lt;enum size=isize format="printf format"&gt;label1 label2 ...&lt;/enum&gt;
331</TT></PRE>
332
333<P>
334The string is null terminated. For the enumeration, the size of the integer
335used for its representation is specified.
336
337<P>
338The type structure may also be a compound type.
339
340<PRE><TT>
341&lt;array size=n&gt; --type structure-- &lt;/array&gt;
342
343&lt;sequence lengthsize=isize&gt; --type structure-- &lt;/sequence&gt;
344
345&lt;struct&gt;
346 &lt;field name=field_name&gt;
347 &lt;description&gt;Some text&lt;/description&gt;
348 --type structure--
349 &lt;/field&gt;
350 ...
351&lt;/struct&gt;
352
353&lt;union typecodesize=isize&gt;
354 &lt;field name=field_name&gt;
355 &lt;description&gt;Some text&lt;/description&gt;
356 --type structure--
357 &lt;/field&gt;
358 ...
359&lt;/union&gt;
360</TT></PRE>
361
362<P>
363Array is a fixed size array of length size. Sequence is a variable size
364array with its length stored as a prepended uint of length lengthsize.
365A structure is simply an aggregation of fields. An union is one of its n
366fields (variant record), as indicated by a preceeding code (0 to n - 1)
367of the specified size typecodesize.
368
369<P>
370Finally the type structure may be defined by referencing a named type.
371
372<PRE><TT>
373&lt;typeref name=type_name/&gt;
374</PRE></TT>
375
376<H2>Builtin events</H2>
377
378<P>
379The facility named "builtin" is always present and contains at least the
380following event types.
381
382<PRE><TT>
383&lt;event name=facility_load&gt;
384 &lt;description&gt;Facility used in the trace&lt;/description&gt;
385 &lt;struct&gt;
386 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
387 &lt;field name="checksum"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
388 &lt;field name="base_code"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
389 &lt;/struct&gt;
390&lt;/event&gt;
391
392&lt;event name=block_start&gt;
393 &lt;description&gt;Block start timestamp&lt;/description&gt;
394 &lt;typeref name=block_timestamp/&gt;
395&lt;/event&gt;
396
397&lt;event name=block_end&gt;
398 &lt;description&gt;Block end timestamp&lt;/description&gt;
399 &lt;typeref name=block_timestamp/&gt;
400&lt;/event&gt;
401
402&lt;event name=time_heartbeat&gt;
403 &lt;description&gt;System time values sent periodically to minimize cycle counter
404 drift with respect to real time clock and to detect cycle counter
405 rollovers
406 &lt;/description&gt;
407 &lt;typeref name=timestamp/&gt;
408&lt;/event&gt;
409
410&lt;type name=block_timestamp&gt;
411 &lt;struct&gt;
412 &lt;field name=timestamp&gt;&lt;typeref name=timestamp&gt;&lt;/field&gt;
413 &lt;field name=block_id&gt;&lt;uint size=4/&gt;&lt;/field&gt;
414 &lt;/struct&gt;
415&lt;/type&gt;
416
417&lt;type name=timestamp&gt;
418 &lt;struct&gt;
419 &lt;field name=time&gt;&lt;typeref name=timespec/&gt;&lt;/event&gt;
420 &lt;field name="cycle_count"&gt;&lt;uint size=8/&gt;&lt;/field&gt;
421 &lt;/struct&gt;
422&lt;/event&gt;
423
424&lt;type name=timespec&gt;
425 &lt;struct&gt;
426 &lt;field name="seconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
427 &lt;field name="nanoseconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
428 &lt;/struct&gt;
429&lt;/type&gt;
430</TT></PRE>
431
432<H2>Control files</H2>
433
434<P>
435The interrupts file reflects the content of the /proc/interrupts system file.
436It contains one event describing each interrupt. At trace start, events are
437generated describing all the current interrupts. If the assignment of
438interrupts changes later, due to devices or device drivers being activated or
439deactivated, additional events may be added to the file. Each interrupt
440event has the following structure.
441
442<PRE><TT>
443&lt;event name=interrupt&gt;
444 &lt;description&gt;Interrupt request number assignment&lt;description&gt;
445 &lt;struct&gt;
446 &lt;field name="number"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
447 &lt;field name="count"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
448 &lt;field name="controller"&gt;&lt;string/&gt;&lt;/field&gt;
449 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
450 &lt;/struct&gt;
451&lt;/event&gt;
452</TT></PRE>
453
454<P>
455The processes file contains the list of processes already created when the
456trace starts. Each process describing event is modeled after the
457/proc/self/status system file. The number of fields in this event is
458expected to be expanded in the future to include groups, signal masks,
459opened file descriptors and address maps.
460
461<PRE><TT>
462&lt;event name=process&gt;
463 &lt;description&gt;Existing process&lt;description&gt;
464 &lt;struct&gt;
465 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
466 &lt;field name="pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
467 &lt;field name="ppid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
468 &lt;field name="tracer_pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
469 &lt;field name="uid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
470 &lt;field name="euid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
471 &lt;field name="suid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
472 &lt;field name="fsuid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
473 &lt;field name="gid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
474 &lt;field name="egid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
475 &lt;field name="sgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
476 &lt;field name="fsgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
477 &lt;field name="state"&gt;&lt;enum size=4&gt;
478 Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
479 &lt;/enum&gt;&lt;/field&gt;
480 &lt;/struct&gt;
481&lt;/event&gt;
482</TT></PRE>
483
484<H2>Facilities</H2>
485
486<P>
487Facilities define a granularity of events grouping for filtering, activation
488and compilation. Each facility does cost a table entry in the kernel (name,
489checksum, event type code range), or somewhere between 20 and 30 bytes. Having
490one facility per tracing statement in the kernel would be too much (assuming
491that they eventually are routinely inserted in the kernel code and replace
492the 80000+ printk statements in some proportion). However, having a few
493facilities, up to a few tens, would make sense.
494
495<P>
496The "builtin" facility contains a small number of predefined events which must
497always exist. The "core" facility contains a small subset of OS events which
498are almost always of interest (scheduling, interrupts, faults, system calls).
499Then, specialized facilities may exist for each subsystem (network, disks,
500USB, SCSI...).
501
502
503<H2>Bookmarks</H2>
504
505<P>
506Bookmarks are user supplied information added to a trace. They contain user
507annotations attached to a time interval.
508
509<PRE><TT>
510&lt;bookmarks&gt;
511 &lt;location name=name cpu=n start_time=t end_time=t&gt;Some text&lt;/location&gt;
512 ...
513&lt;/bookmarks&gt;
514</TT></PRE>
515
516<P>
517The interval is defined using either "time=" or "start_time=" and
518"end_time=", or "cycle=" or "start_cycle=" and "end_cycle=".
519The time is in seconds with decimals up to nanoseconds and cycle counts
520are unsigned integers with a 64 bits range. The cpu attribute is optional.
521
522</BODY>
523</HTML>
524
525
526
527
This page took 0.060258 seconds and 4 git commands to generate.