git-svn-id: http://ltt.polymtl.ca/svn@100 04897980-b3bd-0310-b5e0-8ef037075253
[lttv.git] / ltt / branches / poly / doc / developer / format.html
CommitLineData
584db146 1<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2<html>
3<head>
4 <title>The new LTT trace format</title>
5</head>
6 <body>
7
8<h1>The new LTT trace format</h1>
9
10<P>
11A trace is contained in a directory tree. To send a trace remotely,
12the directory tree may be tar-gzipped. Trace foo, placed in the home
13directory of user john, /home/john, would have the following content:
14
15<PRE><TT>
16$ cd /home/john
17$ tree foo
18foo/
19|-- eventdefs
20| |-- core.xml
21| |-- net.xml
22| |-- ipv4.xml
23| `-- ide.xml
24|-- info
25| |-- bookmarks.xml
26| `-- system.xml
27|-- control
28| |-- facilities
29| |-- interrupts
30| `-- processes
31`-- cpu
32 |-- 0
33 |-- 1
34 |-- 2
35 `-- 3
36</TT></PRE>
37
38<P>
39The eventdefs directory contains the events descriptions for all the
40facilities used. The syntax is a simple subset of XML; XML is widely
41known and easily parsed or hand edited. Each file contains one or more
42<FACILITY NAME=name>...</FACILITY> elements. Indeed, several
43facilities may have the same name but different content (and thus will
44generate a different checksum), typically when the event descriptions
45for a given facility change from one version to the next, if a module
46is recompiled and reloaded during a trace.
47
48<P>
49A small number of events are predefined, part of the "builtin" facility,
50and are not present there. These "builtin" events include "facility_load",
51"block_start", "block_end" and "time_heartbeat".
52
53<P>
54The cpu directory contains a tracefile for each cpu, numbered from 0,
55in .trace format. A uniprocessor thus only contains the file cpu/0.
56A multi-processor with some unused (possibly hotplug) CPU slots may have some
57unused CPU numbers. For instance a 8 way SMP board with 6 CPUs randomly
58installed may produce tracefiles named 0, 1, 2, 4, 6, 7.
59
60<P>
61The files in the control directory also follow the .trace format.
62The "facilities" file only contains "builtin" facility_load events
63and is used to determine the facilities used and the code range assigned
64to each facility. The other control files contain the initial system
65state and various subsequent important events, for example process
66creations and exit. The interest of placing such subsequent events
67in control trace files instead of (or in addition to) in the per cpu
68trace files is that they may be accessed more quickly/conveniently
69and that they may be kept even when the per cpu files are overwritten
70in "flight recorder mode".
71
72<P>
73The info directory contains in system.xml a description of the system on which
74the trace was created as well as different user annotations in bookmark.xml.
75This directory may also contain various information about the trace, generated
76during trace analysis (statistics, index...).
77
78
79<H2>Trace format</H2>
80
81<P>
82Each tracefile is divided into equal size blocks with an uint32 at the block
83end giving the offset to the last event in the block. Events are packed
84sequentially in the block starting at offset 0 with a "block_start" event
85and ending, at the offset stored in the last 4 bytes of the block, with a
86block_end event. Both the block_start and block_end events
87contain the kernel timestamp (timespec binary structure,
88uint32 seconds, uint32 nanoseconds), the cycle counter (uint64 cycles),
89and the buffer id (uint64).
90
91<P>
92Each event consists in an event type id (uint16 which is the event type id
93within the facility + the facility base id), a time delta (uint32 in cycles
94or nanoseconds, depending on configuration, since the last time value, in the
95block header or in a "time_heartbeat" event) and the event type specific data.
96All values are packed in native byte order binary format.
97
98
99<H2>System description</H2>
100
101<P>
102The system type description, in system.xml, looks like:
103
104<PRE><TT>
105&lt;system
106 node_name="vaucluse"
107 domainname="polymtl.ca"
108 cpu=4
109 arch_size="ILP32"
110 endian="little"
111 kernel_name="Linux"
112 kernel_release="2.4.18-686-smp"
113 kernel_version="#1 SMP Sun Apr 14 12:07:19 EST 2002"
114 machine="i686"
115 processor="unknown"
116 hardware_platform="unknown"
117 operating_system="Linux"
118 ltt_major_version="2"
119 ltt_minor_version="0"
120 ltt_block_size="100000"
121&gt;
122Some comments about the system
123&lt;/system&gt;
124</TT></PRE>
125
126<P>
127The system attributes kernel_name, node_name, kernel_release,
128 kernel_version, machine, processor, hardware_platform and operating_system
129come from the uname(1) program. The domainname attribute is obtained from
130the "hostname --domain" command. The arch_size attribute is one of
131LP32, ILP32, LP64 or ILP64 and specifies the length in bits of integers (I),
132long (L) and pointers (P). The endian attribute is "little" or "big".
133While the arch_size and endian attributes could be deduced from the platform
134type, having these explicit allows analysing traces from yet unknown
135platforms. The cpu attribute specifies the maximum number of processors in
136the system; only tracefiles 0 to this maximum - 1 may exist in the cpu
137directory.
138
139<P>
140Within the system element, the text enclosed may describe further the
141system traced.
142
143
144<H2>Event type descriptions</H2>
145
146<P>
147A facility contains the descriptions of several event types. When a structure
148is reused in several event types, a named type is defined and may be referenced
149by several other event types or named types.
150
151<PRE><TT>
152&lt;facility name=facility_name&gt;
153 &lt;description&gt;Some text&lt;/description&gt;
154 &lt;event name=eventtype_name&gt;
155 &lt;description&gt;Some text&lt;/description&gt;
156 --type structure--
157 &lt;/event&gt;
158 ...
159 &lt;type name=type_name&gt;
160 --type structure--
161 &lt;/type&gt;
162&lt;/facility&gt;
163</TT></PRE>
164
165<P>
166The type structure may be one of the following primitive type elements.
167Whenever the keyword isize is used, the allowed values are
168short, medium, long, 1, 2, 4, 8, indicating the size in bytes.
169The fsize keyword represents one of medium, long, 4 and 8 bytes.
170
171<PRE><TT>
172&lt;int size=isize format="printf format"/&gt;
173
174&lt;uint size=isize format="printf format"/&gt;
175
176&lt;float size=fsize format="printf format"/&gt;
177
178&lt;string format="printf format"/&gt;
179
180&lt;enum size=isize format="printf format"&gt;label1 label2 ...&lt;/enum&gt;
181</TT></PRE>
182
183<P>
184The string is null terminated. For the enumeration, the size of the integer
185used for its representation is specified.
186
187<P>
188The type structure may also be a compound type.
189
190<PRE><TT>
191&lt;array size=n&gt; --type structure-- &lt;/array&gt;
192
193&lt;sequence lengthsize=isize&gt; --type structure-- &lt;/sequence&gt;
194
195&lt;struct&gt;
196 &lt;field name=field_name&gt;
197 &lt;description&gt;Some text&lt;/description&gt;
198 --type structure--
199 &lt;/field&gt;
200 ...
201&lt;/struct&gt;
202
203&lt;union typecodesize=isize&gt;
204 &lt;field name=field_name&gt;
205 &lt;description&gt;Some text&lt;/description&gt;
206 --type structure--
207 &lt;/field&gt;
208 ...
209&lt;/union&gt;
210</TT></PRE>
211
212<P>
213Array is a fixed size array of length size. Sequence is a variable size
214array with its length stored as a prepended uint of length lengthsize.
215A structure is simply an aggregation of fields. An union is one of its n
216fields (variant record), as indicated by a preceeding code (0 to n - 1)
217of the specified size typecodesize.
218
219<P>
220Finally the type structure may be defined by referencing a named type.
221
222<PRE><TT>
223&lt;typeref name=type_name/&gt;
224</PRE></TT>
225
226<H2>Builtin events</H2>
227
228<P>
229The facility named "builtin" is always present and contains at least the
230following event types.
231
232<PRE><TT>
233&lt;event name=facility_load&gt;
234 &lt;description&gt;Facility used in the trace&lt;/description&gt;
235 &lt;struct&gt;
236 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
237 &lt;field name="checksum"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
238 &lt;field name="base_code"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
239 &lt;/struct&gt;
240&lt;/event&gt;
241
242&lt;event name=block_start&gt;
243 &lt;description&gt;Block start timestamp&lt;/description&gt;
244 &lt;typeref name=block_timestamp/&gt;
245&lt;/event&gt;
246
247&lt;event name=block_end&gt;
248 &lt;description&gt;Block end timestamp&lt;/description&gt;
249 &lt;typeref name=block_timestamp/&gt;
250&lt;/event&gt;
251
252&lt;event name=time_heartbeat&gt;
253 &lt;description&gt;System time values sent periodically to minimize cycle counter
254 drift with respect to real time clock and to detect cycle counter
255 rollovers
256 &lt;/description&gt;
257 &lt;typeref name=timestamp/&gt;
258&lt;/event&gt;
259
260&lt;type name=block_timestamp&gt;
261 &lt;struct&gt;
262 &lt;field name=timestamp&gt;&lt;typeref name=timestamp&gt;&lt;/field&gt;
263 &lt;field name=block_id&gt;&lt;uint size=4/&gt;&lt;/field&gt;
264 &lt;/struct&gt;
265&lt;/type&gt;
266
267&lt;type name=timestamp&gt;
268 &lt;struct&gt;
269 &lt;field name=time&gt;&lt;typeref name=timespec/&gt;&lt;/event&gt;
270 &lt;field name="cycle_count"&gt;&lt;uint size=8/&gt;&lt;/field&gt;
271 &lt;/struct&gt;
272&lt;/event&gt;
273
274&lt;type name=timespec&gt;
275 &lt;struct&gt;
276 &lt;field name="seconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
277 &lt;field name="nanoseconds"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
278 &lt;/struct&gt;
279&lt;/type&gt;
280</TT></PRE>
281
282<H2>Control files</H2>
283
284<P>
285The interrupts file reflects the content of the /proc/interrupts system file.
286It contains one event describing each interrupt. At trace start, events are
287generated describing all the current interrupts. If the assignment of
288interrupts changes later, due to devices or device drivers being activated or
289deactivated, additional events may be added to the file. Each interrupt
290event has the following structure.
291
292<PRE><TT>
293&lt;event name=interrupt&gt;
294 &lt;description&gt;Interrupt request number assignment&lt;description&gt;
295 &lt;struct&gt;
296 &lt;field name="number"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
297 &lt;field name="count"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
298 &lt;field name="controller"&gt;&lt;string/&gt;&lt;/field&gt;
299 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
300 &lt;/struct&gt;
301&lt;/event&gt;
302</TT></PRE>
303
304<P>
305The processes file contains the list of processes already created when the
306trace starts. Each process describing event is modeled after the
307/proc/self/status system file. The number of fields in this event is
308expected to be expanded in the future to include groups, signal masks,
309opened file descriptors and address maps.
310
311<PRE><TT>
312&lt;event name=process&gt;
313 &lt;description&gt;Existing process&lt;description&gt;
314 &lt;struct&gt;
315 &lt;field name="name"&gt;&lt;string/&gt;&lt;/field&gt;
316 &lt;field name="pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
317 &lt;field name="ppid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
318 &lt;field name="tracer_pid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
319 &lt;field name="uid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
320 &lt;field name="euid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
321 &lt;field name="suid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
322 &lt;field name="fsuid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
323 &lt;field name="gid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
324 &lt;field name="egid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
325 &lt;field name="sgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
326 &lt;field name="fsgid"&gt;&lt;uint size=4/&gt;&lt;/field&gt;
327 &lt;field name="state"&gt;&lt;enum size=4&gt;
328 Running WaitInterruptible WaitUninterruptible Zombie Traced Paging
329 &lt;/enum&gt;&lt;/field&gt;
330 &lt;/struct&gt;
331&lt;/event&gt;
332</TT></PRE>
333
334<H2>Facilities</H2>
335
336<P>
337Facilities define a granularity of events grouping for filtering, activation
338and compilation. Each facility does cost a table entry in the kernel (name,
339checksum, event type code range), or somewhere between 20 and 30 bytes. Having
340one facility per tracing statement in the kernel would be too much (assuming
341that they eventually are routinely inserted in the kernel code and replace
342the 80000+ printk statements in some proportion). However, having a few
343facilities, up to a few tens, would make sense.
344
345<P>
346The "builtin" facility contains a small number of predefined events which must
347always exist. The "core" facility contains a small subset of OS events which
348are almost always of interest (scheduling, interrupts, faults, system calls).
349Then, specialized facilities may exist for each subsystem (network, disks,
350USB, SCSI...).
351
352
353<H2>Bookmarks</H2>
354
355<P>
356Bookmarks are user supplied information added to a trace. They contain user
357annotations attached to a time interval.
358
359<PRE><TT>
360&lt;bookmarks&gt;
361 &lt;location name=name cpu=n start_time=t end_time=t&gt;Some text&lt;/location&gt;
362 ...
363&lt;/bookmarks&gt;
364</TT></PRE>
365
366<P>
367The interval is defined using either "time=" or "start_time=" and
368"end_time=", or "cycle=" or "start_cycle=" and "end_cycle=".
369The time is in seconds with decimals up to nanoseconds and cycle counts
370are unsigned integers with a 64 bits range. The cpu attribute is optional.
371
372</BODY>
373</HTML>
374
375
376
377
This page took 0.03543 seconds and 4 git commands to generate.