Add a batchanalysis module to build and run a sync chain
[lttv.git] / lttv / lttv / sync / README
CommitLineData
add19043
BP
1Benjamin Poirier
2benjamin.poirier@polymtl.ca
32009
4
5+ About time synchronization
6This framework performs offline time synchronization. This means that the
7synchronization is done after tracing is over. It is not the same as online
8synchronization like what is done by NTP. Nor is it directly influenced by it.
9
10Event timestamps are adjusted according to a clock correction function that
11palliates for initial offset and rate offset (ie. clocks that don't start out
12at the same value and clocks that don't run at the same speed). It can work on
13two or more traces.
14
15The synchronization is based on relations identified in network traffic
16between nodes. So, for it to work, there must be traffic exchanged between the
17nodes. At the moment, this must be TCP traffic. Any kind will do (ssh, http,
18...)
19
20For scientific information about the algorithms used, see:
21* Duda, A., Harrus, G., Haddad, Y., and Bernard, G.: Estimating global time in
22distributed systems, Proc. 7th Int. Conf. on Distributed Computing Systems,
23Berlin, volume 18, 1987
24* Ashton, P.: Algorithms for Off-line Clock Synchronisation, University of
25Canterbury, December 1995
26http://www.cosc.canterbury.ac.nz/research/reports/TechReps/1995/tr_9512.pdf
27
28+ Using time synchronization
29++ Recording traces
30To use time synchronization you have to record traces on multiple nodes
31simultaneously with lttng (the tracer). While recording the traces, you have
32to make sure the following markers are enabled:
33* dev_receive
34* dev_xmit_extended
35* tcpv4_rcv_extended
36* udpv4_rcv_extended
9a9ca632
BP
37You can use the 'ltt-armall' and 'ltt-armnetsync' scripts for this.
38
add19043
BP
39You also have to make sure there is some TCP traffic between the traced nodes.
40
41++ Viewing traces
42Afterwards, you have to make sure all the traces are accessible from a single
43machine, where lttv (the viewer) is run.
44
45Time synchronization is enabled and controlled via the following lttv options,
46as seen with "-h":
47--sync
48 synchronize the time between the traces
49--sync-stats
50 print statistics about the time synchronization
51--sync-null
52 read the events but do not perform any processing, this
53 is mostly for performance evaluation
54--sync-analysis - argument: chull, linreg
55 specify the algorithm to use for event analysis
56--sync-graphs
57 output gnuplot graph showing synchronization points
58--sync-graphs-dir - argument: DIRECTORY
59 specify the directory where to store the graphs, by
60 default in "graphs-<lttv-pid>"
61
62To enable synchronization, start lttv with the "--sync" option. It can be
63used in text mode or in GUI mode. You can add the traces one by one in the GUI
64but this will recompute the synchronization after every trace that is added.
65Instead, you can save some time by specifying all your traces on the command
66line (using -t).
67
68Example:
69lttv-gui -t traces/node1 -t traces/node2 --sync
70
71++ Statistics
72The --sync-stats option is useful to make sure the synchronization algorithms
73worked. Here is an example output (with added comments) from a successful
74chull (one of the synchronization algorithms) run of two traces:
75 LTTV processing stats:
76 received frames: 452
77 received frames that are IP: 452
78 received and processed packets that are TCP: 268
79 sent packets that are TCP: 275
80 TCP matching stats:
81 total input and output events matched together to form a packet: 240
82 Message traffic:
83 0 - 1 : sent 60 received 60
84# Note that 60 + 60 < 240, this is because there was loopback traffic, which is
85# discarded.
86 Convex hull analysis stats:
87 out of order packets dropped from analysis: 0
88 Number of points in convex hulls:
89 0 - 1 : lower half-hull 7 upper half-hull 9
90 Individual synchronization factors:
91 0 - 1 : Middle a0= -1.33641e+08 a1= 1 - 4.5276e-08 accuracy 1.35355e-05
92 a0: -1.34095e+08 to -1.33187e+08 (delta= 907388)
93 a1: 1 -6.81298e-06 to +6.72248e-06 (delta= 1.35355e-05)
94 Resulting synchronization factors:
95 trace 0 drift= 1 offset= 0 (0.000000) start time= 18.799023588
96 trace 1 drift= 1 offset= 1.33641e+08 (0.066818) start time= 19.090688494
97 Synchronization time:
98 real time: 0.113308
99 user time: 0.112007
100 system time: 0.000000
101
102++ Algorithms
103The synchronization framework is extensible and already includes two
104algorithms: chull and linreg. You can choose which analysis algorithm to use
105with the --sync-analysis option.
106
107+ Design
108This part describes the design of the synchronization framework. This is to
109help programmers interested in:
110* adding new synchronization algorithms (analysis part)
111 There are already two analysis algorithms available: chull and linreg
112* using new types of events (processing and matching parts)
113* using time synchronization with another data source/tracer (processing part)
114 There are already two data sources available: lttng and unittest
115
116++ Sync chain
117This part is specific to the framework in use: the program doing
118synchronization, the executable linking to the event_*.o
119eg. LTTV, unittest
120
121This reads parameters, creates SyncState and calls the processing init
122function. The "sync chain" is the set of event-* modules. At the moment there
123is only one module at each stage. However, as more module are added, it will
124become relevant to have many modules at the same stage simultaneously. This
125will require some modifications. I've kept this possibility at the back of my
126mind while designing.
127
128++ Stage 1: Event processing
129Specific to the tracing data source.
130eg. LTTng, LTT userspace, libpcap
131
132Read the events from the trace and stuff them in an appropriate Event object.
133
134++ Communication between stages 1 and 2: events
135Communication is done via objects specialized from Event. At the moment, all
136*Event are in data_structures.h. Specific event structures and functions could
137be in separate files. This way, adding a new set of modules would require
138shipping extra data_structures* files instead of modifying the existing one.
139For this to work, Event.type couldn't be an enum, it could be an int and use
f6691532 140#defines or constants defined in the specialized data_structures* files.
add19043
BP
141Event.event could be a void*.
142
143++ Stage 2: Event matching
144This stage and its modules are specific to the type of event. Event processing
145feeds the events one at a time but event analysis works on groups of events.
146Event matching is responsible for forming these groups. Generally speaking,
147these can have different types of relation ("one to one", "one to many", or a
148mix) and it will influence the overall behavior of the module.
149eg. TCP, UDP, MPI
150
f6691532
BP
151matchEvent() takes an Event pointer. An actual matching module doesn't have to
152be able to process every type of event. It will only be passed events of a
153type it can process (according to the .canMatch field of its MatchingModule
154struct).
add19043
BP
155
156++ Communication between stages 2 and 3: event groups
157Communication consists of events grouped in Message, Exchange or Broadcast
158structs.
159
160About exchanges:
161If one event pair is a packet (more generally, something representable as a
162Message), an exchange is composed of at least two packets, one in each
163direction. There should be a non-negative minimum "round trip time" (RTT)
164between the first and last event of the exchange. This RTT should be as small
165as possible so these packets should be closely related in time like a data
166packet and an acknowledgement packet. If the events analyzed are such that the
167minimum RTT can be zero, there's nothing gained in analyzing exchanges beyond
168what can already be figured out by analyzing packets.
169
170An exchange can also consist of more than two packets, in case one packet
171single handedly acknowledges many data packets. In this case, it is best to
172use the last acknowledged packet. Assuming a linear clock, an acknowledged
173packet is as good as any other. However, since the linear clock assumption is
174further from reality as the interval grows longer, it is best to keep the
175interval between the two packets as short as possible.
176
177++ Stage 3: Event analysis
178This stage and its modules are specific to the algorithm that analyzes events
179to deduce synchronization factors.
180eg. convex hull, linear regression, broadcast Maximum Likelihood Estimator
181
182Instead of having one analyzeEvents() function that can receive any sort of
183grouping of events, there are three prototypes: analyzeMessage(),
184analyzeExchange() and analyzeBroadcast(). A module implements only the
185relevant one(s) and sets the other function pointers to NULL in its
186AnalysisModule struct.
187
188The approach is different from matchEvent() where there is one point of entry
189no mather the type of event. The analyze*() approach has the advantage that
190there is no casting or type detection to do. It is also possible to deduce
191from the functions pointers which groupings of events a module can analyze.
192However, it means each analysis module will have to be modified if there is
193ever a new type of event grouping.
194
195I chose this approach because:
1961) I thought it likely that there will be new types of events but not so
197 likely that there will be new types of event groups.
1982) all events share some members (time, traceNb, ...) but not event groups
1993) we'll see which one of the two approaches works best and we can adapt
200 later.
201
202++ Data flow
203Data from traces flows "down" from processing to matching to analysis. Factors
204come back up.
205
206++ Evolution and adaptation
207It is possible to change/add another sync chain and to add other event_*
208modules. It has been done. New types of events may need to be added to
209data_structures.h. This is only to link between Event-* modules. If the data
210does not have to be shared, data_structures.h does not have to be modified.
211
212At the moment there is some code duplication in the last steps of linreg and
213chull analysis: the code to propagate the factors when there are more than two
214nodes. Maybe there could be a Stage 4 that does that?
This page took 0.030408 seconds and 4 git commands to generate.