David Goulet [Thu, 10 Jan 2013 17:07:35 +0000 (12:07 -0500)]
Fix: update next_net_seq_num after sending header
Increment the sequence number after we are sure that the relayd has
received correctly the data header. If an error occurs when sending the
header, the data won't be extracted from the buffers thus keeping this
sequence number untouched.
Furthermore, after sending the header, if the relayd dies, this value
won't matter much and if there is an error on the stream when reading
the trace data, the stream will be deleted thus closed on the relayd
making this value useless.
It's important to note that this sequence number is updated on the
relayd side if the full expected data packet was received. So,
incrementing the value after the transmission of the header is not
changing anything in terms of value coherency. The point is to have a
semantic of when read and used successfully (transmission to relayd),
let's update it.
In that code flow, the stream's lock is acquired so no need to
read/update it atomically. I've also added a comments to better
understand the purpose of this variable and how to use it.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 10 Jan 2013 15:18:31 +0000 (10:18 -0500)]
Fix: wrong loop continuation in metadata thread
The validation of the endpoint status can change the metadata hash table
meaning stream(s) can be removed from it and the poll set. After that,
continuing the for loop was making the thread use possible invalid file
descriptor that were not in the hash table anymore trigerring the lookup
assert of the node just after the for loop.
The very important part here is that when the metadata ht changes, we
MUST go back to the poll wait() to synchronize the subset of fd we are
looking at.
Reported-by: Jesus Garcia <jesus.garcia@ericsson.com>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 9 Jan 2013 22:06:38 +0000 (17:06 -0500)]
Fix: lttng create session memleaks
The uri_parse() function call was leaking copy(ies) of lttng_uri
structure.
Fixes #420
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 9 Jan 2013 15:14:15 +0000 (10:14 -0500)]
Fix: remove unused session id map
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 9 Jan 2013 15:03:38 +0000 (10:03 -0500)]
Fix: wrong session id used on relayd lookup
The relayd session id might not be unique with multiple relayd so the
lookup could choose the wrong relayd for the given sessiond session id.
Fixes #419
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 7 Jan 2013 22:44:03 +0000 (17:44 -0500)]
Fix: don't do custom lookup to relayd stream ht
Use the function made for stream lookup for all lookup calling site. The
function had to move up so to be visible.
Signed-off-by: David Goulet <dgoulet@efficios.com>
Yannick Brosseau [Thu, 20 Dec 2012 19:31:06 +0000 (14:31 -0500)]
Add pkg-config for liblttng-ctl
Signed-off-by: Yannick Brosseau <yannick.brosseau@gmail.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Andrew Gabbasov [Mon, 10 Dec 2012 19:37:03 +0000 (13:37 -0600)]
Add kernel modules loading for new probes
New probes introduced by commit
b87700e318c27267890cbd6fb5e50b687279131b
in lttng-modules.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 7 Jan 2013 19:37:16 +0000 (14:37 -0500)]
Fix: add missing UST abi header for make dist
Reported-by: Samuel Martin <smartin@aldebaran-robotics.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 7 Jan 2013 18:45:29 +0000 (13:45 -0500)]
Fix: add missing rcu read side lock/unlock
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 18:53:18 +0000 (13:53 -0500)]
Update version to v2.1.0
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 19:13:07 +0000 (14:13 -0500)]
Fix: lttng create URI parsing and check
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 18:06:50 +0000 (13:06 -0500)]
Fix: missing scripts for make dist
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 17:17:11 +0000 (12:17 -0500)]
Add disable-event to man page and clarify enable-event
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 15:51:41 +0000 (10:51 -0500)]
Fix: update to latest UST abi
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 01:56:04 +0000 (20:56 -0500)]
Fix: bad check of accept() return value
Also fix a missing ret = -1 assignment. Although, the chances are
unlikely to hit a positive ret value that does not match the structure
size, better safe than sorry.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 01:37:14 +0000 (20:37 -0500)]
Fix: missing mutex lock if relayd was not created
Also add missing ret = -1 assignment on error in error path when adding
a relayd socket in the consumer.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 01:21:54 +0000 (20:21 -0500)]
Fix: return error if sendmsg fails on relayd
Also, remove a FIXME that was refering to something that disapeared
(data_size).
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 20 Dec 2012 00:58:30 +0000 (19:58 -0500)]
Fix: variable usage for data pending and add comments
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 23:49:37 +0000 (18:49 -0500)]
Fix: print ret value on ust_app start/stop error
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 23:30:37 +0000 (18:30 -0500)]
Fix: compare write() return value to size
Now also check if the ret value of a write() operation is not equal to
the given size.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 23:25:49 +0000 (18:25 -0500)]
Fix: handle orderly shutdown from transport layer
Print a debug statement if a shutdown is detected or else an error. The
transport layer will print the perror in case of an error.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 23:11:12 +0000 (18:11 -0500)]
Fix: change perror to debug statement
Most of the changes here remove a double PERROR which is done by the
transport layer. So we notify in the debug message to understand where
the transport error was.
Also, don't print an error if the relayd is not found. This is possible
if the relayd dies so an error here is useless to the common user but
useful as a debug statement.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 22:54:25 +0000 (17:54 -0500)]
Fix: don't print EPIPE error which can happen
Anytime a relayd is killed, writing on a closed fd is totally possible
so the PERROR of an EPIPE error is useless as an error but we do print
it as a dbg message now.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 22:51:25 +0000 (17:51 -0500)]
Fix: handle shutdown on recv reply in relayd
Print a meaningful error when the recvmsg for the reply gets an orderly
shutdown or an error.
Return a negative value each time since this means that we have to stop
everything for that socket and clean up.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 20:36:59 +0000 (15:36 -0500)]
Fix: Off by one in seq num for data pending command
Like the close stream command, the next sequence number of the stream
needs to be used minus 1 for the data pending or else we are off by one
on the relayd during the check since 4 data packets for instance means a
prev_seq value of 4 but a last_next_seq_num of 5 hence creating an off
by one for the data pending check.
Furthermore, the check was actually wrong on the relayd side. Having a
previous sequence number lower than the last one seen does NOT mean that
the data is not pending so the check needed was actually equal or
greater.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 19 Dec 2012 19:13:24 +0000 (14:13 -0500)]
Fix: wrong check on session started on stop command
This is problematic for application that lives longer than the tracing
session so the make check unfortunately did not catch this problem since
we either kill the applications before the stop or wait for them to die.
I will quote a colleague of mine on IRC after discovering this:
14:14 < cbab> moar tests!
:)
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 21:50:59 +0000 (16:50 -0500)]
Fix: for librelayd, fix negative reply ret code
Trying to negate a uint32_t is kind of difficult so set ret to -1 and
print the actuall host byte order ret code as an error.
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Tue, 18 Dec 2012 21:31:18 +0000 (16:31 -0500)]
run-report: Add filtering, health and streaming tests
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Tue, 18 Dec 2012 21:31:17 +0000 (16:31 -0500)]
run-report: Allow tests to spawn and control their own sessiond
The run-report script can spawn a sessiond if the 'daemon' key value is
set to 'True' in the test description dictionary. If the 'daemon' key is
set to 'False', the TEST_NO_SESSIOND environment variable is set so no
sessiond can be spawned in the tests. This variable is also set when the
run-report spawn its own sessiond.
This behavior has the unfortunate side-effect of restricting any kind of
spawning and control of the sessiond via the tests.
Fix this issue by allowing the tests to spawn their own sessiond. We
need to pass an additional env dictionary to the TestWorker in order to
spawn the test with the proper environment variables set.
To indicate that a test will spawn and manage its own sessiond, the
'daemon' key value should be set to the "test" string.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Tue, 18 Dec 2012 21:31:16 +0000 (16:31 -0500)]
run-report: Fix CPU usage stats computation
The CPU usage statistics are computed by grepping the top command
output. The top output format as since changed so the CPU usage
statistics were not properly computed.
Fix this by adjusting to the new top command output format.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Tue, 18 Dec 2012 21:31:15 +0000 (16:31 -0500)]
run-report: Restore SIGPIPE default handler in subprocess calls
Python override the SIGPIPE default handler because it prefers to check
every write and raise an IOError exception rather than taking SIGPIPE
[1].
This behavior has the unfortunate side-effect of polluting stdout with
broken pipe messages on shell pipelines invocations (e.g. echo foo |
grep something | etc.) in shell scripts spawned via subprocess.Popen().
This commit fix the polluting of stdout by restoring the default SIGPIPE
handler on subprocess calls.
[1] - http://bugs.python.org/issue1652
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Tue, 18 Dec 2012 21:31:14 +0000 (16:31 -0500)]
run-report: Use libtool wrapper to spawn the sessiond for tests
The run-report script was using the sessiond binary generated via
libtool under the ".libs/" folder. When using this binary, the consumerd
used when starting the sessiond is the one installed system-wide (if
any). This could lead to tests failures if no consumer are installed in
the system or any version mismatch occurs.
This commit fix this by using the consumerd that was built with libtool
in the local source tree.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 21:19:34 +0000 (16:19 -0500)]
Fix: sessiond write() to handle EINTR
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 21:04:19 +0000 (16:04 -0500)]
Fix: change ERR/PERROR statement to DBG
Most of the explanation is added as comments in the code.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 20:38:25 +0000 (15:38 -0500)]
Fix: DBG statement in relayd
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 20:30:25 +0000 (15:30 -0500)]
Fix: handle EINTR for every read()
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 20:21:33 +0000 (15:21 -0500)]
Fix: handle consumer data pipe read error
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 20:18:27 +0000 (15:18 -0500)]
Fix: don't print usage when listing fails
Fixes #414
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 19:50:51 +0000 (14:50 -0500)]
Fix: possible invalid free in kernel thread
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 19:02:14 +0000 (14:02 -0500)]
Fix: flag metadata stream on quiescent control cmd
For the relayd, when doing a quiescent control command, we have to flag
the corresponding metadata stream or else it will simply stay alive
until a close stream and always returning that data is inflight at the
end data pending command.
Add a stream id to the relayd command so the relayd can identify which
stream to flag.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 00:04:13 +0000 (19:04 -0500)]
Fix: prioritize control socket communication in relayd
Add the LTTNG_POLL_GET_PREV_FD for the relayd listener thread that needs
to access the previous valid fd during a poll loop.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 17 Dec 2012 20:46:28 +0000 (15:46 -0500)]
Fix: poll and epoll fd set reallocation
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Mathieu Desnoyers [Mon, 17 Dec 2012 23:32:27 +0000 (18:32 -0500)]
Fix: cppcheck linter cleanups
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 17:09:09 +0000 (12:09 -0500)]
Fix: add missing goto pending if data is inflight
There was only a detection for data NOT inflight and for data inflight,
if a relayd was found, was simply exiting the loop and return no data
pending.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 17:05:24 +0000 (12:05 -0500)]
Fix: remove ua_sess->started assert on stop trace
It's totally possible that a start failed for a specific app but the
started flag is set for the global session making a stop trace possible
on a failed started session.
The assert is no longer valid since this code flow is possible.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 18 Dec 2012 13:59:07 +0000 (08:59 -0500)]
Fix: remove bash quote when starting relayd in tests
Signed-off-by: David Goulet <dgoulet@efficios.com>
Julien Desfossez [Mon, 17 Dec 2012 17:13:38 +0000 (12:13 -0500)]
Set classes of traffic in high_throughput_limits
This patch creates 2 classes for the bandwidth limited test instead of
one. The intent is to have multiple queues in the kernel instead of just
one. That way we can prioritize the control port over the data port and
make sure it gets its share of the bandwidth.
With this update, the control port gets 1/10th of the limit and the data
get the remaining 9/10th. If unused, the data connection can borrow the
remaining bandwidth.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 17 Dec 2012 17:37:42 +0000 (12:37 -0500)]
Fix: use the poll wait ret value when iterating on fd(s)
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 17 Dec 2012 17:19:56 +0000 (12:19 -0500)]
Fix: force the poll() return value to be nb_fd
With poll(), we have to iterate over all fd in the pollset since it is
handled in user space where we don't have to with epoll.o
This is a first patch to fix the fact that we should iterate over the
number of fd the lttng_poll_wait() call returns which is for epoll the
number of returned events and with poll the whole set of fd.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 17 Dec 2012 16:30:24 +0000 (11:30 -0500)]
Fix: add missing pollset reset in relayd listener thread
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 14 Dec 2012 20:11:49 +0000 (15:11 -0500)]
Fix: Wrong check of node when cleaning up ht
The node should NOT be in the hash table to ignore the deletion and not
the contrary.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 14 Dec 2012 15:28:31 +0000 (10:28 -0500)]
Revert adding LTTNG_PACKED in lttng.h
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 14 Dec 2012 14:47:21 +0000 (09:47 -0500)]
Fix: cleanup high_throughput_limits test
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 14 Dec 2012 01:40:53 +0000 (20:40 -0500)]
Fix: set started flag of ust app after ustctl
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 14 Dec 2012 01:30:50 +0000 (20:30 -0500)]
Fix: memory leak in add relayd socket error path
Signed-off-by: David Goulet <dgoulet@efficios.com>
Julien Desfossez [Fri, 14 Dec 2012 01:01:52 +0000 (20:01 -0500)]
Move relay commands out of lttcomm_sessiond_command
Introduce a new enum for relayd commands: lttcomm_relayd_command. This
will make further additions to either enum cleaner.
Signed-off-by: Julien Desfossez <jdesfossez@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Thu, 13 Dec 2012 23:39:13 +0000 (18:39 -0500)]
Tests: Add health check testpoint fail test
This test trigger a failure in a specified thread by using the testpoint
mechanism. The testpoints behavior is implemented in health_fail.c. The
testpoint code simply return 1 (non-zero values are considered as errors
for testpoints) to trigger the specific thread error handling mechanism.
This test ensure that we can detect health failure for each thread error
handling paths.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Thu, 13 Dec 2012 23:38:56 +0000 (18:38 -0500)]
Add return code to the testpoint mechanism
The testpoint processing could fail and currently there is no mechanism
to notify the caller of such failures. This patch adds an int return
code to the testpoint prototype. Non-zero return code indicate failure.
When using the testpoint mechanism, the caller should properly handle
testpoint failure cases and trigger the appropriate response (error
handling, thread teardown, etc.).
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 23:27:23 +0000 (18:27 -0500)]
Fix: put back the high-throughput test removed by mistake
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 23:15:56 +0000 (18:15 -0500)]
Fix: Bad error handling when enable channel fails
Fixes #403
Signed-off-by: David Goulet <dgoulet@efficios.com>
Christian Babeux [Mon, 10 Dec 2012 19:46:15 +0000 (14:46 -0500)]
Tests: Fix sleep interruption in health stall test
The sleep(3) call can return the number of seconds left to sleep if
interrupted. Handle the intteruption in the health stall test.
Signed-off-by: Christian Babeux <christian.babeux@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 22:51:45 +0000 (17:51 -0500)]
Fix: RCU unlock out of error path
On channel error, RCU was not unlocking the read side. Furthermore,
remove a check for a NULL session that was also not going through an RCU
unlock. Change it to an assert.
This also adds a channel subbuf size check when enabling a channel.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 22:30:40 +0000 (17:30 -0500)]
Fix: update file listing for licensing
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 21:55:08 +0000 (16:55 -0500)]
Fix: missing health exit in registration app thread
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 21:41:57 +0000 (16:41 -0500)]
Fix: add packed attribute to filter structure
Also fix the internal UST abi by swapping two variables and fit the
upstream UST abi.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 21:35:44 +0000 (16:35 -0500)]
Fix: Add missing health code update for consumer command
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 20:25:03 +0000 (15:25 -0500)]
Fix: packed every sessiond-comm.h structure pass over sockets
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 20:15:10 +0000 (15:15 -0500)]
Add LTTNG_PACKED macro
This adds the macro and set it on all lttng.h structure. Also, replace
the already packed relayd structure with the macro.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 18:58:31 +0000 (13:58 -0500)]
Fix: clear the fixme in high_throughput_limits
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Thu, 13 Dec 2012 01:16:33 +0000 (20:16 -0500)]
Fix data pending for inflight streaming
The consumer_data_pending() function call had a bad label naming. The
goto label data_not_pending was actually going to the return value of
pending data (1). So, this patch fixes that by renaming the label to the
right meaning.
Add a missing destroy of the relayd session id mapping hash table.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 22:39:06 +0000 (17:39 -0500)]
Map session id of relayd and sessiond in consumer
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 22:05:45 +0000 (17:05 -0500)]
Add the relayd create session command
This is needed in order to fix a specific condition of the data pending
where we need to have streams associated with a session and this command
will be used for new feature in the future.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 16:23:20 +0000 (11:23 -0500)]
Make the consumer sends a ACK after each command
This is needed to avoid buffer bloating when throttling communication
between the consumer and the relayd. Considering a very low bandwith
limit between the relayd and consumerd, the session daemon would send a
high debit of commands to the consumer without ever
emptying the unix socket queue, which makes the UNIX socket reach buffer
full conditions, which is prone to trigger corner-cases behaviors in
blocking send/recv with MSG_WAITALL, which is likely the cause of hang
experienced when limiting relayd bandwidth.
Adding an ACK to each command makes sure that we acknowledge the session
daemon that we, the consumer, have emptied the unix socket buffer.
NOTE: In consumer_add_relayd_socket(), there might be a problem with the
error path and message status to the sessiond. A subsequent patch might
fix a possible issue but for now it is not at all critical since any
critical error on the consumer side will notify the sessiond through the
error socket.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Wed, 12 Dec 2012 18:39:37 +0000 (13:39 -0500)]
Remove MSG_WAITALL on every recvmsg() socket type
In order to handle messages that are possibly larger than the socket
buffer size set by wmem_max and rmem_max /proc files, ensure that the
recv-side reads the data chunk-wise rather than hanging on a
MSG_WAITALL.
In addition to fixing this issue, chances are that it will also help
fixing hangs detected due to UNIX socket buffers filling up. The
MSG_WAITALL behavior in such situations might be unexpected.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 23:24:42 +0000 (18:24 -0500)]
Fix: overlap bash escaping for wildcard event name
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 22:18:23 +0000 (17:18 -0500)]
Fix: Wrong path in the overlap test
Also, activate the overlap.sh tests by default in the make check.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 21:27:55 +0000 (16:27 -0500)]
Fix: Add missing relayd ht cleanup and ht destroy
Add a function to cleanup every element of the relayd ht and free them
in a call_rcu.
Also, destroy the stream_list_ht on cleanup.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 21:11:15 +0000 (16:11 -0500)]
Fix: Allocate stream hash table in respective threads
Allocation and destroy are now in the same thread for both metadata and
data hash table.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 21:03:58 +0000 (16:03 -0500)]
Fix: Use stream deletion function when cleaning up
In theory, once the destroy stream ht function is called with the hash
table, it should be empty. However, for some fatal errors, it might not
so it's imperative that we gracefully delete the stream and free it
using an RCU call so both hash tables (stream and the one for the
pending command) are synchronized.
Simply freeing the stream could have created possible fd leaks and
invalid node for the data pending hash table.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 18:45:45 +0000 (13:45 -0500)]
Fix: Missing umask when using run as no clone
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 17:16:15 +0000 (12:16 -0500)]
Fix: Relayd and sessiond version check
Now only checks for the major version to be equal. After 2.1 stable
release, both components will adapt to the lowest minor version for the
same major version. For this, the session daemon now send it's version
values to the relayd so slight change in the protocol here.
For instance, a relayd 2.4 talking to a sessiond 2.8, the communication
and available feature will only be those of 2.4 version.
For a relayd let say 3.2 and a sessiond 2.2, the communication stops
right there since both major version differs.
Acked-by: Julien Desfossez <julien.desfossez@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 16:38:35 +0000 (11:38 -0500)]
Fix: FD leak on consumer add relayd socket error
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 10 Dec 2012 16:20:30 +0000 (11:20 -0500)]
Fix: Consumer sockets leak on error
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 21:03:04 +0000 (16:03 -0500)]
Fix: Use endpoint status enum value in checks
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 21:00:48 +0000 (16:00 -0500)]
Fix: protect consumer_find_channel with rcu locking
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 20:54:19 +0000 (15:54 -0500)]
Fix: Rename ust_app_destroy_trace and set it static
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 18:54:44 +0000 (13:54 -0500)]
Fix: UST app session teardown process
This patch removes the ht_del of sessions from the delete_ust_app RCU
call and puts it in the unregister app function just before the call_rcu
is done.
To be able to free the sessions in the call rcu, a list is added for
which, when in tearing down an application or session, this list is used
to get the session reference for deletion.
Note that when in the RCU call, we are assured that the list is
exclusively accessed thus no need for any locking.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Fri, 7 Dec 2012 17:05:24 +0000 (12:05 -0500)]
Fix: check ht_del ret value of ust app session
UST app sesion can be destroyed by two execution paths. Either the app
unregisters or a destroy session is triggered. So, allowing a ht_del to
fail means that the session is already scheduled for teardown in a rcu
call.
Furthermore, this means that when looking up for a ust app session that
is not found becomes valid since it means it is in the teardown process.
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 4 Dec 2012 23:10:45 +0000 (18:10 -0500)]
Fix: locking order between consumer and stream
Also, lock the stream BEFORE calling the read subbuffer so not to race
with the data pending command.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Tue, 4 Dec 2012 23:17:55 +0000 (18:17 -0500)]
Fix: don't steal key when adding a metadata stream
This was causing a stream corruption of the node key if the stream->key
of the metadata was matching a stream wait_fd making the stream not
findable and asserting when getting out of the metadata poll wait.
Now we lookup the stream before adding it to make sure it's unique and
don't try to steal the key anymore since wait_fd is unique to the
consumer.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@efficios.com>
Mathieu Desnoyers [Thu, 6 Dec 2012 14:20:11 +0000 (09:20 -0500)]
Consumer hold mutex for add stream
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: David Goulet <dgoulet@ev0ke.net>
David Goulet [Mon, 3 Dec 2012 21:57:57 +0000 (16:57 -0500)]
Fix: audit all close/fclose and check returned code
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 3 Dec 2012 21:43:43 +0000 (16:43 -0500)]
Fix: update/clean lttng.h comments
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 3 Dec 2012 21:14:31 +0000 (16:14 -0500)]
Fix: install lttng health check man page
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 3 Dec 2012 21:07:45 +0000 (16:07 -0500)]
Fix: ship relevant documentations with tarball
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 3 Dec 2012 21:01:09 +0000 (16:01 -0500)]
Remove useles AUTHORS and NEWS files
Authors are in each code files associated with the copyright statement.
AUTHORS is useless and out of date. NEWS contains nothing.
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 3 Dec 2012 21:00:45 +0000 (16:00 -0500)]
Fix: update urcu version in README and configure.ac
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 3 Dec 2012 20:08:48 +0000 (15:08 -0500)]
Update version to v2.1.0-rc9
Signed-off-by: David Goulet <dgoulet@efficios.com>
David Goulet [Mon, 3 Dec 2012 19:15:25 +0000 (14:15 -0500)]
Fix: set the stream ht static in consumer file
Signed-off-by: David Goulet <dgoulet@efficios.com>
This page took 0.045158 seconds and 4 git commands to generate.