Fix: consumer: snapshot: assertion on subsequent snapshot
Observed issue
==============
While a snapshot is being taken, the containing folder can disappear
unexpectedly. This can lead to the following errors, which are expected
and mostly handled fine:
PERROR - 14:47:32.
002564464 [
2922498/
2922507]: Failed to open file relative to trace chunk file_path = "channel0_0", flags = 577, mode = 432: No such file or directory (in _lttng_trace_chunk_open_fs_handle_locked() at trace-chunk.cpp:1411)
Error: Failed to open stream file "channel0_0"
Error: Snapshot channel failed
The problem happens on the subsequent snapshot for the session:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007fbbdadb3859 in __GI_abort () at abort.c:79
#2 0x00007fbbdadb3729 in __assert_fail_base (fmt=0x7fbbdaf49588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55c4212cfbb5 "!stream->trace_chunk", file=0x55c4212cf820 "kernel-co
#3 0x00007fbbdadc5006 in __GI___assert_fail (assertion=0x55c4212cfbb5 "!stream->trace_chunk", file=0x55c4212cf820 "kernel-consumer/kernel-consumer.cpp", line=188, function=0x55c4212cfb00 "
#4 0x000055c421268cc6 in lttng_kconsumer_snapshot_channel (channel=0x7fbbc4000b60, key=1, path=0x7fbbd37f8fd4 "", relayd_id=
18446744073709551615, nb_packets_per_stream=0) at kernel-consume
#5 0x000055c42126b39d in lttng_kconsumer_recv_cmd (ctx=0x55c421b80a90, sock=31, consumer_sockpoll=0x7fbbd37fd280) at kernel-consumer/kernel-consumer.cpp:986
#6 0x000055c4212546d1 in lttng_consumer_recv_cmd (ctx=0x55c421b80a90, sock=31, consumer_sockpoll=0x7fbbd37fd280) at consumer/consumer.cpp:2090
#7 0x000055c421259963 in consumer_thread_sessiond_poll (data=0x55c421b80a90) at consumer/consumer.cpp:3281
#8 0x00007fbbdaf8b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#9 0x00007fbbdaeb0163 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
How to reproduce:
1. Setting a breakpoint on snapshot_channel() inside
src/common/ust-consumer/ust-consumer.cpp
2. When the breakpoint hits, remove the the complete lttng directory
containing the session data.
3. Continue the lttng_consumerd process from gdb.
4. In that case you see a negative return value -1 from
consumer_stream_create_output_files() inside snapshot_channel().
5. Take another snapshot and lttng_consumerd crashes because
of the `assert(!stream->trace_chunk)` in snapshot_channel().
This last action does not require any breakpoint intervention.
Cause
=====
During the snapshot, the stream is assigned the channel current chunk.
It is expected that the stream does not have a chunk at this point.
The error handling is faulty here, the stream chunk must be
invalidated/reset on error to allow its reuse later on.
The problem exists for both consumer domains (user/kernel).
Solution
========
For the ust consumer, we can directly use the `error_close_stream`
label.
For the kernel consumer, the code path is slightly different since it
does not uses `consumer_stream_close`. Note that `consumer_stream_close`
cannot be used as is for the kernel consumer. The current implementation
partially resembles `consumer_stream_close` at the end of the iteration.
It is extracted to its own function for easier reuse from the new
`error_finalize_stream` label.
Known drawbacks
=========
None.
Fixes: #1352
Signed-off-by: Marcel Hamer <marcel.hamer@windriver.com>
Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I9fc81917b19aa436ed8e8679672648f2d5baf41a