From: Nils Carlson <nils.carlson@ericsson.com>
Date: Mon, 27 Sep 2010 08:54:48 +0000 (+0200)
Subject: Re-write ustcomm parts of UST v2
X-Git-Tag: v0.8~12
X-Git-Url: https://git.liburcu.org/?a=commitdiff_plain;h=4723ca096d740ff93da400df304c9902e9834e5f;hp=4723ca096d740ff93da400df304c9902e9834e5f;p=ust.git

Re-write ustcomm parts of UST v2

Changes since v1:

Updated after comments from David Goulet and resulting insights.
* Add a continue after a failed accept
* Fix some malloc issues
* Fix some coding style in the patch
* make del_named_sock free the memory, even if all else fails.

Notice: valgrind test-case currently broken. Needs an exception.

Description:

This is a very big patch, and so it requires a bit of explaining.

This patch is a step on the way of accomplishing serveral goals I have in this
area:

1. Use enums for commands and eliminate text-based commands. This does not mean
   that we will stop processing strings for trace/channel and marker names;
   just that the long series of if statements with token and string matching
   will be replaced with a switch statement. To this end I have created a
   ustcomm_header struct that contains the length of the data-field and some
   other fields. This allows us to first receive the header, allocate memory
   for the data and then receive the data; eliminating all scanning of messages.

2. Reduce the complexity of the implementation. To put it simply, I don't like
   callbacks. They reduce transparency and make it difficult to follow the
   flow of the code; so I have eliminated multipoll replacing it with a normal
   epoll. I have also replaced almost all the different server, connection and
   source structs with one, called ustcomm_sock.

3. Make ustd scale better. Currently ustd scales terribly. We allocate one
   thread per-cpu per-channel per-process, five applications each with three
   channels on a four cpu machine leads to 5*3*4=60 threads. Part of the reason
   for this multitude of threads was that we used a ustcomm_request call
   (consisting of a send and a receive) to wait for a subbuffer to be written.
   The sequence for a subbuffer to be written was as follows:

      Ustd calls send with a 'get_subbuffer' command, and then recv in one of
      the threads and hangs on the recv on the socket.

      Upon filling the subbuffer the traced app writes '1' to a pipe.

      The ust_thread inside the app which was listening to the other end of the
      pipe wakes up when the '1' is written. The callback from multipoll calls
      a send which sends a reply to the ustd thread over the socket.

      The ustd thread wakes up and reads the message, continuing along in its
      execution.

   I replace this with a bit of a different mechanism, which should allow us
   to eventually reduce the number of threads to one per cpu:

      Ustd requests a buffer_fd which causes the ustd_thread inside the app
      to send the file-descriptor for the read en of the pipe to ustd.

      The ustd thread now does a read on the pipe, halting its execution until
      the app fills the subbuffer and writes '1' to the pipe, waking up the ustd
      thread.

      Ustd now makes the 'get_subbuffer' call which the ust_thread inside the
      app responds to with information about the subbuffer. Writes it and then
      goes back to the read call, hanging on the pipe.

   So we are still stuck on the multitude of threads, but we are in much better
   position to move forward. Replacing the read with an epoll statement and then
   pointing the epoll event data at the buffer struct containing the current
   buffer to whitch the pipe belongs should be relatively easy. We can then
   instead of spawning a new thread for each buffer just allocate the
   buffer_info struct and assign it to one of the per-cpu threads in ustd to
   poll on.

4. Replace poll with epoll which scales better, especially for
   events << (nr of fds). This is complete.

5. Allow UST to handle arbitrarily long unix socket names. This is done by
   carefull allocation of the socketaddr_un struct with a dynamic length.
   Truncating is ugly and dangerous.

There is a lot of work still left to be done. This is only the first of a
number of patches that I expect in this area. If someone feels like tackling
ustd head on to reduce the number of threads that would be great.

I have kept Pierre-Marc's form of error handling for the I/O wrapping functions
because I want to propagate return codes up to the apps that are using them
so they can close file-descriptors and free associated resources. If somebody
knows of a better approach please make yourself heard.

Signed-off-by: Nils Carlson <nils.carlson@ericsson.com>
Acked-by: David Goulet <david.goulet@polymtl.ca>
---