LTTng Project Updates Outline Outline LTTng 2.11 Upcoming LTTng - - PowerPoint PPT Presentation

lttng project updates outline outline
SMART_READER_LITE
LIVE PREVIEW

LTTng Project Updates Outline Outline LTTng 2.11 Upcoming LTTng - - PowerPoint PPT Presentation

Polytechnique Montral Polytechnique Montral December 2019 December 2019 LTTng Project Updates Outline Outline LTTng 2.11 Upcoming LTTng features LTTng 2.12 & 2.13 Babeltrace 2.0 Restartable Sequences


slide-1
SLIDE 1

LTTng Project Updates

Polytechnique Montréal Polytechnique Montréal December 2019 December 2019

slide-2
SLIDE 2

Polytechnique Progress Report - December 2019 2

Outline Outline

  • LTTng 2.11
  • Upcoming LTTng features

LTTng 2.12 & 2.13

  • Babeltrace 2.0
  • Restartable Sequences
slide-3
SLIDE 3

Ericsson Workshop - December 2019 3

Released on October 19th 2019 (v2.11.0) Very big release:

– Two years of development, – Lots of new features, – Required significant re-engineering:

  • Protocols (no breaking changes),
  • Internal file management.

Spent ~1 year in Release Candidate (beta) to ensure a smooth release:

– Fixing issues uncovered in testing, – Developing 2.12 in parallel.

LTTng 2.11 – Release Status LTTng 2.11 – Release Status

slide-4
SLIDE 4

Ericsson Workshop - December 2019 4

  • Session rotation (details on following slides),
  • Dynamic tracing of user-space (from kernel, Uprobe-based),
  • Support of arrays and bit-wise binary operators in filters,
  • User and kernel space call-stack capture (from kernel-space),
  • Improved performance of relay daemon:

– Handling of slow clients and network errors,

  • NUMA-aware buffer allocations by the user-space tracer,
  • Support unloading of user-space probe providers (dlclose).

LTTng 2.11 – New Features LTTng 2.11 – New Features

slide-5
SLIDE 5

Ericsson Workshop - December 2019 5

Session Rotation Session Rotation

Motivation:

– Tracing can be left running for a long time, – Resulting traces can be huge, – Want to process traces as they are being produced,

Apply the concept of log rotations to traces:

– Provide trace archives (“chunks”) that can be processed

independently.

slide-6
SLIDE 6

Ericsson Workshop - December 2019 6

Session Rotation – Use-cases Session Rotation – Use-cases

  • Process traces before the end of a test run,
  • Read traces without stopping traces (without using “live”),
  • Pipeline and/or shard trace analysis (scale-out),
  • Encryption,
  • Compression,
  • Clean-up of old chunks (keep a bounded backlog of traces),
  • Integration with external message buses (Kafka, ZeroMQ, etc.)
slide-7
SLIDE 7

Ericsson Workshop - December 2019 7

Rotating a tracing session Rotating a tracing session

Immediate rotation:

$ l t t n g r

  • t

a t e

  • s

e s s i

  • n

m y _ s e s s i

  • n

Scheduled rotation:

$ l t t n g e n a b l e

  • r
  • t

a t i

  • n
  • s

e s s i

  • n

m y _ s e s s i

  • n
  • t

i m e r 3 s $ l t t n g e n a b l e

  • r
  • t

a t i

  • n
  • s

e s s i

  • n

m y _ s e s s i

  • n
  • s

i z e 5 M

slide-8
SLIDE 8

Ericsson Workshop - December 2019 8

Session Rotation Session Rotation

As produced by LTTng, a CTF trace is a set of files

– One event stream file per CPU – A metadata file describing the layout of the event streams

CPU 0

Packet Packet Packet Packet Packet

Stream 0

Packet Packet Packet Packet Packet

Stream 1 Metadata stream

CPU 1

slide-9
SLIDE 9

Ericsson Workshop - December 2019 9

Session rotation – step by step Session rotation – step by step

Stream 0 Stream 1 Metadata stream

Kernel

Stream 0 Stream 1 Metadata stream

User space Chunk 0

$ l t t n g r

  • t

a t e

  • s

e s s i

  • n

m y _ s e s s i

  • n
  • Sample production position of every stream
  • Establish a per-stream “switch-over” point
  • Flush the layout description of all events declared

up to the “switch-over” point

  • Consume tracing data up to the “switch-over”

point

  • Notify user of trace archive chunk availability
slide-10
SLIDE 10

Ericsson Workshop - December 2019 10

Session rotation Session rotation

Chunk 0

Stream 0 Stream 1 Metadata stream

Kernel

Stream 0 Stream 1 Metadata stream

User space Chunk 1

Stream 0 Stream 1 Metadata stream

Kernel

Stream 0 Stream 1 Metadata stream

User space Chunk 0

slide-11
SLIDE 11

Ericsson Workshop - December 2019 11

Session rotation Session rotation

Chunk 0

Stream 0 Stream 1 Metadata stream

Kernel

Stream 0 Stream 1 Metadata stream

User space Chunk 1

Stream 0 Stream 1 Metadata stream

Kernel

Stream 0 Stream 1 Metadata stream

User space Chunk 0

slide-12
SLIDE 12

Polytechnique Progress Report - December 2019 12

  • UID/GID tracker,
  • File descriptor pooling (relay daemon),
  • Fast clear,
  • Container support (namespace contexts),
  • Working directory override (relay daemon),
  • Trace hierarchy by session or host name (relay daemon),
  • Version tracking.

LTTng 2.12 – New Features LTTng 2.12 – New Features

slide-13
SLIDE 13

Polytechnique Progress Report - December 2019 13

UID/GID Tracker UID/GID Tracker

  • Specialized filtering mechanism for UID/GID tracking:

– Makes it possible to create tracing buffers only for some

users/groups (or applications, in per-PID buffering mode),

– Works in the same way as the existing PID tracker functionality,

  • Reduces memory use on multi-user setups when tracing in per-

UID mode.

slide-14
SLIDE 14

Polytechnique Progress Report - December 2019 14

File Descriptor Pooling File Descriptor Pooling

  • Impose a hard cap on the number of file descriptors opened by

the relay daemon (--fd-pool-size),

  • The LTTng file format causes many files to be opened

simultaneously:

– Metadata file + one file per data stream (i.e. per CPU), – Doubled when a live client is consuming the trace (files opened for

writing and reading),

  • Many support cases reported file descriptor exhaustion:

– Not always possible to increase the system limit for administrative

reasons (team doesn’t have the necessary permissions on the system).

slide-15
SLIDE 15

Polytechnique Progress Report - December 2019 15

Clear command Clear command

  • Discard the data recorded for a session,
  • Builds on the work done in 2.11 for session rotations,
  • Tracing setup time is greatly reduced for teams running multiple test runs:

Run test, read trace, clear,

No need to re-create the session, channels, etc.

  • Works with live clients:

Live clients will skip-ahead to the newest data after a clear,

  • Useful when debugging:

Try to reproduce a problem, clear between attempts,

$ l t t n g c l e a r

  • s

e s s i

  • n

m y _ s e s s i

  • n
  • Use of clear can be disallowed per relayd process:

LTTNG_RELAYD_DISALLOW_CLEAR environment variable.

slide-16
SLIDE 16

Polytechnique Progress Report - December 2019 16

Container Support (namespace contexts) Container Support (namespace contexts)

  • Allow the capture of the namespaces of the current process when an event
  • ccurs (available from both kernel and user space tracers):

– Cgroup, – IPC, – Mount, – Network, – PID, – User, – UTS (hostname and domain name).

  • It is then possible to map the events back to a container name (e.g. Docker or

LXD user-visible name),

  • Namespace hierarchy can be dumped to the trace on-demand.
slide-17
SLIDE 17

Polytechnique Progress Report - December 2019 17

Working Directory Override (Relay Daemon) Working Directory Override (Relay Daemon)

  • New -
  • w
  • r

k i n g

  • d

i r e c t

  • r

y

  • ption changes the working

directory of the relay daemon,

  • Helpful for teams who launch the relay daemon from a drive

that should be un-mountable,

  • Used to set the working directory to a writeable directory so that

core dumps can be written.

slide-18
SLIDE 18

Polytechnique Progress Report - December 2019 18

Trace hierarchy by session or host name Trace hierarchy by session or host name

  • Two new options for the relay daemon:

– -

  • g

r

  • u

p

  • u

t p u t

  • b

y

  • s

e s s i

  • n

,

– -

  • g

r

  • u

p

  • u

t p u t

  • b

y

  • h
  • s

t .

  • Allows users to control the path hierarchy of traces produced by

the relay daemon:

– By hostname (default):

  • r

e l a y d _

  • u

t p u t / h

  • s

t _ n a m e / s e s s i

  • n

_ n a m e /

– By session name:

  • r

e l a y d _

  • u

t p u t / s e s s i

  • n

_ n a m e / h

  • s

t _ n a m e /

  • Makes it easier to collect all traces from a cluster.
slide-19
SLIDE 19

Polytechnique Progress Report - December 2019 19

Version Tracking Version Tracking

  • Introduced a mechanism to register out-of-tree changes applied
  • n top of LTTng,
  • Objective is to make it easy to know the exact version of LTTng

running on systems when a support ticket is created,

  • Vendors often add custom patches which can cause problems

that are hard to track for us,

  • Requires the cooperation of the vendors to “register” those

patches at build time: $ l t t n g

  • v

e r s i

  • n
slide-20
SLIDE 20

Polytechnique Progress Report - December 2019 20

LTTng 2.12 – Release Status LTTng 2.12 – Release Status

  • Currently putting the finishing touches to the clear command:

– Fixing issues following internal testing.

  • Most of the features are present upstream (master branch),
  • Release Candidate planned by the end of the year (before

December 20th):

– Final release date depends on the feedback we get, – We expect this phase to be fairly short as the changes were not as

invasive as previous releases.

slide-21
SLIDE 21

Polytechnique Progress Report - December 2019 21

LTTng 2.13 – New Features LTTng 2.13 – New Features

  • Dynamic Snapshots (triggers) is the major focus of this release,
  • A new top-level concept will be introduced: triggers

– Triggers can be associated to an event rule and trigger an action when

that event rule is met,

  • Supported actions:

– Start tracing, – Stop tracing, – Rotate session, – Record snapshot, – Notify.

slide-22
SLIDE 22

Polytechnique Progress Report - December 2019 22

Dynamic Snapshot / Triggers Dynamic Snapshot / Triggers

$ l t t n g c r e a t e

  • t

r i g g e r

  • i

d m y _ i d

  • u

s e r s p a c e

  • t

r a c e p

  • i

n t p r

  • v

i d e r : h e l l

  • f

i l t e r ‘ c a l l e r _ i d = = 1 4 2 2 4 3 2 ’

  • a

c t i

  • n

s t

  • p

s e s s i

  • n

_ n a m e

  • a

c t i

  • n

s n a p s h

  • t

s e s s i

  • n

_ n a m e

  • When the h

e l l

  • event occurs with c

a l l e r _ i d 1422432, a session is stopped and a snapshot is recorded.

slide-23
SLIDE 23

Polytechnique Progress Report - December 2019 23

Dynamic Snapshot / Triggers Dynamic Snapshot / Triggers

  • The notify action allows external applications to receive the

contents of an event associated to a trigger,

  • Allows complex scenarios that reach beyond the scope of

LTTng, for example:

– A communication error occurs in a code path instrumented with an

LTTng tracepoint,

– An application can listen for that specific event and receive a

notification when it occurs,

– Inspect the payload of the event to connect to the machine that was

involved and take a snapshot on that machine.

slide-24
SLIDE 24

Polytechnique Progress Report - December 2019 24

Dynamic Snapshot / Triggers Dynamic Snapshot / Triggers

  • Like regular events, triggers can be dropped when the system

is overloaded:

– Dropped events are accounted for in aggregation maps,

  • Triggers can be associated to counters:

– Trigger once after n matches, – Trigger after every n matches.

slide-25
SLIDE 25

Polytechnique Progress Report - December 2019 25

Babeltrace 2.0 Babeltrace 2.0

  • Reaching a stable release after 5 years of development,
  • Last year was mostly performance improvements and API

clean-ups,

  • Focus on easing the transition from Babeltrace 1:

– Performance is now slightly better than Babeltrace 1, – Can co-exist with Babeltrace 1 on the same machine.

  • Documentation is the only remaining milestone for release.
slide-26
SLIDE 26

Ericsson Workshop - December 2019 26

Restartable Sequences Restartable Sequences

  • Restartable sequence system call:

– Allow per-CPU operations in user space, – End goal is to eliminate atomic operations from the user space

tracer’s fast-path,

– Useful for other use-cases (e.g. memory allocators), – Merged in Linux 4.18.

  • Integrating the syscall in glibc is crucial for adoption,
  • Still working on the missing pieces for LTTng-ust integration.
slide-27
SLIDE 27

Polytechnique Progress Report - December 2019 27

Questions ? Questions ?

 lttng.org  lttng-dev@lists.lttng.org  @lttng_project  # l t t n g O F T C