Add the ability to store/retrieve incomplete/partial spans
#729 opened on Mar 6, 2018
Description
Summary
Allow clients to export partial spans, to support two use cases:
- Flush a long running span before it is finished, in case the process crashes before finishing it
- Enrich existing span with information from other sources, e.g. to record log events not captured via tracing SDK
Details
For the first version of this feature, we shall assume the following
- clients can flush and report partial spans; but not span deltas
durationis monotonically increasing- the final span has the longest duration
For e.g., a client can do the following
time = 1s
Report Span[traceID=1, spanID=2, duration=10s, operationName="someOperation"]
time = 2s
Report Span[traceID=1, spanID=2, duration=20s, operationName="someOperation"]
time = 3s
Report Span[traceID=1, spanID=2, duration=40s, operationName="someOperation", tags=...]
To support this, the backend would need the ability to resolve merge conflicts on spans. For V1, this means simply selecting the longest span.
On Cassandra, the jaeger-collector uses the model spanhash, guaranteeing that all partial spans are stored: https://github.com/jaegertracing/jaeger/blob/412baf60db217d6f41d7fdca04358069362b29ca/plugin/storage/cassandra/spanstore/dbmodel/converter.go#L45
On ElasticSearch, we use the index api, which performs upserts. https://github.com/jaegertracing/jaeger/blob/e52ecffbf69a572593504ea9acc4ff65854a3e9a/plugin/storage/es/spanstore/writer.go#L179-L182
To make matters more interesting, note that Jaeger supports storing Zipkin spans that share the same spanID for client and server spans. Jaeger adjusts these spanIDs during query time as seen here: https://github.com/jaegertracing/jaeger/blob/412baf60db217d6f41d7fdca04358069362b29ca/model/adjuster/span_id_deduper.go#L23-L31
Another point to note is that jaeger-collector makes no guarantee that spans are stored in the order that they are received.
Enhancements
- Communicate that jaeger-query is serving incomplete spans to the user. One approach is to do something similar to https://github.com/jaegertracing/jaeger-ui/issues/132. We might have to enhance the Span model to store this information
- Allow for jaeger-clients to report that the span it is sending is the final span. This allows us to remove the assumption about monotonically increasing duration.
- Add support for span deltas, which might allow for clients to reduce state maintained. (Might be at odds with the previous point)
- Enhance Span model to store the lineage of a span (whether it was generated from a Zipkin or Jaeger span), this allows for more robust merging behavior
For more context on use cases and prior discussion see https://github.com/jaegertracing/jaeger-client-java/issues/231
Similar tickets: