jaegertracing/jaeger

Jaeger Query can OOM when retrieving large traces

Open

#1,051 opened on 2018年9月5日

GitHub で見る
 (38 comments) (1 reaction) (0 assignees)Go (18,974 stars) (2,326 forks)batch import
area/storagebughelp wantedperformance

説明

Requirement - what kind of business use case are you trying to solve?

  • Retrieve large traces
  • Be resilient against bad instrumentation using same traceID for all traces

Problem - what in Jaeger blocks you from solving the requirement?

Jaeger Query OOMs on retrieval of large traces on Cassandra. If someone is crafty, they can easily create a trace with millions of spans, and attempt to retrieve it to systematically bring down all jaeger-query instances.

Proposed Solution - Cassandra

We might do some combination of the following:

  • Trace retrieval limits: Test that the number of spans per trace is less than a user defined threshold before retrieving spans.
  • Protect against large spans submitted on the HTTP POST endpoints by setting a user defined span size limit.
  • Limit number of concurrent requests served by the HTTP GET handler so that we can accurately predict and bound worst case memory utilization.

Any open questions to address

  • Does this affect ES as well?

コントリビューターガイド

Jaeger Query can OOM when retrieving large traces · jaegertracing/jaeger#1051 | Good First Issue