jaegertracing/jaeger

gRPC plugin framework should be able to recover from panics

Open

#1742 opened on Aug 19, 2019

View on GitHub
 (7 comments) (0 reactions) (0 assignees)Go (18,974 stars) (2,326 forks)batch import
area/storagehelp wanted

Description

Requirement - what kind of business use case are you trying to solve?

We are implementing a custom gRPC-based storage plugin as per this doc.

Problem - what in Jaeger blocks you from solving the requirement?

There are two related problems:

  • When gRPC plugin panics the reason for panic is not shown in Jaeger logs, making the cause very hard to identify (related to: #1529)
  • When plugin panics it is not restarted, so Jaeger enters an unusable state

Impact:

  • Makes it very hard to make any plugin prod-ready
  • Makes developer experience of writing a plugin very frustrating
  • Plugin code becomes littered with defer/recover to prevent it from fully crashing and display helpful debug info

Proposal - what do you suggest to solve the problem or improve the existing situation?

Reason and a stacktrace should be already dumped into stderr of a plugin process at the time of panic, so Jaeger should be able to capture and log it.

Crashed plugins should be restarted.

Any open questions to address

Contributor guide