jaegertracing/jaeger
View on GitHubgRPC plugin framework should be able to recover from panics
Open
#1742 opened on Aug 19, 2019
area/storagehelp wanted
Description
Requirement - what kind of business use case are you trying to solve?
We are implementing a custom gRPC-based storage plugin as per this doc.
Problem - what in Jaeger blocks you from solving the requirement?
There are two related problems:
- When gRPC plugin panics the reason for panic is not shown in Jaeger logs, making the cause very hard to identify (related to: #1529)
- When plugin panics it is not restarted, so Jaeger enters an unusable state
Impact:
- Makes it very hard to make any plugin prod-ready
- Makes developer experience of writing a plugin very frustrating
- Plugin code becomes littered with
defer/recoverto prevent it from fully crashing and display helpful debug info
Proposal - what do you suggest to solve the problem or improve the existing situation?
Reason and a stacktrace should be already dumped into stderr of a plugin process at the time of panic, so Jaeger should be able to capture and log it.
Crashed plugins should be restarted.