jaegertracing/jaeger

gRPC plugin framework should be able to recover from panics

Open

#1,742 建立於 2019年8月19日

在 GitHub 查看
 (7 留言) (0 反應) (0 負責人)Go (18,974 star) (2,326 fork)batch import
area/storagehelp wanted

描述

Requirement - what kind of business use case are you trying to solve?

We are implementing a custom gRPC-based storage plugin as per this doc.

Problem - what in Jaeger blocks you from solving the requirement?

There are two related problems:

  • When gRPC plugin panics the reason for panic is not shown in Jaeger logs, making the cause very hard to identify (related to: #1529)
  • When plugin panics it is not restarted, so Jaeger enters an unusable state

Impact:

  • Makes it very hard to make any plugin prod-ready
  • Makes developer experience of writing a plugin very frustrating
  • Plugin code becomes littered with defer/recover to prevent it from fully crashing and display helpful debug info

Proposal - what do you suggest to solve the problem or improve the existing situation?

Reason and a stacktrace should be already dumped into stderr of a plugin process at the time of panic, so Jaeger should be able to capture and log it.

Crashed plugins should be restarted.

Any open questions to address

貢獻者指南

gRPC plugin framework should be able to recover from panics · jaegertracing/jaeger#1742 | Good First Issue