Productionize Streaming Jobs for Service Dependencies · jaegertracing/jaeger#4590

(17 留言) (0 反應) (0 負責人)Go (2,326 fork)batch import

enhancementhelp wantedpossible mentorship

倉庫指標

Currently we have two analytics solutions for generating service maps:

Jaeger Analytics Flink
- Real time streaming, requires Kafka.
- More feature rich, includes code for both 1-hop and transitive dependency graphs -- https://www.jaegertracing.io/docs/1.47/features/#topology-graphs
- Aggregates data for a given time window (originally at Uber - 15min) and writes a summary snapshot to storage
- Not easy deployment solution is provided in the repository.
Spark Dependencies
- Batch job that reads all data for a period of time, aggregates, and writes a summary snapshot to storage.
- Does not require Kafka.
- Theoretically can be run as frequently as 15min to produce similar results as Flink jobs above, but the implementation for Cassandra may need to be tweaked for that.
- Does not support transitive dependency graphs.

Objectives:

Ideally we want a single code base that supports both types of service dependencies
The solution needs to be documented, packaged (e.g. published containers) and easy to deploy (e.g. with docker compose or k8s operator)
Supporting both batch (goes directly against span storage) and streaming (reads from Kafka) is nice to have

研究方向: 調查現有的Jaeger Analytics Flink和Spark依賴項，設計一個統一的基於Go的服務依賴圖解決方案，支援批次處理和串流處理模式。研究使用Docker Compose和Kubernetes的部署選項。
技術棧: go
領域: backenddata
議題類型: 功能
難度: 3
預計時間: 超過 1 週
活動狀態: 活躍
清晰度: 需要先調研
前置要求: GoDockerKubernetesDistributed Tracing
新手友善度: 30