Productionize Streaming Jobs for Service Dependencies · jaegertracing/jaeger#4590

(17 评论) (0 反应) (0 负责人)Go (2,326 fork)batch import

enhancementhelp wantedpossible mentorship

仓库指标

Currently we have two analytics solutions for generating service maps:

Jaeger Analytics Flink
- Real time streaming, requires Kafka.
- More feature rich, includes code for both 1-hop and transitive dependency graphs -- https://www.jaegertracing.io/docs/1.47/features/#topology-graphs
- Aggregates data for a given time window (originally at Uber - 15min) and writes a summary snapshot to storage
- Not easy deployment solution is provided in the repository.
Spark Dependencies
- Batch job that reads all data for a period of time, aggregates, and writes a summary snapshot to storage.
- Does not require Kafka.
- Theoretically can be run as frequently as 15min to produce similar results as Flink jobs above, but the implementation for Cassandra may need to be tweaked for that.
- Does not support transitive dependency graphs.

Objectives:

Ideally we want a single code base that supports both types of service dependencies
The solution needs to be documented, packaged (e.g. published containers) and easy to deploy (e.g. with docker compose or k8s operator)
Supporting both batch (goes directly against span storage) and streaming (reads from Kafka) is nice to have

研究方向: 调查现有的Jaeger Analytics Flink和Spark依赖项，设计一个统一的基于Go的服务依赖图解决方案，支持批处理和流处理模式。研究使用Docker Compose和Kubernetes的部署选项。
技术栈: go
领域: backenddata
议题类型: 功能
难度: 3
预计时间: 超过 1 周
活动状态: 活跃
清晰度: 需要先调研
前置要求: GoDockerKubernetesDistributed Tracing
新手友好度: 30