Productionize Streaming Jobs for Service Dependencies · jaegertracing/jaeger#4590

(17 comments) (0 reactions) (0 assignees)Go (2,326 forks)batch import

enhancementhelp wantedpossible mentorship

Repository metrics

Currently we have two analytics solutions for generating service maps:

Jaeger Analytics Flink
- Real time streaming, requires Kafka.
- More feature rich, includes code for both 1-hop and transitive dependency graphs -- https://www.jaegertracing.io/docs/1.47/features/#topology-graphs
- Aggregates data for a given time window (originally at Uber - 15min) and writes a summary snapshot to storage
- Not easy deployment solution is provided in the repository.
Spark Dependencies
- Batch job that reads all data for a period of time, aggregates, and writes a summary snapshot to storage.
- Does not require Kafka.
- Theoretically can be run as frequently as 15min to produce similar results as Flink jobs above, but the implementation for Cassandra may need to be tweaked for that.
- Does not support transitive dependency graphs.

Objectives:

Ideally we want a single code base that supports both types of service dependencies
The solution needs to be documented, packaged (e.g. published containers) and easy to deploy (e.g. with docker compose or k8s operator)
Supporting both batch (goes directly against span storage) and streaming (reads from Kafka) is nice to have

Research direction: Investigate existing Jaeger analytics Flink and Spark dependencies to design a unified Go based solution for service dependency graphs, supporting both batch and streaming modes. Research deployment options with Docker Compose and Kubernetes.
Tech stack: go
Domain: backenddata
Issue type: Feature
Difficulty: 3
Estimated time: Over 1 week
Activity status: Active
Clarity: Needs investigation
Prerequisites: GoDockerKubernetesDistributed Tracing
Newbie friendliness: 30