apache/seatunnel

[Docs][Core] Add Javadoc to key lifecycle, checkpoint, and multi-table methods across seatunnel-api and seatunnel-engine

Open

#10533 opened on Feb 28, 2026

View on GitHub
 (2 comments) (1 reaction) (0 assignees)Java (6,897 stars) (1,432 forks)batch import
help wanted

Description

Summary

SeaTunnel's core runtime code is impressively engineered, but most key lifecycle methods across seatunnel-api (Connector V2 SPI) and seatunnel-engine (Zeta) carry zero Javadoc. This makes the learning curve for new contributors very steep.

This issue tracks adding concise Javadoc to the most critical areas. It is intentionally split into small, claimable sub-tasks so contributors can pick one file and submit a focused PR.


Why this matters (measured, not estimated)

File Lines Javadoc lines Coverage
CheckpointCoordinator.java 1 174 ~21 ~1.8 %
TaskExecutionService.java 1 018 ~24 ~2.4 %
CoordinatorService.java 1 177 ~50 ~4.3 %
MultiTableSinkWriter.java 373 0 0 %
MultiTableSink.java 207 0 0 %
MultiTableWriterRunnable.java 98 0 0 %
SupportResourceShare.java 30 0 0 %
SeaTunnelTask.java ~360 <5 < 2 %
SourceFlowLifeCycle.java ~350 <5 < 2 %
SinkFlowLifeCycle.java ~300 <5 < 2 %

Sub-tasks (each is a standalone good-first PR)

Claim a sub-task by commenting below.


[ ] Sub-task 1 — SupportResourceShare.java + MultiTableResourceManager.java

seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/SupportResourceShare.java seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/MultiTableResourceManager.java

These two small interfaces (~60 lines combined) are the extension points for sharing a connection pool across multiple sink writers. Without Javadoc, connector authors don't know when or whether to override them.

Methods to document:

Method Location What to explain
initMultiTableResourceManager(int tableSize, int queueSize) SupportResourceShare.java:22 When to override; what tableSize and queueSize mean; lifecycle (called once per job)
setMultiTableResourceManager(MultiTableResourceManager<T>, int queueIndex) SupportResourceShare.java:27 When called relative to initMultiTableResourceManager; meaning of queueIndex
getSharedResource() MultiTableResourceManager.java:25 Thread-safety contract; when Optional.empty() is acceptable
close() MultiTableResourceManager.java:29 Called by whom; ordering relative to writer close()

[ ] Sub-task 2 — MultiTableSink.java

seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/multitablesink/MultiTableSink.java

The central sink adapter wrapping multiple per-table sinks. None of its 12 public methods have Javadoc.

Methods to document:

Method Line What to explain
MultiTableSink(MultiTableFactoryContext) 59 How sinks map is populated; role of replicaNum and how it multiplies writer indices
createWriter(SinkWriter.Context) 71 How index = subtaskIndex * replicaNum + i scatters writers across queues
restoreWriter(SinkWriter.Context, List<MultiTableState>) 90 How checkpoint states are matched back to per-table writers
createCommitter() 130 Aggregation of per-table committers
createAggregatedCommitter() 152 Aggregation across tables
getSinkTables() 168 Fallback logic when getWriteCatalogTable() is absent
getWriteCatalogTable() 195 Why always Optional.empty() in multi-table context
supports() 200 Schema evolution delegation to sub-sinks

[ ] Sub-task 3 — MultiTableSinkWriter.java

seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/multitablesink/MultiTableSinkWriter.java

The most complex undocumented class: 373 lines, 0 method-level Javadoc, heavy concurrency.

Methods to document:

Method Line What to explain
Constructor 64 queueSize * 2 thread pool; BlockingQueue-to-writer assignment; queueIndex modulo distribution
initResourceManager(int) 112 One-shot init from first writer; broadcast to all via setMultiTableResourceManager
subSinkErrorCheck() 134 Propagates errors from async writer threads to the caller thread
write(SeaTunnelRow) 179 Primary-key hashing for ordered delivery (primaryKeyhashCode % queueSize); random queue for no primary key; role of submitted flag
prepareCommit(long) 246 Parallel per-table prepareCommit inside executor; future aggregation
snapshotState(long) 222 Synchronization with runnable to prevent concurrent mutation
abortPrepare() 294 Error aggregation across sub-writers
close() 322 executorService.shutdownNow() then per-writer close(); resource manager teardown
checkQueueRemain() 361 Busy-wait until all queues drain; purpose before state snapshot / prepareCommit

[ ] Sub-task 4 — MultiTableWriterRunnable.java

seatunnel-api/src/main/java/org/apache/seatunnel/api/sink/multitablesink/MultiTableWriterRunnable.java

Methods to document:

Method Line What to explain
Constructor 37 tableWriterMap keyed by tableIdentifier; queue ownership
run() 45 Infinite loop draining queue; row arity == 0 as control signal for schema evolution; synchronized(this) for consistent snapshotState
getThrowable() 91 Returns first unhandled exception; null means healthy
getCurrentTableId() 95 Diagnostic; may be null/empty between rows

[ ] Sub-task 5 — SeaTunnelTask.java

seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/task/SeaTunnelTask.java

The abstract base for all Zeta task executions. The state machine in stateProcess is completely undocumented.

Methods to document:

Method Line What to explain
init() 123 Flow-to-lifecycle conversion; metrics context wiring; flowFutures registration
stateProcess() 137 Full state machine: WAITING_RESTORE → open() → RUNNING → collect() → PREPARE_CLOSE → close(); restore completion check; barrier-driven close
convertFlowToActionLifeCycle(Flow) 197 Recursive DAG traversal; ordering of outputs list
ack(Barrier) 312 Per-barrier ACK accumulation; prepareClose signal from barrier
addState(Barrier, ActionStateKey, List<byte[]>) 352 Checkpoint state accumulation before ACK
triggerSchemaChangeBeforeCheckpoint() 334 Propagates schema-change barrier to upstream
triggerSchemaChangeAfterCheckpoint() 343 Propagates schema-change-complete barrier
close() 299 Ordered teardown of all flow lifecycles; error aggregation

[ ] Sub-task 6 — SourceFlowLifeCycle.java

seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/task/flow/SourceFlowLifeCycle.java

Methods to document:

Method Line What to explain
init() 115 Reader creation from SourceAction; split serializer setup
open() 129 Reader open() + enumerator registration via register()
collect() 150 reader.pollNext(collector) loop; checkpoint-lock interaction; schema-change phase detection; idle yield
triggerBarrier(Barrier) 267 Lock-acquire on checkpointLock before injecting barrier; prepareClose propagation; schema-change checkpoint branching
signalNoMoreElement() 198 Initiates graceful reader shutdown; sends ReaderCloseOperation to enumerator
requestSplit() 230 Sends RequestSplitOperation to remote enumerator
receivedSplits(List<SplitT>) 259 Deserializes and forwards splits to reader.addSplits
notifyCheckpointComplete(long) 330 Delegates to reader.notifyCheckpointComplete
notifyCheckpointAborted(long) 335 Clears schema-change phase if it matches the aborted checkpoint
register() 215 Sends RegisterReaderOperation to enumerator; resolves enumerator address lazily

[ ] Sub-task 7 — SinkFlowLifeCycle.java

seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/task/flow/SinkFlowLifeCycle.java

Methods to document (key ones):

Method What to explain
init() Writer creation from context (fresh vs restore path)
received(Record<?>) Barrier vs data row dispatch; prepareCommit trigger
triggerBarrier(Barrier) prepareCommit() call and state collection; close-barrier propagation
close() Ordered writer teardown; committer teardown

[ ] Sub-task 8 — CoordinatorService.java (key private methods)

seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/CoordinatorService.java

Selected methods (full class is large; focus on the least-documented scheduling logic):

Method Line What to explain
startPendingJobScheduleThread() 227 Background scheduling thread purpose; loop/exception handling
pendingJobSchedule() 250 Peek-then-consume pattern; PendingSourceState.RESTORE vs SUBMIT semantics; resource pre-check
completeFailJob(JobMaster, ...) 334 Why insufficient-resource jobs are removed differently than other failures
restoreAllRunningJobFromMasterNodeSwitch() 449 Master HA failover: job state recovery from IMap
restoreJobFromMasterActiveSwitch(Long, JobInfo) 506 Per-job restore path during master switch
checkNewActiveMaster() 539 Hazelcast master detection and coordinator activation
clearCoordinatorService() 566 Graceful shutdown ordering

[ ] Sub-task 9 — CheckpointCoordinator.java (key methods)

seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/checkpoint/CheckpointCoordinator.java

Selected undocumented methods:

Method Line What to explain
reportedTask(TaskReportStatusOperation) 256 State machine entry: maps task status to restore or allTaskReady
restoreTaskState(TaskLocation) 306 Checkpoint-state lookup and distribution to tasks; parallelism mapping; COORDINATOR_INDEX special case
allTaskReady() 346 Triggers first checkpoint cycle or savepoint; sets isAllTaskReady
tryTriggerPendingCheckpoint(CheckpointType) 500 Interval enforcement; concurrent pending checkpoint guard; scheduling
acknowledgeTask(TaskAcknowledgeOperation) 905 Accumulates per-task ACKs in PendingCheckpoint; triggers completePendingCheckpoint when all tasks ACK
completePendingCheckpoint(CompletedCheckpoint) 942 Persists to storage; updates IMap state; notifies all tasks; prunes old checkpoints
cleanPendingCheckpoint(CheckpointCloseReason) 853 Abort path: reason-to-action mapping; failure vs cancel semantics
readyToClose(TaskLocation) 424 Tracks source-complete signals; triggers final savepoint barrier
notifyTaskStart() 397 Broadcasts task-start event; role in startup sequencing
scheduleSchemaChangeBeforeCheckpoint() 1111 Initiates schema-evolution checkpoint protocol

Undocumented state fields (add field-level Javadoc):

Field Line What to explain
readyToCloseStartingTask 119 Which tasks have finished their data; used to gate final barrier
readyToCloseIdleTask 120 Source tasks in idle (no-data) state awaiting close
closedIdleTask 121 Idle tasks that have fully closed
schemaChanging 136 Guards checkpoint trigger during DDL events
isAllTaskReady 143 Set once; used to skip task-ready wait on subsequent checkpoints

[ ] Sub-task 10 — TaskExecutionService.java (inner classes)

seatunnel-engine/seatunnel-engine-server/src/main/java/org/apache/seatunnel/engine/server/TaskExecutionService.java

The inner classes implement the cooperative/blocking execution model and are completely undocumented:

Class / Method Line What to explain
CooperativeTaskWorker (class) 731 Work-stealing cooperative executor; exclusiveTaskTracker for single-task phases
CooperativeTaskWorker.run() 753 Task-stealing loop; exclusive vs shared task semantics
RunBusWorkSupplier.runNewBusWork(boolean) 849 Creates new CooperativeTaskWorker threads on demand
BlockingWorker.run() 667 Dedicated thread per blocking task; lifecycle tracking
TaskGroupExecutionTracker.taskDone(Task) 928 All-tasks-complete detection; triggers group completion
TaskGroupExecutionTracker.exception(Throwable) 901 First-exception wins; cancels all sibling tasks
submitThreadShareTask(...) 210 Routes cooperative tasks into the shared queue
submitBlockingTask(...) 235 Launches dedicated blocking thread per task
deployTask(TaskGroupImmutableInformation) 276 Deserializes, classloads, and starts a task group
notifyTaskStatusToMaster(...) 442 Reports task-level status changes to CoordinatorService

Comment style

Follow the existing project convention:

/**
 * Drains all blocking queues before a checkpoint snapshot or commit.
 *
 * <p>Busy-waits until every per-table {@link BlockingQueue} is empty, polling every 100 ms.
 * Must be called while holding the runnable lock to prevent concurrent writes.
 *
 * @throws RuntimeException if interrupted while waiting
 */
private void checkQueueRemain() { ... }

Guidelines:

  • Describe what the method does and when it is called in the lifecycle
  • Note thread-safety and concurrency constraints where relevant
  • Reference related methods or state fields with {@link ...}
  • 2–6 lines per method is enough; avoid restating the signature

How to contribute

  1. Comment on this issue to claim a sub-task (e.g., "I'll take sub-task 3")
  2. Add Javadoc to the methods listed for that sub-task
  3. Run ./mvnw spotless:apply and ./mvnw -q -DskipTests verify before opening a PR
  4. PR title format: [Docs][Core] Add Javadoc to MultiTableSinkWriter (one class per PR preferred)

This is a good first issue — no logic changes, just reading and writing clear documentation. It is also a great way to deeply understand how SeaTunnel moves data from source to sink.

Contributor guide