apache/seatunnel

[Feature] Connector prepare for RAG

Open

#9713 opened on Aug 18, 2025

View on GitHub
 (3 comments) (4 reactions) (0 assignees)Java (6,897 stars) (1,432 forks)batch import
featuregood first issuehelp wantedllm

Description

Search before asking

  • I had searched in the feature and found no similar feature requirement.

Description

As a multimodal data integration tool, we hope that SeaTunnel can support parsing complex file types, converting their contents into structured file streams, and ultimately writing them into a vector library through embedding. This issue tracks related tasks.

For chunking please refer Please refer https://docs.dify.ai/en/guides/knowledge-base/create-knowledge-and-upload-documents/chunking-and-cleaning-text and https://docs.llamaindex.ai/en/stable/examples/node_parsers/semantic_chunking/

Usage Scenario

No response

Related issues

No response

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Contributor guide