Support feeding from Spark · vespa-engine/vespa#9158

(6 comments) (0 reactions) (0 assignees)Java (561 forks)batch import

enhancementgood first issue

Repository metrics

Stars: (4,948 stars)
PR merge metrics: (Avg merge 1d 22h) (209 merged PRs in 30d)

Description

Today, the Hadoop integration tools for Vespa support Hadoop and Pig for feeding and querying Vespa. The Pig feeder is a thin wrapper around the Vespa HTTP client.

We should support feeding directly from Spark as well, to avoid Spark pipelines having to write to HDFS and run another Pig job for the actual feeding. Similarly to the Pig feeder, this could be implemented as a thin wrapper around the HTTP client.

Contributor guide

Research direction: Study the existing Hadoop and Pig integration code to understand how they wrap the Vespa HTTP client, then implement a similar Spark DataSource or Foreach writer for Spark.
Tech stack: java
Domain: backenddatabase
Issue type: Feature
Difficulty: 2
Estimated time: 1-2 days
Activity status: Active
Clarity: Clear
Prerequisites: JavaGit
Newbie friendliness: 60

Repository metrics

Description

Contributor guide

Get fresh easy issues in your inbox.