Support feeding from Spark · vespa-engine/vespa#9158

(6 comments) (0 reactions) (0 assignees)Java (4,948 stars) (561 forks)batch import

enhancementgood first issue

説明

Today, the Hadoop integration tools for Vespa support Hadoop and Pig for feeding and querying Vespa. The Pig feeder is a thin wrapper around the Vespa HTTP client.

We should support feeding directly from Spark as well, to avoid Spark pipelines having to write to HDFS and run another Pig job for the actual feeding. Similarly to the Pig feeder, this could be implemented as a thin wrapper around the HTTP client.

コントリビューターガイド

技術スタック: javarest api
領域: backenddata
Issue 種別: feature
難度: 3
推定時間: 3-5 days
活動状況: stale
明確さ: clear
前提条件: Familiarity with SparkUnderstanding of Vespa HTTP clientJava development experience
初心者向け度: 40
調査方針: Review the existing Hadoop integration in the vespa hadoop directory and the Vespa HTTP client documentation. Identify the key components used in the Pig feeder wrapper and adapt them for Spark. Consider how to handle data ingestion in parallel and error handling. Look at comments in the issue for any additional requirements or constraints.