Support feeding from Spark · vespa-engine/vespa#9158

(6 评论) (0 反应) (0 负责人)Java (4,948 star) (561 fork)batch import

enhancementgood first issue

描述

Today, the Hadoop integration tools for Vespa support Hadoop and Pig for feeding and querying Vespa. The Pig feeder is a thin wrapper around the Vespa HTTP client.

We should support feeding directly from Spark as well, to avoid Spark pipelines having to write to HDFS and run another Pig job for the actual feeding. Similarly to the Pig feeder, this could be implemented as a thin wrapper around the HTTP client.

贡献者指南

技术栈: javarest api
领域: backenddata
议题类型: feature
难度: 3
预计时间: 3-5 days
活动状态: stale
清晰度: clear
前置要求: Familiarity with SparkUnderstanding of Vespa HTTP clientJava development experience
新手友好度: 40
研究方向: Review the existing Hadoop integration in the vespa hadoop directory and the Vespa HTTP client documentation. Identify the key components used in the Pig feeder wrapper and adapt them for Spark. Consider how to handle data ingestion in parallel and error handling. Look at comments in the issue for any additional requirements or constraints.