feathr-ai/feathr

[BUG] get_offline_feature ignores `parquet` output file option

Open

#716 建立於 2022年9月29日

在 GitHub 查看
 (2 留言) (0 反應) (0 負責人)Scala (1,929 star) (244 fork)batch import
buggood first issue

描述

Willingness to contribute

No. I cannot contribute a bug fix at this time.

Feathr version

0.8.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0): both on Linux Ubuntu 20 and Databricks
  • Python version: 3.10
  • Spark version, if reporting runtime issue:

Describe the problem

get_offline_feature always write into avro regardless of the execution config.

Tracking information

No response

Code to reproduce bug

Run:

get_offline_feature(
    execution_configurations=SparkExecutionConfiguration({
        "spark.feathr.inputFormat": "parquet",
        "spark.feathr.outputFormat": "parquet",
    }),
    ....
)

still write file as avro

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React

貢獻者指南