feathr-ai/feathr

[BUG] `materialize_features` fails with some combinations of features

Open

#920 创建于 2022年12月13日

在 GitHub 查看
 (0 评论) (0 反应) (0 负责人)Scala (1,929 star) (244 fork)batch import
buggood first issue

描述

Willingness to contribute

No. I cannot contribute a bug fix at this time.

Feathr version

0.9.0

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0): Ubuntu 20.0
  • Python version: 3.10
  • Spark version, if reporting runtime issue: 3.2.x and 3.3.1

Describe the problem

Materialize job fails on some combinations of features, throwing following errors:

Caused by: java.lang.NullPointerException
        at com.linkedin.feathr.common.types.protobuf.FeatureValueOuterClass$FeatureValue$Builder.setStringValue(FeatureValueOuterClass.java:1728)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$getConversionFunction$4(RedisOutputUtils.scala:110)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2(RedisOutputUtils.scala:51)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$2$adapted(RedisOutputUtils.scala:48)
        at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
        at scala.collection.immutable.Range.foreach(Range.scala:158)
        at scala.collection.TraversableLike.map(TraversableLike.scala:286)
        at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
        at scala.collection.AbstractTraversable.map(Traversable.scala:108)
        at com.linkedin.feathr.offline.generation.outputProcessor.RedisOutputUtils$.$anonfun$encodeDataFrame$1(RedisOutputUtils.scala:48)

Tracking information

No response

Code to reproduce bug

# anchored feature
Feature(
        name="account_country",
        key=account_id,
        feature_type=STRING, 
        transform="accountCountry",
    ),
...

# average amount of transaction in that week
avg_transaction_amount = Feature(
    name="avg_transaction_amount",
    key=account_id,
    feature_type=FLOAT,
    transform=WindowAggTransformation(
        agg_expr="cast_float(transactionAmount)", agg_func="AVG", window="7d"
    ),
)
...

client.materialize_features(
    MaterializationSettings(
        ACCOUNT_FEATURE_TABLE_NAME,
        backfill_time=backfill_time,
        sinks=[RedisSink(table_name=ACCOUNT_FEATURE_TABLE_NAME)],
        feature_names=["account_country", "avg_transaction_amount"],
    ),
    allow_materialize_non_agg_feature=True,
)

feature_names=["account_country"], feature_names=["avg_transaction_amount"], and other combinations like ['account_country', 'num_transaction_count_in_day'] work without errors.

Only ["account_country", "avg_transaction_amount"] this combination fails.

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React

贡献者指南