feathr-ai/feathr

[BUG] Feature Materialization hang in stage "RedisOutputUtils.scala:37" in local spark env

Open

#693 opened on Sep 22, 2022

View on GitHub
 (5 comments) (0 reactions) (1 assignee)Scala (1,929 stars) (244 forks)batch import
buggood first issue

Description

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the Feathr community.

Feathr version

0.7.2

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0):Mac OS
  • Python version:3.9
  • Spark version, if reporting runtime issue:3.3.0

Describe the problem

The feature gen job hang in redis write stage without any error message.

Tracking information

22/09/22 10:39:55 INFO TaskSchedulerImpl: Adding task set 8.0 with 3 tasks resource profile 0 22/09/22 10:39:55 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 10) (localhost, executor driver, partition 0, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID 11) (localhost, executor driver, partition 1, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 2.0 in stage 8.0 (TID 12) (localhost, executor driver, partition 2, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO Executor: Running task 0.0 in stage 8.0 (TID 10) 22/09/22 10:39:55 INFO Executor: Running task 1.0 in stage 8.0 (TID 11) 22/09/22 10:39:55 INFO Executor: Running task 2.0 in stage 8.0 (TID 12) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 0, [0 - 5461] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 1, [5462 - 10922] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 2, [10923 - 16383] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 11:09:37 INFO BlockManagerInfo: Removed broadcast_7_piece0 on localhost:56004 in memory (size: 46.5 KiB, free: 434.4 MiB)

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React

Contributor guide