feathr-ai/feathr

[BUG] Feature Materialization hang in stage "RedisOutputUtils.scala:37" in local spark env

Open

Aperta il 22 set 2022

Vedi su GitHub
 (5 commenti) (0 reazioni) (1 assegnatario)Scala (1929 star) (244 fork)batch import
buggood first issue

Descrizione

Willingness to contribute

Yes. I would be willing to contribute a fix for this bug with guidance from the Feathr community.

Feathr version

0.7.2

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 20.0):Mac OS
  • Python version:3.9
  • Spark version, if reporting runtime issue:3.3.0

Describe the problem

The feature gen job hang in redis write stage without any error message.

Tracking information

22/09/22 10:39:55 INFO TaskSchedulerImpl: Adding task set 8.0 with 3 tasks resource profile 0 22/09/22 10:39:55 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 10) (localhost, executor driver, partition 0, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID 11) (localhost, executor driver, partition 1, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 2.0 in stage 8.0 (TID 12) (localhost, executor driver, partition 2, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO Executor: Running task 0.0 in stage 8.0 (TID 10) 22/09/22 10:39:55 INFO Executor: Running task 1.0 in stage 8.0 (TID 11) 22/09/22 10:39:55 INFO Executor: Running task 2.0 in stage 8.0 (TID 12) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 0, [0 - 5461] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 1, [5462 - 10922] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 2, [10923 - 16383] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 11:09:37 INFO BlockManagerInfo: Removed broadcast_7_piece0 on localhost:56004 in memory (size: 46.5 KiB, free: 434.4 MiB)

Code to reproduce bug

No response

What component(s) does this bug affect?

  • Python Client: This is the client users use to interact with most of our API. Mostly written in Python.
  • Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark.
  • Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API)
  • Feature Registry Web UI: The Web UI for feature registry. Written in React

Guida contributor