[BUG] Feature Materialization hang in stage "RedisOutputUtils.scala:37" in local spark env
#693 创建于 2022年9月22日
描述
Willingness to contribute
Yes. I would be willing to contribute a fix for this bug with guidance from the Feathr community.
Feathr version
0.7.2
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 20.0):Mac OS
- Python version:3.9
- Spark version, if reporting runtime issue:3.3.0
Describe the problem
The feature gen job hang in redis write stage without any error message.
Tracking information
22/09/22 10:39:55 INFO TaskSchedulerImpl: Adding task set 8.0 with 3 tasks resource profile 0 22/09/22 10:39:55 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 10) (localhost, executor driver, partition 0, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 1.0 in stage 8.0 (TID 11) (localhost, executor driver, partition 1, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO TaskSetManager: Starting task 2.0 in stage 8.0 (TID 12) (localhost, executor driver, partition 2, ANY, 5196 bytes) taskResourceAssignments Map() 22/09/22 10:39:55 INFO Executor: Running task 0.0 in stage 8.0 (TID 10) 22/09/22 10:39:55 INFO Executor: Running task 1.0 in stage 8.0 (TID 11) 22/09/22 10:39:55 INFO Executor: Running task 2.0 in stage 8.0 (TID 12) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 0, [0 - 5461] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 1, [5462 - 10922] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 10:39:55 INFO RedisKeysRDD: Computing partition, get keys partId: 2, [10923 - 16383] nodes: RedisNode(RedisEndpoint(feathrazuretest3redis.redis.cache.windows.net,6380,null,,0,2000,true),0,16383,0,1) 22/09/22 11:09:37 INFO BlockManagerInfo: Removed broadcast_7_piece0 on localhost:56004 in memory (size: 46.5 KiB, free: 434.4 MiB)
Code to reproduce bug
No response
What component(s) does this bug affect?
-
Python Client: This is the client users use to interact with most of our API. Mostly written in Python. -
Computation Engine: The computation engine that execute the actual feature join and generation work. Mostly in Scala and Spark. -
Feature Registry API: The frontend API layer supports SQL, Purview(Atlas) as storage. The API layer is in Python(FAST API) -
Feature Registry Web UI: The Web UI for feature registry. Written in React