microsoft/nni

NNI is infinitely waiting when running in remote mode

Open

#4,072 创建于 2021年8月16日

在 GitHub 查看
 (6 评论) (0 反应) (1 负责人)Python (13,504 star) (1,830 fork)batch import
bughelp wanteduser raised

描述

Discussed in https://github.com/microsoft/nni/discussions/4070

Originally posted by ZhiyuanChen August 14, 2021

[2021-08-14 10:13:41] INFO (NNIDataStore) Datastore initialization done
[2021-08-14 10:13:41] INFO (RestServer) RestServer start
[2021-08-14 10:13:41] INFO (RestServer) RestServer base port is 8080
[2021-08-14 10:13:41] INFO (main) Rest server listening on: http://0.0.0.0:8080
[2021-08-14 10:13:42] INFO (NNIManager) Starting experiment: VBgChK3z
[2021-08-14 10:13:42] INFO (NNIManager) Setup training service...
[2021-08-14 10:13:42] INFO (TrialDispatcher) TrialDispatcher: GPU scheduler is enabled.
[2021-08-14 10:13:42] INFO (RemoteEnvironmentService) connecting to machine1
[2021-08-14 10:13:42] INFO (RemoteEnvironmentService) connecting to machine2
[2021-08-14 10:13:42] INFO (NNIManager) Setup tuner...
[2021-08-14 10:13:42] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2021-08-14 10:13:42] INFO (NNIManager) Add event listeners
[2021-08-14 10:13:43] INFO (NNIManager) NNIManager received command from dispatcher: ID, 
[2021-08-14 10:13:43] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"lr": 0.004360754539476665}, "parameter_index": 0}
[2021-08-14 10:13:43] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"lr": 0.0035240674041291577}, "parameter_index": 0}
[2021-08-14 10:13:44] INFO (RemoteEnvironmentService) ssh connection initialized!
[2021-08-14 10:13:44] INFO (TrialDispatcher) TrialDispatcher: started channel: WebCommandChannel
[2021-08-14 10:13:44] INFO (TrialDispatcher) TrialDispatcher: copying code and settings.
[2021-08-14 10:13:44] INFO (TrialDispatcher) Initialize environments total number: 2
[2021-08-14 10:13:44] INFO (TrialDispatcher) Assign environment service remote to environment fzYKh
[2021-08-14 10:13:45] INFO (TrialDispatcher) requested environment fzYKh and job id is nni_exp_VBgChK3z_env_fzYKh.
[2021-08-14 10:13:45] INFO (TrialDispatcher) Assign environment service remote to environment D6h8H
[2021-08-14 10:13:46] INFO (TrialDispatcher) requested environment D6h8H and job id is nni_exp_VBgChK3z_env_D6h8H.
[2021-08-14 10:13:46] INFO (TrialDispatcher) TrialDispatcher: run loop started.
[2021-08-14 10:13:47] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 0,
  hyperParameters: {
    value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"lr": 0.004360754539476665}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}
[2021-08-14 10:13:47] INFO (NNIManager) submitTrialJob: form: {
  sequenceId: 1,
  hyperParameters: {
    value: '{"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"lr": 0.0035240674041291577}, "parameter_index": 0}',
    index: 0
  },
  placementConstraint: { type: 'None', gpus: [] }
}

```</div>

贡献者指南

NNI is infinitely waiting when running in remote mode · microsoft/nni#4072 | Good First Issue