Add MAML Sampler for faster meta-task sampling · rlworkgroup/garage#1115

(4 comments) (0 reactions) (0 assignees)Python (1,728 stars) (297 forks)batch import

good first issuetf

説明

I found that ProMP does sampling much faster than garage. This is because ProMP has a specialized sampler, call MAML Sampler, that parallelizes sampling at task-level. I think this is also important for garage.

A MAML Sampler is a sampler that samples all tasks in one run (i.e. one call to sampler.obtain_samples(). This is contrary to the current design of sampler, which handles a single task once at a time. MAML sampler has control for task-level scheduling, so it allows parallelism at task-level.

Under MAML sampler, the training loop will be similar to something like

sampler = MAMLSampler(tasks)
for many batches
    policies = num_tasks copies of policy
    paths_batch = []
    for some gradient steps
        paths_all_tasks = sampler.obtain_samples(policies)
        update policies using paths_all_tasks
        add paths_all_tasks to paths_batch
    optimize policy using policies and paths_batch

while currently, a MAML training loop has to switch task outside of sampler. Although sampler does parallel sampling at rollout-level, this has a higher overhead than the above MAML sampler.

for many batches
    policies = num_tasks copies of policy
    paths_batch = []
    for tasks[i]
        for some gradient steps
            paths = sampler.obtain_samples(policies[i], tasks[i])
            update policies[i] using paths
            add paths to paths_batch
    optimize policy using policies and paths_batch

コントリビューターガイド

技術スタック: python
領域: machine learning
Issue 種別: feature
難度: 4
推定時間: over 1 week
活動状況: stale
明確さ: clear
前提条件: Reinforcement learningMeta learning (MAML)PythonGarage sampler API
初心者向け度: 25
調査方針: Study the ProMP MAML Sampler for parallel task sampling. Analyze the current garage sampler architecture in 'garage/sampler/' to understand task handling. Design a MAMLSampler class that obtains samples for all tasks simultaneously, enabling task level parallelism. Modify the MAML training loop to use this new sampler, as outlined in the issue's proposed pseudocode.