Imitation Learning

Basic Usage

LocoMuJoCo comes with many baseline algorithms. All baseline algorithms are implemented in MushroomRL. Here we will show how to setup an experiment to train a policy using a baseline algorithm. To easily schedule the experiments on local PCs and Slurm compute clusters, we use the experiment_launcher package, which is installed with LocoMuJoCo. Two files are needed for the experiment launcher; launcher.py and experiment.py. Note that the imitation_lib has be be installed before running the experiment (checkout Installing the Baselines).

Note

All files shown here can be found under examples/imitation_learning in the LocoMuJoCo repository.

The launcher file is used to define the parameters of the experiment and the experiment file is used to define the experiment itself. Let’s say you would like to train on almost all environments in LocoMuJoCo. Then the Launcher file will look like this:

from experiment_launcher import Launcher
from experiment_launcher.utils import is_local


if __name__ == '__main__':
    LOCAL = is_local()
    TEST = False
    USE_CUDA = False

    N_SEEDS = 3

    launcher = Launcher(exp_name='loco_mujoco_evalution',
                        exp_file='experiment',
                        n_seeds=N_SEEDS,
                        n_cores=1,  # only used for slurm
                        memory_per_core=1500,   # only used for slurm
                        n_exps_in_parallel=10,  # should not be used in slurm
                        days=2,     # only used for slurm
                        hours=0,    # only used for slurm
                        minutes=0,  # only used for slurm
                        use_timestamp=True,
                        )

    default_params = dict(n_epochs=400,
                          n_steps_per_epoch=100000,
                          n_epochs_save=25,
                          n_eval_episodes=10,
                          n_steps_per_fit=1000,
                          use_cuda=USE_CUDA)

    env_ids = ["Atlas.walk", "Atlas.carry",
               "Talos.walk", "Talos.carry",
               "UnitreeH1.walk", "UnitreeH1.run", "UnitreeH1.carry",
               "HumanoidTorque.walk", "HumanoidTorque.run",
               "HumanoidMuscle.walk", "HumanoidMuscle.run",
               "UnitreeA1.simple", "UnitreeA1.hard"]

    for env_id in env_ids:
        launcher.add_experiment(env_id__=env_id, **default_params)

    launcher.run(LOCAL, TEST)

In the launcher file, we defined information about the execution of the experiment (e.g., number of cores, memory per core number of seeds to run, etc.). We also defined the parameter of the experiments. These parameters are only the Task-IDs of the environments in LocoMuJoCo.

The experiment file will look like this:

import os
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
from experiment_launcher import run_experiment
from mushroom_rl.core import Core
from mushroom_rl.utils.dataset import compute_J, compute_episodes_length
from mushroom_rl.core.logger.logger import Logger

from imitation_lib.utils import BestAgentSaver

from loco_mujoco import LocoEnv
from utils import get_agent


def experiment(env_id: str = None,
               n_epochs: int = 500,
               n_steps_per_epoch: int = 10000,
               n_steps_per_fit: int = 1024,
               n_eval_episodes: int = 50,
               n_epochs_save: int = 500,
               gamma: float = 0.99,
               results_dir: str = './logs',
               use_cuda: bool = False,
               seed: int = 0):

    np.random.seed(seed)
    torch.random.manual_seed(seed)

    results_dir = os.path.join(results_dir, str(seed))

    # logging
    sw = SummaryWriter(log_dir=results_dir)     # tensorboard
    logger = Logger(results_dir=results_dir, log_name="logging", seed=seed, append=True)    # numpy
    agent_saver = BestAgentSaver(save_path=results_dir, n_epochs_save=n_epochs_save)

    print(f"Starting training {env_id}...")

    # create environment, agent and core
    mdp = LocoEnv.make(env_id)
    agent = get_agent(env_id, mdp, use_cuda, sw)
    core = Core(agent, mdp)

    for epoch in range(n_epochs):

        # train
        core.learn(n_steps=n_steps_per_epoch, n_steps_per_fit=n_steps_per_fit, quiet=True, render=False)

        # evaluate
        dataset = core.evaluate(n_episodes=n_eval_episodes)
        R_mean = np.mean(compute_J(dataset))
        J_mean = np.mean(compute_J(dataset, gamma=gamma))
        L = np.mean(compute_episodes_length(dataset))
        logger.log_numpy(Epoch=epoch, R_mean=R_mean, J_mean=J_mean, L=L)
        sw.add_scalar("Eval_R-stochastic", R_mean, epoch)
        sw.add_scalar("Eval_J-stochastic", J_mean, epoch)
        sw.add_scalar("Eval_L-stochastic", L, epoch)
        agent_saver.save(core.agent, R_mean)

    agent_saver.save_curr_best_agent()
    print("Finished.")


if __name__ == "__main__":
    run_experiment(experiment)

The main important part of the experiment file is the definition of the environment, the definition of the agent, and the definition of the MushroomRL core. The definition of the environment is done as usual by using the make method from from LocoMuJoCo together with the desired task-ID. The definition of the agent is done by using the a helper function, which can be found in the utils.py file in the examples/imitation_learning in the LocoMuJoCo repository. This helper functions returns an agent with fine-tuned parameter for the respective environment. These parameters can be found in the conf.yaml file. The definition of the MushroomRL core is done by passing the agent and the environment to the Core class from MushroomRL. Finally, at each epoch, the agent is trained using the core.learn and is evaluated using the core.evaluate method.

That’s it! Now you can run the experiment by executing the following command in the terminal:

python launcher.py

Visualizing the Results

The results are saved in the ./logs directory. To visualize the results, you can use the tensorboard. To do so, run the following command in the terminal:

tensorboard --logdir ./logs

The focus should be put on the following three metrics: “Eval_R-stochastic”, “Eval_J-stochastic”, and “Eval_L-stochastic”, which are the mean undiscounted return, mean discounted return, and the mean length of an episode the agent, respectively. The return is calculated based on the reward specified for each environment. Note that the latter is not used for training but only for evaluation.

Tuning the Hyperparameters

If you want to to change the hyperparameters or the algorithm, we suggest to copy the confs.yaml file and pass the new configuration file to the get_agent method in experiment.py.

Alternatively, you can also directly use the specific agent getter (e.g., create_gail_agent`or :code:`create_vail_agent, which can be found in utils.py). This way you can also directly pass the hyperparameters to the agent. In doing so, you can easily loop over hyperparameters to perform a search. Therefore, specify the parameter you would like to perform hyperparameter in the launcher file. Here is an example. Let’s say you want to perform a hyperparameter search over the critic’s learning rate of a VAIL agent.

To do so, change the loop in launcher.py from:

for env_id in env_ids:
    launcher.add_experiment(env_id__=env_id, **default_params)

to:

critic_lrs = [1e-3, 1e-4, 1e-5]
for env_id, critic_lr in product(env_ids, critic_lrs):
    launcher.add_experiment(env_id__=env_id, critic_lr__=critic_lr, **default_params)

The trailing underscores are important to have a separate logging directory for each experiment when looping over a parameter.

Note

You have to specify the new parameter with the type declaration in the experiment.py file.

Hence, the experiment file changes from:

def experiment(env_id: str = None,
               n_epochs: int = 500,
               n_steps_per_epoch: int = 10000,
               n_steps_per_fit: int = 1024,
               n_eval_episodes: int = 50,
               n_epochs_save: int = 500,
               gamma: float = 0.99,
               results_dir: str = './logs',
               use_cuda: bool = False,
               seed: int = 0):
    # ...

to:

def experiment(env_id: str = None,
               n_epochs: int = 500,
               n_steps_per_epoch: int = 10000,
               n_steps_per_fit: int = 1024,
               n_eval_episodes: int = 50,
               n_epochs_save: int = 500,
               lr_critic: float = 1e-3,     # WE ADDED THIS LINE
               gamma: float = 0.99,
               results_dir: str = './logs',
               use_cuda: bool = False,
               seed: int = 0):
    # ...

    # pass the new learning rate to the agent

Load and Evaluate a Trained Agent

The best agents are saved every n_epochs_save epochs at your specified directory or at the default directory ./logs. To load and evaluate a trained agent, you can use the following code:

from mushroom_rl.core import Core, Agent
from loco_mujoco import LocoEnv

env = LocoEnv.make("Atlas.walk")

agent = Agent.load("./path/to/agent.msh")

core = Core(agent, env)

core.evaluate(n_episodes=10, render=True)

In the example above, first an Atlas environment is created. Then, the agent is loaded from the specified path. Finally, the agent is evaluated for 10 episodes with rendering enabled.

Continue Training from a Checkpoint

Similarly to above, if you want to continue training from a checkpoint, you can replace the line agent = get_agent(env_id, mdp, use_cuda, sw) in the experiment.py file with the following line agent = Agent.load("./path/to/agent.msh"). In that case, you will continue training from the specified checkpoint.