Imitation Learning
Basic Usage
LocoMuJoCo comes with many baseline algorithms. All baseline algorithms are implemented in MushroomRL.
Here we will show how to setup an experiment to train a policy using a baseline algorithm.
To easily schedule the experiments on local PCs and Slurm compute clusters, we use
the experiment_launcher package,
which is installed with LocoMuJoCo. Two files are needed for the experiment launcher; launcher.py and
experiment.py. Note that the imitation_lib has be be installed before running the experiment (checkout
Installing the Baselines).
Note
All files shown here can be found under examples/imitation_learning in the LocoMuJoCo repository.
The launcher file is used to define the parameters of the experiment and the experiment file is used to define the experiment itself. Let’s say you would like to train on almost all environments in LocoMuJoCo. Then the Launcher file will look like this:
from experiment_launcher import Launcher
from experiment_launcher.utils import is_local
if __name__ == '__main__':
LOCAL = is_local()
TEST = False
USE_CUDA = False
N_SEEDS = 3
launcher = Launcher(exp_name='loco_mujoco_evalution',
exp_file='experiment',
n_seeds=N_SEEDS,
n_cores=1, # only used for slurm
memory_per_core=1500, # only used for slurm
n_exps_in_parallel=10, # should not be used in slurm
days=2, # only used for slurm
hours=0, # only used for slurm
minutes=0, # only used for slurm
use_timestamp=True,
)
default_params = dict(n_epochs=400,
n_steps_per_epoch=100000,
n_epochs_save=25,
n_eval_episodes=10,
n_steps_per_fit=1000,
use_cuda=USE_CUDA)
env_ids = ["Atlas.walk", "Atlas.carry",
"Talos.walk", "Talos.carry",
"UnitreeH1.walk", "UnitreeH1.run", "UnitreeH1.carry",
"HumanoidTorque.walk", "HumanoidTorque.run",
"HumanoidMuscle.walk", "HumanoidMuscle.run",
"UnitreeA1.simple", "UnitreeA1.hard"]
for env_id in env_ids:
launcher.add_experiment(env_id__=env_id, **default_params)
launcher.run(LOCAL, TEST)
In the launcher file, we defined information about the execution of the experiment (e.g., number of cores, memory per core number of seeds to run, etc.). We also defined the parameter of the experiments. These parameters are only the Task-IDs of the environments in LocoMuJoCo.
The experiment file will look like this:
import os
import numpy as np
import torch
from torch.utils.tensorboard import SummaryWriter
from experiment_launcher import run_experiment
from mushroom_rl.core import Core
from mushroom_rl.utils.dataset import compute_J, compute_episodes_length
from mushroom_rl.core.logger.logger import Logger
from imitation_lib.utils import BestAgentSaver
from loco_mujoco import LocoEnv
from utils import get_agent
def experiment(env_id: str = None,
n_epochs: int = 500,
n_steps_per_epoch: int = 10000,
n_steps_per_fit: int = 1024,
n_eval_episodes: int = 50,
n_epochs_save: int = 500,
gamma: float = 0.99,
results_dir: str = './logs',
use_cuda: bool = False,
seed: int = 0):
np.random.seed(seed)
torch.random.manual_seed(seed)
results_dir = os.path.join(results_dir, str(seed))
# logging
sw = SummaryWriter(log_dir=results_dir) # tensorboard
logger = Logger(results_dir=results_dir, log_name="logging", seed=seed, append=True) # numpy
agent_saver = BestAgentSaver(save_path=results_dir, n_epochs_save=n_epochs_save)
print(f"Starting training {env_id}...")
# create environment, agent and core
mdp = LocoEnv.make(env_id)
agent = get_agent(env_id, mdp, use_cuda, sw)
core = Core(agent, mdp)
for epoch in range(n_epochs):
# train
core.learn(n_steps=n_steps_per_epoch, n_steps_per_fit=n_steps_per_fit, quiet=True, render=False)
# evaluate
dataset = core.evaluate(n_episodes=n_eval_episodes)
R_mean = np.mean(compute_J(dataset))
J_mean = np.mean(compute_J(dataset, gamma=gamma))
L = np.mean(compute_episodes_length(dataset))
logger.log_numpy(Epoch=epoch, R_mean=R_mean, J_mean=J_mean, L=L)
sw.add_scalar("Eval_R-stochastic", R_mean, epoch)
sw.add_scalar("Eval_J-stochastic", J_mean, epoch)
sw.add_scalar("Eval_L-stochastic", L, epoch)
agent_saver.save(core.agent, R_mean)
agent_saver.save_curr_best_agent()
print("Finished.")
if __name__ == "__main__":
run_experiment(experiment)
The main important part of the experiment file is the definition of the environment, the definition of the agent,
and the definition of the MushroomRL core. The definition of the environment is done as usual by using the make
method from from LocoMuJoCo together with the desired task-ID. The definition of the agent is done by using the
a helper function, which can be found in the utils.py file in the examples/imitation_learning in the LocoMuJoCo
repository. This helper functions returns an agent with fine-tuned parameter for the respective environment. These
parameters can be found in the conf.yaml file. The definition of the MushroomRL core is done by passing the agent
and the environment to the Core class from MushroomRL. Finally, at each epoch, the agent is trained using the
core.learn and is evaluated using the core.evaluate method.
That’s it! Now you can run the experiment by executing the following command in the terminal:
python launcher.py
Visualizing the Results
The results are saved in the ./logs directory. To visualize the results, you can use the tensorboard. To do so, run the following command in the terminal:
tensorboard --logdir ./logs
The focus should be put on the following three metrics: “Eval_R-stochastic”, “Eval_J-stochastic”, and “Eval_L-stochastic”, which are the mean undiscounted return, mean discounted return, and the mean length of an episode the agent, respectively. The return is calculated based on the reward specified for each environment. Note that the latter is not used for training but only for evaluation.
Tuning the Hyperparameters
If you want to to change the hyperparameters or the algorithm, we suggest to copy the confs.yaml file and
pass the new configuration file to the get_agent method in experiment.py.
Alternatively, you can also directly use the specific agent getter (e.g., create_gail_agent`or :code:`create_vail_agent,
which can be found in utils.py). This way you can also directly pass the hyperparameters to the agent. In doing so,
you can easily loop over hyperparameters to perform a search. Therefore, specify the parameter
you would like to perform hyperparameter in the launcher file. Here is an example. Let’s say you
want to perform a hyperparameter search over the critic’s learning rate of a VAIL agent.
To do so, change the loop in launcher.py from:
for env_id in env_ids:
launcher.add_experiment(env_id__=env_id, **default_params)
to:
critic_lrs = [1e-3, 1e-4, 1e-5]
for env_id, critic_lr in product(env_ids, critic_lrs):
launcher.add_experiment(env_id__=env_id, critic_lr__=critic_lr, **default_params)
The trailing underscores are important to have a separate logging directory for each experiment when looping over a parameter.
Note
You have to specify the new parameter with the type declaration in the experiment.py file.
Hence, the experiment file changes from:
def experiment(env_id: str = None,
n_epochs: int = 500,
n_steps_per_epoch: int = 10000,
n_steps_per_fit: int = 1024,
n_eval_episodes: int = 50,
n_epochs_save: int = 500,
gamma: float = 0.99,
results_dir: str = './logs',
use_cuda: bool = False,
seed: int = 0):
# ...
to:
def experiment(env_id: str = None,
n_epochs: int = 500,
n_steps_per_epoch: int = 10000,
n_steps_per_fit: int = 1024,
n_eval_episodes: int = 50,
n_epochs_save: int = 500,
lr_critic: float = 1e-3, # WE ADDED THIS LINE
gamma: float = 0.99,
results_dir: str = './logs',
use_cuda: bool = False,
seed: int = 0):
# ...
# pass the new learning rate to the agent
Load and Evaluate a Trained Agent
The best agents are saved every n_epochs_save epochs at your specified directory or at the default directory
./logs. To load and evaluate a trained agent, you can use the following code:
from mushroom_rl.core import Core, Agent
from loco_mujoco import LocoEnv
env = LocoEnv.make("Atlas.walk")
agent = Agent.load("./path/to/agent.msh")
core = Core(agent, env)
core.evaluate(n_episodes=10, render=True)
In the example above, first an Atlas environment is created. Then, the agent is loaded from the specified path. Finally, the agent is evaluated for 10 episodes with rendering enabled.
Continue Training from a Checkpoint
Similarly to above, if you want to continue training from a checkpoint, you can replace the line
agent = get_agent(env_id, mdp, use_cuda, sw) in the experiment.py file with the following line
agent = Agent.load("./path/to/agent.msh"). In that case, you will continue training from the specified
checkpoint.