Reinforcement Learning

Even though LocoMuJoCo focuses on imitation learning, it can be also used for plain reinforcement learning. The challenge here is to define a reward function that produces the desired behavior. Here is a minimal example for defining a reinforcement learning example:

Note

This is for didactic purposes only! It will not produce any useful gait.

import numpy as np
from loco_mujoco import LocoEnv
import gymnasium as gym


# define what ever reward function you want
def my_reward_function(state, action, next_state):
    return -np.mean(action)     # here we just return the negative mean of the action


# create the environment and task together with the reward function
env = gym.make("LocoMujoco", env_name="UnitreeH1.run.real", reward_type="custom",
               reward_params=dict(reward_callback=my_reward_function))

action_dim = env.action_space.shape[0]

env.reset()
env.render()
terminated = False
i = 0

while True:
    if i == 1000 or terminated:
        env.reset()
        i = 0
    action = np.random.randn(action_dim)
    nstate, reward, terminated, truncated, info = env.step(action)

    # HERE is your favorite RL algorithm

    env.render()
    i += 1

Right now, LocoMuJoCo only supports Markovian reward functions (i.e., functions only depending on the current state transition). We are thinking about providing support for non-Markovian reward functions as well by providing access to the environment in the reward function. Open an issue or drop me a message if you think this is something we should really do!