Reinforcement Learning
Even though LocoMuJoCo focuses on imitation learning, it can be also used for plain reinforcement learning. The challenge here is to define a reward function that produces the desired behavior. Here is a minimal example for defining a reinforcement learning example:
Note
This is for didactic purposes only! It will not produce any useful gait.
import numpy as np
from loco_mujoco import LocoEnv
import gymnasium as gym
# define what ever reward function you want
def my_reward_function(state, action, next_state):
return -np.mean(action) # here we just return the negative mean of the action
# create the environment and task together with the reward function
env = gym.make("LocoMujoco", env_name="UnitreeH1.run.real", reward_type="custom",
reward_params=dict(reward_callback=my_reward_function))
action_dim = env.action_space.shape[0]
env.reset()
env.render()
terminated = False
i = 0
while True:
if i == 1000 or terminated:
env.reset()
i = 0
action = np.random.randn(action_dim)
nstate, reward, terminated, truncated, info = env.step(action)
# HERE is your favorite RL algorithm
env.render()
i += 1
Right now, LocoMuJoCo only supports Markovian reward functions (i.e., functions only depending on the current state transition). We are thinking about providing support for non-Markovian reward functions as well by providing access to the environment in the reward function. Open an issue or drop me a message if you think this is something we should really do!