RL Basics | How to Build a Custom RL Environment Using the OpenAI Gym Interface (Detailed Version)

Reference:

Official Link:Gym documentation | Make your own custom environment
Tencent Cloud | OpenAI Gym Intermediate Tutorial - Environment Customization and Creation
Knowing | How to register a custom environment in Gym?
g, wrote it before realizing I had written one before:RL Basics | How to build a custom gym environment

(This blog applies to the gym interface, the gmnasium interface is similar, just look at the interface definition in more detail and do some magic)

Install the openai gym:

# pip install gym
import gym
from gym import spaces

Two main functions need to be realized:

01 env initialization and reset

env.__init__() function:

The inputs are some initialization conditions for env, such as how big the map of the environment is, how many gold coins are in the environment, and the location of each coin. If you are only training for a specific task, such as eating a gold coin in the upper right corner of a 3×3 map, these settings can be written to death and do not need to be passed as arguments when env is initialized.
existenv.__init__() function, you need to define theself.observation_space respond in singingself.action_space 。
- If the state space/action space is discrete, use the([space dim]) ；
- If it is consecutive, use the(low=([0,1]), high=([100,50]), dtype=np.float32) The low and high must match the dimensions of the state space/action space, representing the maximum and minimum values of each dimension, respectively.
- Continuous space is also available(low=0, high=255, shape=(84, 84), dtype=np.uint8) This is a form in which each dimension of the space has the same maximum and minimum value, and the shape denotes the dimension of the space.
- The specific use of: official documentationGym documentation | Spaces ， Knowing | Understanding Spaces in Gym in a Shallow and Simple Way 。
This can be done in theenv.__init__() A call to the end of the function() function.

obs = () function:

The purpose is to initialize the environment, e.g. put the agent in the bottom left corner of the map, put the gold in the top right corner of the map, reset the built-in pedometer to 0, and so on.
Its return value, obs, should be a state space of the same dimension as .

obs, reward, done, info = (action) function:

The input action should be one with the same dimension as the action space . (The general environment should not support batch input actions, i.e., the = (batch_size, action_dim) ）
The step function is used in the interaction between the agent and the env; after the env receives the input action, it performs some internal state transfer and outputs it:
- New state obs: same dimension as state space ;
- reward: reward value, real number;
- done: bool value, True means the episode has finished (e.g. the agent has eaten a gold coin, or the agent has finished 1000 steps) (it's time to reset), False means the episode hasn't finished yet.
- info: python's dictionary (dict), can pass some information, no information to pass can be set to{} 。

env.__init__(render_mode="human" or "rgb_array") as well asrgb_frame = () The render mode = human seems to work with pygame, and rgb frame is a direct output of (say) a frame with shape = (256, 256, 3), which can be saved as a video with imageio.
How to register a gym environment:RL Basics | How to register a custom gym environment