Behavior Cloning¶
收集 | Collect human demonstrarions¶
回放 | Playback the demonstations¶
训练 | Robomimic¶
Dataset Structure¶
robopal 的 HDF5 数据集文件遵循 **robomimic** 的数据集结构,无需处理即可用于 robomimic 的训练。 需要注意,robopal 无法使用来自 robosuite/robomimic 的数据集,在基于 robopal 的环境中进行训练,只能使用来自 robopal 的数据集。
A single dataset is a single HDF5 file with the following structure:
HDF5 Structure (click to expand)
- **`data`** (group)
- **`total`** (attribute) - number of state-action samples in the dataset
- **`env_args`** (attribute) - a json string that contains metadata on the environment and relevant arguments used for collecting data. Three keys: `env_name`, the name of the environment or task to create, `env_type`, one of robomimic's supported [environment types](https://github.com/ARISE-Initiative/robomimic/blob/master/robomimic/envs/env_base.py#L9), and `env_kwargs`, a dictionary of keyword-arguments to be passed into the environment of type `env_name`.
- **`demo_0`** (group) - group for the first trajectory (every trajectory has a group)
- **`num_samples`** (attribute) - the number of state-action samples in this trajectory
- **`model_file`** (attribute) - the xml string corresponding to the MJCF MuJoCo model.
- **`states`** (dataset) - flattened raw MuJoCo states, ordered by time. Shape (N, D) where N is the length of the trajectory, and D is the dimension of the state vector.
- **`actions`** (dataset) - environment actions, ordered by time. Shape (N, A) where N is the length of the trajectory, and A is the action space dimension
- **`rewards`** (dataset) - environment rewards, ordered by time. Shape (N,) where N is the length of the trajectory.
- **`dones`** (dataset) - done signal, equal to 1 if playing the corresponding action in the state should terminate the episode. Shape (N,) where N is the length of the trajectory.
- **`obs`** (group) - group for the observation keys. Each key is stored as a dataset.
- **`