|  | |
|  | |
| Code for performing Hierarchical RL based on the following publications: | |
| "Data-Efficient Hierarchical Reinforcement Learning" by | |
| Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine | |
| (https://arxiv.org/abs/1805.08296). | |
| "Near-Optimal Representation Learning for Hierarchical Reinforcement Learning" | |
| by Ofir Nachum, Shixiang (Shane) Gu, Honglak Lee, and Sergey Levine | |
| (https://arxiv.org/abs/1810.01257). | |
| Requirements: | |
| * TensorFlow (see http://www.tensorflow.org for how to install/upgrade) | |
| * Gin Config (see https://github.com/google/gin-config) | |
| * Tensorflow Agents (see https://github.com/tensorflow/agents) | |
| * OpenAI Gym (see http://gym.openai.com/docs, be sure to install MuJoCo as well) | |
| * NumPy (see http://www.numpy.org/) | |
| Quick Start: | |
| Run a training job based on the original HIRO paper on Ant Maze: | |
| ``` | |
| python scripts/local_train.py test1 hiro_orig ant_maze base_uvf suite | |
| ``` | |
| Run a continuous evaluation job for that experiment: | |
| ``` | |
| python scripts/local_eval.py test1 hiro_orig ant_maze base_uvf suite | |
| ``` | |
| To run the same experiment with online representation learning (the | |
| "Near-Optimal" paper), change `hiro_orig` to `hiro_repr`. | |
| You can also run with `hiro_xy` to run the same experiment with HIRO on only the | |
| xy coordinates of the agent. | |
| To run on other environments, change `ant_maze` to something else; e.g., | |
| `ant_push_multi`, `ant_fall_multi`, etc. See `context/configs/*` for other options. | |
| Basic Code Guide: | |
| The code for training resides in train.py. The code trains a lower-level policy | |
| (a UVF agent in the code) and a higher-level policy (a MetaAgent in the code) | |
| concurrently. The higher-level policy communicates goals to the lower-level | |
| policy. In the code, this is called a context. Not only does the lower-level | |
| policy act with respect to a context (a higher-level specified goal), but the | |
| higher-level policy also acts with respect to an environment-specified context | |
| (corresponding to the navigation target location associated with the task). | |
| Therefore, in `context/configs/*` you will find both specifications for task setup | |
| as well as goal configurations. Most remaining hyperparameters used for | |
| training/evaluation may be found in `configs/*`. | |
| NOTE: Not all the code corresponding to the "Near-Optimal" paper is included. | |
| Namely, changes to low-level policy training proposed in the paper (discounting | |
| and auxiliary rewards) are not implemented here. Performance should not change | |
| significantly. | |
| Maintained by Ofir Nachum (ofirnachum). | |