Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

Introduction

This is the official PyTorch implementation for the paper titled "Adjustable Robust Reinforcement Learning for Online 3D Bin Packing". The paper introduces the AR2L framework, which takes into account both the average performance and worst-case performance of a packing policy. By using this framework, the trained packing policy can be made more robust, while still maintaining acceptable performance in nominal cases. In the AR2L framework, the training process involves alternating between training the packing policy, the permutation-based attacker, and the mixture-dynamics model in each iteration. The PPO algorithm is utilized to train these three policies. Additionally, the packing policy is built on the PCT algorithm. The video demonstration can be found using the YouTube Link.

Dependencies

Before executing the training process, please ensure that the necessary requirements have been installed.

pip install -r requirements.txt

Training

The packing policy has the flexibility to observe a varying number of next boxes (NNB). The robustness of the policy can be adjusted by tuning the hyperparameter alpha.

Environment: discrete, NNB=5, alpha=1.0

bash scripts train_disc.sh 5 1.0

Environment: discrete, NNB=10, alpha=1.2

bash scripts train_disc.sh 10 1.2

Environment: discrete, NNB=15, alpha=1.3

bash scripts train_disc.sh 15 1.3

Environment: discrete, NNB=20, alpha=1.0

bash scripts train_disc.sh 20 1.0

Environment: continuous, NNB=5, alpha=1.0

bash scripts train_cont.sh 5 1.0

Environment: continuous, NNB=10, alpha=1.0

bash scripts train_cont.sh 10 1.0

Environment: continuous, NNB=15, alpha=1.0

bash scripts train_cont.sh 15 1.0

Environment: continuous, NNB=20, alpha=1.0

bash scripts train_cont.sh 20 1.0

Validation

To select an effective AR2L packing policy, you can evaluate various packing policies with and without the permutation-based attacker.

bash val_disc.sh [NNB] [path to the parent directory where all the models are saved] load_adv

example: bash val_disc.sh 5 ./logs/experiment/timeStr load_adv

bash val_disc.sh [NNB] [path to the parent directory where all the models are saved] not_load_adv

example: bash val_disc.sh 5 ./logs/experiment/timeStr not_load_adv

After conducting the validation, please add the space utilization in the nominal dynamics (not_load_adv) and the space utilization in the worst-case dynamics (load_adv) for each model. Then, you can choose the best one among them.

Evaluation

You can evaluate the selected packing policy in various settings.

bash eval_disc.sh [NNB] [path to the BPP model] [path to the adv model] load_adv

example: bash eval_disc.sh 5 ./logs/experiment/timeStr/BPP-subtimeStr.pt ./logs/experiment/timeStr/Adv-subtimeStr.pt load_adv

bash eval_disc.sh [NNB] [path to the BPP model] [path to the adv model] not_load_adv

example: bash eval_disc.sh 5 ./logs/experiment/timeStr/BPP-subtimeStr.pt ./logs/experiment/timeStr/Adv-subtimeStr.pt not_load_adv

Acknowledgement

We appreciate the anonymous reviewers, (S)ACs, and PCs of NeurIPS2023 for their insightful comments to further improve our paper and their service to the community. We would like to thank the authors of PCT for providing their highly valuable implementation of PCT. and the authors of the PPO PyTorch Implementation.

Citation

@inproceedings{
pan2023adjustable,
title={Adjustable Robust Reinforcement Learning for Online 3D Bin Packing},
author={Yuxin Pan and Yize Chen and Fangzhen Lin},
booktitle={Thirty-seventh Conference on Neural Information Processing Systems},
year={2023},
url={https://openreview.net/forum?id=1mdTYi1jAW}
}

License

This source code is provided solely for academic use. Please refrain from using it for commercial purposes without obtaining proper authorization from the author.

chennnnnyize / ar2l_bpp Goto Github PK

ar2l_bpp's Introduction

Adjustable Robust Reinforcement Learning for Online 3D Bin Packing

Introduction

Dependencies

Training

Validation

Evaluation

Acknowledgement

Citation

License

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent