Giter VIP home page Giter VIP logo

phadonp / rubiks-cube-reinforcement-learning Goto Github PK

View Code? Open in Web Editor NEW
39.0 2.0 5.0 104.08 MB

Solving a Rubik's Cube and 15 Puzzle using the Deep Reinforcement Learning and Search

License: MIT License

Python 12.02% JavaScript 7.14% HTML 0.36% CSS 0.53% Jupyter Notebook 79.95%
rubikscube rubiks-cube-solver rubiks-cube-simulator astar-algorithm pytorch 15puzzle deep-reinforcement-learning slide-puzzle value-iteration

rubiks-cube-reinforcement-learning's Introduction

CubeDemonstration

Rubik's Cube Reinforcement Learning

Solving a Rubik's Cube using Deep Reinforcement Learning and Search.

๐Ÿ“ Table of Contents

๐Ÿง About

This project is an implementation of the following paper http://deepcube.igb.uci.edu/static/files/SolvingTheRubiksCubeWithDeepReinforcementLearningAndSearch_Final.pdf.

The network was trained using a method named Deep Approximate Value Iteration (DAVI). This estimates the number of moves a scramble is from being solved. A batched weighted A Star search is then done to solve the puzzle. Implementations to solve the 2x2 Cube, 15-Puzzle and 24-Puzzle are also included in the code.

This project contrasts with many other deep learning projects solving the Rubik's Cube as it is uses pure reinforcement learning, learning without copying other solvers.

For a deep dive into the technical aspects of the project, you can browse through the report in the pdf. Feel free to ask me any questions about using the code.

๐Ÿ“ˆ Results

Each puzzle was tested by testing them on a set of 100 random scrambles. The tests were run on my ultrabook.

The results for the 2x2 Cube and 15-Puzzle were extremely good and produced the optimal solution most of the time within a few seconds. The 24-Puzzle averaged 101.88 moves with a search time around 20 seconds, which is roughly 12 moves away from optimal solutions. For the 3x3 Cube, average move count was 41.45 with an average time of 62.51 seconds. This is quite poor compared to the results achieved in the above paper, however computing resources were limited and parallel training was not implemented. 98 out of 100 cubes were solved, which is good compared to many other projects on Github which struggle with scrambles of length 10 or more. Different network architectures and training hyperparameters need to be tried to maximise performance.

Training was accomplished on a Tesla P100 GPU. For the 2x2 Cube and 15-Puzzle, training took around 1 hour. The 24-Puzzle took 1 day. The 3x3 Cube took around 7 days.

15PuzzleDemonstration

๐Ÿ“– Usage

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Training

Training a new model is done by running the train file. The config file location has to be written as an argument and it will contain the parameters for training. Mul

The names of the different networks to be written in the networkType setting are given in the networks/getNetwork.py file.

python train.py -c config/cube3.ini -mp True

Training can be restarted by giving a existing network as an argument.

python train.py -c config/cube3.ini -nt saves\YourNetworkHere.pt -mp True

The run can be tracked while training using tensorboard.

tensorboard --logdir=runs

Testing

Testing of a model can be done using the solve file. This file tests the model on a test set with a chosen size and generates performance statistics.

python solve.py -n saves/YourNetworkHere.pt -c config/cube3.ini 

Running the GUIs

To run the Sliding Puzzle GUI, run the following line.

python puzzleNgui.py -n saves/YourNetworkHere.pt -c config/puzzle15.ini

To run the GUI for the Rubik's Cube is a little more complicated. The simulation was written in Javascript and communication was done using Flask as an API.

To run the simulation, you can run a live server on the index.html file.

To connect your network to the simulation, run the following line. Unfortunately, running the network alongside the GUI can slow down the solve time if the API is running on the same computer as the GUI.

python cubeAPI.py -n2 saves/Your2x2NetworkHere.pt -n3 saves/Your3x3NetworkHere.pt -c2 config/cube2.ini -c3 config/cube3.ini

โ›๏ธ Prerequisites

This project uses a couple libraries that need to be installed.

  1. PyTorch
  2. Tensorboard
  3. Flask (Cube API)
  4. Three.js (Cube GUI)
  5. Tween.js (Cube GUI)
  6. Pygame (Sliding Puzzle GUI)

Installing

The python packages can be installed using the following code.

pip install torch===1.7.1 torchvision===0.8.2 torchaudio===0.7.2 -f https://download.pytorch.org/whl/torch_stable.html

python -m pip install -U pygame --user

pip install -U Flask flask-restful flask-cors

The Javascript modules can be downloaded using npm. These lines should be run while inside the cubeGUI folder.

npm install --save three

git clone https://github.com/tweenjs/tween.js
cd tween.js
npm i .
npm run build

๐ŸŽ‰ Acknowledgements

I would like to give thanks to my supervisor for this project, Mehrtash Harandi. Another big thanks has to go to the creators of DeepCubeA, which provided the algorithm of this project.

rubiks-cube-reinforcement-learning's People

Contributors

phadonp avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

rubiks-cube-reinforcement-learning's Issues

Training Stops at epoch 110/200

After running the train command, "python train.py -c config/cube3.ini -mp True" the process stops at epoch 110/200. The final error messages are attached in the following screenshot.
Capture

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.