Giter VIP home page Giter VIP logo

pydsrl's Introduction

PyDSRL

Faithful Python implementation of the paper "Towards Deep Symbolic Reinforcement Learning" by Garnelo et al.

Work in progress, please feel free to contribute.

Instructions

Confirmed to work with Python 3.8.5.

pip install -r requirements.txt

or pip install -r requirements.freeze

Run with python main.py.

Issues:

  • Object tracking occasionally messes up, and relations get jumbled. Cause unknown.
  • Agent breaks upon encountering a cross during an episode

pydsrl's People

Contributors

ivegner avatar e2crawfo avatar

Stargazers

 avatar  avatar  avatar Harsha avatar Frederik avatar qimo avatar Abhishek Singh avatar Aron Hammond avatar Kiyo Kunii avatar Lewis Hammond avatar saman avatar Smrutiranjan Sahu avatar Zach Schaffer avatar

Watchers

James Cloos avatar Zach Schaffer avatar  avatar

pydsrl's Issues

Test Phase

Hello,

I would like to know how the code could be tested after the training is finished?

Thanks

Understanding Problems / Code mistakes

Hey,

thanks for providing a base implementation for the method presented in "Towards Deep Symbolic Reinforcement Learning".

With respect to your implementation, I have the following comments/ questions:

state_builder.py:

(1) line 177: We should check the absolute difference between x.position and entity.position otherwise x might not be in the radius but still part of the list when the difference is negative.

(2) line 143-145: In the deletion of the not-tracked entities. This is only a problem if the object tracking messes up and the number of objects seen changes. Since the entities to be deleted are from the last state their id in the self.tracked_entities list might have changed. Lets say you track 15 entities and in your current do_not_exist you have index 10 which you delete and in the current timestep you add 15 to the newlynonexistent list, and no new entities are added in the next timestep this loop will throw an error.

agent.py

(1) The update function of the Tabular Agent. If I am not completley mistaken, I think there is a mistake in the paper in the update equation of the tabular Q-function. At least I do not see a reason why the action value of (s_t,a_t) should also be discounted. Additionally in your calculation of the current and next step action value you always sum over all the interactions present at the current timestep while only updating the Q-table for the specific type of interaction as described in the paper. Is there a reasoning behind it ?

autoencoder.py

(1) type consistency: In the current get_entities function of the autoencoder the representative activations are always determined newly for a new timestep, which can lead to inconsistencies across timesteps. For example, when there is an entity at the top left corner with a specific type it gets assigned type-0. If now the agent collects this entity after a certain number of timesteps the agent gets assigned type 0 which I think is undesired behaviour because it makes the matching of tacked_entities and new entities harder in the build representation function.

If I change the points mentioned above the implementation runs without problems. I hope these comments can help to fix your implementation, if I misunderstood some part let me know.

Since the paper itself provides very little information on how the method was actually implemented I was wondering whether you already contacted the authors and got some additional information about the implementation that is not part of the paper.

All the best,

N

How to run your program?

How to run your program? Should I to create a virtual environment named CrossCircle-MixedRand-v0 ?

I'm running: python main.py, It got errpr: layout() got an unexpected keyword argument 'num_entities' in line 56:
states.append(temp_env.make_random_state(min_entities, max_entities))

Environment Problem

Hi ivegner,
I have some problems with PyDSRL when I'm trying to reproduce the Python environment by following your requirements.txt.
Could you please specify the packages' version in requirements.txt? And what Python version to run the program?
Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.