Light

ivegner / pydsrl Goto Github PK

View Code? Open in Web Editor NEW

13.0 3.0 5.0 52 KB

Faithful Python implementation of the paper "Towards Deep Symbolic Reinforcement Learning" by Garnelo et al.

License: MIT License

Python 100.00%

pydsrl's Introduction

PyDSRL

Faithful Python implementation of the paper "Towards Deep Symbolic Reinforcement Learning" by Garnelo et al.

Work in progress, please feel free to contribute.

Instructions

Confirmed to work with Python 3.8.5.

pip install -r requirements.txt

or pip install -r requirements.freeze

Run with python main.py.

Issues:

Object tracking occasionally messes up, and relations get jumbled. Cause unknown.
Agent breaks upon encountering a cross during an episode

pydsrl's People

Contributors

Stargazers

Watchers

Forkers

e2crawfo calyri chesternimiz lbjnormalworld nikehop

pydsrl's Issues

Test Phase

Hello,

I would like to know how the code could be tested after the training is finished?

Thanks

Understanding Problems / Code mistakes

Hey,

thanks for providing a base implementation for the method presented in "Towards Deep Symbolic Reinforcement Learning".

With respect to your implementation, I have the following comments/ questions:

state_builder.py:

(1) line 177: We should check the absolute difference between x.position and entity.position otherwise x might not be in the radius but still part of the list when the difference is negative.

(2) line 143-145: In the deletion of the not-tracked entities. This is only a problem if the object tracking messes up and the number of objects seen changes. Since the entities to be deleted are from the last state their id in the self.tracked_entities list might have changed. Lets say you track 15 entities and in your current do_not_exist you have index 10 which you delete and in the current timestep you add 15 to the newlynonexistent list, and no new entities are added in the next timestep this loop will throw an error.

agent.py

(1) The update function of the Tabular Agent. If I am not completley mistaken, I think there is a mistake in the paper in the update equation of the tabular Q-function. At least I do not see a reason why the action value of (s_t,a_t) should also be discounted. Additionally in your calculation of the current and next step action value you always sum over all the interactions present at the current timestep while only updating the Q-table for the specific type of interaction as described in the paper. Is there a reasoning behind it ?

autoencoder.py

(1) type consistency: In the current get_entities function of the autoencoder the representative activations are always determined newly for a new timestep, which can lead to inconsistencies across timesteps. For example, when there is an entity at the top left corner with a specific type it gets assigned type-0. If now the agent collects this entity after a certain number of timesteps the agent gets assigned type 0 which I think is undesired behaviour because it makes the matching of tacked_entities and new entities harder in the build representation function.

If I change the points mentioned above the implementation runs without problems. I hope these comments can help to fix your implementation, if I misunderstood some part let me know.

Since the paper itself provides very little information on how the method was actually implemented I was wondering whether you already contacted the authors and got some additional information about the implementation that is not part of the paper.

All the best,

How to run your program?

How to run your program? Should I to create a virtual environment named CrossCircle-MixedRand-v0 ?

I'm running: python main.py, It got errpr: layout() got an unexpected keyword argument 'num_entities' in line 56:
states.append(temp_env.make_random_state(min_entities, max_entities))

Environment Problem

Hi ivegner,
I have some problems with PyDSRL when I'm trying to reproduce the Python environment by following your requirements.txt.
Could you please specify the packages' version in requirements.txt? And what Python version to run the program?
Thanks!

Test Phase

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.