vinf / deer Goto Github PK

View Code? Open in Web Editor NEW

485.0 50.0 127.0 12.9 MB

DEEp Reinforcement learning framework

License: Other

Python 98.26% Shell 1.74%

deep-reinforcement-learning q-learning policy-gradient

deer's People

Contributors

Stargazers

Watchers

Forkers

scatterbrain333 mpezeshki xypan1232 geneing notimesea amoliu zkailinzhang chentao1999 hitluobin xindaya wanjinchang liuyi103 caomw liukang92 lu839684437 phvu oftensmile hassyma nberah jayinai bwry ai42 zencoding euwen gzzgz qiuz paulhendricks arjoly fdoperezi ml-ai-nlp-ir reesemarlin jude2014 hotelzululima bigcapitalist binderwang halofanx qgzang zhang5555 yonglehou hedgefair coolspiderghy flowgrad moshiasri jsonbao tonyan vyraun zeyuan1987 soroushmehr wagamamaz wjbianjason nzatsepilov deepalcoholic mehdimo ksahare m3y3l fulquan frederikruelens pencilandbike chop2 saadmahboob python3pkg shehroze37-zz sxdkxgwan nguyenvchuong jfct001 thomas1523075209f4x aigodfather amitection howl-anderson kocurekc winjoy meelement epochstamp yangzsnews cherishywang wn9081 stefangordon afcarl codeaudit rajpratim jdc08161063 dantodor createamind i6173215 mklissa revaapriyan antoninbrthn yanxg vmuthuk2 viv92 akdspw cdgaete andrewpaulchester parthchadha mzy2240 loavila seancarverphd newxei hardikmeisheri zhangtjtongxue

deer's Issues

[Feature Request] Growing toy_env further ?

Hi Vince, many many thanks for this wonderful small toy called 'toy_env'. It is a joy to watch this toy's learning how to make a progress on 'buy and sell' technique, almost as taught on the trading textbook !
My background is trading, ie. not coding, hence find it very difficult to refine this framework further. For example changing the price feed structure from 'random' to the real numbers like 'csv', or provide more terrain information for it to make a better(or might be worse, of course) decision. Most likely this kind of work is for me to sort out, but wondered whether you have any plan to mature this toy environment further.

Be aware of binary object in git

I would avoid keeping binary object such as those in the repository.

More information about the LongerExplorationPolicy

Hey VinF,

do you have more information about the LongerExplorationPolicy?

I'm wondering whether this policy is suitable for my environment. How should the length parameter be chosen?

Thanks!

Best wishes

MemoryError on run_PLE.py example

Hi, I tried running the PLE example on a simple pygame I came up with but encountered the following error. Would appreciate some guidance here on how to overcome the error. Thanks.

Traceback (most recent call last):
File "run_PLE.py", line 190, in
agent.run(parameters.epochs, parameters.steps_per_epoch)
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 282, in run
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\experiment\base_controllers.py", line 346, in onEpochEnd
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 173, in startMode
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 434, in init
File "C:\Users\speedy\Anaconda3\lib\site-packages\deer-0.3-py3.5.egg\deer\agent.py", line 676, in init
MemoryError

Bug found

Found a bug in the override method "append" for CircularBuffer in agent. I fixed it. Basically it is a indexing error.

TypeError: init() got an unexpected keyword argument 'random_state'

envy@ub1404:/os_pri/github/General_Deep_Q_RL/examples/toy_env$ python run_toy_env_simple.py
Using gpu device 0: GeForce GTX 950M (CNMeM is disabled, CuDNN 4007)
/home/envy/.local/lib/python2.7/site-packages/theano/tensor/signal/downsample.py:5: UserWarning: downsample module has been moved to the pool module.
warnings.warn("downsample module has been moved to the pool module.")
Traceback (most recent call last):
File "run_toy_env_simple.py", line 23, in
random_state=rng)
TypeError: init() got an unexpected keyword argument 'random_state'
envy@ub1404:/os_pri/github/General_Deep_Q_RL/examples/toy_env$

First Install issue

I ran the PIP installer under Ubuntu 15.10 under SUDO and got the following:

$ sudo pip install deer
Downloading/unpacking deer
Downloading deer-0.2.4-py2-none-any.whl (122kB): 122kB downloaded
Installing collected packages: deer
Compiling /tmp/pip-build-duZeFH/deer/deeprl/core_optim.py ...
Sorry: IndentationError: unindent does not match any outer indentation level (core_optim.py, line 144)
Successfully installed deer

I don't know if this is going to be a problem, but subsequent PIP installs yield:
$ sudo pip install deer
Requirement already satisfied (use --upgrade to upgrade): deer in /usr/local/lib/python2.7/dist-packages
Cleaning up...

So it might have installed properly.

Great project, I can't wait to work with it.

ImportError: No module named 'deer.policies'

python run_toy_env.py returns:

Traceback (most recent call last):
File "run_toy_env.py", line 17, in
from deer.policies import EpsilonGreedyPolicy
ImportError: No module named 'deer.policies'

Using Python 3.5.2

Standalone Python module

It would help organize the project if the Python code was organized as a standalone library instead of a bunch of scripts. In this way, you could also write nice standalone examples and unit tests.

Naming convention for this project

The naming convention for this project has been fixed : http://deer.readthedocs.io/en/latest/user/development.html#naming-conv

The downside is that if you upgrade from 0.2.x (stable branch) to 0.3.devx (master branch) and still want to use the "old version" of the examples from the 0.2.x version, you will need to slightly modify the run_XXX of your examples. You can take a look at how run_toy_env.py has been modified from 0.2.x to 0.3.devx and do the same :
0b97398#diff-df55532cc225e6233c89a825fb61048c

Sorry for the hassle (should be the only one of that kind for a while!)

MG example with custom environment

Hi, first of all congratulations, interesting repo.
I would like to use the MG example by modifying the environment a bit. In particular, in my case there is no long-term storage, but only the battery (and obviously also consumption and production). I also have historical data about the energy storage system (kwh) of a building and would like to integrate them into the environment.
Can you help me?

CRAR continuous action space

Hi,

thank you very much for your work, it has helped me a lot until now!

Is there any possibility that the CRAR implementation will be adapted for the continuous action space in the near future? I managed to adapt the NN_CRAR_keras adapter (and fix a few bugs), but the CRAR learning algo itself is a little bit over my head for now.

Best regards,
Nik

MG_two_storages

Hi dear Vincent,
Thank you very much for your work, it is very helpful. I tried to run MG_two_storages that I realized an error occur as follow, in FindBestController. Does it need any configuration before run?

Best regards,
EJ
"Average (on the epoch) training loss: 0.9318450183311346
Episode average V value: 0
epoch 1:
Learning rate: 0.0002
Discount factor: 0.9
Epsilon: 0.9987931999999653
Best neural net obtained after 1 epochs, with validation score -78.40202677778416
Traceback (most recent call last):
File "run_MG_two_storages.py", line 194, in
agent.run(parameters.epochs, parameters.steps_per_epoch)
File "/usr/local/lib/python3.7/dist-packages/deer/agent.py", line 269, in run
self._run_train(n_epochs, epoch_length)
File "/usr/local/lib/python3.7/dist-packages/deer/agent.py", line 296, in _run_train
for c in self._controllers: c.onEpochEnd(self)
File "/usr/local/lib/python3.7/dist-packages/deer/experiment/base_controllers.py", line 338, in onEpochEnd
agent._run_non_train(n_epochs=1, epoch_length=self._epoch_length)
File "/usr/local/lib/python3.7/dist-packages/deer/agent.py", line 324, in _run_non_train
for c in self._controllers: c.onEnd(self)
File "/usr/local/lib/python3.7/dist-packages/deer/experiment/base_controllers.py", line 558, in onEnd
print("Test score of this neural net: {}".format(self._testScores[bestIndex]))
IndexError: list index out of range"

ReadTheDocs Link Broken

The ReadTheDocs link provided in the readme is currently broken.

Is there any pre-trained model?

MG two sorages

hello dear Vincent,
Thank you so much for your precious work, it is so helpful and practical for me. I have some questions about micro gride two storages. For my data it's a time consuming process about 53 minutes for each epoch, and every time that I want to do some little tuning, I must run it again (with a new untrained network). Is it possible to save and use the trained network in a new training epoch?

Thank you
Erfan

how can i test my deer model

i creat deer model with custom gym environment and i want to test it, please help me

Error for bleeding edge version installation

Dear Mr.VinF,

I follow your instruction to install bleeding edge version, but I cannot apply the following code you already mentioned: pip install git+git://github.com/VINF/deer.git@master.

I got the error: The system cannot find the file specified while executing command git clone -q git://github.com/VINF/deer.git C:\Users\admin\AppData\Local\Temp\pip-r24t5v_x-build
Cannot find command 'git'

Could you help me figure out this problem?

[FEATURE REQUEST] - TensorFlow

I would like to request adding TensorFlow compatibility. I would like to help on this if I can, but don't fully understand what needs to be done/built in TensorFlow as well as how to integrate that into your library.

Action limits are getting exceeded

Dear VinF,

thank you very much for this great library!

I have noticed following behavior:
Example: mountain_car_continuous_env.py
The action space ist limited to [-1.0, 1.0]
But during the training sometimes values bigger than 1.0 occur.

Is it possible to prevent this?

Thanks!

[Feature Request] Weight Normalization

Hi VinF,

your library is very helpful. Thank you!

Weight normalization might be a way to make SGD-based algorithms suitable for a wider range of environments without the need to manually scale observation vectors and finely tune hyper-parameters. Moreover, the training process might be accelerated considerably for certain environments.

Maybe there is a straightforward way to apply weight normalization to your implementation of actor-critic learning like example code by OpenAI suggests. It appears that only the initialization of the critic would need to be adapted. The example code provides the adaptions for the SGD and Adam optimizers of Keras and the initialization of the critic's parameters based on a single minibatch of data.

The two challenges I see are the following:

creating the minibatch for the data-dependent initialization of the critic's parameters prior to training
adapting the RMSprop optimizer of Keras (Should be similar to Adam, though.)

I would really appreciate your thoughts on this.

EDIT: url of the links fixed

Conv2D channels_last in the Keras

Hi Vince, I've noticed that the default mode of Conv2D function in the Keras is channels_last. In the 57th line and 73rd line of the NN_keras.py, the Reshape operation sets the channels as the first dimension. In my test, the function cannot work properly. The mountain_car_continuous works well because the dim == 1 and dim[0] == 1.

AC_net_keras qnetwork.getAllParams()

Hey VinF, thank you very much for your great work! I'm enjoying to use deer a lot.

I would like to save trained ACNetworks.

But when I'm training a ACNetwork for example with run_mountain_car_continuous.py it's for me not possible to get (and save) the network parameters.

qnetwork.getAllParams()
-> AttributeError: 'Variable' object has no attribute 'get_value'

How can I save a network?

Thank for your help!

compatibility with H2O deep learning

Hi Vincent,

Thank you for this nice and useful work. I am wondering whether deep learning models from H2O can be integrated and used as part of learning model in deer?

DDPG implementation

Hey VinF,

thanks for your work!

I have questions about the DDPG implementation in deer.

Patrick Emami recommends in http://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html to use for the actor and critic two functions in separate classes.

Additionally, he adds the action tensor in the 2nd hidden layer of the Critic Network.

Is my assumption correct that the ddpg implementation in deer is different?

King regards,
Roman

Agent not learning for maze environment with CRAR

I am trying to reproduce the results of the CRAR agent in the maze environment and am observing that the agent's test reward is not improving at all. It stays at about -5 for all the 250 epochs. Can you please point me to the experiment settings that can reproduce the results?

Naming convention - Policy

Environment and Policy both contain act method, but they do quite different things.
In my opinion, act is a verb to perform sth. Therefore, in Policy abstract class, it should be a noun action just like bestAction. However, chooseAction and chooseBestAction are good, too.

class Policy(object):
    """Abstract class for all policies, i.e. objects that can take any space as input, and output an action.
    """

    def __init__(self, q_network, n_actions,random_state):
        self.q_network = q_network
        self.n_actions = n_actions
        self.random_state = random_state

        pass

    def bestAction(self, state):
        """ Returns the best Action
        """
        action = self.q_network.chooseBestAction(state)
        V = max(self.q_network.qValues(state))
        return action, V

    def act(self, state):
        """Main method of the Policy class. It can be called by agent.py, given a state,
        and should return a valid action w.r.t. the environment given to the constructor.
        """
        raise NotImplementedError()

How to use LSTM?

Dear VinF,

thank you for your great work.

I'm using temporary the DDPG algorithm. I would like to try to use the LSTM network.

I changed in AC_net_keras the line nr. 10 for this purpose to
from .NN_keras_LSTM import NN

When I'm trying to start the optimisation following error occurs:
"Q_net = neural_network_critic(self._batch_size, self._input_dimensions, self._n_actions, self._random_state, True)

TypeError: init() takes 5 positional arguments but 6 were given"

How is it possible to use it?

Thanks
dynamik

TypeError: _buildDQN() takes exactly 2 arguments (1 given)

I am runnning the run_ALE.py, and I use the keras model:

from deer.q_networks.q_net_keras import MyQNetwork.

However, this goes wrong.

Traceback (most recent call last):
File "run_ALE.py", line 85, in
rng)
File "/share/syou/deer/local/lib/python2.7/site-packages/deer/q_networks/q_net_keras.py", line 60, in init
self.q_vals, self.params = Q_net._buildDQN()
TypeError: _buildDQN() takes exactly 2 arguments (1 given)

Btw, when I use theano, it goes well. Any ideas?

q_networks.AC_net_keras, q_networks.q_network_keras and q_networks.q_network_theano only use 95mb of GPU

Do you happen to know a reason why it would use such little GPU memory? Trying to use the library but running into issues with this. It's saying that the GPU has no more space but the GPU has way more space than 95MB

Q_network set / dump does not work as expected

Hi Vince, thank you so much to offer this useful toolbox for us. I just find the Q_network cannot be dumped and resumed with setNetwork / dumpNetwork appropriately. The Learning rate / Epsilon / Discount factor cannot be transferred from trained model to the new one. We can add some tags in the agent.py in the _runEpisode after self._total_mode_reward += reward as follows:
print 'Action is {}, V is {}'.format(action, V)
print '#{} --- Reward is {}:'.format(maxSteps, reward)

TRPO algorithm

Hi Vince, many thanks to your fantastic work! I would like to know if there is a plant to support the TRPO algorithm? Thanks a lot!