Comments (5)
And I win for misspelling function in the issue title ;).
from async-rl.
@stokasto, probably not for 1. :
look closely at :
def sampled_actions_log_probs(self): return F.select_item( self.log_probs, chainer.Variable(np.asarray(self.action_indices, dtype=np.int32)))
I assume the function is equivalent to multiply by one-hot encoded target.
from async-rl.
@etienne87 Ah yes, that is fine then. As I wrote I did not overly carefully study the code during my re-implementation but just thought that I would bring these two issues up since (in case they are true) they are hard to detect and could be lurking in the code-base for a while.
Thank you for falsifying the first issue.
from async-rl.
Thank you for your comments.
- As @etienne87 wrote,
F.select_item
returns only the corresponding log probability of the selected action (which is stored inself.action_indices
ofSoftmaxPolicyOutput
). float(advantage.data)
is what makes sure the gradients won't flow throughtv
.advantage
is achainer.Variable
, which can back-propagate gradients, butfloat(advantage.data)
is literally just a pythonfloat
and no gradients will be back-propagated through it.
from async-rl.
Hey, a great that explains it, I was wondering how the implementation could have converged to a reasonable value function if that error were indeed there.
As I said I am not experienced with chainer and so did not get this little interaction. My results also look fairly similar to the graphs you have posted. I'll upload it later and link to your implementation.
Feel free to close this issue.
from async-rl.
Related Issues (20)
- Running on GPU HOT 8
- Trivial scaling question HOT 3
- add Play game with visualization? HOT 1
- How adaptable is this to a completely different environment? HOT 1
- Can this program run on GPU? HOT 1
- Gradient clipping and reward normalization parameters
- t_max = 1000 , loss normalization HOT 1
- Installation: ImportError: No module named 'ale_python_interface' HOT 1
- Non-performant A3C-LSTM model for Space Invaders
- Sign of pi_loss? HOT 3
- Crashes of Spawned Proceeses
- Color transform is incorrect. HOT 1
- the action (x4) semantics different? HOT 1
- where is the final model and score record saved?
- cannot evaluate on trained model
- Why the value loss need to devide 2 in line 108 of a3c.py HOT 1
- I only have the machine with 4 cpus with each one has 8 cores. Can I train with 32 processes or only 8 processes?
- What to put in <path-to-rom> HOT 2
- About the ALE settings HOT 5
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from async-rl.