reinforcement-learning-kr / pg_travel Goto Github PK
View Code? Open in Web Editor NEWPolicy Gradient algorithms (REINFORCE, NPG, TRPO, PPO)
License: MIT License
Policy Gradient algorithms (REINFORCE, NPG, TRPO, PPO)
License: MIT License
1 개의 액터러너를 가지고 샘플을 모아서 학습시키는 것은 학습 속도가 느린 것 같습니다. 또한 여러개의 액터러너로 학습시킨 에이전트보다 policy의 quality가 상당히 낮기 때문에 여러 개의 액터러너를 가지고 학습해야할 것 같습니다. 다음과 같은 순서로 진행하면 될 것 같습니다.
일단 이게 되어야 뒤의 다른 작업들을 진행할 수 있기 때문에 가능한 한 빠르게 구성해주시면 좋을 것 같습니다.
현재 unity ppo 코드는 로컬 랩탑(cpu only)에서 돌리는데 mujoco와 달리 state와 action space가 커서 gpu가 있는 서버에서 돌려야합니다. 게다가 ppo는 gpu를 trpo보다 잘 활용할 수 있는 알고리즘입니다.
따라서 다음을 수행해야합니다.
코드를 돌려보니, gpu 사용이 없어서 질문 드립니다.
Does this code run on cpu by default?
When i run this code, there seems no gpu usage during the execution.
학습은 기존 평평한 곳에서 학습시킨 PPO 에이전트를 베이스라인으로해서 학습
환경은 가능하다면 민규식님의 도움을 받아볼 것.
아래는 대충 나눈 거니까 두 분이서 의논하시면서 진행하시면 어떨까 싶습니다.
중간중간 이 이슈에 과정 남겨주세요!
I successfully installed mujoco, but when i import it, I got this problem...
--
PermissionError Traceback (most recent call last)
/usr/local/lib/python3.5/dist-packages/lockfile/linklockfile.py in acquire(self, timeout)
18 try:
---> 19 open(self.unique_name, "wb").close()
20 except IOError:
PermissionError: [Errno 13] Permission denied: '/usr/local/lib/python3.5/dist-packages/mujoco_py-1.50.1.59-py3.5.egg/mujoco_py/generated/wonchul-60572700.8099-2917094554558463988'
I followed all you mentioned...
Could you help me?
In main.py line 93,
action = get_action(mu, std)[0]
then action is just a scalar.
Is that a problem?
I see that in the actor critic model(model.py) it outputs the mu and logstd as an output. In the code, logstd is fixed to 0 by defining it "logstd = torch.zeros_like(mu)" making the standard deviation fixed to 1. But as far as I know it should return the logstd which is also learned by the network(in this case logstd would be the output of some layer). Is there any reason for this behavior?
아마 다음과 같은 순서로 진행하면 되지 않을까 싶습니다.
도움 필요하면 언제나 요청해주세요!
README에는 다음 내용이 들어가야합니다.
코드 주석은 알고리즘에 대해 주석이 없으면 이해하기 어려운 부분에 추가하도록 합니다.
when i run main.py i met this problem:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
How can i solve this? Thanks
I think this(https://github.com/reinforcement-learning-kr/pg_travel/blob/master/mujoco/main.py#L117) should be like below??
if iter % 100 == 0:
Hello,
I'm planning to use your PPO implementations, which seem well-written, clear and easy to understand. But first, I'd like to have the answer to the following question:
In OpenAI baselines, environments are passed to various classes, such as VecNormalize or Observation/Reward Wrappers or even Monitor. In these cases, observations and rewards are transformed in order to ease learning. However, there is a lot of encapsulation and it makes it kinda difficult to follow the chain. After a quick glance at your implementations, I'm under the impression that you do transform the observations in unity/utils/running_state.py
. Is that so ? Are there other transformations ? Or were you just careful while designing the environment, designing it to make sure rewards were appropriately scaled ?
Thanks a lot for your answers.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.