Currently we are trying to recreate the door-binary results but it seems either the pe

Reproducing normalized scores on door-binary about rlpd HOT 5 CLOSED

ikostrikov commented on August 11, 2024

Reproducing normalized scores on door-binary

from rlpd.

Comments (5)

StoneT2000 commented on August 11, 2024 1

ah ok I misunderstood, I thought the paper used absolute value not percent. Thank you this clears it up

from rlpd.

philipjball commented on August 11, 2024 1

No problem, I'd just like to add for future reference/readers that this is the same evaluation methodology for these domains as was performed in prior and concurrent work.

from rlpd.

philipjball commented on August 11, 2024

I think this is actually correct; bear in mind that the door environment has a horizon of 200 (whereas pen is 100)! So basically if you halve everything, you seem to get a score of around 80, which I believe matches the paper? Let me know if this makes sense 😄

from rlpd.

StoneT2000 commented on August 11, 2024

Sorry this doesn't quite make sense?

I believe the reward function is -1 if not success state, and 0 if it's a success states

In that case, if the horizon is even 1000 steps, i should expect the same total return if the horizon is 100 steps as in both cases the agent completes the task in the same amount of time, accumulating the same number of -1 reward before getting 0 for the rest of the episode.

Let me know if that helps explain my confusion?

And thanks for responding to all the other issues!

from rlpd.

philipjball commented on August 11, 2024

Recall that we normalize w.r.t. the horizon so that all values fall between 0 and 100. In this case with T=1000, let's say you solved the task in 900 steps; this means getting a normalized score of 10 (not -800), since you only resided in a "success" state for 10% of the duration of the experiment. I hope this makes sense!

from rlpd.

Recommend Projects

Reproducing normalized scores on door-binary about rlpd HOT 5 CLOSED

Comments (5)

Related Issues (6)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent