Is there any limitation as to what those numbers can be? <div cl

magic numbers about cartpolechallenge HOT 8 OPEN

iacore commented on August 16, 2024

magic numbers

from cartpolechallenge.

Comments (8)

Blimpyway commented on August 16, 2024 1

Sorry for the delayed response
This strategy should work for problems that require physical skill, or fast response. It loosely corresponds to gamma parameter (aka discount factor) in RL/Q-learning. This value - of 10 - is hand-picked for sample efficiency performance, I remember testing between 8 and 18 the cartpole env. is solved in a couple dozen episodes give or take. I mean failed episodes, >90% of the time the algorithm spends is running the extra ~100 episodes without failing (nor learning) till the required score is reached.

I don't remember where did I get the idea. Initially I was thinking of associative memories where each bit pair maps to a bunch of pointers. Pointers towards anything. So it's an idea from another idea.

The takeaway here is the bit pair expansion of a SDR : mapping an N dimensional array of sparse bits to an N*(N-1)//2 array of bits (which are "bit pair addresses") . I guess the juice is in this in this bit-to-bit relationship mapping.

e.g. if you expand a (28x28=) 784 bit B/W mnist digit image to a whooping 784*783//2 = 306936 bit pair size, a linear classifier will give much better results when trained on the expanded images than on the images themselves. Because it .. provides more context, not just points of data but correlations between points of data.

I used it to make a mnist classifier, a regressor, and an ID-map where SDR is the key and the value is an arbitrary number which can be a pointer or identifier.

The only person I know using the same addressing concept - pairing bits in SDRs - is this one he uses them to map SDRs to other SDRs. It is an associative memory. Well, what I do here (and I'm more interested in) is the "diadic" variant where a single SDR is used as a key. The "triadic" uses maps a triple of (x,y,z) to each other. Which means any two of the x,y,z can be the "key" and the remaining one the value.

It shares similarities with a vector database but is different. In a database there is room for regardless how many vectors you want to put in. Here is competition, the most frequent or most recent (depending on implementation) value stored is the most representative.

So unlike a database it is limited (fixed) in capacity which has different use cases. (e.g. works well for a short term memory, writings are quite fast, old values tend to be overwritten).

from cartpolechallenge.

iacore commented on August 16, 2024

also, here, sdr[i] - 1 and range(1 i don't quite understand.

def address_list(sdr): 
    """
    Converts a SDR to a list of value map addresses
    """
    ret = []
    for i in range(1, len(sdr)):
        ival = (sdr[i]*(sdr[i]-1))//2
        for j in range(i):
            ret.append((ival + sdr[j]))
    return ret

from cartpolechallenge.

Blimpyway commented on August 16, 2024

Last time I tested it worked with all 4 values set to -inf, the only difference is it spends a few more rounds till the encoder max limits dynamically adjust to the actual max values received from the environment.

The magic ones cut that adjustment time, and were found by running randomly a few rounds.

You can simply check that by uncommenting the following line in the source.

from cartpolechallenge.

iacore commented on August 16, 2024

Thanks for answering! I'm new to the sparse encoding thing (SDR, CSDR, SPH). What's the best way I can learn about all these? Is there an active community?

from cartpolechallenge.

Blimpyway commented on August 16, 2024

Regarding the address there,
It is a bit more talking about you first should be familiar with Numenta's concept of SDRs (Sparse Distributed Representation).
The short story is it simply projects a SDR of size (e.g.) 100 bits into a much larger one of size 100*(100-1)/2 , by enumerating all pairs of 1s in the initial SDR. What makes this projection interesting is it can be used as an associative memory - if current state "looks" similar to a previous state you can read recorded values to predict value of (danger associated with) current state

So the pipeline in my CartPole demo is:

encode each env scalar into a 100 value long vector.
add the resulting 4 vectors (sum)
generate a SDR by simply taking top (e.g.) 20 values of the sum
expand this SDR to a larger one using address()
add "danger" values at every address

There is numenta's forum, however, apart from the concept of SDRs for representing values (or a bunch of them) everything here is my own research project.

Here-s an old mention of CycleEncoder concept https://discourse.numenta.org/t/cycleencoder-and-varcycleencoder/10751

Most likely all the above is confusing. What the example here does is quite similar to Q-Tables, except instead of binning the 4 state values in a huge 4-dimensional array it just projects it in a one-dimensional bit pair value map.

from cartpolechallenge.

iacore commented on August 16, 2024

Most likely all the above is confusing

I somehow understood it.

I'll read more about CSDR and SDR before asking more questions.

from cartpolechallenge.

Blimpyway commented on August 16, 2024

Any time, thanks for giving it a try.

from cartpolechallenge.

iacore commented on August 16, 2024

I thought about the code a bit more.

learning from only last 10 failed frames -> I think this works because CartPole needs a PID controller. Does this strategy (only learning from (moments before) failure) work for other problems?

bitpair value maps -> This is the first time I have heard of this technique. Where did you get the idea? Is this useful in solving other problems?
it looks vaguely like "vector database" (where you associate arbitrary data with float vector), but for SDR

from cartpolechallenge.

magic numbers about cartpolechallenge HOT 8 OPEN

Comments (8)

Related Issues (2)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent