Giter VIP home page Giter VIP logo

Comments (8)

Blimpyway avatar Blimpyway commented on August 16, 2024 1

Sorry for the delayed response
This strategy should work for problems that require physical skill, or fast response. It loosely corresponds to gamma parameter (aka discount factor) in RL/Q-learning. This value - of 10 - is hand-picked for sample efficiency performance, I remember testing between 8 and 18 the cartpole env. is solved in a couple dozen episodes give or take. I mean failed episodes, >90% of the time the algorithm spends is running the extra ~100 episodes without failing (nor learning) till the required score is reached.


I don't remember where did I get the idea. Initially I was thinking of associative memories where each bit pair maps to a bunch of pointers. Pointers towards anything. So it's an idea from another idea.

The takeaway here is the bit pair expansion of a SDR : mapping an N dimensional array of sparse bits to an N*(N-1)//2 array of bits (which are "bit pair addresses") . I guess the juice is in this in this bit-to-bit relationship mapping.

e.g. if you expand a (28x28=) 784 bit B/W mnist digit image to a whooping 784*783//2 = 306936 bit pair size, a linear classifier will give much better results when trained on the expanded images than on the images themselves. Because it .. provides more context, not just points of data but correlations between points of data.

I used it to make a mnist classifier, a regressor, and an ID-map where SDR is the key and the value is an arbitrary number which can be a pointer or identifier.

The only person I know using the same addressing concept - pairing bits in SDRs - is this one he uses them to map SDRs to other SDRs. It is an associative memory. Well, what I do here (and I'm more interested in) is the "diadic" variant where a single SDR is used as a key. The "triadic" uses maps a triple of (x,y,z) to each other. Which means any two of the x,y,z can be the "key" and the remaining one the value.

It shares similarities with a vector database but is different. In a database there is room for regardless how many vectors you want to put in. Here is competition, the most frequent or most recent (depending on implementation) value stored is the most representative.

So unlike a database it is limited (fixed) in capacity which has different use cases. (e.g. works well for a short term memory, writings are quite fast, old values tend to be overwritten).

from cartpolechallenge.

iacore avatar iacore commented on August 16, 2024

also, here, sdr[i] - 1 and range(1 i don't quite understand.

def address_list(sdr): 
    """
    Converts a SDR to a list of value map addresses
    """
    ret = []
    for i in range(1, len(sdr)):
        ival = (sdr[i]*(sdr[i]-1))//2
        for j in range(i):
            ret.append((ival + sdr[j]))
    return ret

from cartpolechallenge.

Blimpyway avatar Blimpyway commented on August 16, 2024

Last time I tested it worked with all 4 values set to -inf, the only difference is it spends a few more rounds till the encoder max limits dynamically adjust to the actual max values received from the environment.

The magic ones cut that adjustment time, and were found by running randomly a few rounds.

You can simply check that by uncommenting the following line in the source.

from cartpolechallenge.

iacore avatar iacore commented on August 16, 2024

Thanks for answering! I'm new to the sparse encoding thing (SDR, CSDR, SPH). What's the best way I can learn about all these? Is there an active community?

from cartpolechallenge.

Blimpyway avatar Blimpyway commented on August 16, 2024

Regarding the address there,
It is a bit more talking about you first should be familiar with Numenta's concept of SDRs (Sparse Distributed Representation).
The short story is it simply projects a SDR of size (e.g.) 100 bits into a much larger one of size 100*(100-1)/2 , by enumerating all pairs of 1s in the initial SDR. What makes this projection interesting is it can be used as an associative memory - if current state "looks" similar to a previous state you can read recorded values to predict value of (danger associated with) current state

So the pipeline in my CartPole demo is:

  • encode each env scalar into a 100 value long vector.
  • add the resulting 4 vectors (sum)
  • generate a SDR by simply taking top (e.g.) 20 values of the sum
  • expand this SDR to a larger one using address()
  • add "danger" values at every address

There is numenta's forum, however, apart from the concept of SDRs for representing values (or a bunch of them) everything here is my own research project.

Here-s an old mention of CycleEncoder concept https://discourse.numenta.org/t/cycleencoder-and-varcycleencoder/10751

Most likely all the above is confusing. What the example here does is quite similar to Q-Tables, except instead of binning the 4 state values in a huge 4-dimensional array it just projects it in a one-dimensional bit pair value map.

from cartpolechallenge.

iacore avatar iacore commented on August 16, 2024

Most likely all the above is confusing

I somehow understood it.

I'll read more about CSDR and SDR before asking more questions.

from cartpolechallenge.

Blimpyway avatar Blimpyway commented on August 16, 2024

Any time, thanks for giving it a try.

from cartpolechallenge.

iacore avatar iacore commented on August 16, 2024

I thought about the code a bit more.

learning from only last 10 failed frames -> I think this works because CartPole needs a PID controller. Does this strategy (only learning from (moments before) failure) work for other problems?

bitpair value maps -> This is the first time I have heard of this technique. Where did you get the idea? Is this useful in solving other problems?
it looks vaguely like "vector database" (where you associate arbitrary data with float vector), but for SDR

from cartpolechallenge.

Related Issues (2)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.