floybix / comportex-archived Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 1.17 MB

Private fork of Comportex

Clojure 100.00%

comportex-archived's People

Stargazers

Watchers

comportex-archived's Issues

Substitution pooling experiments

Posting this in a third place! It struck me it makes most sense to discuss it close to the code.

@floybix as part of these substitution pooling experiments you've implemented a concatenated "context space" representation, which concatenates all sequences (or maybe you allow some decay??) into a single SDR.

As a first application you used this to give extended context (as I understand it) to input states via feedback.

And in these last commits you implement code to find overlaps between cells in different substrings in this concatenated "context space" representation.

Those are two applications of such a concatenated "context space" representation.

I now think full substitution pooling might be implemented in as few as two combined steps applied to such a representation (two steps: suggestive of oscillation in the cortex??)

Those steps would be, for a presented sequence:

Generalize columns based on cell states
Group columns by number of cell states

To make it concrete, in the context of your latest "overlap" commits:

You've identified overlaps (between strings or subsequences.) If we now merge overlapping sequences together based on these overlaps that would be something like my step 1).

Now, I'm not sure what it means to find overlaps out of a single undifferentiated sequence and merge them. I guess what that might mean in concrete terms is that we run over the columns of the concatenated "context space", and anywhere there are shared cells (your "overlap") you take the rest of the cells from one column and add them to the other (so if two columns have enough context similarity, we overlay (stack) all the contexts of each on the other.)

This is the "sensitivity" (thinking) step. Because it varies as we decide the cells of two columns have "enough" similarity.

For step 2) I currently think we would want to take an average of cells per column, as we follow paths through the columns of the concatenated context space (a little like our original "path counting"! But with context information not only kept, but enhanced by generalization this time.) I think this average should go down as more columns are added internal to a pooled state by step 1), so go down as more states are substituted/generalized. And the average should go up as we approach a state boundary/sequence division point, which by definition will be marked by the possibility for the (pooled) state to occur in many contexts.

We might do an experiment to plot this average as we trace all sequences through the "context space". My hunch is it should resolve itself into clear boundaries in some way (probably as highs in the average cells per column, averaged over all paths to a given point.)

But step 2) would be basically a "decoding" step, in the sense of our previous discussions: identifying a sequence of states and unfolding it from the concatenated "context space" representation. (The only difference would be that now the context space representation will have been generalized by step 1) -- note: we might need another layer to put this generalization -- and so its "states", and the sequences they occur in, will have been generalized/simplified/enlarged, too.) You may have other insights how to perform "decoding" which make more sense than my "average cells states per column over all paths" state boundary criterion.

Word Sense Disambiguation experiments

https://www.cs.york.ac.uk/semeval-2013/task13.html

(there is another WSD task in 2015 but that is more complicated since it includes entity linking)

Semantic Textual Similarity experiments

http://alt.qcri.org/semeval2016/task1/
http://ixa2.si.ehu.es/stswiki/index.php/Main_Page

Compare pairs of sentences and return a similarity score.

Phoneme Sequence experiments

Our brains are amazing at converting sequences of syllables into words. Finding the gaps between them.

Experiment: First, convert bodies of text into phonemes, possibly using this dictionary: http://www.speech.cs.cmu.edu/cgi-bin/cmudict . Observe an HTM's ability to put names on sequences of phonemes, to recognize them.

This experiment could come in two stages:

Supervised learning
Unsupervised learning

In the supervised experiment, we train the HTM by forcing encodings in a higher region. Give it a sequence of phonemes at the bottom, and a word encoding at the top. Then, to test it, give it a long sequence of phonemes and let it convert it to words.

In the unsupervised experiment, the HTM endlessly inhales phonemes, and decides which sequences to give names to. I haven't decided what this testing process would look like.

I'd start with the supervised experiment. In some sense, the HTM "commits" an interpretation of a word when it rules out the possibility that a subsequent phoneme is a continuation of the word. From afar, this seems doable, but perfecting this process is a main goal of this experiment.

The end result: it can reconstruct the text of a Wikipedia article from its phonemes. A big win here is that we don't need to hand-craft the input data or manually evaluate the HTM's results. If the output word sequence is equal to the input word sequence, it worked. The internet is our corpus.

I haven't decided whether phoneme transitions/predictions come solely from lateral connections, or also from higher layer feedback. Our current thinking is a little unclear on which context lives in higher layers and which lives in the which-cell-in-the-column choice. I'm hoping in this project I'd get some more clarity on that.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.