Giter VIP home page Giter VIP logo

Comments (7)

honnibal avatar honnibal commented on May 17, 2024 1

Neural network is finally live :).

The new code is much simpler and better, and supports pickle — so serialisation is now no problem.

from thinc.

honnibal avatar honnibal commented on May 17, 2024

Ah, don't use this yet — that tagger script in particular has some bugs. I have a branch of spaCy where I'm doing the tagger experiments now. I'll let you know when it's pushed, shouldn't be long.

There's no serialization for the neural network yet. We'd accept a pull request if you want to add one? Use the cfile.CFile class. Code should be pretty similar to the AvgTron code.

from thinc.

geovedi avatar geovedi commented on May 17, 2024

OK, I think I have the workaround for the serialisation problem. Sorry, no PR as I'm still not 100% sure I've done it the correct way.

Took me a while to understand your NN datatypes. So I made this change:

diff --git a/thinc/neural/nn.pyx b/thinc/neural/nn.pyx
index ff706c4..fab1d15 100644
--- a/thinc/neural/nn.pyx
+++ b/thinc/neural/nn.pyx
@@ -362,11 +362,23 @@ cdef class NeuralNet:
             cdef int k = 0
             cdef key_t key
             cdef void* value
+            embeddings = []
             for i in range(self.c.embed.nr):
                 j = 0
+                table = []
                 while Map_iter(self.c.embed.weights[i], &j, &key, &value):
                     emb = <weight_t*>value
-                    yield key, [emb[k] for k in range(self.c.embed.lengths[i])]
+                    table.append((key, [emb[k] for k in range(self.c.embed.lengths[i])]))
+                embeddings.append(table)
+            return embeddings
+        def __set__(self, embeddings):
+            cdef float val
+            for i, table in enumerate(embeddings):
+                for key, value in table:
+                    emb = <float*>self.mem.alloc(self.c.embed.lengths[i], sizeof(emb[0]))
+                    for j, val in enumerate(value):
+                        emb[j] = val
+                    Map_set(self.mem, self.c.embed.weights[i], <key_t>key, emb)

     property nr_layer:
         def __get__(self):

then on the tagger.py, I use this

    def save(self, model_loc):
        # Pickle as a binary file
        pickle.dump((self, self.model.weights, self.model.embeddings),
            open(model_loc, 'wb'), -1)

    @classmethod
    def load(cls, loc):
        t, w, e = pickle.load(open(loc, 'rb'))
        widths = [t.ex.input_length] + [t.hidden_width] * t.depth + [len(t.classes)]
        t.model = NeuralNet(
            widths,
            embed=(t.ex.tables, t.ex.slots),
            rho=t.L2,
            eta=t.learn_rate,
            update_step=t.solver)
        t.model.weights = w
        t.model.embeddings = e
        return t

also I think i fixed the bug in that script

-    word_context = [-(i+1) for i in range(left_words)] + [0] + [i+1 for i in range(right_words)]
-    tag_context = [-i for i in range(left_tags)]
+    word_context = [-(i+1) for i in reversed(range(left_words))] + [0] + [i+1 for i in range(right_words)]
+    tag_context = [-(i+1) for i in reversed(range(left_tags))]

I used learn_rate=0.001 and chars_per_word=10 and it works like a charm!

from thinc.

honnibal avatar honnibal commented on May 17, 2024

Seems logical. Being able to read/write from the Python properties is useful. I've been doing that a bit on the Example class.

What sort of results are you getting with the tagger? Is it working better than the word-based model for you?

from thinc.

geovedi avatar geovedi commented on May 17, 2024

I only used word-based model, will try the char-based soon. It took >40 training iterations to get accuracy > 97% on my dataset. It does 600 line sentences/second tagging for tokenized data with average 15 words. I also still need to watch for the memory usage, but i'm guessing it's related to FeatureExtractor.strings.

from thinc.

geovedi avatar geovedi commented on May 17, 2024

apparently self.c.embed.momentum is also required.

diff --git a/thinc/neural/nn.pyx b/thinc/neural/nn.pyx
index ff706c4..ec914b0 100644
--- a/thinc/neural/nn.pyx
+++ b/thinc/neural/nn.pyx
@@ -362,11 +362,48 @@ cdef class NeuralNet:
             cdef int k = 0
             cdef key_t key
             cdef void* value
+            embeddings = []
             for i in range(self.c.embed.nr):
                 j = 0
+                table = []
                 while Map_iter(self.c.embed.weights[i], &j, &key, &value):
                     emb = <weight_t*>value
-                    yield key, [emb[k] for k in range(self.c.embed.lengths[i])]
+                    table.append((key, [emb[k] for k in range(self.c.embed.lengths[i])]))
+                embeddings.append(table)
+            return embeddings
+        def __set__(self, embeddings):
+            cdef float val
+            for i, table in enumerate(embeddings):
+                for key, value in table:
+                    emb = <float*>self.mem.alloc(self.c.embed.lengths[i], sizeof(emb[0]))
+                    for j, val in enumerate(value):
+                        emb[j] = val
+                    Map_set(self.mem, self.c.embed.weights[i], <key_t>key, emb)
+
+    property momentum:
+        def __get__(self):
+            cdef int i = 0
+            cdef int j = 0
+            cdef int k = 0
+            cdef key_t key
+            cdef void* value
+            momentum = []
+            for i in range(self.c.embed.nr):
+                j = 0
+                table = []
+                while Map_iter(self.c.embed.momentum[i], &j, &key, &value):
+                    mom = <weight_t*>value
+                    table.append((key, [mom[k] for k in range(self.c.embed.lengths[i])]))
+                momentum.append(table)
+            return momentum
+        def __set__(self, momentum):
+            cdef float val
+            for i, table in enumerate(momentum):
+                for key, value in table:
+                    mom = <float*>self.mem.alloc(self.c.embed.lengths[i] * 2, sizeof(mom[0]))
+                    for j, val in enumerate(value):
+                        mom[j] = val
+                    Map_set(self.mem, self.c.embed.momentum[i], <key_t>key, mom)

     property nr_layer:
         def __get__(self):

from thinc.

honnibal avatar honnibal commented on May 17, 2024

I think youre clipping the value in that __set__ method --- weight_t is double in recent versions, so be careful to use the typedef.

from thinc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.