Giter VIP home page Giter VIP logo

kann's Introduction

Getting Started

# acquire source code and compile
git clone https://github.com/attractivechaos/kann
cd kann; make  # or "make CBLAS=/path/to/openblas" for faster matrix multiplication
# learn unsigned addition (30000 samples; numbers within 10000)
seq 30000 | awk -v m=10000 '{a=int(m*rand());b=int(m*rand());print a,b,a+b}' \
  | ./examples/rnn-bit -m7 -o add.kan -
# apply the model (output 1138429, the sum of the two numbers)
echo 400958 737471 | ./examples/rnn-bit -Ai add.kan -

Introduction

KANN is a standalone and lightweight library in C for constructing and training small to medium artificial neural networks such as multi-layer perceptrons, convolutional neural networks and recurrent neural networks (including LSTM and GRU). It implements graph-based reverse-mode automatic differentiation and allows to build topologically complex neural networks with recurrence, shared weights and multiple inputs/outputs/costs. In comparison to mainstream deep learning frameworks such as TensorFlow, KANN is not as scalable, but it is close in flexibility, has a much smaller code base and only depends on the standard C library. In comparison to other lightweight frameworks such as tiny-dnn, KANN is still smaller, times faster and much more versatile, supporting RNN, VAE and non-standard neural networks that may fail these lightweight frameworks.

KANN could be potentially useful when you want to experiment small to medium neural networks in C/C++, to deploy no-so-large models without worrying about dependency hell, or to learn the internals of deep learning libraries.

Features

  • Flexible. Model construction by building a computational graph with operators. Support RNNs, weight sharing and multiple inputs/outputs.

  • Efficient. Reasonably optimized matrix product and convolution. Support mini-batching and effective multi-threading. Sometimes faster than mainstream frameworks in their CPU-only mode.

  • Small and portable. As of now, KANN has less than 4000 lines of code in four source code files, with no non-standard dependencies by default. Compatible with ANSI C compilers.

Limitations

  • CPU only. As such, KANN is not intended for training huge neural networks.

  • Lack of some common operators and architectures such as batch normalization.

  • Verbose APIs for training RNNs.

Installation

The KANN library is composed of four files: kautodiff.{h,c} and kann.{h,c}. You are encouraged to include these files in your source code tree. No installation is needed. To compile examples:

make

This generates a few executables in the examples directory.

Documentations

Comments in the header files briefly explain the APIs. More documentations can be found in the doc directory. Examples using the library are in the examples directory.

A tour of basic KANN APIs

Working with neural networks usually involves three steps: model construction, training and prediction. We can use layer APIs to build a simple model:

kann_t *ann;
kad_node_t *t;
t = kann_layer_input(784); // for MNIST
t = kad_relu(kann_layer_dense(t, 64)); // a 64-neuron hidden layer with ReLU activation
t = kann_layer_cost(t, 10, KANN_C_CEM); // softmax output + multi-class cross-entropy cost
ann = kann_new(t, 0);                   // compile the network and collate variables

For this simple feedforward model with one input and one output, we can train it with:

int n;     // number of training samples
float **x; // model input, of size n * 784
float **y; // model output, of size n * 10
// fill in x and y here and then call:
kann_train_fnn1(ann, 0.001f, 64, 25, 10, 0.1f, n, x, y);

We can save the model to a file with kann_save() or use it to classify a MNIST image:

float *x;       // of size 784
const float *y; // this will point to an array of size 10
// fill in x here and then call:
y = kann_apply1(ann, x);

Working with complex models requires to use low-level APIs. Please see 01user.md for details.

A complete example

This example learns to count the number of "1" bits in an integer (i.e. popcount):

// to compile and run: gcc -O2 this-prog.c kann.c kautodiff.c -lm && ./a.out
#include <stdlib.h>
#include <stdio.h>
#include "kann.h"

int main(void)
{
	int i, k, max_bit = 20, n_samples = 30000, mask = (1<<max_bit)-1, n_err, max_k;
	float **x, **y, max, *x1;
	kad_node_t *t;
	kann_t *ann;
	// construct an MLP with one hidden layers
	t = kann_layer_input(max_bit);
	t = kad_relu(kann_layer_dense(t, 64));
	t = kann_layer_cost(t, max_bit + 1, KANN_C_CEM); // output uses 1-hot encoding
	ann = kann_new(t, 0);
	// generate training data
	x = (float**)calloc(n_samples, sizeof(float*));
	y = (float**)calloc(n_samples, sizeof(float*));
	for (i = 0; i < n_samples; ++i) {
		int c, a = kad_rand(0) & (mask>>1);
		x[i] = (float*)calloc(max_bit, sizeof(float));
		y[i] = (float*)calloc(max_bit + 1, sizeof(float));
		for (k = c = 0; k < max_bit; ++k)
			x[i][k] = (float)(a>>k&1), c += (a>>k&1);
		y[i][c] = 1.0f; // c is ranged from 0 to max_bit inclusive
	}
	// train
	kann_train_fnn1(ann, 0.001f, 64, 50, 10, 0.1f, n_samples, x, y);
	// predict
	x1 = (float*)calloc(max_bit, sizeof(float));
	for (i = n_err = 0; i < n_samples; ++i) {
		int c, a = kad_rand(0) & (mask>>1); // generating a new number
		const float *y1;
		for (k = c = 0; k < max_bit; ++k)
			x1[k] = (float)(a>>k&1), c += (a>>k&1);
		y1 = kann_apply1(ann, x1);
		for (k = 0, max_k = -1, max = -1.0f; k <= max_bit; ++k) // find the max
			if (max < y1[k]) max = y1[k], max_k = k;
		if (max_k != c) ++n_err;
	}
	fprintf(stderr, "Test error rate: %.2f%%\n", 100.0 * n_err / n_samples);
	kann_delete(ann); // TODO: also to free x, y and x1
	return 0;
}

Benchmarks

  • First of all, this benchmark only evaluates relatively small networks, but in practice, it is huge networks on GPUs that really demonstrate the true power of mainstream deep learning frameworks. Please don't read too much into the table.

  • "Linux" has 48 cores on two Xeno E5-2697 CPUs at 2.7GHz. MKL, NumPy-1.12.0 and Theano-0.8.2 were installed with Conda; Keras-1.2.2 installed with pip. The official TensorFlow-1.0.0 wheel does not work with Cent OS 6 on this machine, due to glibc. This machine has one Tesla K40c GPU installed. We are using by CUDA-7.0 and cuDNN-4.0 for training on GPU.

  • "Mac" has 4 cores on a Core i7-3667U CPU at 2GHz. MKL, NumPy and Theano came with Conda, too. Keras-1.2.2 and Tensorflow-1.0.0 were installed with pip. On both machines, Tiny-DNN was acquired from github on March 1st, 2017.

  • mnist-mlp implements a simple MLP with one layer of 64 hidden neurons. mnist-cnn applies two convolutional layers with 32 3-by-3 kernels and ReLU activation, followed by 2-by-2 max pooling and one 128-neuron dense layer. mul100-rnn uses two GRUs of size 160. Both input and output are 2-D binary arrays of shape (14,2) -- 28 GRU operations for each of the 30000 training samples.

Task Framework Machine Device Real CPU Command line
mnist-mlp KANN+SSE Linux 1 CPU 31.3s 31.2s mlp -m20 -v0
Mac 1 CPU 27.1s 27.1s
KANN+BLAS Linux 1 CPU 18.8s 18.8s
Theano+Keras Linux 1 CPU 33.7s 33.2s keras/mlp.py -m20 -v0
4 CPUs 32.0s 121.3s
Mac 1 CPU 37.2s 35.2s
2 CPUs 32.9s 62.0s
TensorFlow Mac 1 CPU 33.4s 33.4s tensorflow/mlp.py -m20
2 CPUs 29.2s 50.6s tensorflow/mlp.py -m20 -t2
Tiny-dnn Linux 1 CPU 2m19s 2m18s tiny-dnn/mlp -m20
Tiny-dnn+AVX Linux 1 CPU 1m34s 1m33s
Mac 1 CPU 2m17s 2m16s
mnist-cnn KANN+SSE Linux 1 CPU 57m57s 57m53s mnist-cnn -v0 -m15
4 CPUs 19m09s 68m17s mnist-cnn -v0 -t4 -m15
Theano+Keras Linux 1 CPU 37m12s 37m09s keras/mlp.py -Cm15 -v0
4 CPUs 24m24s 97m22s
1 GPU 2m57s keras/mlp.py -Cm15 -v0
Tiny-dnn+AVX Linux 1 CPU 300m40s 300m23s tiny-dnn/mlp -Cm15
mul100-rnn KANN+SSE Linux 1 CPU 40m05s 40m02s rnn-bit -l2 -n160 -m25 -Nd0
4 CPUs 12m13s 44m40s rnn-bit -l2 -n160 -t4 -m25 -Nd0
KANN+BLAS Linux 1 CPU 22m58s 22m56s rnn-bit -l2 -n160 -m25 -Nd0
4 CPUs 8m18s 31m26s rnn-bit -l2 -n160 -t4 -m25 -Nd0
Theano+Keras Linux 1 CPU 27m30s 27m27s rnn-bit.py -l2 -n160 -m25
4 CPUs 19m52s 77m45s
  • In the single thread mode, Theano is about 50% faster than KANN probably due to efficient matrix multiplication (aka. sgemm) implemented in MKL. As is shown in a previous micro-benchmark, MKL/OpenBLAS can be twice as fast as the implementation in KANN.

  • KANN can optionally use the sgemm routine from a BLAS library (enabled by macro HAVE_CBLAS). Linked against OpenBLAS-0.2.19, KANN matches the single-thread performance of Theano on Mul100-rnn. KANN doesn't reduce convolution to matrix multiplication, so MNIST-cnn won't benefit from OpenBLAS. We observed that OpenBLAS is slower than the native KANN implementation when we use a mini-batch of size 1. The cause is unknown.

  • KANN's intra-batch multi-threading model is better than Theano+Keras. However, in its current form, this model probably won't get alone well with GPUs.

kann's People

Contributors

alperyilmaz avatar attractivechaos avatar gareins avatar lh3 avatar timgates42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kann's Issues

resnet

I want apply it to resent, so I write

kad_node_t *basic_block(kad_node_t *x, int channel)
{
	kad_node_t *y = kad_relu(kann_layer_conv2d(x, channel, 3, 3, 1, 1, 1, 1));
	y = kann_layer_conv2d(y, channel, 3, 3, 1, 1, 1, 1);
	return kad_relu(kad_add(x, y));
}

but it will arise segmentation fault

Image classification of cats, dogs, and pandas

I am attempting to use KANN to classify cats, dogs, and pandas. I have the data pre-processed such that it reads every image in as RGB byte, scales to float, and resizes to 32x32 (3 channel still). I store the image and labels as float **x, float **y, where the dimension of x is [nsamples][32x32x3] (formatted as a flattened array with rows = ncols * 3 laid out as RGB per pixel) and y is [nsamples][3] (for the 3 classes of cat/dog/panda). I split my data into a 75% train and send it into a modified version of the "Complete Example" provided:

int train_kann(float **x, int nrows, int ncols, int nbands, float **y, int nclasses, int n_samples)
{
	int max_bit, i;
	kad_node_t *t;
	kann_t *ann;

	max_bit = nrows * ncols * nbands;

	// construct an MLP with one hidden layers
	t = kann_layer_input(max_bit);
	t = kad_relu(kann_layer_dense(t, 64));
	t = kann_layer_cost(t, nclasses, KANN_C_CEM); // output uses 1-hot encoding
	ann = kann_new(t, 0);

	// train
	kann_train_fnn1(ann, 0.001f, 64, 50, 10, 0.1f, n_samples, x, y);

	return 0;
}

However I am getting some strange output from kann_train_fnn1. It is not reporting the class error in training or validation, so I am getting n_train_base == 0 and n_val == 0 (meaning no class error?).

epoch: 1; training cost: 13.2655; validation cost: 13.8155
epoch: 2; training cost: 13.8112; validation cost: 13.8155
epoch: 3; training cost: 13.8155; validation cost: 13.8155
epoch: 4; training cost: 13.8155; validation cost: 13.8155
(repeats these values for the remaining epochs)

I have a feeling this is an issue of how I set up my data and labels. Any help would be greatly appreciated.

More examples (classify text)

Hi, basically I'm hoping someone can point me to (or create) an example of a text sentiment conv based classifier using KANN. Doesn't need to be exact just something I can use as a base to start from. Something like this example from keras, even just the core model code would give me a starting point...

embedding_dim = 100
model = Sequential()
model.add(layers.Embedding(vocab_size, embedding_dim, input_length=maxlen))
model.add(layers.Conv1D(128, 5, activation='relu'))
model.add(layers.GlobalMaxPooling1D())
model.add(layers.Dense(10, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(optimizer='adam',
loss='binary_crossentropy',
metrics=['accuracy'])
model.summary()

(example from) https://realpython.com/python-keras-text-classification/

`// here is my guess... but it's pure guess work so may be nonsense :-)

kann_t *model_gen_classify(int n_h_flt, int n_h_fc)
{
int wordsize=10; // lets assume word embedding
int sentence=100; // length of each sentence to analyze
int wordgroup[]={3,4,5}; // group words in 3,4,5 word groups in conv layers.
float dropout = 0.2f;
kann_t *ann;
kad_node_t *t;
t = kad_feed(4, 1, 1, sentence, wordsize), t->ext_flag |= KANN_F_IN;
t = kad_relu(kann_layer_conv1d(t, n_h_flt, wordgroup[0], wordsize, 1, 1, 0, 0));
t = kad_relu(kann_layer_conv1d(t, n_h_flt, wordgroup[1], wordsize, 1, 1, 0, 0));
t = kad_relu(kann_layer_conv1d(t, n_h_flt, wordgroup[2], wordsize, 1, 1, 0, 0));
t = kann_layer_dropout(t, dropout);
t = kann_layer_dense(t, n_h_fc);
t = kad_relu(t);
ann = kann_new(kann_layer_cost(t,1, KANN_C_CEB), 0);
return ann;
}
`

Possible Conv1D and Max1D Issue

Hi there.
I am dealing with 1D signal and hence I have modified the mnist-cnn.c example and changed the model like shown below:

kad_node_t *t;
t = kann_layer_input(200);
t = kad_relu(kann_layer_conv1d(t, n_h_flt, 5, 1, 2));
t = kad_max1d(t, 2, 1, 1);
t = kad_relu(kann_layer_conv1d(t, n_h_flt, 5, 1, 2));
t = kad_max1d(t, 2, 1, 1);
t = kann_layer_dense(t, 200);
t = kad_relu(t);
t = kann_layer_dense(t, 100);
t = kad_relu(t);
t = kann_layer_dense(t, 50);
t = kad_relu(t);
ann = kann_new(kann_layer_cost(t, 2, KANN_C_CEB), 0);

I added the padding so as to keep the length. My input is 200 in length and output is a simple true/false.
However, when I compile and run training on this, the terminal immediately shows "Segmentation fault: 11." I believe the model to be correct, so I suspect the issue is from 1d-related function?

Thanks in advance!

Transfrom script for dataset

Hi,
I could download dataset you provided. Can I get your transformation script so that we can properly convert some dataset to load to kann? (There is no such script in the kann repository)

Thank you

License Type

Hello,

I am looking for a lightweight and standalone framework for deep learning, and this one looks like it could match my needs.
What licensing type is covering the source code?
MIT? BSD-3?

Thanks,
Mathieu

Adding CUDA support ?

Hi,

Is there any plan to add CUDA support in the near future ? It will be very useful if we want to train some medium size network. It will also be very attractive for platforms like Tegra TK1, etc. Libraries like caffe and mxnet rely on too many libraries. Sometimes it will consume too much time to resolve these libraries conflicts during installation.

RNN classification example

When classify a sequence, we would like the network to have one output, instead of a sequence of output. According to 01user.md to classify a sequence kad_avg was mentioned. I tried this on mnist. It works but
i am not sure how to train such network. During training process we don't even know output values other then last one. In this line memcpy(&y[k][b * d->n_out], d->y[s], d->n_out * sizeof(float)); each y in sequence of output has same value d->y[s] which looks strange.

#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include "kann_extra/kann_data.h"
#include "kann.h"

typedef struct {
  int n_in, n_out, ulen, n;
  float **x, **y;
} train_data;

static void train(kann_t *ann, train_data *d, float lr, int mini_size, int max_epoch, const char *fn, int n_threads)
{
  float **x, **y, *r, best_cost = 1e30f;
  int epoch, j, n_var, *shuf;
  kann_t *ua;

  n_var = kann_size_var(ann);
  r = (float*)calloc(n_var, sizeof(float));
  x = (float**)malloc(d->ulen * sizeof(float*));
  y = (float**)malloc(d->ulen * sizeof(float*));
  for (j = 0; j < d->ulen; ++j) {
    x[j] = (float*)calloc(mini_size * d->n_in, sizeof(float));
    y[j] = (float*)calloc(mini_size * d->n_out, sizeof(float));
  }
  shuf = (int*)calloc(d->n, sizeof(int));

  ua = kann_unroll(ann, d->ulen);
  kann_set_batch_size(ua, mini_size);
  kann_mt(ua, n_threads, mini_size);
  kann_feed_bind(ua, KANN_F_IN,    0, x);
  kann_feed_bind(ua, KANN_F_TRUTH, 0, y);
  kann_switch(ua, 1);
  for (epoch = 0; epoch < max_epoch; ++epoch) {
    kann_shuffle(d->n, shuf);
    double cost = 0.0;
    int tot = 0, tot_base = 0, n_cerr = 0;
    for (j = 0; j < d->n - mini_size; j += mini_size) {
      int b, k;
      for (k = 0; k < d->ulen; ++k) {
        for (b = 0; b < mini_size; ++b) {
          int s = shuf[j + b];
          memcpy(&x[k][b * d->n_in], &d->x[s][k * d->n_in], d->n_in * sizeof(float));
          memcpy(&y[k][b * d->n_out], d->y[s], d->n_out * sizeof(float));
        }
      }
      cost += kann_cost(ua, 0, 1) * d->ulen * mini_size;
      n_cerr += kann_class_error(ua, &k);
      tot_base += k;
      //kad_check_grad(ua->n, ua->v, ua->n-1);
      kann_RMSprop(n_var, lr, 0, 0.9f, ua->g, ua->x, r);
      tot += d->ulen * mini_size;
    }
    if (cost < best_cost) {
      best_cost = cost;
      if (fn) kann_save(fn, ann);
    }
    fprintf(stderr, "epoch: %d; cost: %g (class error: %.2f%%)\n", epoch+1, cost / tot, 100.0f * n_cerr / tot_base);
  }

  kann_delete_unrolled(ua);

  for (j = 0; j < d->ulen; ++j) {
    free(y[j]); free(x[j]);
  }
  free(y); free(x); free(r); free(shuf);
}

static train_data* create_train_data(kann_t *ann, kann_data_t *x, kann_data_t *y)
{
  train_data *d;
  d = (train_data*)malloc(sizeof(*d));
  assert(d);
  assert(x->n_row == y->n_row);
  d->x = x->x;
  d->y = y->x;
  d->ulen = 28; // 28x28
  d->n = x->n_row;
  d->n_in = kann_dim_in(ann);
  d->n_out = kann_dim_out(ann);
  return d;
}

int main(int argc, char *argv[])
{
  kann_t *ann;
  kann_data_t *x, *y;
  char *fn_in = 0, *fn_out = 0;
  int c, i, mini_size = 64, max_epoch = 50, seed = 84, n_h_layers = 1, n_h_neurons = 64, norm = 1, n_threads = 1;
  float lr = 0.001f, dropout = 0.2f;

  while ((c = getopt(argc, argv, "i:o:m:l:n:d:s:t:N")) >= 0) {
    if (c == 'i') fn_in = optarg;
    else if (c == 'o') fn_out = optarg;
    else if (c == 'm') max_epoch = atoi(optarg);
    else if (c == 'l') n_h_layers = atoi(optarg);
    else if (c == 'n') n_h_neurons = atoi(optarg);
    else if (c == 'd') dropout = atof(optarg);
    else if (c == 's') seed = atoi(optarg);
    else if (c == 't') n_threads = atoi(optarg);
    else if (c == 'N') norm = 0;
  }

  if (argc - optind == 0 || (argc - optind == 1 && fn_in == 0)) {
    FILE *fp = stdout;
    fprintf(fp, "Usage: mnist-cnn [-i model] [-o model] [-t nThreads] <x.knd> [y.knd]\n");
    return 1;
  }

  kad_trap_fe();
  kann_srand(seed);
  if (fn_in) {
    ann = kann_load(fn_in);
  } else {
    kad_node_t *t;
    int rnn_flag = KANN_RNN_VAR_H0;
    if (norm) rnn_flag |= KANN_RNN_NORM;
    t = kann_layer_input(28); // 28x28
    for (i = 0; i < n_h_layers; ++i) {
      t = kann_layer_gru(t, n_h_neurons, rnn_flag);
      t = kann_layer_dropout(t, dropout);
    }
    t = kad_avg(1, &t);
    ann = kann_new(kann_layer_cost(t, 10, KANN_C_CEB), 0);
  }

  x = kann_data_read(argv[optind]);
  assert(x->n_col == 28 * 28);
  y = argc - optind >= 2? kann_data_read(argv[optind+1]) : 0;

  if (y) { // training
    assert(y->n_col == 10);
    if (n_threads > 1) kann_mt(ann, n_threads, mini_size);
    train_data *d;
    d = create_train_data(ann, x, y);
    train(ann, d, lr, mini_size, max_epoch, fn_out, n_threads);
    free(d);
    kann_data_free(y);
  } else { // applying
    int i, j, k, n_out;
    kann_switch(ann, 0);
    n_out = kann_dim_out(ann);
    assert(n_out == 10);
    for (i = 0; i < x->n_row; ++i) {
      const float *y;
      kann_rnn_start(ann);
      for(k = 0; k < 28; ++k) {
        float x1[28];
        memcpy(x1, &x->x[i][k * 28], sizeof(x1));
        y = kann_apply1(ann, x1);
      }
      if (x->rname) printf("%s\t", x->rname[i]);
      for (j = 0; j < n_out; ++j) {
        if (j) putchar('\t');
        printf("%.3g", y[j] + 1.0f - 1.0f);
      }
      putchar('\n');
      kann_rnn_end(ann);
    }
  }

  kann_data_free(x);
  kann_delete(ann);
  return 0;
}

It would be great to see any simple rnn classification example.

kann_layer_linear

I have not been able to find kann_layer_linear definition in the source code.

It is from the example (A complete example) given on the page.

t = kad_relu(kann_layer_linear(t, 64));

It's not that I need it right now, but, it is the first thing I tried running.

Why accuracy does not increase anymore at specific epoch?

Given MNIST CNN example, validation cost does not increase from approximately epoch 11. So, running more epochs is useless since validation cost will be only increasing or decreasing in near minimum cost value which is the validation cost of epoch 11. Could you explain why this happens and how to solve it? (I also tested using a variety of structures of CNN, but there was no big difference.)

Batch processing of large dataset

Is it currently possible to process a large dataset in batches rather than loading it all into memory? Possibly it's simply a matter of calling kann_train_fnn1 with each batch?
Any clues most welcome.

ChrisP.

Can we define the neuron number for each layer?

Hi,

From the code for MLP, I can see that I can set the hidden layer but all the neuron numbers for each layer are the same. Are any methods to define a different number of neurons for different layers?

Cheers,
Travis

Can KANN train an embedding layer?

Or should I just use word2vec instead as a pre-processing step?
Again sorry for stupid questions, happy to read more docs if you can point me at them :-)
Thanks.

Questions about kann_apply1

Hi, I am a fresh guy in C++. My code had a bug when I used the libaray.
this is my input and label:

float xx[1][100][400];
float yy[1][100][3];
float **x = (float **)xx;
float **y = (float **)yy;

and then train the net:

kann_train_fnn1(ann, lr, batch_size, epoch, max_drop_streak, frac_val, 1, x, y);

when I tried to test the net, it's something wrong:

auto y1 = kann_apply1(ann, x->x[0]); // It caused a error here.

BTW, I didn't save the model between executing kann_train_fnn1() and kann_apply1().

Getting Started example not working on macOS 11.3, Apple M1 chip

Hello, people.

I just wanted to let you know that at least the Getting Started example doesn't seem to be working on macOS 11.3, Apple M1 chip, whereas it works correctly on my old machine, which is an Intel Macbook Pro running macOS 10.15.7.

If I print the size of int, float, and double types, I get the same results on both machines (4, 4, and 8).

This is the output when I try running the Getting Started example on BigSur:

dariosanfilippo@Darios-MBP kann % seq 30000 | awk -v m=10000 '{a=int(m*rand());b=int(m*rand());print a,b,a+b}' \
  | ./examples/rnn-bit -m7 -o add.kan -
epoch: 1; cost: 0.0587254 (class error: 2.59%)
epoch: 2; cost: 0.000135723 (class error: 0.00%)
epoch: 3; cost: 7.7552e-05 (class error: 0.00%)
epoch: 4; cost: 4.28615e-05 (class error: 0.00%)
epoch: 5; cost: 4.24452e-05 (class error: 0.00%)
epoch: 6; cost: 2.26656e-05 (class error: 0.00%)
epoch: 7; cost: 1.84629e-05 (class error: 0.00%)
dariosanfilippo@Darios-MBP kann % echo 400958 737471 | ./examples/rnn-bit -Ai add.kan -
1924146487037

Would you know what the issue might be?

Thank you so much for your help.

Dario

Exporting weights?

I am hoping to train a KANN model with a genetic algorithm, but in order to do this I will need to be able to get an array of network weights, and I did not see a way of doing this in the documentation. I could be missing something obvious though.

xor example

Hi, here is my code:

// gcc xor.c ../kann.c ../kautodiff.c -I. -I../ -lm && ./a.out

#include "kann.h"

static kann_t *model_gen(int n_in, int n_out, int loss_type, int n_h_layers, int n_h_neurons)
{
  int i;
  kad_node_t *t;
  t = kann_layer_input(n_in);
  for (i = 0; i < n_h_layers; ++i)
    t = kad_relu(kann_layer_dense(t, n_h_neurons));
  return kann_new(kann_layer_cost(t, n_out, loss_type), 0);
}

static void train(kann_t *ann)
{
  enum { n = 4 };

  float *x[n] = {
    (float[]){ 0, 0, },
    (float[]){ 0, 1, },
    (float[]){ 1, 0, },
    (float[]){ 1, 1, },
  };

  float *y[n] = {
    (float[]){ 0, },
    (float[]){ 1, },
    (float[]){ 1, },
    (float[]){ 0, },
  };

  kann_train_fnn1(ann, 0.001f, 64, 10000, 10, 0.1f, n, x, y);
}

void predict(kann_t *ann)
{
  printf("%f | %f\n", *kann_apply1(ann, (float[]){ 0, 0 }), 0.0f);
  printf("%f | %f\n", *kann_apply1(ann, (float[]){ 0, 1 }), 1.0f);
  printf("%f | %f\n", *kann_apply1(ann, (float[]){ 1, 0 }), 1.0f);
  printf("%f | %f\n", *kann_apply1(ann, (float[]){ 1, 1 }), 0.0f);
}

int main(int argc, char *argv[])
{
  kann_t *ann = model_gen(2, 1, KANN_C_CEB, 1, 5);
  train(ann);
  predict(ann);
  kann_delete(ann);

  return 0;
}

Program output:

0.000902 | 0.000000
0.999955 | 1.000000
0.999937 | 1.000000
0.000029 | 0.000000

As far as i know xor requires 3 neurons in hidden layer not 5. Here is keras example:

model = Sequential()
model.add(Dense(3, input_dim=2, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(loss='mean_squared_error', optimizer='adam', metrics=['binary_accuracy'])
model.fit(training_data, target_data, epochs=10000, verbose=2)
print model.predict(training_data)
[[0.0073216]
 [0.9848797]
 [0.9848797]
 [0.0067511]]

Why 5 neurons ?

Model inference on ARM M4F

Hi there.
Great work on the project! I have made successful progress with mac os. But I was wondering if the trained model, say minst-cnn.kan, could be transferred to ARM M4F chip for inference? There would definitely be data input and output processing. But apart from that, is it possible to use the trained model on ARM? Thanks in advance!

mnist-cnn example fail assert on training

I'm not sure if I'm missing something but I tried the `mnist-cnn example in the same ways as the README, but the training fails an assert on line 51:

assert(x->n_col == 28 * 28);

I printed x->n_col and the result is 0. I'm not sure if the problem is the code or in the data (I got the data from this repo as well, as stated in the examples readme file).

I tried removing the assert but it naturally just segfaults elsewhere.

The mlp example works fine, so I assume it isn't the data.

What does "va_start" mean in kann ?

IN kann.c line 521:
va_start(ap, n_d); for (i = 0; i < n_d; ++i) d[i] = va_arg(ap, int); va_end(ap)
my question is what is this uesed for?

Why isn't KANN scalable?

Hi,
I was wondering about the statement in the README - "KANN is not as scalable, but it is close in flexibility, has a much smaller code base and only depends on the standard C library.".

Why isn't KANN scalable and why isn't it it suitable for training deeper networks?

A time series data example for LSTM

I have inputs (dimension 2) and outputs (1) sequence like below all numbers are normalized ( -1 to 1 )

below is copied 2 samples from training data

(-0.70,-0.23) (-0.70,-0.23) (-0.70,-0.23) (-0.70,-0.23) (-0.70,-0.23) 0.03
(-0.61,-0.26) (-0.61,-0.26) (-0.61,-0.26) (-0.61,-0.26) (-0.61,-0.26) -0.20

here last column is a output vector of size 1 and before that we have 5 unrolling pairs of data points, Can you pls point me how to write training routine for above example. I didn't understand much from your rnn-bit example is quite different use case and textgen is difficult to understand.

I tried like below, but i dont think its KANN_F_TRUTH array is correctly populated

for (int j = 0; j < num_rows - batch_size_; j += batch_size_) {

			int k;
			for (k = 0; k < ulen; ++k) {
				for (int b = 0; b < batch_size_; ++b) {
					int s = j + b;/// shuf[j + b];
					for (int i = 0; i < input_; ++i)
						x[k][b*input_ + i] = data.x[s][k][i];
					
					for (int i = 0; i < output_; ++i)
						y[k][b*output_ + i] = data.y[s][i]; // <--------------------------- some fix required here
				}
			}
			
				
			cost += kann_cost(ua, 0, 1) * ulen * batch_size_;
			n_cerr += kann_class_error(ua, &k);
			tot_base += k;
			//kad_check_grad(ua->n, ua->v, ua->n-1);
			kann_RMSprop(n_var, error, NULL, 0.9f, ua->g, ua->x, r);
			tot += ulen * batch_size_;
			
		}

Custom loss function example using kad_op functions

Hi,

I'm trying to implement a custom loss function with a simple MLP.
Is there an example of using the kad_op functions to accomplish this so that I benefit from automatic differentiation?
I don't want to explicitly write the backward computation as is the case for the currently implemented loss functions (mse, ce, etc).

Or is this approach not feasible (for memory consumption reasons) as it will require the computation and storage of the gradients for each operation in the loss function?

I'd greatly appreciate any help/feedback/example!

Thanks!

Format of model file

I would like to use a model that is pre-trained on Keras or Tensorflow, and run it on kann.
I am trying to look for the file format that the weights needs to be saved in to load it from with kann.
Please advise.

Convolutional recurrent neural network

I want to combine convolutional layer with recurrent one. This code is based on #19:

    kad_node_t *t;
    int rnn_flag = KANN_RNN_VAR_H0;
    if (norm) rnn_flag |= KANN_RNN_NORM;
    t = kad_feed(3, 1, 1, 28), t->ext_flag |= KANN_F_IN;
    t = kad_relu(kann_layer_conv1d(t, 32, 3, 1, 0)); // 3 kernel; 1 stride; 0 padding
    t = kann_layer_dropout(t, dropout);
    t = kad_max1d(t, 2, 2, 0); // 2 kernel; 2 stride; 0 padding
    for (i = 0; i < n_h_layers; ++i) {
      t = kann_layer_gru(t, n_h_neurons, rnn_flag);
      t = kann_layer_dropout(t, dropout);
    }
    t = kad_select(1, &t, -1);
    ann = kann_new(kann_layer_cost(t, 10, KANN_C_CEB), 0);
    kad_print_graph(stdout, ann->n, ann->v);

It works:

./mnist-crnn -i mnist-crnn.kan kann-data/mnist-test-x.knd | kann-data/mnist-eval.pl
Error rate: 1.19%

Questions:

  • i stumbled across same problem #6 at first, then i replaced kann_layer_input to kad_feed(3, 1, 1, 28) to make it work, but numbers 1, 1 still looks like magic to me... Are they correct ?

  • does backprop work correctly for conv1d on unrolled rnn ?

Whole code:

#include <unistd.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
#include "kann_extra/kann_data.h"
#include "kann.h"

typedef struct {
  int n_in, n_out, ulen, n;
  float **x, **y;
} train_data;

static void train(kann_t *ann, train_data *d, float lr, int mini_size, int max_epoch, const char *fn, int n_threads)
{
  float **x, **y, *r, best_cost = 1e30f;
  int epoch, j, n_var, *shuf;
  kann_t *ua;

  n_var = kann_size_var(ann);
  r = (float*)calloc(n_var, sizeof(float));
  x = (float**)malloc(d->ulen * sizeof(float*));
  y = (float**)malloc(1 * sizeof(float*));
  for (j = 0; j < d->ulen; ++j) {
    x[j] = (float*)calloc(mini_size * d->n_in, sizeof(float));
  }
  y[0] = (float*)calloc(mini_size * d->n_out, sizeof(float));
  shuf = (int*)calloc(d->n, sizeof(int));

  ua = kann_unroll(ann, d->ulen);
  kann_set_batch_size(ua, mini_size);
  kann_mt(ua, n_threads, mini_size);
  kann_feed_bind(ua, KANN_F_IN,    0, x);
  kann_feed_bind(ua, KANN_F_TRUTH, 0, y);
  kann_switch(ua, 1);
  for (epoch = 0; epoch < max_epoch; ++epoch) {
    kann_shuffle(d->n, shuf);
    double cost = 0.0;
    int tot = 0, tot_base = 0, n_cerr = 0;
    for (j = 0; j < d->n - mini_size; j += mini_size) {
      int b, k;
      for (b = 0; b < mini_size; ++b) {
        int s = shuf[j + b];
        for (k = 0; k < d->ulen; ++k) {
          memcpy(&x[k][b * d->n_in], &d->x[s][k * d->n_in], d->n_in * sizeof(float));
        }
        memcpy(&y[0][b * d->n_out], d->y[s], d->n_out * sizeof(float));
      }
      cost += kann_cost(ua, 0, 1) * d->ulen * mini_size;
      n_cerr += kann_class_error(ua, &k);
      tot_base += k;
      //kad_check_grad(ua->n, ua->v, ua->n-1);
      kann_RMSprop(n_var, lr, 0, 0.9f, ua->g, ua->x, r);
      tot += d->ulen * mini_size;
    }
    if (cost < best_cost) {
      best_cost = cost;
      if (fn) kann_save(fn, ann);
    }
    fprintf(stderr, "epoch: %d; cost: %g (class error: %.2f%%)\n", epoch+1, cost / tot, 100.0f * n_cerr / tot_base);
  }

  kann_delete_unrolled(ua);

  for (j = 0; j < d->ulen; ++j) {
    free(x[j]);
  }
  free(y[0]); free(y); free(x); free(r); free(shuf);
}

static train_data* create_train_data(kann_t *ann, kann_data_t *x, kann_data_t *y)
{
  train_data *d;
  d = (train_data*)malloc(sizeof(*d));
  assert(d);
  assert(x->n_row == y->n_row);
  d->x = x->x;
  d->y = y->x;
  d->ulen = 28; // 28x28
  d->n = x->n_row;
  d->n_in = kann_dim_in(ann);
  d->n_out = kann_dim_out(ann);
  return d;
}

int main(int argc, char *argv[])
{
  kann_t *ann;
  kann_data_t *x, *y;
  char *fn_in = 0, *fn_out = 0;
  int c, i, mini_size = 64, max_epoch = 50, seed = 84, n_h_layers = 1, n_h_neurons = 64, norm = 1, n_h_flt = 32, n_threads = 1;
  float lr = 0.001f, dropout = 0.2f;

  while ((c = getopt(argc, argv, "i:o:m:l:n:d:s:t:N")) >= 0) {
    if (c == 'i') fn_in = optarg;
    else if (c == 'o') fn_out = optarg;
    else if (c == 'm') max_epoch = atoi(optarg);
    else if (c == 'l') n_h_layers = atoi(optarg);
    else if (c == 'n') n_h_neurons = atoi(optarg);
    else if (c == 'd') dropout = atof(optarg);
    else if (c == 's') seed = atoi(optarg);
    else if (c == 't') n_threads = atoi(optarg);
    else if (c == 'N') norm = 0;
  }

  if (argc - optind == 0 || (argc - optind == 1 && fn_in == 0)) {
    FILE *fp = stdout;
    fprintf(fp, "Usage: mnist-cnn [-i model] [-o model] [-t nThreads] <x.knd> [y.knd]\n");
    return 1;
  }

  kad_trap_fe();
  kann_srand(seed);
  if (fn_in) {
    ann = kann_load(fn_in);
  } else {
    kad_node_t *t;
    int rnn_flag = KANN_RNN_VAR_H0;
    if (norm) rnn_flag |= KANN_RNN_NORM;
    t = kad_feed(3, 1, 1, 28), t->ext_flag |= KANN_F_IN;
    t = kad_relu(kann_layer_conv1d(t, 32, 3, 1, 0)); // 3 kernel; 1 stride; 0 padding
    t = kann_layer_dropout(t, dropout);
    t = kad_max1d(t, 2, 2, 0); // 2 kernel; 2 stride; 0 padding
    for (i = 0; i < n_h_layers; ++i) {
      t = kann_layer_gru(t, n_h_neurons, rnn_flag);
      t = kann_layer_dropout(t, dropout);
    }
    t = kad_select(1, &t, -1);
    ann = kann_new(kann_layer_cost(t, 10, KANN_C_CEB), 0);
    kad_print_graph(stdout, ann->n, ann->v);
  }

  x = kann_data_read(argv[optind]);
  assert(x->n_col == 28 * 28);
  y = argc - optind >= 2? kann_data_read(argv[optind+1]) : 0;

  if (y) { // training
    assert(y->n_col == 10);
    if (n_threads > 1) kann_mt(ann, n_threads, mini_size);
    train_data *d;
    d = create_train_data(ann, x, y);
    train(ann, d, lr, mini_size, max_epoch, fn_out, n_threads);
    free(d);
    kann_data_free(y);
  } else { // applying
    int i, j, k, n_out;
    kann_switch(ann, 0);
    n_out = kann_dim_out(ann);
    assert(n_out == 10);
    for (i = 0; i < x->n_row; ++i) {
      const float *y;
      kann_rnn_start(ann);
      for(k = 0; k < 28; ++k) {
        float x1[28];
        memcpy(x1, &x->x[i][k * 28], sizeof(x1));
        y = kann_apply1(ann, x1);
      }
      if (x->rname) printf("%s\t", x->rname[i]);
      for (j = 0; j < n_out; ++j) {
        if (j) putchar('\t');
        printf("%.3g", y[j] + 1.0f - 1.0f);
      }
      putchar('\n');
      kann_rnn_end(ann);
    }
  }

  kann_data_free(x);
  kann_delete(ann);
  return 0;
}

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.