Results of mfcc in main.c is different from that in python with the same setting. about nnom HOT 13 OPEN

liziru commented on May 26, 2024

Results of mfcc in main.c is different from that in python with the same setting.

from nnom.

Comments (13)

liziru commented on May 26, 2024

mfcc setting in python:
# get the mfcc of noisy voice mfcc_feat = mfcc(sig, sample_rate, winlen=0.032, winstep=0.032 / 2, numcep=20, nfilt=20, nfft=512, lowfreq=20, highfreq=8000, winfunc=np.hanning, ceplifter=0, preemph=0, appendEnergy=True)
mfcc setting in main.c:
// 20 features, 0 offset, 20 bands, 512fft, 0 preempha, attached_energy_to_band0 mfcc_t *mfcc = mfcc_create(NUM_FEATURES, 0, NUM_FEATURES, 512, 0, true);
#define SAMP_FREQ 16000 #define MEL_LOW_FREQ 20 #define MEL_HIGH_FREQ 8000

from nnom.

majianjia commented on May 26, 2024

Hi @liziru

Please use some true number to test both functions. All zero simply means there is no energy in each band so the first band will give to its minimum cause by Log(0). With true signal (or just some random noise), you might plot both or use some metric like MSE or cosine similarity to compare the output of those 2 signals.

Since we use the option appendEnergy=True and in main.c mfcc_create(..., true)., the first band will represent the energy of the FFT. I believe the python is using 64bit float arithmetic but 32bit float in C.
So this might be the cause of the different. anyway, both -84 and -36 are their minimum number.
In both python and c code, they are saturated by 2^3 = 8

nnom/examples/rnn-denoise/main_arm.c

Line 210 in ec3afac

quantize_data(nn_features, nn_features_q7, NUM_FEATURES+20, 3);

nnom/examples/rnn-denoise/main.py

Line 269 in ec3afac

x_train = normalize(x_train, 3, quantize=False)

They will both saturated to -8 after these 2 quantisation/saturation. So this energy different will not affect anything.

from nnom.

liziru commented on May 26, 2024

@majianjia
Thank you for your reply.
I did the test following your advice and it really works.
However, I found another two problems.
First, with the sample input(0-512, 512 samples), the result of the python code is a little different from that of c code. As you said, python using 64bit float arithmetic but 32bit float in C may lead to this problem.
before being saturated to -8, python and c results with the same input:
-8.0303,29.3225,6.7850,7.4641,3.6157,4.1926,2.4651,2.8310,1.8338,2.0457,1.3851,1.5110,1.0347,1.0887,0.7333,0.7338,0.4813,0.4191,0.2408,0.1338, -4.5886,30.0869,7.2367,7.8549,3.8652,4.3900,2.5817,2.9530,1.8638,2.1015,1.4186,1.5603,1.0761,1.1739,0.7649,0.7836,0.5086,0.5034,0.2978,0.1628

Second, with the sample input, the result of nnom inference is a little different from results of tf model.predict api.
input feats:
-8.0303, 29.3225, 6.7850, 7.4641, 3.6157, 4.1926, 2.4651, 2.8310, 1.8338, 2.0457, 1.3851, 1.5110, 1.0347, 1.0887, 0.7333, 0.7338, 0.4813, 0.4191, 0.2408, 0.1338, -8.0303, 29.3225, 6.7850, 7.4641, 3.6157, 4.1926, 2.4651, 2.8310, 1.8338, 2.0457, -8.0303, 29.3225, 6.7850, 7.4641, 3.6157, 4.1926, 2.4651, 2.8310, 1.8338, 2.0457
results of nnom infer and tf api infer:
0.4724,0.7480,0.8504,0.8583,0.8583,0.8583,0.8346,0.8425,0.8110,0.8346,0.8110,0.8268,0.8268,0.8031,0.8268,0.8268,0.8346,0.8425,0.8031,0.8346, 0.9275,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000,1.0000
The first row is results of nnom infer. The secocnd row is results of tf api infer. Quanifiztions of inference and feats can leads to some loss, but the loss is a little big.
Is the loss acceptable? Do you have some advice to improve the loss?
Looking forward to your reply soon!

from nnom.

liziru commented on May 26, 2024

As a footnote，my nn model is made up of four full-connected layers, so there is no hidden information like RNN. And, the result distributions of two inference engines is almost same.

from nnom.

majianjia commented on May 26, 2024

the 8 bit resolution might not good for regression application. Please also try to this if It is related. #104

I will check in detail later when I am back.

from nnom.

liziru commented on May 26, 2024

the 8 bit resolution might not good for regression application. Please also try to this if It is related. #104

I will check in detail later when I am back.

Thank you very much. I checked my code and 'NNOM_TRUNCATE' was already defined in nnom_port.h as you advised in #104, but I didnot do the following step because i think this ops will round the results. change this line #define NNOM_ROUND(out_shift) ( (0x1u << out_shift) >> 1 ) to #define NNOM_ROUND(out_shift) ((q31_t)( (0x1u << out_shift) >> 1 )) fix the issue. But how about the arm version? still not working.

Sadly, the loss is not changed.

from nnom.

majianjia commented on May 26, 2024

Round or floor don't actually change the result because it only affects the result by 0.5/128.
In the denoise example, the output of normal gains are like this, with column represent the gain index (1~20) and rows represent the timestamp. You can see that they will reach 1 here after the hard_sigmoid() as the final output layer.

Did you try to use Conv or RNN? They might be different, dense is not working well when the 2 vectors is hugely in size (e.g. 1000 units input vs 2 units output).

from nnom.

liziru commented on May 26, 2024

Round or floor don't actually change the result because it only affects the result by 0.5/128.
In the denoise example, the output of normal gains are like this, with column represent the gain index (1~20) and rows represent the timestamp. You can see that they will reach 1 here after the hard_sigmoid() as the final output layer.

Did you try to use Conv or RNN? They might be different, dense is not working well when the 2 vectors is hugely in size (e.g. 1000 units input vs 2 units output).

Thank you for your reply.
I have to use denses(full-connected) layer in rnn-denoise project due to some limits. Input size and output size is all 20, so dense should be ok. I use four dense layers with relu activations besides the last layer with sigmoid rather than hard-sigmoid.
I can understand the gains table provided in the picture, but the loss of two inference engines exists truly which makes me sad and confused.

from nnom.

majianjia commented on May 26, 2024

The RNN currently runs with 8bit input/output data and 16bit memory (state) data, which might keep more info.
I am not sure what is the cause of the loss you met. Would you be able to validate the model for more data?
You may also try Conv-based network, TCN (consist of Conv with dilation>1) is completely fine using NNoM. Which can outperform RNN type model.

from nnom.

liziru commented on May 26, 2024

The RNN currently runs with 8bit input/output data and 16bit memory (state) data, which might keep more info.
I am not sure what is the cause of the loss you met. Would you be able to validate the model for more data?
You may also try Conv-based network, TCN (consist of Conv with dilation>1) is completely fine using NNoM. Which can outperform RNN type model.

OK, I think i am close to the answer. I found the weights.h file is different because of different x_train, i have to say the x_train is generated randomly, which is used to compare with nnom infer in c code with x_train as the same input. As a result, the result of nnorm infer with different weight.h file generated with different x_train.
# now generate the NNoM model. generate_model(model, x_train[:2048 * 4], name='weights.h')
weight.h file comparsions generated with different x_train.

So, the loss of two infer engines maybe caused by setting in weight.h.
However, NNOM_TRUNCATE' was already defined in nnom_port.h as you advised in #104, which should mean i am using float computation now.

from nnom.

liziru commented on May 26, 2024

@majianjia After i set x_train in 'generate_model' to x_train of training as you did in main.py example,
the result of nnom infer changes a lot and is still hugely different from that of 'tf predict' api infer.
the first row is generated by nnom infer.
0.5827,0.5197,0.4409,0.3937,0.2992,0.1654,0.1102,0.1181,0.1260,0.1417,0.1575,0.1575,0.1654,0.1732,0.1811,0.1969,0.2126,0.2126,0.2047,0.2047, 0.9952,0.9999,0.9998,0.9994,0.9898,0.9204,0.6904,0.6838,0.7321,0.8566,0.8668,0.8191,0.7994,0.8683,0.8680,0.9044,0.9288,0.9375,0.9346,0.9124

from nnom.

majianjia commented on May 26, 2024

Forget about NNOM_TRUNCATE since you don't use RNN layers. Also, this macro is not about using the floating number, nnom currently only running on 8bit fixed-point data.

For the calibration step generate_model(model, x_train[:2048 * 4], name='weights.h'), you should use real data, can be training or testing data but not random number. And the data should covert the most cases possible, you can enlarge the size of x_train[:2048 * 4] to see if it helps.
The calibration step will generate those number in you screenshot is determined by the output of each layer. Therefore to provide this q format to contains the maximum/minimum of the layer/weights. By using different calibration dataset, these bits/shift are supposed to be changed. However, calibrating with different real signals will bring little changes while with fake signals can change quite a lot.

I will suggest you run the example first. Once it is successful, then modify the tf model and see if it still work.

from nnom.

liziru commented on May 26, 2024

Forget about NNOM_TRUNCATE since you don't use RNN layers. Also, this macro is not about using the floating number, nnom currently only running on 8bit fixed-point data.

For the calibration step generate_model(model, x_train[:2048 * 4], name='weights.h'), you should use real data, can be training or testing data but not random number. And the data should covert the most cases possible, you can enlarge the size of x_train[:2048 * 4] to see if it helps.
The calibration step will generate those number in you screenshot is determined by the output of each layer. Therefore to provide this q format to contains the maximum/minimum of the layer/weights. By using different calibration dataset, these bits/shift are supposed to be changed. However, calibrating with different real signals will bring little changes while with fake signals can change quite a lot.

I will suggest you run the example first. Once it is successful, then modify the tf model and see if it still work.

I am sorry to tell you that enlarging the size of x_train[:2048 * 4] and running the example first and modifying the tf model does not work. Your nnom inference is a good project. Do you have a plan to support floating computation? I think developers of other areas will like this project very much.

from nnom.

Results of mfcc in main.c is different from that in python with the same setting. about nnom HOT 13 OPEN

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent