Comments (15)
@TheMightyRaider the trained model is available here
from neural-audio-fp.
Thanks. Yes actually the training part is same.
I have a plan for colab. The g-drive (raw) files are exactly for the purpose of mounting it on colab .
Training in colab:
I didn't test it but it should work. You first need to modify the config/default.yaml. The OUTPUT_ROOT_DIR
and LOG_ROOT_DIR
must be set to you gdrive directory. And other paths like SOURCE_ROOT
etc. should be the dataset (raw) I shared.
In training, It saves model checkpoint
every epoch. Usually every twenty minutes or it can take longer.
So if the colab was auto-shut down, you can continue training from the last checkpoint.
If you meet any problem, just let me know. It will be a nice contribution.
About sharing a trained model, yes I can. The plan is to write a one page colab demo by loading it for the next update.
But if you wanna early-try, here is the link.
I really welcome feedback from colab users. I feel it is the way this open project to go.
from neural-audio-fp.
I was able to run the training process in Colab with Miniconda, but just installing requirements without Miniconda leads to an error. #12 should fix it.
Restoring from that checkpoint doesn't work for some reason. It outputs a long list of messages like WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.div_enc.split_fc_layers.124.layer_with_weights-0.bias
for all the layers, weights, etc., and this warning at the end:
WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details
from neural-audio-fp.
Got unsupported operand type(s) for +: 'PosixPath' and 'str'
from line 306 of dataset.py when tried to generate
from a custom source
from neural-audio-fp.
@Mihonarium Solved by removing pathlib for argin. Also fixed same issue for --output option.
from neural-audio-fp.
Oh, sorry, I just saw that you actually use the mini dataset for training and the full one for a full-scale evaluation. Closing the issue
from neural-audio-fp.
I am wondering if it is possible to install faiss
(required for constructing search engine) smoothly in colab. I've never tried it yet. It is also an important prerequisite to develop colab demo. I'll test it out a bit tonight.
- Installation of faiss-gpu on colab.
from neural-audio-fp.
@Mihonarium Thanks for report. Yes, it seems we don't need conda for colab. Just pip install
works smooth. Installation of faiss-gpu was super smooth too: !pip install faiss-gpu
.
About your checkpoint loading issue, let me ask:
- Just use the
config/640_lamb.yaml
in repo. - Did you specify
config
? The command should be like:
python run.py train -c 640_lamb 640_lamb # ignore this line..
!python run.py generate -c 640_lamb 640_lamb 101
BTW, just try generate
command. Continuing train from the checkpoint of different type of device is weird scenario.
If you send me your notebook, I'll look at it tomorrow.
from neural-audio-fp.
Yes, I did specify the config.
What's even more strange, the issue with a lot of warnings appears only with run.py train
and doesn't appear for generate
.
The notebook: https://gist.github.com/Mihonarium/e3fd355cb560b82373fd2186139f1bc2 (the last cells show that generate
and training from scratch work).
from neural-audio-fp.
@Mihonarium Oh it is an expected behavior as I wrote it above. The checkpoint file contains optimzer's states info which is GPU device dependent. So, if you wanna continue train using my checkpoint as an initial parameter, it's possible but I didn't consider such use. It requires to load model without connecting optimizer first (as in generate
). Then initialize optimizer and start training.
from neural-audio-fp.
@Mihonarium About training from scratch error: First, for P100 GPU, I recommend
BSZ:
TR_BATCH_SZ : 320
# Training batch size N must be EVEN number.
TR_N_ANCHOR : 160
You didn't get out of memory
error though. But this is not related with your issue.
I am now checking CPU info of colab.
In config, try:
DEVICE:
CPU_N_WORKERS : 4 # 4 for minimal system. 8 is recommended.
CPU_MAX_QUEUE : 10 # 10 for minimal system. 20 is recommended.
It depends on how many threads the system can handle.
I will run it tomorrow.
from neural-audio-fp.
it is an expected behavior as I wrote it above. The checkpoint file contains optimzer's states info which is GPU device dependent.
Got it, makes sense. Thanks!
Training from scratch didn't give any errors, I interrupted it. I included it to show that errors are from the checkpoints load (I didn't know it was the expected behavior) and not from something else. You're right though, I would probably get an out of memory
error if trained for longer. I was actually able to train the model successfully with a batch size of 320.
from neural-audio-fp.
@mimbres @Mihonarium Is it possible for you guys to share the trained model, It's quite hard to train with 320 as batch size? π€
from neural-audio-fp.
Thanks! @Mihonarium
from neural-audio-fp.
i use the pretrained model, and same database(Dataset-mini), for evalue step, but i got very poor result, i want to know: why? this is my code
`
CUDA_VISIBLE_DEVICES=1 python run.py evaluate 640_lamb 101.index -c 640_lamb
cli: Configuration from ./config/640_lamb.yaml
Load 29,500 items from ./logs/emb/640_lamb/101.index/query.mm.
Load 29,500 items from ./logs/emb/640_lamb/101.index/db.mm.
Load 581,922 items from ./logs/emb/640_lamb/101.index/dummy_db.mm.
Creating index: ivfpq
Copy index to GPU.
Training index...
Elapsed time: 23.07 seconds.
581922 items from dummy DB
29500 items from reference DB
Added total 611422 items to DB. 2.25 sec.
Created fake_recon_index, total 611422 items. 0.04 sec.
test_id: icassp, n_test: 2000
========= Top1 hit rate (%) of segment-level search =========
---------------- Query length ----------------
segments 1 3 5 9 11 19
seconds (1s) (2s) (3s) (5s) (6s) (10s)
Top1 exact 3.75 5.90 6.45 7.25 7.25 7.80
Top1 near 4.00 6.15 6.70 7.30 7.30 7.80
Top3 exact 4.40 7.00 7.85 8.60 8.45 8.95
Top10 exact 5.40 8.35 9.40 10.90 11.15 10.90
average search + evaluation time 7.25 ms/query
Saved test_ids and raw score to ./logs/emb/640_lamb/101.index/.
`
if i need retrain?
from neural-audio-fp.
Related Issues (20)
- Unable to open the file "../demo_template.ipynb" HOT 1
- Speed of generating fingereprints from custom source HOT 8
- Questions about inquiries HOT 8
- Could you please provide the checksum of fma_full only? HOT 8
- Why the loss computed during training is NanοΌ HOT 9
- permission denied for building Docker Custom Image HOT 5
- FileNotFoundError: [Errno 2] No such file or directory: './logs/emb/CHECKPOINT_NAME/CHECKPOINT_INDEX/query_shape.npy' HOT 4
- IS THIS REPOSITORY HELPFUL FOR FOLLOWING SITUATION HOT 1
- Dimension of Zt HOT 7
- Getting training loss value as nan and val loss as nan HOT 8
- positve pairs and negitive pairs? HOT 1
- question: my modle train loss:nan HOT 2
- Comparing short audio files HOT 1
- finetuning on short audios HOT 5
- Evaluation with custom dataset HOT 7
- Fingerprint generation from custom dataset HOT 2
- Reported train, val, test split sizes vs actual FMA size HOT 3
- Questions Regarding Custom Data Testing in Audio Fragment Identification HOT 1
- Model Definition Front Strides HOT 1
- UnboundLocalError in run.py during training HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from neural-audio-fp.