Giter VIP home page Giter VIP logo

Comments (15)

Mihonarium avatar Mihonarium commented on May 30, 2024 2

@TheMightyRaider the trained model is available here

from neural-audio-fp.

mimbres avatar mimbres commented on May 30, 2024 1

Thanks. Yes actually the training part is same.
I have a plan for colab. The g-drive (raw) files are exactly for the purpose of mounting it on colab .

Training in colab:
I didn't test it but it should work. You first need to modify the config/default.yaml. The OUTPUT_ROOT_DIR and LOG_ROOT_DIR must be set to you gdrive directory. And other paths like SOURCE_ROOT etc. should be the dataset (raw) I shared.
In training, It saves model checkpoint every epoch. Usually every twenty minutes or it can take longer.
So if the colab was auto-shut down, you can continue training from the last checkpoint.
If you meet any problem, just let me know. It will be a nice contribution.

About sharing a trained model, yes I can. The plan is to write a one page colab demo by loading it for the next update.
But if you wanna early-try, here is the link.

I really welcome feedback from colab users. I feel it is the way this open project to go.

from neural-audio-fp.

Mihonarium avatar Mihonarium commented on May 30, 2024 1

I was able to run the training process in Colab with Miniconda, but just installing requirements without Miniconda leads to an error. #12 should fix it.

Restoring from that checkpoint doesn't work for some reason. It outputs a long list of messages like WARNING:tensorflow:Unresolved object in checkpoint: (root).optimizer's state 'v' for (root).model.div_enc.split_fc_layers.124.layer_with_weights-0.bias for all the layers, weights, etc., and this warning at the end:

WARNING:tensorflow:A checkpoint was restored (e.g. tf.train.Checkpoint.restore or tf.keras.Model.load_weights) but not all checkpointed values were used. See above for specific issues. Use expect_partial() on the load status object, e.g. tf.train.Checkpoint.restore(...).expect_partial(), to silence these warnings, or use assert_consumed() to make the check explicit. See https://www.tensorflow.org/guide/checkpoint#loading_mechanics for details

from neural-audio-fp.

Mihonarium avatar Mihonarium commented on May 30, 2024 1

Got unsupported operand type(s) for +: 'PosixPath' and 'str' from line 306 of dataset.py when tried to generate from a custom source

from neural-audio-fp.

mimbres avatar mimbres commented on May 30, 2024 1

@Mihonarium Solved by removing pathlib for argin. Also fixed same issue for --output option.

from neural-audio-fp.

Mihonarium avatar Mihonarium commented on May 30, 2024

Oh, sorry, I just saw that you actually use the mini dataset for training and the full one for a full-scale evaluation. Closing the issue

from neural-audio-fp.

mimbres avatar mimbres commented on May 30, 2024

I am wondering if it is possible to install faiss (required for constructing search engine) smoothly in colab. I've never tried it yet. It is also an important prerequisite to develop colab demo. I'll test it out a bit tonight.

  • Installation of faiss-gpu on colab.

from neural-audio-fp.

mimbres avatar mimbres commented on May 30, 2024

@Mihonarium Thanks for report. Yes, it seems we don't need conda for colab. Just pip install works smooth. Installation of faiss-gpu was super smooth too: !pip install faiss-gpu.

About your checkpoint loading issue, let me ask:

  • Just use the config/640_lamb.yaml in repo.
  • Did you specify config? The command should be like:
python run.py train -c 640_lamb  640_lamb # ignore this line..
!python run.py generate -c 640_lamb 640_lamb 101

BTW, just try generate command. Continuing train from the checkpoint of different type of device is weird scenario.
If you send me your notebook, I'll look at it tomorrow.

from neural-audio-fp.

Mihonarium avatar Mihonarium commented on May 30, 2024

Yes, I did specify the config.

What's even more strange, the issue with a lot of warnings appears only with run.py train and doesn't appear for generate.

The notebook: https://gist.github.com/Mihonarium/e3fd355cb560b82373fd2186139f1bc2 (the last cells show that generate and training from scratch work).

from neural-audio-fp.

mimbres avatar mimbres commented on May 30, 2024

@Mihonarium Oh it is an expected behavior as I wrote it above. The checkpoint file contains optimzer's states info which is GPU device dependent. So, if you wanna continue train using my checkpoint as an initial parameter, it's possible but I didn't consider such use. It requires to load model without connecting optimizer first (as in generate). Then initialize optimizer and start training.

from neural-audio-fp.

mimbres avatar mimbres commented on May 30, 2024

@Mihonarium About training from scratch error: First, for P100 GPU, I recommend

BSZ:
    TR_BATCH_SZ : 320
        # Training batch size N must be EVEN number.
    TR_N_ANCHOR : 160

You didn't get out of memory error though. But this is not related with your issue.
I am now checking CPU info of colab.
In config, try:

DEVICE:
    CPU_N_WORKERS : 4 # 4 for minimal system. 8 is recommended.
    CPU_MAX_QUEUE : 10 # 10 for minimal system. 20 is recommended.

It depends on how many threads the system can handle.
I will run it tomorrow.

from neural-audio-fp.

Mihonarium avatar Mihonarium commented on May 30, 2024

it is an expected behavior as I wrote it above. The checkpoint file contains optimzer's states info which is GPU device dependent.

Got it, makes sense. Thanks!

Training from scratch didn't give any errors, I interrupted it. I included it to show that errors are from the checkpoints load (I didn't know it was the expected behavior) and not from something else. You're right though, I would probably get an out of memory error if trained for longer. I was actually able to train the model successfully with a batch size of 320.

from neural-audio-fp.

TheMightyRaider avatar TheMightyRaider commented on May 30, 2024

@mimbres @Mihonarium Is it possible for you guys to share the trained model, It's quite hard to train with 320 as batch size? 🀞

from neural-audio-fp.

TheMightyRaider avatar TheMightyRaider commented on May 30, 2024

Thanks! @Mihonarium

from neural-audio-fp.

haha010508 avatar haha010508 commented on May 30, 2024

i use the pretrained model, and same database(Dataset-mini), for evalue step, but i got very poor result, i want to know: why? this is my code
`
CUDA_VISIBLE_DEVICES=1 python run.py evaluate 640_lamb 101.index -c 640_lamb
cli: Configuration from ./config/640_lamb.yaml
Load 29,500 items from ./logs/emb/640_lamb/101.index/query.mm.
Load 29,500 items from ./logs/emb/640_lamb/101.index/db.mm.
Load 581,922 items from ./logs/emb/640_lamb/101.index/dummy_db.mm.
Creating index: ivfpq
Copy index to GPU.
Training index...
Elapsed time: 23.07 seconds.
581922 items from dummy DB
29500 items from reference DB
Added total 611422 items to DB. 2.25 sec.
Created fake_recon_index, total 611422 items. 0.04 sec.
test_id: icassp, n_test: 2000
========= Top1 hit rate (%) of segment-level search =========
---------------- Query length ----------------
segments 1 3 5 9 11 19
seconds (1s) (2s) (3s) (5s) (6s) (10s)

Top1 exact 3.75 5.90 6.45 7.25 7.25 7.80
Top1 near 4.00 6.15 6.70 7.30 7.30 7.80
Top3 exact 4.40 7.00 7.85 8.60 8.45 8.95
Top10 exact 5.40 8.35 9.40 10.90 11.15 10.90

average search + evaluation time 7.25 ms/query
Saved test_ids and raw score to ./logs/emb/640_lamb/101.index/.
`
if i need retrain?

from neural-audio-fp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.