Giter VIP home page Giter VIP logo

Comments (8)

MicrobeLab avatar MicrobeLab commented on June 8, 2024

You could try adding a print message around line 94-97 of seq2tfrec_onehot.py to make sure that a training set (rather than a test set) is being converted.

By the way, if you only want to reproduce 16S prediction using the seq2species model. The original implementation by google might be helpful:

https://github.com/tensorflow/models/tree/master/research/seq2species

from deepmicrobes.

Bartvelp avatar Bartvelp commented on June 8, 2024

Yes I already made sure it is converted to a training set with the convert_advance_file function and that function correctly extracts the information.

Turns out the input_fn_train is set depending on the --encode_method flag which I failed to set, it default to kmer which is of course wrong. Setting --encode_method to one_hot fixes the TFrecord parsing, and the training starts succesfully.

Calculation the loss seems to fail however and I am not sure what is causing it.
I am getting this error:

tensorflow.python.framework.errors_impl.InvalidArgumentError: Tried to explicitly squeeze dimension 1 but dimension was not 1: 0
         [[Node: sparse_softmax_cross_entropy_loss/remove_squeezable_dimensions/Squeeze = Squeeze[T=DT_INT64, squeeze_dims=[-1], _device="/job:localhost/replica:0/task:0/device:CPU:0"](IteratorGetNext:1)]]

Full log here

Any idea on how to fix this/what causes this?

P.S. I found the original paper and code to be very convoluted and difficult to work with, and I am interest in also trying the other models in this repo.

from deepmicrobes.

MicrobeLab avatar MicrobeLab commented on June 8, 2024

I'm not really sure about the solution. But I think the problem lies in the training data (e.g., the length of DNA sequences) rather than the model. The model needs a flag of --max_len whose default value is 150 bp. Try setting it to the max length of your full-length 16S data.

from deepmicrobes.

Bartvelp avatar Bartvelp commented on June 8, 2024

Oops I see I forgot to add the command I ran
DeepMicrobes.py --input_tfrec=combined_train_small.tfrec --model_name=seq2species --model_dir=seq2species_new_weights_small --max_len=400 --encode_method=one_hot
(I trimmed the sequences to 400bp).
So that should not be the problem. When I did forget to set the --max_len I get an error about padding to a lower size than the original.

from deepmicrobes.

MicrobeLab avatar MicrobeLab commented on June 8, 2024

Try deleting the model_dir (rm -rf seq2species_new_weights_small) and running again.

from deepmicrobes.

Bartvelp avatar Bartvelp commented on June 8, 2024

Still no luck unfortunatly
Log
This is my repo if you are puzzeled by the print statements
github.com/Bartvelp/DeepMicrobes_clone

from deepmicrobes.

MicrobeLab avatar MicrobeLab commented on June 8, 2024

You should set the --num_classes flag to your actual number of categories. The default value is --num_classes=2505 (I had 2505 species for the pre-trained model).

from deepmicrobes.

Bartvelp avatar Bartvelp commented on June 8, 2024

yes thank you I forgot that.

I figured it out, due to a weird bug or something my tfrecord file did not contain the classes/labels. When I recreated them it all worked out-of-the-box. Thanks alot for your help!
Closing

from deepmicrobes.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.