Giter VIP home page Giter VIP logo

Comments (10)

pichuan avatar pichuan commented on August 26, 2024

Hi,
can you tell me where you got your inception_v3.ckpt* model files?
And, can you paste the content of your test_train.config.txt file?

from deepvariant.

pichuan avatar pichuan commented on August 26, 2024

Oh!! By the way, if you want to train a model, you shouldn't be using make_examples.zip (like you listed in your example). You should be using: model_train.zip.

Here is an example of a latest small training run that I did:
python "${BIN_DIR}"/model_train.zip
--dataset_config_pbtxt /home/pichuan/training/training.dataset_config.pbtxt
--train_dir /home/pichuan/train_dir
--start_from_checkpoint /home/pichuan/inception_v3_ckpt/model.ckpt

My training.dataset_config.pbtxt looks like this:
name: "small_dataset"
tfrecord_path: "/home/pichuan/training/small_dataset.training.shuffled.examples-?????-of-00010.tfrecord.gz"
num_examples: 72178

from deepvariant.

TuBieJun avatar TuBieJun commented on August 26, 2024

Sorry, i used model_train.zip really, just pasted the incorrect command to issue.
This is my correct command:

python /leostore/software/deepvariant/bazel-bin/deepvariant/model_train.zip --dataset_config_pbtxt "/leostore/analysis/development/liteng/deepvariant_test/test_train.config.txt" --start_from_checkpoint inception_v3.ckpt

The inception_v3.ckpt is downloaded from https://github.com/tensorflow/models/tree/master/research/slim#Data
This is my config file:

name: "test-training-dataset"
tfrecord_path: "/leostore/analysis/development/liteng/deepvariant_test/train_set/test_train.tfrecord.gz"
num_examples: 1

from deepvariant.

pichuan avatar pichuan commented on August 26, 2024

I suggest that you try a dataset with more training examples.
Can you try providing a tfrecord_path with more examples (let's say, 100 or 1000 examples?) and provide the corresponding num_examples, and let us know if that works for you?

from deepvariant.

TuBieJun avatar TuBieJun commented on August 26, 2024

I changed the "num_examples" form 1 to 2 only,not add any tfrecord.gz file, then it start run, but it is really slow, it have run two hours stilly. So is this situation normal? The tfrecord.gz file is about 2.4M size.

image

Here my config.txt:

name: "test-training-dataset"
tfrecord_path: "/leostore/analysis/development/liteng/deepvariant_test/train_set/test_train.tfrecord.gz"
num_examples: 2

This is my command

python /leostore/software/deepvariant/bazel-bin/deepvariant/model_train.zip --dataset_config_pbtxt /leostore/analysis/development/liteng/deepvariant_test/test_train.config.txt --start_from_checkpoint inception_v3.ckpt

from deepvariant.

pichuan avatar pichuan commented on August 26, 2024

First of all, the num_examples is meant to be how many tensorflow.Example are actually in the file pointed at tfrecord_path. You should set it correctly. (I assume you're not actually only training on 2 examples?)

Second, if you want to run training, I highly recommend that you use a machine with GPU.
See: https://github.com/google/deepvariant/blob/r0.4/docs/deepvariant-details.md on how to build with GPU.

from deepvariant.

TuBieJun avatar TuBieJun commented on August 26, 2024

The tfrecord file i used come from a sample data. By the way if i have many tfrecord files come from some different sample, what should i set "tfrecord_path"? Use wildcard like this "/leostore/analysis/development/liteng/deepvariant_test/train_set/*tfrecord.gz" or just use the "cat" command to combine all tfrecord files to a tfrecord file ?

from deepvariant.

pgrosu avatar pgrosu commented on August 26, 2024

Based on the proto code at the link below, it seems to require full paths without wildcards:

// Full path of the tensorflow.Example TFRecord file.
string tfrecord_path = 2;

from deepvariant.

pichuan avatar pichuan commented on August 26, 2024

If I remember correctly, wildcards like * and ? should work. We can probably improve the comment there.
But concatenating everything together works too. You can directly cat all *tfrecord.gz into another big all.tfrecord.gz file. I would suggest trying wildcard first though.

In terms of how to set num_examples: for now if you know roughly how many examples you have (for example, I can't remember if make_examples print out that information), you can just set a rough number. It's only being used here:
https://github.com/google/deepvariant/blob/r0.4/deepvariant/model_train.py#L211
It does affect the learning rate decay, but it doesn't have to be exact.
I'll see if I can come back with a better example to count examples later.

from deepvariant.

pichuan avatar pichuan commented on August 26, 2024

Update:
if you look at https://github.com/google/deepvariant/blob/r0.5/docs/visualizing_examples.ipynb
the code in read_tfrecords is an example of how you can read a tfrecord file, and count examples if you like.

from deepvariant.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.