Giter VIP home page Giter VIP logo

Comments (5)

StephenYangjz avatar StephenYangjz commented on September 23, 2024 1

Thank you! Reinstalled the package and it went away, just seems to be kernel/package issues

from meshgpt-pytorch.

MarcusLoppe avatar MarcusLoppe commented on September 23, 2024

Hi Stephen,

So MeshDataset is a class which I created for meshgpt_pytorch, I made a pull request for it but not sure why it wasn't accepted.
What ever the reason; the difference between my fork and meshgpt is just the modified trainer class (train by epochs instead and get progress reports from tdqm) and MeshDataset.

If you'd like to use my MeshDataset you can install my fork or just copy and paste MeshDataset into your code.

from meshgpt-pytorch.

StephenYangjz avatar StephenYangjz commented on September 23, 2024

That resolves it (init also needs to be updated) Thanks!

from meshgpt-pytorch.

StephenYangjz avatar StephenYangjz commented on September 23, 2024

Hi @MarcusLoppe , I didnt think I have this issue before but now im getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], [line 11](vscode-notebook-cell:?execution_count=10&line=11)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [2](vscode-notebook-cell:?execution_count=10&line=2) #                                              batch_size=8,
      [3](vscode-notebook-cell:?execution_count=10&line=3) #                                              grad_accum_every=2,
      [4](vscode-notebook-cell:?execution_count=10&line=4) #                                              learning_rate = 1e-2) 
      [5](vscode-notebook-cell:?execution_count=10&line=5) # loss = autoencoder_trainer.train(280,stop_at_loss = 0.7, diplay_graph= True)   
      [7](vscode-notebook-cell:?execution_count=10&line=7) autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [8](vscode-notebook-cell:?execution_count=10&line=8)                                              batch_size=8,
      [9](vscode-notebook-cell:?execution_count=10&line=9)                                              grad_accum_every=2,
     [10](vscode-notebook-cell:?execution_count=10&line=10)                                              learning_rate = 4e-3) 
---> [11](vscode-notebook-cell:?execution_count=10&line=11) loss = autoencoder_trainer.train(280,stop_at_loss = 0.28, diplay_graph= True)     

TypeError: train() got an unexpected keyword argument 'stop_at_loss'

Do you by any chance have any pointers? Thank you!

from meshgpt-pytorch.

MarcusLoppe avatar MarcusLoppe commented on September 23, 2024

Hi @MarcusLoppe , I didnt think I have this issue before but now im getting:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[10], [line 11](vscode-notebook-cell:?execution_count=10&line=11)
      [1](vscode-notebook-cell:?execution_count=10&line=1) # autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [2](vscode-notebook-cell:?execution_count=10&line=2) #                                              batch_size=8,
      [3](vscode-notebook-cell:?execution_count=10&line=3) #                                              grad_accum_every=2,
      [4](vscode-notebook-cell:?execution_count=10&line=4) #                                              learning_rate = 1e-2) 
      [5](vscode-notebook-cell:?execution_count=10&line=5) # loss = autoencoder_trainer.train(280,stop_at_loss = 0.7, diplay_graph= True)   
      [7](vscode-notebook-cell:?execution_count=10&line=7) autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 10, dataset = dataset, num_train_steps=100,
      [8](vscode-notebook-cell:?execution_count=10&line=8)                                              batch_size=8,
      [9](vscode-notebook-cell:?execution_count=10&line=9)                                              grad_accum_every=2,
     [10](vscode-notebook-cell:?execution_count=10&line=10)                                              learning_rate = 4e-3) 
---> [11](vscode-notebook-cell:?execution_count=10&line=11) loss = autoencoder_trainer.train(280,stop_at_loss = 0.28, diplay_graph= True)     

TypeError: train() got an unexpected keyword argument 'stop_at_loss'

Do you by any chance have any pointers? Thank you!

Oh, I'm not sure, the train function is:
def train(self, num_epochs, stop_at_loss = None, diplay_graph = False):

Python has some weird issues so have you give it a go restarting the notebook kernel?

I'm currently running the below and it's working.
Btw, I should have removed one of the autoencoder_trainer so there is only one.
I found it better for the model to start training at a low learning rate since this will ensure the commit loss will be steadier and I don't really notice any improvements by having a higher learning rate at the start.

Also, target a batch size of 64, if you got enough VRAM, set the batch size to 64 and grad_accum_every to 1. Larger batch size equals faster the training time.
For training on a large dataset, you can set the commit_loss_weight to 0.25 otherwise it will shoot up to 100s. This way it puts pressure on the encoder to compress the tokens better.

Otherwise try to get a total effective batch size of 64 by changing grad_accum_every so it will equal 64:
batch_size * grad_accum_every = 64

save_name = "16k_2_4" 
batch_size=16
   
autoencoder.commit_loss_weight = 0.25  
autoencoder_trainer = MeshAutoencoderTrainer(model =autoencoder ,warmup_steps = 100, dataset = dataset, num_train_steps=100,
                                             batch_size=batch_size,
                                             grad_accum_every=4,
                                             learning_rate = 1e-4,
                                             checkpoint_every_epoch= 1) 
loss = autoencoder_trainer.train(480,stop_at_loss = 0.2, diplay_graph= True)  

from meshgpt-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.