Comments (12)
I also have the same issue... Googling doesn't seem to offer any chance on this error
from meshgpt-pytorch.
I think that they dropped the support for dropout for sageconv, I just removed the args and it resolved it
e.g:
from:
sageconv_kwargs = {**sageconv_kwargs, 'sageconv_dropout' : sageconv_dropout}
to:
sageconv_kwargs = {**sageconv_kwargs }
from meshgpt-pytorch.
Hey @MarcusLoppe thanks -- the default is 0 anyway so i guess we can do that
from meshgpt-pytorch.
Another error im getting is:
File [~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:106](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:106), in <listcomp>(.0)
[98](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:98) batch_codes = autoencoder.tokenize(
[99](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:99) vertices=padded_batch_vertices,
[100](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:100) faces=padded_batch_faces,
[101](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:101) face_edges=padded_batch_face_edges
[102](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:102) )
...
--> [106](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:106) item['codes'] = [code for code in codes if code != autoencoder.pad_id and code != -1]
[108](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:108) self.sort_dataset_keys()
[109](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:109) print(f"[MeshDataset] Generated codes for {len(self.data)} entrys")
Is this still specific issues on my end as well : (?
Another idea that I am thinking of is to generate meshes w/ bounding boxes as input text token parameters. Would you think this is possible given the setup we have if we insert that info into the training loop? @MarcusLoppe Thank you so much for all the help!
from meshgpt-pytorch.
Another error im getting is:
File [~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:106](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:106), in <listcomp>(.0) [98](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:98) batch_codes = autoencoder.tokenize( [99](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:99) vertices=padded_batch_vertices, [100](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:100) faces=padded_batch_faces, [101](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:101) face_edges=padded_batch_face_edges [102](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:102) ) ... --> [106](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:106) item['codes'] = [code for code in codes if code != autoencoder.pad_id and code != -1] [108](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:108) self.sort_dataset_keys() [109](https://vscode-remote+ssh-002dremote-002bitx.vscode-resource.vscode-cdn.net/home/stephen/Desktop/meshgpt-pytorch/~/anaconda3/envs/meshgpt/lib/python3.9/site-packages/meshgpt_pytorch/mesh_dataset.py:109) print(f"[MeshDataset] Generated codes for {len(self.data)} entrys")
Is this still specific issues on my end as well : (?
Ah no sorry! I always used to generate tokens using 1 item per batch to avoid VRAM issues.
But since doing that for 218k items takes a while I implemented batch processing. I made a few mistakes but the latest commit should have resolved that issue.
Sorry about that, I was doing the changes while half asleep :)
Another idea that I am thinking of is to generate meshes w/ bounding boxes as input text token parameters. Would you think this is possible given the setup we have if we insert that info into the training loop? @MarcusLoppe Thank you so much for all the help!
You mean providing it with more details about the wanted mesh model size?
The autoencoder converts the mesh and 'simplifies'/discretize it so the lowest point in a axis is 0 and max is 127.
This way all the inputs are the same sizes since they are the same dimensions and it lets the autoencoder learn in a more uniform way.
You can increase this value (num_discrete_coors ) to higher if you are dealing with very big meshes. Otherwise it seems to work fine with the current meshes im dealing with.
So it wouldn't quite matter I think, It might just be more helpful for the transformer if you provide verbs about the shape, e.g. "very big chair" or something like that. The it will create a relationship with that mesh model and the text 'very big', doing this over all the big meshes will create a correlation for the transformer and help it understand what 'big' means.
from meshgpt-pytorch.
Thank you so much @MarcusLoppe -- that makes a lot of sense! May I also know the reason why you have two trainers -- is it just because you wanna do learning rate scheduling at diff losses?
trainer = MeshTransformerTrainer(model = transformer,warmup_steps = 10,grad_accum_every=4,num_train_steps=100, dataset = dataset,
learning_rate = 1e-3, batch_size=8)
loss = trainer.train(100, stop_at_loss = 0.009)
trainer = MeshTransformerTrainer(model = transformer,warmup_steps = 10,grad_accum_every=4,num_train_steps=100, dataset = dataset,
learning_rate = 5e-4, batch_size=8)
loss = trainer.train(200, stop_at_loss = 0.00001)
from meshgpt-pytorch.
Thank you so much @MarcusLoppe -- that makes a lot of sense! May I also know the reason why you have two trainers -- is it just because you wanna do learning rate scheduling at diff losses?
trainer = MeshTransformerTrainer(model = transformer,warmup_steps = 10,grad_accum_every=4,num_train_steps=100, dataset = dataset, learning_rate = 1e-3, batch_size=8) loss = trainer.train(100, stop_at_loss = 0.009) trainer = MeshTransformerTrainer(model = transformer,warmup_steps = 10,grad_accum_every=4,num_train_steps=100, dataset = dataset, learning_rate = 5e-4, batch_size=8) loss = trainer.train(200, stop_at_loss = 0.00001)
Correct, I at the start thought that the autoencoder could benefit from higher learning rate at the start of the training.
But I discovered that it didn't really matter.
I tried to implement the existing LRScheduler class but the lr scheduler that contains the stepping of learning rate isn't a base class of _LRScheduler. Accelerate requires a lr scheduler that is of the type _LRScheduler, so that one isn't compatible with the accelerator.
I also tried to recreate it by hand but there was so many issues so I didn't bother finishing it so I just stopped a specific loss and setup the training again but with lower learning rate :)
Dependant on hardware resources, it might be worth it to use a higher learning rate on the transformer, but I'm not 100% sure.
from meshgpt-pytorch.
Got it thank you! I am thinking of getting more resources form my lab and train it over objectverse (thinking of trianing this on the magnitude of thousands of objects at least). Do u have have a rough estimate of the compute needed and whether its easy to parallelize it over multiple GPUs? Happy to share a pretrained model afterward and wanna hear what u think! @MarcusLoppe
from meshgpt-pytorch.
Got it thank you! I am thinking of getting more resources form my lab and train it over objectverse (thinking of trianing this on the magnitude of thousands of objects at least). Do u have have a rough estimate of the compute needed and whether its easy to parallelize it over multiple GPUs? Happy to share a pretrained model afterward and wanna hear what u think! @MarcusLoppe
I've only trained using max 250 faces, that was over 14k 3d mesh models(x15 augments), I used the 4 encoder and 8 decoder attention layers which resulted in the total parameters size is 75M
The result was very good at a 0.36 MSE loss and took only like 20hrs using a single P100.
Checkout the discussion i posted Pre-trained autoencoder & data sourcing #66 , you can see the results and link to the google drive with the rendered output & model.
Each epoch took 2.5hrs so it was about 8-9 epochs for a 218k dataset.
It seems like the attention layers is a must if you don't want to train it much longer, without the attention I got to 0.48 loss after 30hrs+ training.
When i used 1 encoder and 1 decoder layer it got down to 0.42 loss, so more attention layer definitely helps.
However the transformer takes much longer, using the model below, it takes about 4.5hrs per epoch with 8 batch size (8 grad_accum_every).
If you got enough of VRAM use dim size of 1024 and either 12 or 24 attn_depth, for context: GPT-2 uses 1024 & 24 attention layer I believe.
The autoencoder seem to scale with the model size so it's probably faster/better to train the transformer as big as possible.
Dependant on the GPU's it probably will take a day or two, in the paper they did 4 days with 4 A100 I believe.
However this repo got massive amounts of upgrades compared to the paper.
I managed to train on 14k models on the autoencoder and only took 1 day using a single P100 while it took 2 days using x4 A100 for them.
So hopefully something similar might be the case for the transformer.
transformer = MeshTransformer(
autoencoder,
dim = 512,
attn_depth = 24,
attn_heads = 8,
coarse_pre_gateloop_depth = 6,
fine_pre_gateloop_depth= 4,
max_seq_len = max_seq,
condition_on_text = True,
text_condition_model_types = "bge",
text_condition_cond_drop_prob = 0.01,
)
from meshgpt-pytorch.
@MarcusLoppe Thank you so much for the response! That's super helpful : )
For now, I've been training w/ the demo mesh, w/ autoencoder loss 0.277, and the transformer loss plateaued at around 0.005. I tried to keep training w/ a 3090 for a couple more hrs and the results dont seem to get much better than what I have below (when using only text prompts). Do you think this is expected or do u by any chance have any insights? Thanks in advance!
from meshgpt-pytorch.
@MarcusLoppe Thank you so much for the response! That's super helpful : )
For now, I've been training w/ the demo mesh, w/ autoencoder loss 0.277, and the transformer loss plateaued at around 0.005. I tried to keep training w/ a 3090 for a couple more hrs and the results dont seem to get much better than what I have below (when using only text prompts). Do you think this is expected or do u by any chance have any insights? Thanks in advance!
Try providing it with 10-50% of the tokens for a model and see. The original paper never used text as a guide but only the tokens it was prompted, the text guiding seems to be pretty week in the start of the generation but it's given like 10% of the tokens, it will jump start the generation and output a very good mesh
I've had some difficulty when training on small datasets, if it's just one shape it's fine and the transformer can over-fit the model.
However when dealing with multiple shapes it get's harder. I'll probably exchange the demo meshes so it's the old poly mesh.
Without providing any tokens I've gotten descent result with using chair, table and softa dataset but the only meshes I've flawless outputted using text-to-3d is those 4 basic shapes about cube, cone etc.
But when providing about 10% tokens to jump start it, the transformer goes very well.
The issue when dealing with more complex shapes is that it needs to be more generalized.
Hence why I stopped dealing with small datasets of 400-800 models and moved on to the larger 14k model dataset.
from meshgpt-pytorch.
Thank makes a lot of sense thank u: )
from meshgpt-pytorch.
Related Issues (20)
- ImportError: cannot import name 'MeshDataset' from 'meshgpt_pytorch' HOT 5
- Loss=nan when training transformertrainer HOT 5
- distirbution training HOT 1
- Mesh completion task HOT 2
- A question about the code in 'meshgpt_pytorch.py'
- `ResidualLFQ` was successful, but `ResidualVQ` failed severely! HOT 2
- High RAM cost in derive_face_edges_from_faces
- Pretrained checkpoints? HOT 2
- Mesh conditioning instead of text conditioning HOT 6
- Guide for generating novel meshes based on the trained models
- New/replace attention mechanism
- [Critical] Very high loss rate at first few tokens (classifier free guidance not working) HOT 66
- Seeking Collaboration HOT 3
- Transformer keeps predicting the same token HOT 8
- Area calculation for each triangle is same (with value of 0.5) HOT 6
- Version conflict with x-transformer-1.31.6 HOT 1
- Continuous range for discretize area HOT 2
- Area for quads HOT 6
- hyper-parameter suggestion HOT 20
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from meshgpt-pytorch.