Comments (60)
@MarcusLoppe wait, those chairs are generated from the transformer? if you increase the temperature, can you get a chair that is dissimilar to the ones found in the dataset?
Okay here is the result of stepping 0.1 from 0.1 to 1.0.
Since it's only one model it doesn't know any other shapes so it just gets messed up :)
from meshgpt-pytorch.
@MarcusLoppe you made my day
from meshgpt-pytorch.
@fire yeah, don't worry about it, just show the autoencoder works as in the paper without caveat, and I can figure out the attention portion. I know all there is to know about it
from meshgpt-pytorch.
@MarcusLoppe wait, those chairs are generated from the transformer?
Yes, below is the training loops. I trained for 10 epochs (200 examples dataset) but seems like 3-5 epoch would work as well. I used 1e-3 lr for encoder and 1e-2 lr for transformer since it's training only on one shape
from meshgpt-pytorch.
@MarcusLoppe did you leave this flag on btw? you may have inadvertently proved out a twist to a latest quantization research if so
I left everything as default, but I created my own trainer so I didn't use the warm up.
Not sure, I've tested it on version 0.1.1 & 0.1.12
from meshgpt-pytorch.
@MarcusLoppe ah amazing. you proved out residual LFQ without knowing it. i can probably skip out on the stochastic sampling temperature annealing logic. that complexity is not needed anymore
thank you thank you
Turning off use_residual_lfq makes it pretty bad.
If you want to test, try out my notebook. As a warning, it's very ugly and debuggy style. Just something i smashed together quickly.
Use generate_spheres function if you'd like to test something fast since it's only 80 faces.
https://file.io/y1mpUmYSJctm
from meshgpt-pytorch.
@MarcusLoppe you may not know it, but collect a few more nice lines and you got yourself a short arxiv paper. you can ask chatgpt to fill in the boring expose
there is no published work on residual LFQ
I'll give you the honour since I have no idea what it is :)
Just to be sure since I used other versions to train the encoder in the previous screenshot, I tried it again with the latest version with use_residual_lfq = True and it showed significant improvements as expected. :)
from meshgpt-pytorch.
@MarcusLoppe ok, it is done! 6x shorter sequence length. i haven't fixed the kv cache yet, so inference will be slow, but let me know if you can still overfit to your repeated chair dataset (and you can try some larger meshes too)
Very nice, the ram usage have dropped quite a lot, I can even train using batch size 16 for the transformer which it then only reaches total of 14GB.
The memory usage didn't move when I trained using 1 batch size, It moved up from 2699mb to 4291mb when training using batch size 4. (240 faces)
So very successfully implementation! :)
I managed to generate the chair again successfully, have not had any success yet using the text yet. It seems to slow down the training quite a lot. Maybe implement a caching for the tokens the text conditioner generates?
I have to say is that it takes a lot of VRAM to run the trainer/transformer, its very compute intensive.
Yup, reduce the batch sizes if you are running out of memory. But it's much better today then yesterday
from meshgpt-pytorch.
@MarcusLoppe ok, it is done! 6x shorter sequence length. i haven't fixed the kv cache yet, so inference will be slow, but let me know if you can still overfit to your repeated chair dataset (and you can try some larger meshes too)
If I'm trying to train using a 4k triangles model but the encoder runs out of memory pretty fast:
OutOfMemoryError: CUDA out of memory. Tried to allocate 61.65 GiB......
If i train using a vertices 466 faces 852 model:
The encoder uses 4.2 GB @ 1 batch size.
The Transformer uses 5 GB @ 1 batch size.
This means that the new efficiency for the transformer is: 5112 / 5 = 1022 tokens = 170 triangles per 1GB.
This equals to x4 increase in efficiency per 1GB VRAM.
Not quite the x6 but I'm guessing there is some other factor that increases the memory requirement.
from meshgpt-pytorch.
@MarcusLoppe oh strange, it runs for me
nvm some indexing error, gimme 10
Success :)
Now it's at 1.7 GB @ batch size 1.
Should i set linear_attention to True?
from meshgpt-pytorch.
yeah, this is my first time using a graph convolution, so forgive the bugs π
from meshgpt-pytorch.
yeah, this is my first time using a graph convolution, so forgive the bugs π
Unforgivable π
π
The inference time is quite bad, it hovers around 10 iter/s but that isn't a prio atm
from meshgpt-pytorch.
yup will fix by weekend, it is tricky with hierarchical transformers
from meshgpt-pytorch.
yes!
Great work with the optimizations, it's very nice to be able to have a batch size of 1 :D
Do you think adding some dropout in the resnet will improve the performance?
Setting linear_attention to true worsen the performance, both the speed and accuracy.
Setting it to true gets it stuck at 0.3 loss while turning it off lets the loss go below 0.2
from meshgpt-pytorch.
thanks for sharing the linear attention results
I thought about it and decided to update where they are placed. If that doesn't work, we can just go with local attention
Some more tests, each epoch is 2000 steps/examples.
Seems like when I increased the dataset the old linear_attention caught almost up.
Weirdly enough linear_attn_depth = 0 has the best results.
from meshgpt-pytorch.
@MarcusLoppe once people start seeing some shapes being generated by the attention network, i can start applying my expertise here. mainly i will bring in RQ transformer (which will incidentally allow one to increase D. i think they kept it at 2 because of issues you see). can also bring in reversible networks
from meshgpt-pytorch.
@MarcusLoppe are you using flash attention?
from meshgpt-pytorch.
@MarcusLoppe if you show me that the attention network is able to generate a novel shape, then i'll start on this issue. as they say in software, "make it work, make it right, make it fast, in that order"
from meshgpt-pytorch.
Another idea is some sort of multiply token where you can use the last triangle to affect the future triangle instead of just continue
This is like emoji encoding
from meshgpt-pytorch.
@MarcusLoppe once people start seeing some shapes being generated by the attention network, i can start applying my expertise here. mainly i will bring in RQ transformer (which will incidentally allow one to increase D. i think they kept it at 2 because of issues you see). can also bring in reversible networks
Alright sounds great :) You think that will have such a huge effect? It needs at least a 10x increase if you want to train on 4000 faces models, more if you want to train using higher batch size. (4000 faces * 6 = 24 000 tokens / 10 GB = 2 400 tokens/GB , current effectiveness: 252 tokens / GB)
@MarcusLoppe are you using flash attention?
Yup.
@MarcusLoppe if you show me that the attention network is able to generate a novel shape, then i'll start on this issue. as they say in software, "make it work, make it right, make it fast, in that order"
Well I did, but without the text conditioner, I trained the encoder & transformer on a 240 face chair using 1000 steps (200 examples x 5 epoch) and was able to generate a visual identical chair.
Here is the generated & ground truth:
https://file.io/VCuXAwfJ4zDc
from meshgpt-pytorch.
Here is the comparison, the left side is the generated one and on the right side it's the ground truth.
Only difference I see is that the corners of the generated one isn't rounded.
from meshgpt-pytorch.
@MarcusLoppe wait, those chairs are generated from the transformer? if you increase the temperature, can you get a chair that is dissimilar to the ones found in the dataset?
from meshgpt-pytorch.
@MarcusLoppe Now way, that is an amazing result. How much training time did it take and on which dataset?
from meshgpt-pytorch.
amazing. i'll start work on the reversible network + other efficient transformer tricks tomorrow
from meshgpt-pytorch.
amazing. i'll start work on the reversible network + other efficient transformer tricks tomorrow
Yep seem pretty nice, but don't get your hopes up since it's only the same chair x 200 times in the dataset but at least it's some-what proof of concept.
I'll try to modify the temperature.
@MarcusLoppe just curious, but are you an academic, independent researcher, startup founder? you got this working quite quickly!
Unemployed :D But it wasn't too hard to test it out.
Btw I got the error below with the lastest version when importing the MeshAutoencoder.
21 class DatasetFromTransforms(Dataset):
22 @beartype
23 def init(
24 self,
25 folder: str,
---> 26 transforms: Dict[str, Callable[Path, Tuple[Vertices, Faces]]]
TypeError: Expected a list of types, an ellipsis, ParamSpec, or Concatenate. Got <class 'pathlib.Path'>
from meshgpt-pytorch.
@MarcusLoppe overfitting to a small dataset is always the first step in "make it work" for deep learning. so this is good news
you mean funemployed π you are in the right place if you are trying to break into ML
oops, let me fix
from meshgpt-pytorch.
mmm. So does translating, rotating and scaling affect the results? The paper mentions they use that to get more data.
from meshgpt-pytorch.
@MarcusLoppe ok, let me know if that type error was fixed
from meshgpt-pytorch.
@fire yea, it is just standard data augmentations. you do this for any modality you train with
from meshgpt-pytorch.
@MarcusLoppe did you leave this flag on btw? you may have inadvertently proved out a twist to a latest quantization research if so
from meshgpt-pytorch.
@MarcusLoppe ah amazing. you proved out residual LFQ without knowing it. i can probably skip out on the stochastic sampling temperature annealing logic. that complexity is not needed anymore
thank you thank you
from meshgpt-pytorch.
ok, i'll step on the gas pedal a bit starting tomorrow
from meshgpt-pytorch.
from meshgpt-pytorch.
Would be interested to help write a paper if there's any novel results we discover here (https://github.com/lucidrains/meshgpt-pytorch).
from meshgpt-pytorch.
ok, i'll step on the gas pedal a bit starting tomorrow
Do you think that the ideas you got in mind will make it effective enough to train using 3D models with 4k triangles?
Most 3D models have at least 4k-12k triangles, I've been struggling finding low poly count models.
If I set the batch size at 1 and train using a 4k triangle model it will require 94GB VRAM.
If I want to train it only using 10 GB VRAM, it requires that the memory requirements is lowered by 10x for a 4k triangle model.
from meshgpt-pytorch.
I think using bitpacking techniques is possible since the positional coordinate has 1/128 quantization and 12k triangles is [less than] 2^16 vertices (65536).
I'll try out some math estimates.
from meshgpt-pytorch.
@MarcusLoppe Yeah I can make it work. I'm an expert in this arena
however, you should make sure flash attention is working properly as a first step. what type of GPU do you have?
from meshgpt-pytorch.
I think using bitpacking techniques is possible since the positional coordinate has 1/128 quantization and 12k triangles is [less than] 2^16 vertices (65536).
I'll try out some math estimates.
I might misunderstand but the issue lies in the token sequence since it uses 6 per face and the position coordinate doesn't take much space.
@MarcusLoppe Yeah I can make it work. I'm an expert in this arena
however, you should make sure flash attention is working properly as a first step. what type of GPU do you have?
I'm using kaggle's free GPU P100 (16GB).
I turned it off and the loss worsen pretty bad so it does something.
from meshgpt-pytorch.
Do you know if we can add an extension token that is a special kind of token that can be used to extend the functionality of a base token. Like encode the most common face sequences in one instead of 6n tokens.
Wikipedia:
A dictionary coder, also sometimes known as a substitution coder, is a class of lossless data compression algorithms which operate by searching for matches between the text to be compressed and a set of strings contained in a data structure (called the 'dictionary') maintained by the encoder. When the encoder finds such a match, it substitutes a reference to the string's position in the data structure.
Grammar-based codes or Grammar-based compression are compression algorithms based on the idea of constructing a context-free grammar (CFG) for the string to be compressed.
from meshgpt-pytorch.
@fire don't worry about it
more fruitful would be if you focused on the data portion, ie functions for converting all formats into the tensors needed for training, or augmentation
from meshgpt-pytorch.
I now support ".glb", ".gltf", ".ply", ".obj", ".stl" in the MeshDataset of the Github pull request #6.
from meshgpt-pytorch.
I wasnβt able to get all shapes to match numerically and in sorted order. I donβt think trying to solve the many imports problem will improve meshgpt but for data augmentation an approach is to do uniform scaling, rotation of a chair where the four legs still are on the floor and translation still on the floor. Like axis locked
from meshgpt-pytorch.
ok, the RQ transformer design has crystallized during my sleep. i think i can build it this morning, and bring the sequence length down by 6x (for starters). the hardest part of the whole thing is maintaining two kv caches for the hierarchical transformers
from meshgpt-pytorch.
@MarcusLoppe Can you help me out on this, as to what might be the issue that I am getting?
I set the transformer temperature to 0.1. I use a window mesh with sizes:
torch.Size([286, 3])
torch.Size([200, 3])
from meshgpt-pytorch.
@MarcusLoppe ok, it is done! 6x shorter sequence length. i haven't fixed the kv cache yet, so inference will be slow, but let me know if you can still overfit to your repeated chair dataset (and you can try some larger meshes too)
from meshgpt-pytorch.
@MarcusLoppe ok, it is done! 6x shorter sequence length. i haven't fixed the kv cache yet, so inference will be slow, but let me know if you can still overfit to your repeated chair dataset (and you can try some larger meshes too)
Awsome, i'll do some tests. But is it possible to also optimize the encoder? It takes 5GB space for 240 face seq, it's less then the transformer but still a limitation if you want to go for longer sequences
from meshgpt-pytorch.
@MarcusLoppe the autoencoder doesn't have full attention, so memory should scale linearly. I can bring in some tricks there later if needed
from meshgpt-pytorch.
@MarcusLoppe Can you help me out on this, as to what might be the issue that I am getting? I set the transformer temperature to 0.1. I use a window mesh with sizes: torch.Size([286, 3]) torch.Size([200, 3])
I've noticed it's very sensitive. I had to set the encoder learning rate to 1e-3 and the transformer to 1e-2 and train exactly 10 epochs with 200 examples per epoch.
Sometimes when I trained the transformer for 7 epochs it messed up the generation since the loss was high enough e.g 0.04 vs 0.003.
I hope this will resolve itself when it trains on a larger dataset since it can generalize better.
from meshgpt-pytorch.
I have to say is that it takes a lot of VRAM to run the trainer/transformer, its very compute intensive.
from meshgpt-pytorch.
success! out with doggo but will be back later
from meshgpt-pytorch.
@MarcusLoppe interesting re: autoencoder
want to try the latest version? turned off the linear attention
from meshgpt-pytorch.
@MarcusLoppe autoencoder should be even more efficient in the latest version
from meshgpt-pytorch.
@MarcusLoppe autoencoder should be even more efficient in the latest version
Did you test it? It's stuck at the start of training, it wont train with either my train function nor the forward function.
from meshgpt-pytorch.
@MarcusLoppe oh strange, it runs for me
nvm some indexing error, gimme 10
from meshgpt-pytorch.
runs for me now with the script below - also found a bug where it was auto deriving more face edges than there are, may explain why the memory was a bit high!
import torch
from meshgpt_pytorch import (
MeshAutoencoder,
MeshTransformer,
MeshAutoencoderTrainer,
MeshTransformerTrainer,
DatasetFromTransforms
)
from meshgpt_pytorch.data import (
derive_face_edges_from_faces
)
# autoencoder
autoencoder = MeshAutoencoder(
dim = 512,
encoder_depth = 6,
decoder_depth = 6,
num_discrete_coors = 128,
linear_attention = True
)
# mock dataset
from torch.utils.data import Dataset
class MockDataset(Dataset):
def __init__(self):
pass
def __len__(self):
return 100
def __getitem__(self, idx):
from random import randrange
return torch.randn(randrange(10, 20), 3), torch.randint(0, 10, (randrange(4, 8), 3))
trainer = MeshAutoencoderTrainer(
autoencoder,
dataset = MockDataset(),
batch_size = 2,
grad_accum_every = 2,
num_train_steps = 10,
checkpoint_every = 5,
accelerator_kwargs = dict(
cpu = True
)
)
trainer()
from meshgpt-pytorch.
yes!
from meshgpt-pytorch.
thanks for sharing the linear attention results
I thought about it and decided to update where they are placed. If that doesn't work, we can just go with local attention
from meshgpt-pytorch.
@MarcusLoppe thank you
decided to switch it to local attention, the conservative choice
from meshgpt-pytorch.
Do you think adding some dropout in the resnet will improve the performance?
yup, it could help for the autoencoder in general, should be available and customizable!
from meshgpt-pytorch.
main issue should be resolved
from meshgpt-pytorch.
Related Issues (20)
- when `transformer.generate(prompt=None)`,empty code is passed to the decoder. Error!! HOT 14
- Is there a pretrained model and if not, how to train the model HOT 12
- Question HOT 10
- Correction in the meshgpt paper.
- text_condition_model_types
- Just to get started HOT 2
- Sliding window for transformer HOT 1
- Mesh intra face vertex id ordering convention HOT 16
- Classifier-Free Guidance, cond_drop_prob=1.0, attn_mask=False: Error!!! HOT 2
- GroupNorm in ResnetBlocks HOT 13
- TypeError: MessagePassing.__init__() got an unexpected keyword argument 'sageconv_dropout' HOT 12
- ImportError: cannot import name 'MeshDataset' from 'meshgpt_pytorch' HOT 5
- Loss=nan when training transformertrainer HOT 5
- distirbution training HOT 1
- Mesh completion task HOT 2
- A question about the code in 'meshgpt_pytorch.py'
- `ResidualLFQ` was successful, but `ResidualVQ` failed severely! HOT 2
- High RAM cost in derive_face_edges_from_faces
- Pretrained checkpoints? HOT 2
- Mesh conditioning instead of text conditioning HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from meshgpt-pytorch.