Giter VIP home page Giter VIP logo

Comments (13)

MarcusLoppe avatar MarcusLoppe commented on June 21, 2024 3

@lucidrains @shanemankiw

At the start of the project I tested using smaller group sizes and even used layernorm before while testing but it seems like GroupNorm was better due to better loss improvements.

But I encountered the issue that 90% of the models perfect reconstructed and massively screw up the rest.
I was thinking it might have been some shapes were easier to generalize then others.
But it might have been due to some of the shapes where being normalized too much so anything outside the 'norm' of the batch average got normalized till it got squished.
An example of this was that the dataset contains many similar looking thick chairs and tables, but when it came to the models that are bit of the norm, like one legged chair or super thin glass table it messed them up quite bit.

I've been testing now with layernorm and it seems like that issue is gone!
The 'catastrophe forgetting' is no longer and problem and it seems like even using a 2k codebook manages to store the shapes accuracy!
Using such a small codebook It might even mean that I can release a little demo since the transformer wont take too much time training. After that I'll start right away with the holodeck.

Here is a example of the 'catastrophe forgetting' ( i think it had like 0.4 loss with a bit bigger parameter count (33M+), probably 12h+ training):
bild

vs
Training using (almost) the same parameters as the paper (15M), this would never have worked in the past:
0.43 loss after 4hrs:
bild

PS: Here is the parameters, I used a smaller embedding dims since using a larger ones causing some problems (the default creates a total embedding space of about 840 vs the paper 192).

num_layers = 23 
autoencoder = MeshAutoencoder(     
        decoder_dims_through_depth =  (128,) * 3 + (192,) * 4 + (256,) * num_layers + (384,) * 3, 
        dim_codebook = 192 , 
        codebook_size = 2048, 
        dim_area_embed = 16,
        dim_coor_embed = 16,
        dim_normal_embed = 16,
        dim_angle_embed = 8
    )

from meshgpt-pytorch.

lucidrains avatar lucidrains commented on June 21, 2024 2

@shanemankiw thank you for those results 🙏 i've made the change in 1.1.0

just in the nick of time! alright, time to get back to those emails. go make the holodeck happen 😉

from meshgpt-pytorch.

shanemankiw avatar shanemankiw commented on June 21, 2024 1

Sure I will try it! But I will have to get back to you after I wake up in the morning, in no less than 7-8 hours...
I highly suspect the overfitting results would be similar, at least on my humble small dataset. From my understanding, once the norm dimension is over the 128 feature dimension and has nothing to do with the sequence length dimension, the results should be fine. But let's wait for the results.
btw, why do you favor pixelnorm over layernorm? I am not very familiar with pixelnorm's advantages

from meshgpt-pytorch.

shanemankiw avatar shanemankiw commented on June 21, 2024 1

Thank you for the heads up about pixelnorm, very informative!
The experiment results are indeed similar, the loss curves look almost the same(ignore the 100/280 difference in the run name...):
image
And I saw the mesh reconstructions, they are flawless as well.

from meshgpt-pytorch.

shanemankiw avatar shanemankiw commented on June 21, 2024 1

@lucidrains Thanks for your efforts!

@MarcusLoppe Thanks for the experiments! The 'catastrophe forgetting' problem you talked about is precisely the problem that made me start debugging. You would think that a model this size could figure out a way to overfit on a few hundred meshes, but it always fail on around 10% of the cases. In the paper, MeshGPT could achieve 98% accuracy on even the test set, so this is definitely not normal...
The thing about the loss is that, even if you can achieve a low loss under GroupNorm at batchsize>1, the output would not be the same during evaluation at batchsize=1.

from meshgpt-pytorch.

lucidrains avatar lucidrains commented on June 21, 2024

Hi Jionghao and thanks for the kind words

Your results line up with some papers I've been reading recently. Could you try a pixelnorm, either in place of the groupnorm, or in the direct main path, and see if it leads to comparable results to your layernorm run? Today is my last day open sourcing, but I can throw this last change if you get the experiments to me in time

from meshgpt-pytorch.

lucidrains avatar lucidrains commented on June 21, 2024

@shanemankiw there's a trend in transformers to remove the mean centering in layernorms (rmsnorm), so it lines up with Tero Karras' usage of pixelnorm

from meshgpt-pytorch.

lucidrains avatar lucidrains commented on June 21, 2024

@MarcusLoppe awesome! thanks for the corroboration!

you should switch into the field.. i really think you have a lot of potential

even your name is initialed ML lol

PS: i'm not kidding about the holodeck. in a decade, mark my words

from meshgpt-pytorch.

MarcusLoppe avatar MarcusLoppe commented on June 21, 2024

@MarcusLoppe Thanks for the experiments! The 'catastrophe forgetting' problem you talked about is precisely the problem that made me start debugging. You would think that a model this size could figure out a way to overfit on a few hundred meshes, but it always fail on around 10% of the cases. In the paper, MeshGPT could achieve 98% accuracy on even the test set, so this is definitely not normal... The thing about the loss is that, even if you can achieve a low loss under GroupNorm at batchsize>1, the output would not be the same during evaluation at batchsize=1.

In the above I used 150 chairs and 150 tables and augmented each x50 times, so the dataset is 15000 meshes.
I ran a test for the 15000 meshes and got the MAE results: Avg: 0.004, Min: 0.0028, Max: 0.016
I ran the code below to calculate the MAE, as you can see it's not batch processing.

I'm pretty sure that it's possible to get great results as in the paper.

for item in tqdm(dataset.data, desc="Processing samples"):
    codes = autoencoder.tokenize(
                vertices = item['vertices'],
                faces = item['faces'],
                face_edges = item['face_edges']
    ) 
    codes = codes.flatten().unsqueeze(0)
    codes = codes[:, :codes.shape[-1] // 2 * 2]

    coords, mask = autoencoder.decode_from_codes_to_faces(codes)
    orgs = item['vertices'][item['faces']].unsqueeze(0)
    abs_diff = torch.abs(orgs.view(-1, 3).cpu() - coords.view(-1, 3).cpu()) 
    mae = torch.mean(abs_diff)

While running I stored the worse and best results, in the image you'll see the row the best and the second the worse, the rest is 40 random samples.
In my book, that is a perfect result!
bild

Here is the worst mesh, you can see some defects but that's pretty good after a few hours training!
bild

I trained across 16 different categories with 50 models (800 models total) each and augmented them x100 times (80k meshes), I let it run for about 10hrs and got 0.5 mse loss. The results usually gets good at 0.4 loss so some fragments are expected.
I used a 2k codebook size to test if the chair and tables where just so simple shapes it could be compressed into a small codebook, but it seems like even loads of different shapes can be compressed!
Although a hint that the codebook size is bit is that the commit loss was high when I restarted the training run, it usually gets lower after a training in the same session for a while.

bild

from meshgpt-pytorch.

MarcusLoppe avatar MarcusLoppe commented on June 21, 2024

@MarcusLoppe awesome! thanks for the corroboration!

you should switch into the field.. i really think you have a lot of potential

even your name is initialed ML lol

PS: i'm not kidding about the holodeck. in a decade, mark my words

I've heard about the ring attention in the last week in AI podcast, it seems like they used with sparse attention and some dozen of other small things. I'm not quite sure about it lives up to the hype, in the testing I see that they ask it about one thing in the context window, what if you ask it a abstract question which it needs to find 10-20 needles in the haystack/context window? 😕

Maybe, I'm not a ML programmer or know how to debug a model as @shanemankiw did, if I did I've might've been able to resolve this issue a long time ago :(

But I like using and training them in my software, for example I used Mistral-7B to extract and output the requirements from job adverts in JSON. I was extracting informations such as hard skills, soft skills, certifications, company culture, education and other qualifications.
It's not perfect but i got like 89k labels from like 4k job adverts in 12 hours.
Then I extracted the data from the JSON output and fine-tune Reranker on the different labels and using unmatched bread-text or the other labels as negatives.

I then convert it to a ONNX model used it in my asp .net backend and made a little nice react front-end this way you can quickly sift through many job ads so you don't need to waste time and read the whole thing until you'll realize they want 7+ years of experience :)

Notice the 'job duties' it marked? :) It knows too much 😨
bild

from meshgpt-pytorch.

shanemankiw avatar shanemankiw commented on June 21, 2024

@MarcusLoppe Your results are great! Thank you so much for sharing. All of these with only 2k codebook? This thing sure has a lot of potential.
btw I don't know if I am qualified to say this, but I concur with all the nice things @lucidrains said about you. My tests on this project could not have gone anywhere without your notebook demo! Besides, the way you design and present your experiments are fantastic, and your results in multiple issues have been extremely helpful.

from meshgpt-pytorch.

MarcusLoppe avatar MarcusLoppe commented on June 21, 2024

@MarcusLoppe Your results are great! Thank you so much for sharing. All of these with only 2k codebook? This thing sure has a lot of potential. btw I don't know if I am qualified to say this, but I concur with all the nice things @lucidrains said about you. My tests on this project could not have gone anywhere without your notebook demo! Besides, the way you design and present your experiments are fantastic, and your results in multiple issues have been extremely helpful.

Correct only using 2k,
Here it is at 0.42 loss, there is some fragments but there was still some room for improvements for the loss
During a 11hr run using kaggle's free P100 I went from: 0.45 @ 0.8 commit loss to 0.4235 @ 0.58 commit loss.
I think this means that it can still compress the meshes some more.

Thank you very much :) I appreciate yours and @lucidrains comments, not many people in real life cares about this so it's refreshing and heart warming to get some compliments :)

https://file.io/Mpg7AoUYoBgC (the mse_rows(63) contains the original model plus the reconstructed)
bild

from meshgpt-pytorch.

MarcusLoppe avatar MarcusLoppe commented on June 21, 2024

@shanemankiw
I got some strange results...
I was thinking of how AlphaGeometry managed to get results with a relative small model, it has a vocab of 757 tokens and 1024 context window.
They were talking how they tested with small vocab size due to compress the information to reduce the complexity.
So I used 400 meshes (less then 250 faces) from 16 categories and augmented them x50, resulting with about 20k meshes.

Then I tested using 128 codebook size and I had great success, 0 fragments and took like 2hrs to reach 0.44 loss.
The commit loss was consistently low so I guess to reach any sort of good results requires you to somehow estimate the codebook size. It sort of explains the bad results and that you need to adjust the model to the dataset.

You'll probably need a bigger codebook for more meshes but when dealing with testing/smaller dataset it's probably better to use a smaller codebook size.

bild

from meshgpt-pytorch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.