Giter VIP home page Giter VIP logo

Comments (8)

xichenpan avatar xichenpan commented on August 26, 2024 1

While the second point can be different.
We using the 1st and 2nd raw images to properly generate 3nd image.

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024 1

You can refer to the Section 3.2 and Figure 2.a of our paper. https://arxiv.org/abs/2211.10950

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@KyonP yeah, this line provides all images to the model during training to perform teacher forcing

from arldm.

KyonP avatar KyonP commented on August 26, 2024

Oh, I see. 😄

So, when generating the 3rd image from the story sequence, ARLDM is given both 1st and 2nd raw (true) images?
(1st to 3rd raw images to generate 4th image?)

Assuming from your mentioned "teacher forcing," using the 1st and 2nd raw images to properly generate 2nd image, which is given to synthesize 3rd image.

Is my understanding correct?

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@KyonP Yeah, exactly!

from arldm.

KyonP avatar KyonP commented on August 26, 2024

Thanks. 😅

So, during the generation of the 3rd image (in Figure 2.A), the 2nd raw image is given to the auto-regressive process (long arrow on the right-hand side) to force teaching?

Can you give me a link to where the teacher forcing occurring? Maybe within Unet?

BTW, thank you for your speedy reply, I didn't expect it 😄

from arldm.

xichenpan avatar xichenpan commented on August 26, 2024

@KyonP Yes. And the teacher forcing is implemented through attention mask.

ARLDM/main.py

Lines 218 to 221 in 5b03fc4

square_mask = torch.triu(torch.ones((V, V), device=self.device)).bool()
square_mask = square_mask.unsqueeze(0).unsqueeze(-1).expand(B, V, V, S)
square_mask = square_mask.reshape(B * V, V * S)
attention_mask[:, -V * S:] = torch.logical_or(square_mask, attention_mask[:, -V * S:])

Which is passed into Unet through:

ARLDM/main.py

Line 231 in 5b03fc4

noise_pred = self.unet(noisy_latents, timesteps, encoder_hidden_states, attention_mask).sample

from arldm.

KyonP avatar KyonP commented on August 26, 2024

thanks, I will look into it! 👍

from arldm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.