Giter VIP home page Giter VIP logo

carl_code's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

carl_code's Issues

torch.fx

torch.fx is released over pytorch 1.6.0, which is used in this project.

CARL embedding for single images

Hello, thank you for sharing your work.

I wanted to use the model trained with the CARL method to obtain an embedding for singular images (not part of a video).

Would it fine if I input the image with size (batch size, T, 3, 224, 224) with T = 1 to the model?

For a video input of size (batch size, T, 3, 224, 224) with T > 1, I observed a difference between embedding obtained from inputing the video at once or inputing every frame of the video one by one.

Question on Performance Reporting

Hi, I've been experimenting with your code and I've noticed that some of the metrics (specifically event_completion and retrieval) sometime peak in earlier epochs and then fall in later epochs. I am training 300 epochs and evaluating every 50 epochs as is the default for the configs.

I just wanted to check: for your results in "Frame-wise Action Representations for Long Videos via Sequence Contrastive Learning" do you report the metrics for the "best" intermediate epoch per metric, or do you always report the results for the final epoch (300)? I searched over the paper and I could not find an explicit answer to this, so I wanted to ask here. Thanks!

Request for Hardware and Runtime Information

Hello, if possible, could the authors provide recommended hardware (CPU/GPU counts) and expected runtimes for the jobs described in the "Training" section? I am attempting to run these jobs and data loading seems to be a major bottleneck for my runtime. For example, training one epoch on Penn Action is taking about 30 minutes each. Is this normal?

Problem downloading from Baidu

Hi
I'm trying to download the finegym dataset from Baidu
But it keeps prompting the dialog that wants me to select some rpm or deb file
image
It is worth mentioning that I cannot read nor understand Chinese :(

If it is not too big say up to 1 Gb I can send you a link to my drive

Thnx
Moshe

code confusion and inconsistent reproduction results

Congratulations, you've done a fantasitic job. I try to reproduce your work, but find some problems.
First, i noticed that the parameter DATA.SAMPLING_REGION in *_config.yml refer to alpha variant in your paper., and block_size in dataset class's sample_frame function means sample size like Pouring. Should it be the production of expand_ratio and num_frames rather than expand_ratio and seq_len like

def sample_frames(self, seq_len, num_frames, pre_steps=None):
        ...
        elif sampling_strategy == 'time_augment':
            num_valid = min(seq_len, num_frames)
            expand_ratio = np.random.uniform(low=1.0, high=self.cfg.DATA.SAMPLING_REGION) if self.cfg.DATA.SAMPLING_REGION>1 else 1.0
            # block_size = math.ceil(expand_ratio*seq_len)
            block_size = math.ceil(expand_ratio*num_frames)

Second, i run training process in Pouring Dataset but only got 90.3% classification accuracy. It's less than 93.73% reported in your paper. Can you give some advice ? I use batch_size = 4, embedde_model.num_layers = 3, other parameter is the same with scl_transformer_config in your repo.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.