Giter VIP home page Giter VIP logo

Comments (15)

Parskatt avatar Parskatt commented on May 30, 2024

Hi!

I think you are mistaken regarding the number of training pairs in LoFTR. Could you tell me where you get the number 100 from?
From my understanding they are using the pairs in here: https://drive.google.com/drive/folders/1SrIn9WJ1IuG08yh2nEvIsLftXHLrrIwh

And they load those npz files into this dataset loader: https://github.com/zju3dv/LoFTR/blob/master/src/datasets/megadepth.py

Running the following code:

sampled_pair_files = [f for f in open("trainvaltest_list/train_list.txt","r").read().split("\n") if len(f) > 0]
num_pairs = 0
for scene_name in sampled_pair_files:
    scene = np.load(f"scene_info_0.1_0.7/{scene_name}.npz",allow_pickle=True)
    scene_pairs = len(scene['pair_infos'])
    num_pairs = num_pairs + scene_pairs
print(num_pairs)

yields 8862673. So they have around 9 million unique pairs.
For Scannet they use the same procedure as SuperGlue and end up with 240M pairs.

The main reason that we don't follow this exact procedure is that:

  1. We believe our approach is more modular, and easier to modify for someone looking to improve the sampling, or put focus on certain overlaps.
  2. More transparent in how the pairs are sampled exactly.

However, we did not find that our sampling procedure produces better results after training than the original version on the benchmarks.

from dkm.

noone-code avatar noone-code commented on May 30, 2024

Well, take it easy.
I just wanna to explore the influence of training data numbers on performance.
I think you did not realize the n_samples_per_subset parameters in the sampler.py, it is set to 100 in the megadepth datasets.
So, it is exactly the LoFTR only use the 15300 pairs of images.

from dkm.

Parskatt avatar Parskatt commented on May 30, 2024

Hi again!
I hadn't seen that detail before :)

From reading their implementation:
https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py#L5
it says the following:

Random sampler for ConcatDataset. At each epoch, n_samples_per_subset samples will be draw from each subset
in the ConcatDataset.

I'm guessing they run more than 1 epoch? Hence the correct number should be 368 * 100 * num_epochs?

from dkm.

noone-code avatar noone-code commented on May 30, 2024

In fact, they will not sample each epoch, as the reload_dataloaders_every_epoch=False in the train.py.
If reload_dataloaders_every_epoch=True, the sampler will resample each epoch.

from dkm.

Parskatt avatar Parskatt commented on May 30, 2024

Aha, got it. However, they also use 64 GPUs which I guess means that each GPU gets its own sampler?

from dkm.

Parskatt avatar Parskatt commented on May 30, 2024

My general guess is that they found that the exact specifics of the sampling was not very important for the final performance?

from dkm.

noone-code avatar noone-code commented on May 30, 2024

I don’t know why they do not resample training data each epoch, it is better to ask the author.
While for the sampling, as self.generator = torch.manual_seed(seed) is fix the generator, hence it fix the sample results, and the sampled indices will uniformly assign to each gpu.
Even each gpu sample it by self, while as the generator is fixed, so they still get the same sample indices.

from dkm.

Parskatt avatar Parskatt commented on May 30, 2024

I find it hard to believe that each GPU would sample the exact same indices, but I'm not completely familiar with their exact sampling. I'll run their code to get a better understanding.

I'll get back to you after I have done this so that we can have a more informed discussion.

from dkm.

noone-code avatar noone-code commented on May 30, 2024

Yep, anyway, I think DKM is a good method as its impressive results, I like it.
One more question, could I use DKM on 1920*1080 images when I training it on other size like, 520*720 ?

from dkm.

Parskatt avatar Parskatt commented on May 30, 2024

Yes. We don't have a perfectly clean way of doing it but there are two alternatives:

  1. Set the internal dimensions (we always resize to a fixed size, so you can change this resolution to your desire, note however that the method may become quite slow for large images)
  2. Keep the internal dimension to (540,720) but upsample the prediction by rerunning the final layer

In the model zoo:

def DKMv3_outdoor():
"""
Loads DKMv3 outdoor weights, uses internal resolution of (540, 720) by default
resolution can be changed by setting model.h_resized, model.w_resized later.
Additionally upsamples preds to fixed resolution of (864, 1152),
can be turned off by model.upsample_preds = False
"""
weights = torch.hub.load_state_dict_from_url(weight_urls["DKMv3"]["outdoor"])
return DKMv3(weights, 540, 720, upsample_preds = True)

you can see some api for changing these variables. However, right now the "upsample_preds" variable is autoset to use (864,1152) see:

hs, ws = 864,1152

You can change this hardcoding, so that its settable. If you were to change it, please submit a pullrequest so I can update the code. There is a lot of mess so I'd appreciate it.

from dkm.

noone-code avatar noone-code commented on May 30, 2024

Wow, thank you so much.
My questions are addressed.
Thank you.
🌹

from dkm.

Parskatt avatar Parskatt commented on May 30, 2024

@noone-code
Ok I ran LoFTRs training code and here's how I think it works:

They use RandomConcatSampler : https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py

When using distributed they split the scenes over the GPUS, each GPU gets 384//world_size scenes. The GPUS are initialized with a seeded generator. Their iter method, defined here:

https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py#L44

Samples 100 pairs for each scene and shuffles them. This defines 1 epoch for each worker.

The next epoch this is done again. Note that the module is not reinitizalized. Hence the state of the generator is different from the first epoch. Therefore, the 100 pairs the second epoch will be different from the first.

I think I got this correct, went through their code by debugging the train.py function on two GPUs using their standard outdoor training setting. However I have not actually ran a full epoch yet, so I might have misunderstood something. However, if you look at their comment:
https://github.com/zju3dv/LoFTR/blob/4feac496c1eacebc49ce53793039a8162930935e/src/datasets/sampler.py#L15

This leads me to believe that they are aware of the potential issue with repeated samples, and therefore make sure not to reinitialize it.

Please let me know if I got anything wrong, it might be the case that I misunderstood something.

from dkm.

noone-code avatar noone-code commented on May 30, 2024

I find the code local_npz_names = get_local_split(npz_names, self.world_size, self.rank, self.seed)
I agree that LoFTR assign different scenes to each gpu, and each gpu will sample 100 pairs images at each epochs.
So, actually, the LoFTR uses upper to 384(scenes)*100(samples each scene)*30(epoch)=1,152,000 pairs images ?
Is it correct ?

from dkm.

Parskatt avatar Parskatt commented on May 30, 2024

I think so! (but not sure)

There is some potential that our sampling may yield slightly better (or worse) results compared to theirs if used in DKM. Of course, our method has been developed using our sampling and theirs with theirs, so it might be the case that both would degrade using the others sampling ;)

In conclusion, I would say that since they do sample quite a lot of pairs, they are comparable to us, however it would of course be interesting to investigate a bit more deeply how to sample good pairs for training of feature matchers.

from dkm.

noone-code avatar noone-code commented on May 30, 2024

Yes, finally, thank you so much.

from dkm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.