Giter VIP home page Giter VIP logo

Comments (18)

dbickson avatar dbickson commented on July 17, 2024

Hi @lolongcovas we most likely have a bug in the latest released version 0.125 which version are you using? The bug happens when there are numerical errors for a few images and then the image order is messed up. Did you get any printouts of the type "bug on XX actual XX pos XX dist" ? Also which OS?

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

p.s. Can you try to install version 0.123 and let me know if it works for you?

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

Ubuntu 20.04,

Fastdup version:

In [16]: fastdup.__version__
Out[16]: '0.125'

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

hi @lolongcovas I have just released v. 126 for ubuntu 20.04 can you try it out and let me know if this works? You can upgrade using python3.8 -m pip install -U --force-reinstall fastdup

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

with 0.123 still gives this results
image

i am going to try with the v.126

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

hi @lolongcovas I have just released v. 126 for ubuntu 20.04 can you try it out and let me know if this works? You can upgrade using python3.8 -m pip install -U --force-reinstall fastdup

Sorry, could u compile it for ubuntu 18.04?

Thanks

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

I just released v 0.126 for ubuntu 18.04 as well. Please try it out and if it does not work, send me the output of your run when running with verbose=1. Also let me know if lower on the html there are identical images? Or eveything is messed up?

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

still seeing the issue. does the fastdup prevent invalid images?

Replacing lower threshold 0.05 with position1899997 top_k.size() 1999996 loc pos: 0.8098 last pos: 00.95 1.9e+06
Total time took 5192278 ms
Found a total of 108835 fully identical images (d>0.990), which are 3.63 %
Found a total of 48956 nearly identical images(d>0.980), which are 1.63 %
Found a total of 1161555 above threshold images (d>0.900), which are 38.72 %
Found a total of 99997 outlier images         (d<0.050), which are 3.33 %
Min distance found 0.000 max distance 1.000

the 999_998, might we missed 1 image? and that does all fail?

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

If there was a bad image you should see a file named atrain_features.bad.csv under the work_dir with the list of bad images filenames.
I have no clue what is happening, I think you may have a numerical error, I need the full output, did you see something printed like : "bug on...."? I will be happy to setup a zoom session for tomorrow to debug what is going on.

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

the log is huge, i didnt store it as a file. But the bug u mention is here:

KNN results                                                                                                                    
    0 : 1.00000    14 : 1.00000 814470 : 0.93696                                                                              
    1 : 1.00000 606001 : 0.90209 502470 : 0.88668                                                                             
    2 : 1.00000 955562 : 0.87797 271322 : 0.87653                                                                            
    3 : 1.00000 733775 : 0.97630 351683 : 0.95423                                                                             
    4 : 1.00000 139433 : 0.86121 586396 : 0.85914                                                                             
    5 : 1.00000 131656 : 0.91445 814949 : 0.91100                                                                             
    6 : 1.00000 408403 : 0.88667 948122 : 0.88571                                                                             
    7 : 1.00000 75745 : 0.94520 266015 : 0.94113
    8 : 1.00000 388340 : 0.88984 806809 : 0.88518
    9 : 1.00000 241051 : 0.83276 788561 : 0.82800
Bug on 104675 1 I 441151 actual 999999 pos 314026 k 3 dist -0.023188
Bug on 104675 2 I 677748 actual 999999 pos 314027 k 3 dist -0.029119
Bug on 214965 1 I 638478 actual 999999 pos 644896 k 3 dist -0.002828
Bug on 214965 2 I 939452 actual 999999 pos 644897 k 3 dist -0.003723
Bug on 249287 1 I 948374 actual 999999 pos 747862 k 3 dist -0.004687
Bug on 249287 2 I 750071 actual 999999 pos 747863 k 3 dist -0.006296
Bug on 394579 1 I 53821 actual 999999 pos 1183738 k 3 dist -0.001735
Bug on 394579 2 I 273531 actual 999999 pos 1183739 k 3 dist -0.002979
Bug on 397983 1 I 929964 actual 999999 pos 1193950 k 3 dist -0.064297
Bug on 397983 2 I 32925 actual 999999 pos 1193951 k 3 dist -0.064344
Bug on 936462 1 I 538900 actual 999999 pos 2809387 k 3 dist -0.048341
Bug on 936462 2 I 62374 actual 999999 pos 2809388 k 3 dist -0.058699

and also I found 1 image is failed. Now I am recomputing for the rest.

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

The "bug on" messages are 6 images out of 1M which got into numerical error and thus got distance = 0 those you should ignore. I am still not sure about the duplicates which got 1 but are not identical. Can you send me the top 3 rows of images that are shown as duplicates but are not? I want to run the computation on my side, all the tests pass on my machines (ubuntu 18+20+mac m1) I was not able to reproduce this error yet.

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

after removing that bad image, i got this duplicates:
image
still have the issue.
the log:

FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
Going to loop over dir /tmp/zara.txt                                  
Found total 999998 images to run on                                   
libpng warning: sRGB: out of place                 ] 31% Estimated: 66 Minutes
libpng warning: sRGB: out of place                 ] 42% Estimated: 54 Minutes
Wrote total of 999998 features , found 0 bad images] 100% Estimated: 0 Minutes
Found total 999998 images to run on                                   
                                                                      
111037) Finished write_index() NN model                               
Stored nn model index file ../fastdup/out/nnf.index                   
Bug on 388602 1 I 169647 actual 999998 pos 1165807 k 3 dist -0.132486 
Bug on 388602 2 I 255365 actual 999998 pos 1165808 k 3 dist -0.138604 
Bug on 390441 1 I 774903 actual 999998 pos 1171324 k 3 dist -0.017260 
Bug on 390441 2 I 161661 actual 999998 pos 1171325 k 3 dist -0.017907 
Bug on 568423 1 I 659105 actual 999998 pos 1705270 k 3 dist -0.044372 
Bug on 568423 2 I 755145 actual 999998 pos 1705271 k 3 dist -0.044996 
Bug on 596245 2 I 150585 actual 999998 pos 1788737 k 3 dist -0.000455 
Bug on 779143 1 I 2287 actual 999998 pos 2337430 k 3 dist -0.058276
Bug on 779143 2 I 62638 actual 999998 pos 2337431 k 3 dist -0.058675
Bug on 946843 1 I 197592 actual 999998 pos 2840530 k 3 dist -0.047367    
Bug on 946843 2 I 474090 actual 999998 pos 2840531 k 3 dist -0.049710    
1659980440 : INFO:     (add_vertices:460): Num vertices for group 0: 999998 
1659980440 : INFO:     (commit_edge_buffer:609): In commit edge buffer (0,0)
1659980440 : INFO:     (commit_edge_buffer:680): Shuffling edges ...
1659980440 : INFO:     (commit_edge_buffer:688): Done shuffling edges in 0.04235 secs
1659980440 : INFO:     (commit_edge_buffer:692): Aggregating unique vertices...
1659980440 : INFO:     (commit_edge_buffer:705): Done aggregating unique vertex in 0.022127 secs
1659980440 : INFO:     (commit_edge_buffer:713): Combine vertex data
1659980440 : INFO:     (commit_edge_buffer:779): Done phase 2 in 0.062082 secs                         
1659980440 : INFO:     (commit_edge_buffer:787): Rename id columns 
1659980440 : INFO:     (commit_edge_buffer:890): Done in 0.131735 secs                                 
1659980440 : INFO:     (commit_edge_buffer:892): Finish committing edge in 0.258471 secs
1659980440 : INFO:     (add_edges:584): Num vertices for group 0: 999998
Num vertices for group 0: 999998
Num edges 0 -> 0: 117374
1659980441 : PROGRESS: (_p:516): +-----------------------------+
1659980441 : PROGRESS: (_p:516): | Number of components merged |
1659980441 : PROGRESS: (_p:516): +-----------------------------+
1659980441 : PROGRESS: (_p:516): | 72425                       |
1659980441 : PROGRESS: (_p:516): | 0                           |
1659980441 : PROGRESS: (_p:516): +-----------------------------+
1659980442 : PROGRESS: (triple_apply_pagerank:69): Counting out degree
1659980442 : PROGRESS: (triple_apply_pagerank:78): Done counting out degree
1659980442 : PROGRESS: (_p:516): +-----------+-----------------------+
1659980442 : PROGRESS: (_p:516): | Iteration | L1 change in pagerank |
1659980442 : PROGRESS: (_p:516): +-----------+-----------------------+
1659980442 : PROGRESS: (_p:516): | 1         | 771632                |
1659980442 : PROGRESS: (_p:516): | 2         | 4258.05               |
1659980442 : PROGRESS: (_p:516): | 3         | 2664.38               |
1659980442 : PROGRESS: (_p:516): | 4         | 1918.96               |
1659980442 : PROGRESS: (_p:516): | 5         | 1438.16               |
1659980442 : PROGRESS: (_p:516): | 6         | 1133.33               |
1659980442 : PROGRESS: (_p:516): | 7         | 907.308               |
1659980442 : PROGRESS: (_p:516): | 8         | 737.427               |
1659980442 : PROGRESS: (_p:516): | 9         | 603.873               |
1659980442 : PROGRESS: (_p:516): | 10        | 499.224               |
1659980442 : PROGRESS: (_p:516): | 11        | 414.308               |
1659980442 : PROGRESS: (_p:516): | 12        | 346.034               |
1659980442 : PROGRESS: (_p:516): | 13        | 289.312               |
1659980442 : PROGRESS: (_p:516): | 14        | 242.876               |
1659980442 : PROGRESS: (_p:516): | 15        | 203.985               |
1659980442 : PROGRESS: (_p:516): | 16        | 171.77                |
1659980443 : PROGRESS: (_p:516): | 17        | 144.69                |
1659980443 : PROGRESS: (_p:516): | 18        | 122.08                |
1659980443 : PROGRESS: (_p:516): | 19        | 103.042               |
1659980443 : PROGRESS: (_p:516): | 20        | 87.0725               |
1659980443 : PROGRESS: (_p:516): +-----------+-----------------------+
Wrote total of 999998 components
Total time took 5673428 ms
Found a total of 66513 fully identical images (d>0.990), which are 2.22 %
Found a total of 49974 nearly identical images(d>0.980), which are 1.67 %
Found a total of 1162877 above threshold images (d>0.900), which are 38.76 %
Found a total of 99999 outlier images         (d<0.050), which are 3.33 %
Min distance found 0.001 max distance 1.000

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

Hi @lolongcovas we just released another version 0.127 for ubuntu 18 please try again and let us know if this works!!

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

now with 0.127 worked. thanks!

from fastdup.

lolongcovas avatar lolongcovas commented on July 17, 2024

hi, sorry again. on the position 187 i found wrong duplicates:
image
however, 184 and 189 are ok.

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

This is totally strange and I can't reproduce it on my side. Are you open to setting up a zoom meeting or communicating in the slack channel so we can try and reproduce it together? Once I reproduce the problem I am sure I can solve it. But it is hard to reproduce without having the data.

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

hi @lolongcovas we have released version 0.130 which tries to fix the issue observed, please try it out.

from fastdup.

dbickson avatar dbickson commented on July 17, 2024

Solved.

from fastdup.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.