Comments (18)
Hi @lolongcovas we most likely have a bug in the latest released version 0.125 which version are you using? The bug happens when there are numerical errors for a few images and then the image order is messed up. Did you get any printouts of the type "bug on XX actual XX pos XX dist" ? Also which OS?
from fastdup.
p.s. Can you try to install version 0.123 and let me know if it works for you?
from fastdup.
Ubuntu 20.04,
Fastdup version:
In [16]: fastdup.__version__
Out[16]: '0.125'
from fastdup.
hi @lolongcovas I have just released v. 126 for ubuntu 20.04 can you try it out and let me know if this works? You can upgrade using python3.8 -m pip install -U --force-reinstall fastdup
from fastdup.
with 0.123 still gives this results
i am going to try with the v.126
from fastdup.
hi @lolongcovas I have just released v. 126 for ubuntu 20.04 can you try it out and let me know if this works? You can upgrade using
python3.8 -m pip install -U --force-reinstall fastdup
Sorry, could u compile it for ubuntu 18.04?
Thanks
from fastdup.
I just released v 0.126 for ubuntu 18.04 as well. Please try it out and if it does not work, send me the output of your run when running with verbose=1. Also let me know if lower on the html there are identical images? Or eveything is messed up?
from fastdup.
still seeing the issue. does the fastdup prevent invalid images?
Replacing lower threshold 0.05 with position1899997 top_k.size() 1999996 loc pos: 0.8098 last pos: 00.95 1.9e+06
Total time took 5192278 ms
Found a total of 108835 fully identical images (d>0.990), which are 3.63 %
Found a total of 48956 nearly identical images(d>0.980), which are 1.63 %
Found a total of 1161555 above threshold images (d>0.900), which are 38.72 %
Found a total of 99997 outlier images (d<0.050), which are 3.33 %
Min distance found 0.000 max distance 1.000
the 999_998, might we missed 1 image? and that does all fail?
from fastdup.
If there was a bad image you should see a file named atrain_features.bad.csv under the work_dir with the list of bad images filenames.
I have no clue what is happening, I think you may have a numerical error, I need the full output, did you see something printed like : "bug on...."? I will be happy to setup a zoom session for tomorrow to debug what is going on.
from fastdup.
the log is huge, i didnt store it as a file. But the bug
u mention is here:
KNN results
0 : 1.00000 14 : 1.00000 814470 : 0.93696
1 : 1.00000 606001 : 0.90209 502470 : 0.88668
2 : 1.00000 955562 : 0.87797 271322 : 0.87653
3 : 1.00000 733775 : 0.97630 351683 : 0.95423
4 : 1.00000 139433 : 0.86121 586396 : 0.85914
5 : 1.00000 131656 : 0.91445 814949 : 0.91100
6 : 1.00000 408403 : 0.88667 948122 : 0.88571
7 : 1.00000 75745 : 0.94520 266015 : 0.94113
8 : 1.00000 388340 : 0.88984 806809 : 0.88518
9 : 1.00000 241051 : 0.83276 788561 : 0.82800
Bug on 104675 1 I 441151 actual 999999 pos 314026 k 3 dist -0.023188
Bug on 104675 2 I 677748 actual 999999 pos 314027 k 3 dist -0.029119
Bug on 214965 1 I 638478 actual 999999 pos 644896 k 3 dist -0.002828
Bug on 214965 2 I 939452 actual 999999 pos 644897 k 3 dist -0.003723
Bug on 249287 1 I 948374 actual 999999 pos 747862 k 3 dist -0.004687
Bug on 249287 2 I 750071 actual 999999 pos 747863 k 3 dist -0.006296
Bug on 394579 1 I 53821 actual 999999 pos 1183738 k 3 dist -0.001735
Bug on 394579 2 I 273531 actual 999999 pos 1183739 k 3 dist -0.002979
Bug on 397983 1 I 929964 actual 999999 pos 1193950 k 3 dist -0.064297
Bug on 397983 2 I 32925 actual 999999 pos 1193951 k 3 dist -0.064344
Bug on 936462 1 I 538900 actual 999999 pos 2809387 k 3 dist -0.048341
Bug on 936462 2 I 62374 actual 999999 pos 2809388 k 3 dist -0.058699
and also I found 1 image is failed. Now I am recomputing for the rest.
from fastdup.
The "bug on" messages are 6 images out of 1M which got into numerical error and thus got distance = 0 those you should ignore. I am still not sure about the duplicates which got 1 but are not identical. Can you send me the top 3 rows of images that are shown as duplicates but are not? I want to run the computation on my side, all the tests pass on my machines (ubuntu 18+20+mac m1) I was not able to reproduce this error yet.
from fastdup.
after removing that bad image, i got this duplicates:
still have the issue.
the log:
FastDup Software, (C) copyright 2022 Dr. Amir Alush and Dr. Danny Bickson.
Going to loop over dir /tmp/zara.txt
Found total 999998 images to run on
libpng warning: sRGB: out of place ] 31% Estimated: 66 Minutes
libpng warning: sRGB: out of place ] 42% Estimated: 54 Minutes
Wrote total of 999998 features , found 0 bad images] 100% Estimated: 0 Minutes
Found total 999998 images to run on
111037) Finished write_index() NN model
Stored nn model index file ../fastdup/out/nnf.index
Bug on 388602 1 I 169647 actual 999998 pos 1165807 k 3 dist -0.132486
Bug on 388602 2 I 255365 actual 999998 pos 1165808 k 3 dist -0.138604
Bug on 390441 1 I 774903 actual 999998 pos 1171324 k 3 dist -0.017260
Bug on 390441 2 I 161661 actual 999998 pos 1171325 k 3 dist -0.017907
Bug on 568423 1 I 659105 actual 999998 pos 1705270 k 3 dist -0.044372
Bug on 568423 2 I 755145 actual 999998 pos 1705271 k 3 dist -0.044996
Bug on 596245 2 I 150585 actual 999998 pos 1788737 k 3 dist -0.000455
Bug on 779143 1 I 2287 actual 999998 pos 2337430 k 3 dist -0.058276
Bug on 779143 2 I 62638 actual 999998 pos 2337431 k 3 dist -0.058675
Bug on 946843 1 I 197592 actual 999998 pos 2840530 k 3 dist -0.047367
Bug on 946843 2 I 474090 actual 999998 pos 2840531 k 3 dist -0.049710
1659980440 : INFO: (add_vertices:460): Num vertices for group 0: 999998
1659980440 : INFO: (commit_edge_buffer:609): In commit edge buffer (0,0)
1659980440 : INFO: (commit_edge_buffer:680): Shuffling edges ...
1659980440 : INFO: (commit_edge_buffer:688): Done shuffling edges in 0.04235 secs
1659980440 : INFO: (commit_edge_buffer:692): Aggregating unique vertices...
1659980440 : INFO: (commit_edge_buffer:705): Done aggregating unique vertex in 0.022127 secs
1659980440 : INFO: (commit_edge_buffer:713): Combine vertex data
1659980440 : INFO: (commit_edge_buffer:779): Done phase 2 in 0.062082 secs
1659980440 : INFO: (commit_edge_buffer:787): Rename id columns
1659980440 : INFO: (commit_edge_buffer:890): Done in 0.131735 secs
1659980440 : INFO: (commit_edge_buffer:892): Finish committing edge in 0.258471 secs
1659980440 : INFO: (add_edges:584): Num vertices for group 0: 999998
Num vertices for group 0: 999998
Num edges 0 -> 0: 117374
1659980441 : PROGRESS: (_p:516): +-----------------------------+
1659980441 : PROGRESS: (_p:516): | Number of components merged |
1659980441 : PROGRESS: (_p:516): +-----------------------------+
1659980441 : PROGRESS: (_p:516): | 72425 |
1659980441 : PROGRESS: (_p:516): | 0 |
1659980441 : PROGRESS: (_p:516): +-----------------------------+
1659980442 : PROGRESS: (triple_apply_pagerank:69): Counting out degree
1659980442 : PROGRESS: (triple_apply_pagerank:78): Done counting out degree
1659980442 : PROGRESS: (_p:516): +-----------+-----------------------+
1659980442 : PROGRESS: (_p:516): | Iteration | L1 change in pagerank |
1659980442 : PROGRESS: (_p:516): +-----------+-----------------------+
1659980442 : PROGRESS: (_p:516): | 1 | 771632 |
1659980442 : PROGRESS: (_p:516): | 2 | 4258.05 |
1659980442 : PROGRESS: (_p:516): | 3 | 2664.38 |
1659980442 : PROGRESS: (_p:516): | 4 | 1918.96 |
1659980442 : PROGRESS: (_p:516): | 5 | 1438.16 |
1659980442 : PROGRESS: (_p:516): | 6 | 1133.33 |
1659980442 : PROGRESS: (_p:516): | 7 | 907.308 |
1659980442 : PROGRESS: (_p:516): | 8 | 737.427 |
1659980442 : PROGRESS: (_p:516): | 9 | 603.873 |
1659980442 : PROGRESS: (_p:516): | 10 | 499.224 |
1659980442 : PROGRESS: (_p:516): | 11 | 414.308 |
1659980442 : PROGRESS: (_p:516): | 12 | 346.034 |
1659980442 : PROGRESS: (_p:516): | 13 | 289.312 |
1659980442 : PROGRESS: (_p:516): | 14 | 242.876 |
1659980442 : PROGRESS: (_p:516): | 15 | 203.985 |
1659980442 : PROGRESS: (_p:516): | 16 | 171.77 |
1659980443 : PROGRESS: (_p:516): | 17 | 144.69 |
1659980443 : PROGRESS: (_p:516): | 18 | 122.08 |
1659980443 : PROGRESS: (_p:516): | 19 | 103.042 |
1659980443 : PROGRESS: (_p:516): | 20 | 87.0725 |
1659980443 : PROGRESS: (_p:516): +-----------+-----------------------+
Wrote total of 999998 components
Total time took 5673428 ms
Found a total of 66513 fully identical images (d>0.990), which are 2.22 %
Found a total of 49974 nearly identical images(d>0.980), which are 1.67 %
Found a total of 1162877 above threshold images (d>0.900), which are 38.76 %
Found a total of 99999 outlier images (d<0.050), which are 3.33 %
Min distance found 0.001 max distance 1.000
from fastdup.
Hi @lolongcovas we just released another version 0.127 for ubuntu 18 please try again and let us know if this works!!
from fastdup.
now with 0.127 worked. thanks!
from fastdup.
hi, sorry again. on the position 187 i found wrong duplicates:
however, 184 and 189 are ok.
from fastdup.
This is totally strange and I can't reproduce it on my side. Are you open to setting up a zoom meeting or communicating in the slack channel so we can try and reproduce it together? Once I reproduce the problem I am sure I can solve it. But it is hard to reproduce without having the data.
from fastdup.
hi @lolongcovas we have released version 0.130 which tries to fix the issue observed, please try it out.
from fastdup.
Solved.
from fastdup.
Related Issues (20)
- [Bug]: fastdup fails to create above 10M object crops on ubuntu 20 (due to file system exhaustion) HOT 1
- [Feature Request]: Allow search on multiple work dirs in parallel HOT 1
- [Feature Request]: Compile fastdup for arm to allow docker run on mac m1 HOT 1
- [Feature Request]: add jfif support for windows OS HOT 1
- [Feature Request]: add mkv video support for fastdup HOT 1
- [Feature Request]: fastdup video extraction to provide timing info for each extracted frame HOT 1
- [Bug]: Pinned `requests` makes `fastdup` incompatible with other packages HOT 3
- [Feature Request]: mean_distance in image cluster relative to centroïd + distance between different clusters (using centroïds) HOT 1
- [Bug]: Kernel keep crashing when trying to run fd.run() HOT 22
- [Bug]: Fix thumbnail resize to look better HOT 1
- [Bug]:AssertionError: For removing wrong labels created by the create_similarity_gallery() need to run stats_file=df where df is the output of create_similarity_gallery() HOT 1
- Use fastdup in code pipeline rather than reports HOT 3
- [Bug]: RuntimeError: fastdup detected your are running an old version 1.60 (10 versions or more vs. the latest) please upgrade fastdup) HOT 2
- [Bug]: Oxford pet dataset, fastdup fails on 8 bad images HOT 1
- [Bug]: bad images warning is gibberish HOT 1
- [Bug]: When running fastdup as two steps (and there are bad images) connected component ids do not match atrain_features.dat.csv
- [Bug]: Can't pip install HOT 1
- [Bug]: Run is crashing when specifying embeddings HOT 3
- [Bug]: UnicodeDecodeError when running fd.run HOT 1
- [Feature Request]: Reidentification mode HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fastdup.