nil0x42 / duplicut Goto Github PK

View Code? Open in Web Editor NEW

798.0 22.0 90.0 1.14 MB

Remove duplicates from MASSIVE wordlist, without sorting it (for dictionary-based password cracking)

License: GNU General Public License v3.0

Makefile 0.71% C 96.45% Shell 2.20% Python 0.64%

hashcat password hashes cracking c wordlist wordlists uniq unique dedupe

duplicut's Introduction

Duplicut ✂️

Quickly dedupe massive wordlists, without changing the order

_{Created by
nil0x42 and
contributors}

📖 Overview

Nowadays, password wordlist creation usually implies concatenating multiple data sources.

Ideally, most probable passwords should stand at start of the wordlist, so most common passwords are cracked instantly.

With existing dedupe tools you are forced to choose if you prefer to preserve the order OR handle massive wordlists.

Unfortunately, wordlist creation requires both:

So i wrote duplicut in highly optimized C to address this very specific need 🤓 💻

💡 Quick start

git clone https://github.com/nil0x42/duplicut
cd duplicut/ && make
./duplicut wordlist.txt -o clean-wordlist.txt

🔧 Options

Features:
- Handle massive wordlists, even those whose size exceeds available RAM
- Filter lines by max length (-l option)
- Can remove lines containing non-printable ASCII chars (-p option)
- Press any key to show program status at runtime.
Implementation:
- Written in pure C code, designed to be fast
- Compressed hashmap items on 64 bit platforms
- Multithreading support
Limitations:
- Any line longer than 255 chars is ignored

📖 Technical Details

🔸 1- Memory optimized:

An uint64 is enough to index lines in hashmap, by packing size info within pointer's extra bits:

🔸 2- Massive file handling:

If whole file can't fit in memory, it is split into virtual chunks, in such way that each chunk uses as much RAM as possible.

Each chunk is then loaded into hashmap, deduped, and tested against subsequent chunks.

That way, execution time decreases to at most th triangle number:

💡 Throubleshotting

If you find a bug, or something doesn't work as expected, please compile duplicut in debug mode and post an issue with attached output:

# debug level can be from 1 to 4
make debug level=1
./duplicut [OPTIONS] 2>&1 | tee /tmp/duplicut-debug.log

duplicut's People

Contributors

Stargazers

Watchers

Forkers

nightwalker89 prosecurity embarassed pandapentest jordantsap viviczh1 arschlochnop arianr2014 aupuhcdup 76428778fada embarassed01 fancysauced solardiz m4xx101 gavz olivierh59500 markkastoun wisdark gdraperi cybermonitor zha0 bbhunter ronnair24 sec-js hack3roneness fourteenminusone radenvodka aqqdgyz aasicq slooppe kimxons shahid1996 sidrjal sandermendez 14thghost keralahacker krouser ventaquil persianyagami90xs actorexpose hartl3y94 ktm2590 puzzithinker saraiva dannymas hmidani-abdelilah marciopocebon bijanx y0d4a nx6110a5100 5l1v3r1 lhlsec gysf666 huangzccn angelopalacio okaayfine othmanequarati sprgroup metopedia mgcfish korallin ironleopard372 excloudx6 mayhemheroes silentsoul04 coxchris502 oakkaya pickkaa shadoxys rk68k dorukarda oilcrest neires pl0mo noname1007 attacker-codeninja denji johnyu763 sniper404ghostxploit creativlogic wuen1 japhliet abhijitch bobotikk gprime31 emadyay

duplicut's Issues

Why any line longer than 255 chars is ignored?

One question...

Why any line longer than 255 chars is ignored?

There is any reasonable explanation?

Regards.

UX: Add ansi terminal colors to program status if STDOUT isatty()

ansi colors are now implemented for the DEBUG lines, and adding it also for program status has virtually no cost.
so let's improve UX !

Purge both duplicates

First suggestion on github - but I would interested in a flag when a duplicate is found do NOT write this to the output file.

I think I could cat a number of wordlists that I've already tried together - dedupe, then generate a larger wordlist cat this my merged word list, dedupe again but this time with the flag set it would not write the dupes to the output file. Leaving me with a list of words not tried. Hope that makes sense..

Many unique lines with spaces or non-ASCII characters being deleted

Thanks so much for crafting and sharing duplicut.

It appears to delete many unique lines containing spaces or non-ASCII characters, e.g.,

$ cat 1

foo
bar
pass with spaces
a pass word
another unique password

$ duplicut 1 -o 1.out

$ cat 1.out

foo
bar
a pass word

Any line with 5 or more Japanese characters is cut:

$ cat 2

一
十一
百十一
千百十一
万千百十一
ほげふがぴよ

$ duplicut 2 -o 2.out

$ cat 2.out

一
十一
百十一
千百十一

Add support for removing duplicates from other file

It's often useful to be able to check If new wordlist you find is "interesting" at all. Currently duplicut can remove duplicates only in single but how about something like this:
duplicut wordlist1.txt -i wordlist2.txt -o clean-wordlist1.txt
It could be achieved by creating a temporary file that combines worlist1.txt and wordlist2.txt it is just important to skip first n (number of wordlist2.txt liens) in output. Rest could function the same way it does currently...

Can this program sort the password dictionary?

Otimize duplicut for SSDs

HDD vs SSD

On HDD, sequential access is relatively fast, while random access is terribly slow. That's why duplicut, written back in 2014 has been optimized thinking of it.
It made at that time no sense to have multiple threads reading concurrently a massive wordlist's content, so sequential access with a single thread was more performant when all lines could fit in hashmap at once.

Now we entered the SSD era, concurency could leverage great performance, as random access is way faster.

@solardiz suggested OpenMP, which would probably increase perf a lot.

TODO

compare duplicut/unique/rling on HDD to verify my assumption
compare duplicut/unique/rling on massive wordlist (>30GB)

@solardiz i'd love your suggestions & opinion about duplicut & ways to optimize 😄

core/status: status display sometimes fails to show coherent output

Example

$ ./duplicut chunk1.txt -o dedupe.lst 
time: 0:00:00:22 94.00% (ETA: Tue Oct  6 11:21:57 2020)  step 2/3: cleaning chunk 5/4 (task 11/10) ...

chunk 5 of 4, task 11 of 10 ...
must be fixed

Not 100% this has worked.

Hi,
I have a 23gb wordlist, ran the program, it looked as though it cloned/output the exact same file, then went to 0 bytes. This normal? I expected a fair few duplicates in this.
Could be useful to get some kind of stats after completion as to tell what its achieved. At the moment I cant really tell if its working.

Thanks

Ideas for enhancement

Support for taking input from multiple input files.
Wordlist can be spread across multiple files. Currently I am merging it and then passing to duplicut.
Need to work something like : duplicut -p 1.txt 2.txt 3.txt 4.txt -o output.txt
Progress bar. Need not be accurate. Can be a guesstimate.

Thanks for the software 👍

make give error on ubuntu 23

I wanted to build a project on Ubuntu 23 and I get an error that indicates that the project was not built correctly.

rm -rf objects/
rm -f gmon.out
rm -f tags
rm -f duplicut
mkdir -p 'dirname objects/main.o'
cc -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-implicit-fallthrough -Wno-error=implicit-function-declaration -mtune=native -ffast-math -O0 -D DEBUG=1 -std=gnu99 -g3 -c src/main.c -o objects/main.o
make: cc: No such file or directory
make: *** [Makefile:63: objects/main.o] Error 127

I will add that I built it previously on Windows 11 and everything was fine, but it doesn't work on Ubuntu.

confusing default `--line-max-size`

I just tested duplicat with a 2.5 GB dictionary.

file dictionary_private.dic : data
time sort -u dictionary_private.dic >dict_sort_uniq.txt
real 5m40,168s
user 13m9,512s
sys 0m7,682s
time duplicut dictionary_private.dic -o dict_dedupe.txt

real 0m47,435s
user 0m32,963s

duplicut is much faster than the "sort -u " command.
but the result not same. counting the lines of new worldlists.

wc -l dict_*
171193011 dict_dedupe.txt
205241662 dict_sort_uniq.txt

number of lines of the original file:
wc -l dictionary_private.dic
206282806 dictionary_private.dic
What can cause this discrepancy?

Add OSX testing on travis-ci

Duplicut has been initially designed to work on both linux and OSX, so an OSX test-suite for travis-ci should be implemented.

Improve `MEDIUM_LINE_BYTES` guessing with heuristic

MEDIUM_LINE_BYTES is currently hardcorded in const.h, to a value of 8.
The hasmap & chunks chunks are then made in such way that if real medium length of lines is MEDIUM_LINE_BYTES, the hashmap will be filled by a factor defined by HMAP_LOAD_FACTOR (currently set to 0.5, for 50% hmap filling).

Therefore, we could read some random pages in the file (e.g: start/middle/end of file), and get a better guess of MEDIUM_LINE_BYTES from there.

It would greatly improve performance in wordlists with a lot of very long lines (for example, a list of md5).
Because if lines are 32bytes long, hmap will be filled 12.5% only (50%/2/2). And a lot more chunks are needed.

how to use in kali linux?

No output produced (0 byte) for 9.2 Gb tab separated text file

xxx-Product-Name:~/duplicut$ ./duplicut wordhuge.txt -o clean-wordhuge.txt 2>&1 | tee /tmp/duplicut-debug.log
[DLOG1 26 13:16:34 optparse.c:236]: using wordhuge.txt as input file
[DLOG1 26 13:16:34 status.c:70 ]: CALL update_status(FCOPY_START)
[DLOG1 26 13:16:34 status.c:104]: CALL set_status(FILE_SIZE, 9944557870)
[DLOG1 26 13:16:40 file.c:217]:
[DLOG1 26 13:16:40 file.c:218]: ---------- g_file ------------
[DLOG1 26 13:16:40 file.c:219]: g_file->fd: 4
[DLOG1 26 13:16:40 file.c:220]: g_file->name: clean-wordhuge.txt
[DLOG1 26 13:16:40 file.c:221]: g_file->addr: 0x7f09fd9ea000
[DLOG1 26 13:16:40 file.c:222]: g_file->info.st_size: 9.3G (9944557870)
[DLOG1 26 13:16:40 file.c:224]: ------------------------------
[DLOG1 26 13:16:40 config.c:120]:
[DLOG1 26 13:16:40 config.c:121]: --------- memstate -----------
[DLOG1 26 13:16:40 config.c:122]: memstate.page_size: 4096
[DLOG1 26 13:16:40 config.c:123]: memstate.mem_available: 80.5G (86460444672)
[DLOG1 26 13:16:40 config.c:125]: ------------------------------
[DLOG1 26 13:16:40 config.c:127]:
[DLOG1 26 13:16:40 config.c:128]: ---------- g_conf ------------
[DLOG1 26 13:16:40 config.c:129]: g_conf.infile_name: wordhuge.txt
[DLOG1 26 13:16:40 config.c:130]: g_conf.outfile_name: clean-wordhuge.txt
[DLOG1 26 13:16:40 config.c:131]: g_conf.threads: 48
[DLOG1 26 13:16:40 config.c:132]: g_conf.line_max_size: 64
[DLOG1 26 13:16:40 config.c:133]: g_conf.hmap_size: 18.5G (2486139419 slots of 64bits)
[DLOG1 26 13:16:40 config.c:136]: g_conf.chunk_size: 9.3G (9944557870)
[DLOG1 26 13:16:40 config.c:138]: g_conf.filter_printable: 0
[DLOG1 26 13:16:40 config.c:139]: g_conf.memlimit: 9223372036854775807
[DLOG1 26 13:16:40 config.c:140]: ------------------------------
[DLOG1 26 13:16:40 config.c:141]:
[DLOG1 26 13:16:40 status.c:112]: CALL set_status(CHUNK_SIZE, 9944557870)
[DLOG1 26 13:16:40 status.c:74 ]: CALL update_status(TAGDUP_START)
[DLOG1 26 13:16:46 status.c:86 ]: CALL update_status(CTASK_DONE)
[DLOG1 26 13:16:46 status.c:80 ]: CALL update_status(CHUNK_DONE)
[DLOG1 26 13:16:46 status.c:93 ]: CALL update_status(FCLEAN_START)

(On a 24 core, 192 Gb ram machine. Ava mem 80 Gb)

implement sendfile() copy mode

splice() method could be implemented too, if it works on more platforms

REFERENCE: http://blog.superpat.com/2010/06/01/zero-copy-in-linux-with-sendfile-and-splice/

Duplicut not cutting all duplicates

I tested duplicut on a VM with files larger than RAM (2GB of RAM), I create a test file with

yes this is test file | head -c 4GB > test.file

This create a 4GB file where every line is this is test file
I used duplicut on the file

duplicut test.file -o test.duplicut

Once it finished I expected test.duplicut to have exactly 1 line but to my suprise there were a couple of lines left, 9 to be exact.
All 9 lines where the same this is test file.
This seems odd and I decided to run duplicut again on test.duplicut

duplicut test.duplicut -o test.duplicut.2

After this second run, there is only 1 line left as I expected it from the beginning.

Since this is so strange, I decided to test the same thing again:

duplicut test.file -o second.test.duplicut

But there are exactly 9 lines left again - I have no Idea why this happens but it is reproducible.

display duplicut status line on SIGHUP

it is usefull when duplicut is run using stdin as input file.

[Chore] Typo

-dictionnary
+dictionary

Output to stdout by default

It would be a nice feature to output to stdout by default, as sort -u or anew do.

handle SIGINT to cleanout program on Ctrl-C

the SIGINT unix signal should be catched to properly clean things such as temporary created file

Cant Install

Can you tell me what im doing wrong? Im using Kali Linux 2019
can u tell me how to install it? or what im doing wrong?

root@Kali:~# cd duplicut/
root@Kali:~/duplicut# make release
rm -rf objects/
rm -f gmon.out
rm -f tags
rm -f duplicut
mkdir -p `dirname objects/main.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/main.c -o objects/main.o
mkdir -p `dirname objects/thpool.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/thpool.c -o objects/thpool.o
src/thpool.c: In function ‘thpool_resume’:
src/thpool.c:274:29: warning: unused parameter ‘thpool_p’ [-Wunused-parameter]
  274 | void thpool_resume(thpool_* thpool_p) {
      |                    ~~~~~~~~~^~~~~~~~
src/thpool.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/file.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/file.c -o objects/file.o
mkdir -p `dirname objects/chunk.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/chunk.c -o objects/chunk.o
mkdir -p `dirname objects/line.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/line.c -o objects/line.o
mkdir -p `dirname objects/tag_duplicates.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/tag_duplicates.c -o objects/tag_duplicates.o
mkdir -p `dirname objects/optparse.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/optparse.c -o objects/optparse.o
mkdir -p `dirname objects/config.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/config.c -o objects/config.o
mkdir -p `dirname objects/error.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/error.c -o objects/error.o
mkdir -p `dirname objects/memstate.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/memstate.c -o objects/memstate.o
mkdir -p `dirname objects/meminfo.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/meminfo.c -o objects/meminfo.o
mkdir -p `dirname objects/bytesize.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/bytesize.c -o objects/bytesize.o
mkdir -p `dirname objects/hmap.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hmap.c -o objects/hmap.o
mkdir -p `dirname objects/hash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hash.c -o objects/hash.o
src/hash.c: In function ‘murmur3’:
src/hash.c:24:11: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   24 |     h = *((unsigned long long*)buf128);
      |          ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/hash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/fasthash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/fasthash.c -o objects/fasthash.o
src/fasthash.c: In function ‘fasthash64’:
src/fasthash.c:54:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   54 |  case 7: v ^= (uint64_t)pos2[6] << 48;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:55:2: note: here
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |  ^~~~
src/fasthash.c:55:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:56:2: note: here
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |  ^~~~
src/fasthash.c:56:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:57:2: note: here
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |  ^~~~
src/fasthash.c:57:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:58:2: note: here
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |  ^~~~
src/fasthash.c:58:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:59:2: note: here
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |  ^~~~
src/fasthash.c:59:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:60:2: note: here
   60 |  case 1: v ^= (uint64_t)pos2[0];
      |  ^~~~
src/fasthash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/murmur3.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/murmur3.c -o objects/murmur3.o
src/murmur3.c: In function ‘MurmurHash3_x86_32’:
src/murmur3.c:110:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  110 |   case 3: k1 ^= tail[2] << 16;
      |           ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:111:3: note: here
  111 |   case 2: k1 ^= tail[1] << 8;
      |   ^~~~
src/murmur3.c:111:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  111 |   case 2: k1 ^= tail[1] << 8;
      |           ~~~^~~~~~~~~~~~~~~
src/murmur3.c:112:3: note: here
  112 |   case 1: k1 ^= tail[0];
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x86_128’:
src/murmur3.c:186:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  186 |   case 15: k4 ^= tail[14] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:187:3: note: here
  187 |   case 14: k4 ^= tail[13] << 8;
      |   ^~~~
src/murmur3.c:187:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  187 |   case 14: k4 ^= tail[13] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:188:3: note: here
  188 |   case 13: k4 ^= tail[12] << 0;
      |   ^~~~
src/murmur3.c:189:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  189 |            k4 *= c4; k4  = ROTL32(k4,18); k4 *= c1; h4 ^= k4;
      |                                                     ~~~^~~~~
src/murmur3.c:191:3: note: here
  191 |   case 12: k3 ^= tail[11] << 24;
      |   ^~~~
src/murmur3.c:191:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  191 |   case 12: k3 ^= tail[11] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:192:3: note: here
  192 |   case 11: k3 ^= tail[10] << 16;
      |   ^~~~
src/murmur3.c:192:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  192 |   case 11: k3 ^= tail[10] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:193:3: note: here
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |   ^~~~
src/murmur3.c:193:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:194:3: note: here
  194 |   case  9: k3 ^= tail[ 8] << 0;
      |   ^~~~
src/murmur3.c:195:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  195 |            k3 *= c3; k3  = ROTL32(k3,17); k3 *= c4; h3 ^= k3;
      |                                                     ~~~^~~~~
src/murmur3.c:197:3: note: here
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |   ^~~~
src/murmur3.c:197:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:198:3: note: here
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |   ^~~~
src/murmur3.c:198:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:199:3: note: here
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |   ^~~~
src/murmur3.c:199:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:200:3: note: here
  200 |   case  5: k2 ^= tail[ 4] << 0;
      |   ^~~~
src/murmur3.c:201:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  201 |            k2 *= c2; k2  = ROTL32(k2,16); k2 *= c3; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:203:3: note: here
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |   ^~~~
src/murmur3.c:203:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:204:3: note: here
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |   ^~~~
src/murmur3.c:204:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:205:3: note: here
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |   ^~~~
src/murmur3.c:205:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:206:3: note: here
  206 |   case  1: k1 ^= tail[ 0] << 0;
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x64_128’:
src/murmur3.c:276:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  276 |   case 15: k2 ^= (uint64_t)(tail[14]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:277:3: note: here
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |   ^~~~
src/murmur3.c:277:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:278:3: note: here
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |   ^~~~
src/murmur3.c:278:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:279:3: note: here
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |   ^~~~
src/murmur3.c:279:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:280:3: note: here
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |   ^~~~
src/murmur3.c:280:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:281:3: note: here
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |   ^~~~
src/murmur3.c:281:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:282:3: note: here
  282 |   case  9: k2 ^= (uint64_t)(tail[ 8]) << 0;
      |   ^~~~
src/murmur3.c:283:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  283 |            k2 *= c2; k2  = ROTL64(k2,33); k2 *= c1; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:285:3: note: here
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |   ^~~~
src/murmur3.c:285:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:286:3: note: here
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |   ^~~~
src/murmur3.c:286:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:287:3: note: here
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |   ^~~~
src/murmur3.c:287:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:288:3: note: here
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |   ^~~~
src/murmur3.c:288:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:289:3: note: here
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |   ^~~~
src/murmur3.c:289:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:290:3: note: here
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |   ^~~~
src/murmur3.c:290:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:291:3: note: here
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |   ^~~~
src/murmur3.c:291:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:292:3: note: here
  292 |   case  1: k1 ^= (uint64_t)(tail[ 0]) << 0;
      |   ^~~~
src/murmur3.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/status.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/status.c -o objects/status.o
mkdir -p `dirname objects/user_input.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/user_input.c -o objects/user_input.o
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -O2 -D NDEBUG -o duplicut  objects/main.o  objects/thpool.o  objects/file.o  objects/chunk.o  objects/line.o  objects/tag_duplicates.o  objects/optparse.o  objects/config.o  objects/error.o  objects/memstate.o  objects/meminfo.o  objects/bytesize.o  objects/hmap.o  objects/hash.o  objects/fasthash.o  objects/murmur3.o  objects/status.o  objects/user_input.o -lm -pthread
strip -s duplicut
root@ENiSEC:~/duplicut# make release
rm -rf objects/
rm -f gmon.out
rm -f tags
rm -f duplicut
mkdir -p `dirname objects/main.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/main.c -o objects/main.o
mkdir -p `dirname objects/thpool.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/thpool.c -o objects/thpool.o
src/thpool.c: In function ‘thpool_resume’:
src/thpool.c:274:29: warning: unused parameter ‘thpool_p’ [-Wunused-parameter]
  274 | void thpool_resume(thpool_* thpool_p) {
      |                    ~~~~~~~~~^~~~~~~~
src/thpool.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/file.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/file.c -o objects/file.o
mkdir -p `dirname objects/chunk.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/chunk.c -o objects/chunk.o
mkdir -p `dirname objects/line.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/line.c -o objects/line.o
mkdir -p `dirname objects/tag_duplicates.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/tag_duplicates.c -o objects/tag_duplicates.o
mkdir -p `dirname objects/optparse.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/optparse.c -o objects/optparse.o
mkdir -p `dirname objects/config.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/config.c -o objects/config.o
mkdir -p `dirname objects/error.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/error.c -o objects/error.o
mkdir -p `dirname objects/memstate.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/memstate.c -o objects/memstate.o
mkdir -p `dirname objects/meminfo.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/meminfo.c -o objects/meminfo.o
mkdir -p `dirname objects/bytesize.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/bytesize.c -o objects/bytesize.o
mkdir -p `dirname objects/hmap.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hmap.c -o objects/hmap.o
mkdir -p `dirname objects/hash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hash.c -o objects/hash.o
src/hash.c: In function ‘murmur3’:
src/hash.c:24:11: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   24 |     h = *((unsigned long long*)buf128);
      |          ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/hash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/fasthash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/fasthash.c -o objects/fasthash.o
src/fasthash.c: In function ‘fasthash64’:
src/fasthash.c:54:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   54 |  case 7: v ^= (uint64_t)pos2[6] << 48;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:55:2: note: here
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |  ^~~~
src/fasthash.c:55:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:56:2: note: here
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |  ^~~~
src/fasthash.c:56:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:57:2: note: here
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |  ^~~~
src/fasthash.c:57:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:58:2: note: here
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |  ^~~~
src/fasthash.c:58:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:59:2: note: here
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |  ^~~~
src/fasthash.c:59:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:60:2: note: here
   60 |  case 1: v ^= (uint64_t)pos2[0];
      |  ^~~~
src/fasthash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/murmur3.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/murmur3.c -o objects/murmur3.o
src/murmur3.c: In function ‘MurmurHash3_x86_32’:
src/murmur3.c:110:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  110 |   case 3: k1 ^= tail[2] << 16;
      |           ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:111:3: note: here
  111 |   case 2: k1 ^= tail[1] << 8;
      |   ^~~~
src/murmur3.c:111:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  111 |   case 2: k1 ^= tail[1] << 8;
      |           ~~~^~~~~~~~~~~~~~~
src/murmur3.c:112:3: note: here
  112 |   case 1: k1 ^= tail[0];
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x86_128’:
src/murmur3.c:186:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  186 |   case 15: k4 ^= tail[14] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:187:3: note: here
  187 |   case 14: k4 ^= tail[13] << 8;
      |   ^~~~
src/murmur3.c:187:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  187 |   case 14: k4 ^= tail[13] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:188:3: note: here
  188 |   case 13: k4 ^= tail[12] << 0;
      |   ^~~~
src/murmur3.c:189:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  189 |            k4 *= c4; k4  = ROTL32(k4,18); k4 *= c1; h4 ^= k4;
      |                                                     ~~~^~~~~
src/murmur3.c:191:3: note: here
  191 |   case 12: k3 ^= tail[11] << 24;
      |   ^~~~
src/murmur3.c:191:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  191 |   case 12: k3 ^= tail[11] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:192:3: note: here
  192 |   case 11: k3 ^= tail[10] << 16;
      |   ^~~~
src/murmur3.c:192:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  192 |   case 11: k3 ^= tail[10] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:193:3: note: here
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |   ^~~~
src/murmur3.c:193:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:194:3: note: here
  194 |   case  9: k3 ^= tail[ 8] << 0;
      |   ^~~~
src/murmur3.c:195:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  195 |            k3 *= c3; k3  = ROTL32(k3,17); k3 *= c4; h3 ^= k3;
      |                                                     ~~~^~~~~
src/murmur3.c:197:3: note: here
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |   ^~~~
src/murmur3.c:197:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:198:3: note: here
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |   ^~~~
src/murmur3.c:198:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:199:3: note: here
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |   ^~~~
src/murmur3.c:199:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:200:3: note: here
  200 |   case  5: k2 ^= tail[ 4] << 0;
      |   ^~~~
src/murmur3.c:201:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  201 |            k2 *= c2; k2  = ROTL32(k2,16); k2 *= c3; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:203:3: note: here
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |   ^~~~
src/murmur3.c:203:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:204:3: note: here
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |   ^~~~
src/murmur3.c:204:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:205:3: note: here
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |   ^~~~
src/murmur3.c:205:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:206:3: note: here
  206 |   case  1: k1 ^= tail[ 0] << 0;
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x64_128’:
src/murmur3.c:276:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  276 |   case 15: k2 ^= (uint64_t)(tail[14]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:277:3: note: here
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |   ^~~~
src/murmur3.c:277:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:278:3: note: here
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |   ^~~~
src/murmur3.c:278:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:279:3: note: here
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |   ^~~~
src/murmur3.c:279:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:280:3: note: here
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |   ^~~~
src/murmur3.c:280:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:281:3: note: here
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |   ^~~~
src/murmur3.c:281:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:282:3: note: here
  282 |   case  9: k2 ^= (uint64_t)(tail[ 8]) << 0;
      |   ^~~~
src/murmur3.c:283:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  283 |            k2 *= c2; k2  = ROTL64(k2,33); k2 *= c1; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:285:3: note: here
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |   ^~~~
src/murmur3.c:285:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:286:3: note: here
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |   ^~~~
src/murmur3.c:286:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:287:3: note: here
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |   ^~~~
src/murmur3.c:287:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:288:3: note: here
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |   ^~~~
src/murmur3.c:288:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:289:3: note: here
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |   ^~~~
src/murmur3.c:289:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:290:3: note: here
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |   ^~~~
src/murmur3.c:290:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:291:3: note: here
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |   ^~~~
src/murmur3.c:291:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:292:3: note: here
  292 |   case  1: k1 ^= (uint64_t)(tail[ 0]) << 0;
      |   ^~~~
src/murmur3.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/status.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/status.c -o objects/status.o
mkdir -p `dirname objects/user_input.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/user_input.c -o objects/user_input.o
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -O2 -D NDEBUG -o duplicut  objects/main.o  objects/thpool.o  objects/file.o  objects/chunk.o  objects/line.o  objects/tag_duplicates.o  objects/optparse.o  objects/config.o  objects/error.o  objects/memstate.o  objects/meminfo.o  objects/bytesize.o  objects/hmap.o  objects/hash.o  objects/fasthash.o  objects/murmur3.o  objects/status.o  objects/user_input.o -lm -pthread
strip -s duplicut

Feature request: Word length

Hi
Using the same technique as for duplicates, is it possible to remove words that are <> a certain word length. (min/max word length)
Also if you could use the same method for splitting a wordlist into separate wordlists based on word length? so it can split one large wordlist into separate lists such as 8chr words,9chr,10chr etc

Just a thought as you've nailed the duplicates with this technique

Run Duplicut on Windows?

Hello, is possible compile Duplicut and run the tool on Windows 10?

Verbose output

Hi,
I think it would be a nice addition to have duplicut show how many duplicate entries got removed and how many junk entries were filtered out etc.

add verbosity level (-vvv) option

everything is in the title ...

Seems like it hangs

I am using it on Kali Linux 2019.4 VM
I gave the VM 24 GB of RAM and 4 cores so resources shouldn't be a problem
The command I used:
/root/duplicut/duplicut combined.txt -l 60 -o clean1.txt
time: 0:01:44:58 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
time: 0:01:56:10 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
time: 0:02:02:24 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
time: 0:02:20:17 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
The file I am trying to clean is 24GB and it's been sitting on 5% for more then 3 hours so far

ensure that INFILE and OUTFILE are not the same

otherwise, the file will be deleted ...

reproducing the bug:

duplicut file.txt -o file.txt

Inconsistency sometimes occurs across multiple runs on the same file

Running with the default command on larger files (over 1GB) leads to inconsistency across multiple runs