Giter VIP home page Giter VIP logo

duplicut's Introduction

Duplicut ✂️

Quickly dedupe massive wordlists, without changing the order tweet


github workflows codacy code quality lgtm alerts codecov coverage

Mentioned in awesome-pentest

Created by nil0x42 and contributors


📖 Overview

Nowadays, password wordlist creation usually implies concatenating multiple data sources.

Ideally, most probable passwords should stand at start of the wordlist, so most common passwords are cracked instantly.

With existing dedupe tools you are forced to choose if you prefer to preserve the order OR handle massive wordlists.

Unfortunately, wordlist creation requires both:

So i wrote duplicut in highly optimized C to address this very specific need 🤓 💻


💡 Quick start

git clone https://github.com/nil0x42/duplicut
cd duplicut/ && make
./duplicut wordlist.txt -o clean-wordlist.txt

🔧 Options

  • Features:

    • Handle massive wordlists, even those whose size exceeds available RAM
    • Filter lines by max length (-l option)
    • Can remove lines containing non-printable ASCII chars (-p option)
    • Press any key to show program status at runtime.
  • Implementation:

    • Written in pure C code, designed to be fast
    • Compressed hashmap items on 64 bit platforms
    • Multithreading support
  • Limitations:

    • Any line longer than 255 chars is ignored

📖 Technical Details

🔸 1- Memory optimized:

An uint64 is enough to index lines in hashmap, by packing size info within pointer's extra bits:

🔸 2- Massive file handling:

If whole file can't fit in memory, it is split into virtual chunks, in such way that each chunk uses as much RAM as possible.

Each chunk is then loaded into hashmap, deduped, and tested against subsequent chunks.

That way, execution time decreases to at most th triangle number:

💡 Throubleshotting

If you find a bug, or something doesn't work as expected, please compile duplicut in debug mode and post an issue with attached output:

# debug level can be from 1 to 4
make debug level=1
./duplicut [OPTIONS] 2>&1 | tee /tmp/duplicut-debug.log

duplicut's People

Contributors

imgbotapp avatar nil0x42 avatar solardiz avatar timgates42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

duplicut's Issues

Purge both duplicates

First suggestion on github - but I would interested in a flag when a duplicate is found do NOT write this to the output file.

I think I could cat a number of wordlists that I've already tried together - dedupe, then generate a larger wordlist cat this my merged word list, dedupe again but this time with the flag set it would not write the dupes to the output file. Leaving me with a list of words not tried. Hope that makes sense..

Many unique lines with spaces or non-ASCII characters being deleted

Thanks so much for crafting and sharing duplicut.

It appears to delete many unique lines containing spaces or non-ASCII characters, e.g.,

$ cat 1

foo
bar
pass with spaces
a pass word
another unique password

$ duplicut 1 -o 1.out

$ cat 1.out

foo
bar
a pass word

Any line with 5 or more Japanese characters is cut:

$ cat 2

一
十一
百十一
千百十一
万千百十一
ほげふがぴよ

$ duplicut 2 -o 2.out

$ cat 2.out

一
十一
百十一
千百十一

Add support for removing duplicates from other file

It's often useful to be able to check If new wordlist you find is "interesting" at all. Currently duplicut can remove duplicates only in single but how about something like this:
duplicut wordlist1.txt -i wordlist2.txt -o clean-wordlist1.txt
It could be achieved by creating a temporary file that combines worlist1.txt and wordlist2.txt it is just important to skip first n (number of wordlist2.txt liens) in output. Rest could function the same way it does currently...

Otimize duplicut for SSDs

HDD vs SSD

On HDD, sequential access is relatively fast, while random access is terribly slow. That's why duplicut, written back in 2014 has been optimized thinking of it.
It made at that time no sense to have multiple threads reading concurrently a massive wordlist's content, so sequential access with a single thread was more performant when all lines could fit in hashmap at once.

Now we entered the SSD era, concurency could leverage great performance, as random access is way faster.

@solardiz suggested OpenMP, which would probably increase perf a lot.

TODO

  • compare duplicut/unique/rling on HDD to verify my assumption
  • compare duplicut/unique/rling on massive wordlist (>30GB)

@solardiz i'd love your suggestions & opinion about duplicut & ways to optimize 😄

Not 100% this has worked.

Hi,
I have a 23gb wordlist, ran the program, it looked as though it cloned/output the exact same file, then went to 0 bytes. This normal? I expected a fair few duplicates in this.
Could be useful to get some kind of stats after completion as to tell what its achieved. At the moment I cant really tell if its working.

Thanks

Ideas for enhancement

  1. Support for taking input from multiple input files.
    Wordlist can be spread across multiple files. Currently I am merging it and then passing to duplicut.
    Need to work something like : duplicut -p 1.txt 2.txt 3.txt 4.txt -o output.txt

  2. Progress bar. Need not be accurate. Can be a guesstimate.

Thanks for the software 👍

make give error on ubuntu 23

I wanted to build a project on Ubuntu 23 and I get an error that indicates that the project was not built correctly.

rm -rf objects/
rm -f gmon.out
rm -f tags
rm -f duplicut
mkdir -p 'dirname objects/main.o'
cc -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-implicit-fallthrough -Wno-error=implicit-function-declaration -mtune=native -ffast-math -O0 -D DEBUG=1 -std=gnu99 -g3 -c src/main.c -o objects/main.o
make: cc: No such file or directory
make: *** [Makefile:63: objects/main.o] Error 127

I will add that I built it previously on Windows 11 and everything was fine, but it doesn't work on Ubuntu.

confusing default `--line-max-size`

I just tested duplicat with a 2.5 GB dictionary.

file dictionary_private.dic : data
time sort -u dictionary_private.dic >dict_sort_uniq.txt
real 5m40,168s
user 13m9,512s
sys 0m7,682s
time duplicut dictionary_private.dic -o dict_dedupe.txt

real 0m47,435s
user 0m32,963s

duplicut is much faster than the "sort -u " command.
but the result not same. counting the lines of new worldlists.

wc -l dict_*
171193011 dict_dedupe.txt
205241662 dict_sort_uniq.txt

number of lines of the original file:
wc -l dictionary_private.dic
206282806 dictionary_private.dic
What can cause this discrepancy?

Add OSX testing on travis-ci

Duplicut has been initially designed to work on both linux and OSX, so an OSX test-suite for travis-ci should be implemented.

Improve `MEDIUM_LINE_BYTES` guessing with heuristic

MEDIUM_LINE_BYTES is currently hardcorded in const.h, to a value of 8.
The hasmap & chunks chunks are then made in such way that if real medium length of lines is MEDIUM_LINE_BYTES, the hashmap will be filled by a factor defined by HMAP_LOAD_FACTOR (currently set to 0.5, for 50% hmap filling).

Therefore, we could read some random pages in the file (e.g: start/middle/end of file), and get a better guess of MEDIUM_LINE_BYTES from there.

It would greatly improve performance in wordlists with a lot of very long lines (for example, a list of md5).
Because if lines are 32bytes long, hmap will be filled 12.5% only (50%/2/2). And a lot more chunks are needed.

No output produced (0 byte) for 9.2 Gb tab separated text file

xxx-Product-Name:~/duplicut$ ./duplicut wordhuge.txt -o clean-wordhuge.txt 2>&1 | tee /tmp/duplicut-debug.log
[DLOG1 26 13:16:34 optparse.c:236]: using wordhuge.txt as input file
[DLOG1 26 13:16:34 status.c:70 ]: CALL update_status(FCOPY_START)
[DLOG1 26 13:16:34 status.c:104]: CALL set_status(FILE_SIZE, 9944557870)
[DLOG1 26 13:16:40 file.c:217]:
[DLOG1 26 13:16:40 file.c:218]: ---------- g_file ------------
[DLOG1 26 13:16:40 file.c:219]: g_file->fd: 4
[DLOG1 26 13:16:40 file.c:220]: g_file->name: clean-wordhuge.txt
[DLOG1 26 13:16:40 file.c:221]: g_file->addr: 0x7f09fd9ea000
[DLOG1 26 13:16:40 file.c:222]: g_file->info.st_size: 9.3G (9944557870)
[DLOG1 26 13:16:40 file.c:224]: ------------------------------
[DLOG1 26 13:16:40 config.c:120]:
[DLOG1 26 13:16:40 config.c:121]: --------- memstate -----------
[DLOG1 26 13:16:40 config.c:122]: memstate.page_size: 4096
[DLOG1 26 13:16:40 config.c:123]: memstate.mem_available: 80.5G (86460444672)
[DLOG1 26 13:16:40 config.c:125]: ------------------------------
[DLOG1 26 13:16:40 config.c:127]:
[DLOG1 26 13:16:40 config.c:128]: ---------- g_conf ------------
[DLOG1 26 13:16:40 config.c:129]: g_conf.infile_name: wordhuge.txt
[DLOG1 26 13:16:40 config.c:130]: g_conf.outfile_name: clean-wordhuge.txt
[DLOG1 26 13:16:40 config.c:131]: g_conf.threads: 48
[DLOG1 26 13:16:40 config.c:132]: g_conf.line_max_size: 64
[DLOG1 26 13:16:40 config.c:133]: g_conf.hmap_size: 18.5G (2486139419 slots of 64bits)
[DLOG1 26 13:16:40 config.c:136]: g_conf.chunk_size: 9.3G (9944557870)
[DLOG1 26 13:16:40 config.c:138]: g_conf.filter_printable: 0
[DLOG1 26 13:16:40 config.c:139]: g_conf.memlimit: 9223372036854775807
[DLOG1 26 13:16:40 config.c:140]: ------------------------------
[DLOG1 26 13:16:40 config.c:141]:
[DLOG1 26 13:16:40 status.c:112]: CALL set_status(CHUNK_SIZE, 9944557870)
[DLOG1 26 13:16:40 status.c:74 ]: CALL update_status(TAGDUP_START)
[DLOG1 26 13:16:46 status.c:86 ]: CALL update_status(CTASK_DONE)
[DLOG1 26 13:16:46 status.c:80 ]: CALL update_status(CHUNK_DONE)
[DLOG1 26 13:16:46 status.c:93 ]: CALL update_status(FCLEAN_START)

(On a 24 core, 192 Gb ram machine. Ava mem 80 Gb)

Duplicut not cutting all duplicates

I tested duplicut on a VM with files larger than RAM (2GB of RAM), I create a test file with

yes this is test file | head -c 4GB > test.file

This create a 4GB file where every line is this is test file
I used duplicut on the file

duplicut test.file -o test.duplicut

Once it finished I expected test.duplicut to have exactly 1 line but to my suprise there were a couple of lines left, 9 to be exact.
All 9 lines where the same this is test file.
This seems odd and I decided to run duplicut again on test.duplicut

duplicut test.duplicut -o test.duplicut.2

After this second run, there is only 1 line left as I expected it from the beginning.

Since this is so strange, I decided to test the same thing again:

duplicut test.file -o second.test.duplicut

But there are exactly 9 lines left again - I have no Idea why this happens but it is reproducible.

Cant Install

Can you tell me what im doing wrong? Im using Kali Linux 2019
can u tell me how to install it? or what im doing wrong?

root@Kali:~# cd duplicut/
root@Kali:~/duplicut# make release
rm -rf objects/
rm -f gmon.out
rm -f tags
rm -f duplicut
mkdir -p `dirname objects/main.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/main.c -o objects/main.o
mkdir -p `dirname objects/thpool.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/thpool.c -o objects/thpool.o
src/thpool.c: In function ‘thpool_resume’:
src/thpool.c:274:29: warning: unused parameter ‘thpool_p’ [-Wunused-parameter]
  274 | void thpool_resume(thpool_* thpool_p) {
      |                    ~~~~~~~~~^~~~~~~~
src/thpool.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/file.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/file.c -o objects/file.o
mkdir -p `dirname objects/chunk.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/chunk.c -o objects/chunk.o
mkdir -p `dirname objects/line.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/line.c -o objects/line.o
mkdir -p `dirname objects/tag_duplicates.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/tag_duplicates.c -o objects/tag_duplicates.o
mkdir -p `dirname objects/optparse.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/optparse.c -o objects/optparse.o
mkdir -p `dirname objects/config.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/config.c -o objects/config.o
mkdir -p `dirname objects/error.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/error.c -o objects/error.o
mkdir -p `dirname objects/memstate.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/memstate.c -o objects/memstate.o
mkdir -p `dirname objects/meminfo.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/meminfo.c -o objects/meminfo.o
mkdir -p `dirname objects/bytesize.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/bytesize.c -o objects/bytesize.o
mkdir -p `dirname objects/hmap.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hmap.c -o objects/hmap.o
mkdir -p `dirname objects/hash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hash.c -o objects/hash.o
src/hash.c: In function ‘murmur3’:
src/hash.c:24:11: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   24 |     h = *((unsigned long long*)buf128);
      |          ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/hash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/fasthash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/fasthash.c -o objects/fasthash.o
src/fasthash.c: In function ‘fasthash64’:
src/fasthash.c:54:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   54 |  case 7: v ^= (uint64_t)pos2[6] << 48;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:55:2: note: here
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |  ^~~~
src/fasthash.c:55:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:56:2: note: here
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |  ^~~~
src/fasthash.c:56:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:57:2: note: here
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |  ^~~~
src/fasthash.c:57:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:58:2: note: here
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |  ^~~~
src/fasthash.c:58:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:59:2: note: here
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |  ^~~~
src/fasthash.c:59:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:60:2: note: here
   60 |  case 1: v ^= (uint64_t)pos2[0];
      |  ^~~~
src/fasthash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/murmur3.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/murmur3.c -o objects/murmur3.o
src/murmur3.c: In function ‘MurmurHash3_x86_32’:
src/murmur3.c:110:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  110 |   case 3: k1 ^= tail[2] << 16;
      |           ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:111:3: note: here
  111 |   case 2: k1 ^= tail[1] << 8;
      |   ^~~~
src/murmur3.c:111:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  111 |   case 2: k1 ^= tail[1] << 8;
      |           ~~~^~~~~~~~~~~~~~~
src/murmur3.c:112:3: note: here
  112 |   case 1: k1 ^= tail[0];
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x86_128’:
src/murmur3.c:186:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  186 |   case 15: k4 ^= tail[14] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:187:3: note: here
  187 |   case 14: k4 ^= tail[13] << 8;
      |   ^~~~
src/murmur3.c:187:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  187 |   case 14: k4 ^= tail[13] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:188:3: note: here
  188 |   case 13: k4 ^= tail[12] << 0;
      |   ^~~~
src/murmur3.c:189:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  189 |            k4 *= c4; k4  = ROTL32(k4,18); k4 *= c1; h4 ^= k4;
      |                                                     ~~~^~~~~
src/murmur3.c:191:3: note: here
  191 |   case 12: k3 ^= tail[11] << 24;
      |   ^~~~
src/murmur3.c:191:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  191 |   case 12: k3 ^= tail[11] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:192:3: note: here
  192 |   case 11: k3 ^= tail[10] << 16;
      |   ^~~~
src/murmur3.c:192:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  192 |   case 11: k3 ^= tail[10] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:193:3: note: here
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |   ^~~~
src/murmur3.c:193:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:194:3: note: here
  194 |   case  9: k3 ^= tail[ 8] << 0;
      |   ^~~~
src/murmur3.c:195:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  195 |            k3 *= c3; k3  = ROTL32(k3,17); k3 *= c4; h3 ^= k3;
      |                                                     ~~~^~~~~
src/murmur3.c:197:3: note: here
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |   ^~~~
src/murmur3.c:197:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:198:3: note: here
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |   ^~~~
src/murmur3.c:198:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:199:3: note: here
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |   ^~~~
src/murmur3.c:199:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:200:3: note: here
  200 |   case  5: k2 ^= tail[ 4] << 0;
      |   ^~~~
src/murmur3.c:201:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  201 |            k2 *= c2; k2  = ROTL32(k2,16); k2 *= c3; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:203:3: note: here
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |   ^~~~
src/murmur3.c:203:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:204:3: note: here
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |   ^~~~
src/murmur3.c:204:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:205:3: note: here
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |   ^~~~
src/murmur3.c:205:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:206:3: note: here
  206 |   case  1: k1 ^= tail[ 0] << 0;
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x64_128’:
src/murmur3.c:276:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  276 |   case 15: k2 ^= (uint64_t)(tail[14]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:277:3: note: here
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |   ^~~~
src/murmur3.c:277:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:278:3: note: here
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |   ^~~~
src/murmur3.c:278:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:279:3: note: here
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |   ^~~~
src/murmur3.c:279:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:280:3: note: here
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |   ^~~~
src/murmur3.c:280:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:281:3: note: here
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |   ^~~~
src/murmur3.c:281:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:282:3: note: here
  282 |   case  9: k2 ^= (uint64_t)(tail[ 8]) << 0;
      |   ^~~~
src/murmur3.c:283:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  283 |            k2 *= c2; k2  = ROTL64(k2,33); k2 *= c1; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:285:3: note: here
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |   ^~~~
src/murmur3.c:285:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:286:3: note: here
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |   ^~~~
src/murmur3.c:286:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:287:3: note: here
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |   ^~~~
src/murmur3.c:287:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:288:3: note: here
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |   ^~~~
src/murmur3.c:288:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:289:3: note: here
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |   ^~~~
src/murmur3.c:289:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:290:3: note: here
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |   ^~~~
src/murmur3.c:290:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:291:3: note: here
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |   ^~~~
src/murmur3.c:291:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:292:3: note: here
  292 |   case  1: k1 ^= (uint64_t)(tail[ 0]) << 0;
      |   ^~~~
src/murmur3.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/status.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/status.c -o objects/status.o
mkdir -p `dirname objects/user_input.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/user_input.c -o objects/user_input.o
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -O2 -D NDEBUG -o duplicut  objects/main.o  objects/thpool.o  objects/file.o  objects/chunk.o  objects/line.o  objects/tag_duplicates.o  objects/optparse.o  objects/config.o  objects/error.o  objects/memstate.o  objects/meminfo.o  objects/bytesize.o  objects/hmap.o  objects/hash.o  objects/fasthash.o  objects/murmur3.o  objects/status.o  objects/user_input.o -lm -pthread
strip -s duplicut
root@ENiSEC:~/duplicut# make release
rm -rf objects/
rm -f gmon.out
rm -f tags
rm -f duplicut
mkdir -p `dirname objects/main.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/main.c -o objects/main.o
mkdir -p `dirname objects/thpool.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/thpool.c -o objects/thpool.o
src/thpool.c: In function ‘thpool_resume’:
src/thpool.c:274:29: warning: unused parameter ‘thpool_p’ [-Wunused-parameter]
  274 | void thpool_resume(thpool_* thpool_p) {
      |                    ~~~~~~~~~^~~~~~~~
src/thpool.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/file.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/file.c -o objects/file.o
mkdir -p `dirname objects/chunk.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/chunk.c -o objects/chunk.o
mkdir -p `dirname objects/line.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/line.c -o objects/line.o
mkdir -p `dirname objects/tag_duplicates.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/tag_duplicates.c -o objects/tag_duplicates.o
mkdir -p `dirname objects/optparse.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/optparse.c -o objects/optparse.o
mkdir -p `dirname objects/config.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/config.c -o objects/config.o
mkdir -p `dirname objects/error.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/error.c -o objects/error.o
mkdir -p `dirname objects/memstate.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/memstate.c -o objects/memstate.o
mkdir -p `dirname objects/meminfo.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/meminfo.c -o objects/meminfo.o
mkdir -p `dirname objects/bytesize.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/bytesize.c -o objects/bytesize.o
mkdir -p `dirname objects/hmap.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hmap.c -o objects/hmap.o
mkdir -p `dirname objects/hash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/hash.c -o objects/hash.o
src/hash.c: In function ‘murmur3’:
src/hash.c:24:11: warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]
   24 |     h = *((unsigned long long*)buf128);
      |          ~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/hash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/fasthash.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/fasthash.c -o objects/fasthash.o
src/fasthash.c: In function ‘fasthash64’:
src/fasthash.c:54:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   54 |  case 7: v ^= (uint64_t)pos2[6] << 48;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:55:2: note: here
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |  ^~~~
src/fasthash.c:55:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   55 |  case 6: v ^= (uint64_t)pos2[5] << 40;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:56:2: note: here
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |  ^~~~
src/fasthash.c:56:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   56 |  case 5: v ^= (uint64_t)pos2[4] << 32;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:57:2: note: here
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |  ^~~~
src/fasthash.c:57:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   57 |  case 4: v ^= (uint64_t)pos2[3] << 24;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:58:2: note: here
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |  ^~~~
src/fasthash.c:58:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   58 |  case 3: v ^= (uint64_t)pos2[2] << 16;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:59:2: note: here
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |  ^~~~
src/fasthash.c:59:12: warning: this statement may fall through [-Wimplicit-fallthrough=]
   59 |  case 2: v ^= (uint64_t)pos2[1] << 8;
      |          ~~^~~~~~~~~~~~~~~~~~~~~~~~~
src/fasthash.c:60:2: note: here
   60 |  case 1: v ^= (uint64_t)pos2[0];
      |  ^~~~
src/fasthash.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/murmur3.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/murmur3.c -o objects/murmur3.o
src/murmur3.c: In function ‘MurmurHash3_x86_32’:
src/murmur3.c:110:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  110 |   case 3: k1 ^= tail[2] << 16;
      |           ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:111:3: note: here
  111 |   case 2: k1 ^= tail[1] << 8;
      |   ^~~~
src/murmur3.c:111:14: warning: this statement may fall through [-Wimplicit-fallthrough=]
  111 |   case 2: k1 ^= tail[1] << 8;
      |           ~~~^~~~~~~~~~~~~~~
src/murmur3.c:112:3: note: here
  112 |   case 1: k1 ^= tail[0];
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x86_128’:
src/murmur3.c:186:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  186 |   case 15: k4 ^= tail[14] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:187:3: note: here
  187 |   case 14: k4 ^= tail[13] << 8;
      |   ^~~~
src/murmur3.c:187:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  187 |   case 14: k4 ^= tail[13] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:188:3: note: here
  188 |   case 13: k4 ^= tail[12] << 0;
      |   ^~~~
src/murmur3.c:189:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  189 |            k4 *= c4; k4  = ROTL32(k4,18); k4 *= c1; h4 ^= k4;
      |                                                     ~~~^~~~~
src/murmur3.c:191:3: note: here
  191 |   case 12: k3 ^= tail[11] << 24;
      |   ^~~~
src/murmur3.c:191:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  191 |   case 12: k3 ^= tail[11] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:192:3: note: here
  192 |   case 11: k3 ^= tail[10] << 16;
      |   ^~~~
src/murmur3.c:192:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  192 |   case 11: k3 ^= tail[10] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:193:3: note: here
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |   ^~~~
src/murmur3.c:193:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  193 |   case 10: k3 ^= tail[ 9] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:194:3: note: here
  194 |   case  9: k3 ^= tail[ 8] << 0;
      |   ^~~~
src/murmur3.c:195:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  195 |            k3 *= c3; k3  = ROTL32(k3,17); k3 *= c4; h3 ^= k3;
      |                                                     ~~~^~~~~
src/murmur3.c:197:3: note: here
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |   ^~~~
src/murmur3.c:197:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  197 |   case  8: k2 ^= tail[ 7] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:198:3: note: here
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |   ^~~~
src/murmur3.c:198:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  198 |   case  7: k2 ^= tail[ 6] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:199:3: note: here
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |   ^~~~
src/murmur3.c:199:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  199 |   case  6: k2 ^= tail[ 5] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:200:3: note: here
  200 |   case  5: k2 ^= tail[ 4] << 0;
      |   ^~~~
src/murmur3.c:201:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  201 |            k2 *= c2; k2  = ROTL32(k2,16); k2 *= c3; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:203:3: note: here
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |   ^~~~
src/murmur3.c:203:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  203 |   case  4: k1 ^= tail[ 3] << 24;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:204:3: note: here
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |   ^~~~
src/murmur3.c:204:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  204 |   case  3: k1 ^= tail[ 2] << 16;
      |            ~~~^~~~~~~~~~~~~~~~~
src/murmur3.c:205:3: note: here
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |   ^~~~
src/murmur3.c:205:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  205 |   case  2: k1 ^= tail[ 1] << 8;
      |            ~~~^~~~~~~~~~~~~~~~
src/murmur3.c:206:3: note: here
  206 |   case  1: k1 ^= tail[ 0] << 0;
      |   ^~~~
src/murmur3.c: In function ‘MurmurHash3_x64_128’:
src/murmur3.c:276:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  276 |   case 15: k2 ^= (uint64_t)(tail[14]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:277:3: note: here
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |   ^~~~
src/murmur3.c:277:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  277 |   case 14: k2 ^= (uint64_t)(tail[13]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:278:3: note: here
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |   ^~~~
src/murmur3.c:278:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  278 |   case 13: k2 ^= (uint64_t)(tail[12]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:279:3: note: here
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |   ^~~~
src/murmur3.c:279:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  279 |   case 12: k2 ^= (uint64_t)(tail[11]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:280:3: note: here
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |   ^~~~
src/murmur3.c:280:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  280 |   case 11: k2 ^= (uint64_t)(tail[10]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:281:3: note: here
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |   ^~~~
src/murmur3.c:281:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  281 |   case 10: k2 ^= (uint64_t)(tail[ 9]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:282:3: note: here
  282 |   case  9: k2 ^= (uint64_t)(tail[ 8]) << 0;
      |   ^~~~
src/murmur3.c:283:56: warning: this statement may fall through [-Wimplicit-fallthrough=]
  283 |            k2 *= c2; k2  = ROTL64(k2,33); k2 *= c1; h2 ^= k2;
      |                                                     ~~~^~~~~
src/murmur3.c:285:3: note: here
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |   ^~~~
src/murmur3.c:285:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  285 |   case  8: k1 ^= (uint64_t)(tail[ 7]) << 56;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:286:3: note: here
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |   ^~~~
src/murmur3.c:286:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  286 |   case  7: k1 ^= (uint64_t)(tail[ 6]) << 48;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:287:3: note: here
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |   ^~~~
src/murmur3.c:287:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  287 |   case  6: k1 ^= (uint64_t)(tail[ 5]) << 40;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:288:3: note: here
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |   ^~~~
src/murmur3.c:288:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  288 |   case  5: k1 ^= (uint64_t)(tail[ 4]) << 32;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:289:3: note: here
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |   ^~~~
src/murmur3.c:289:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  289 |   case  4: k1 ^= (uint64_t)(tail[ 3]) << 24;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:290:3: note: here
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |   ^~~~
src/murmur3.c:290:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  290 |   case  3: k1 ^= (uint64_t)(tail[ 2]) << 16;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:291:3: note: here
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |   ^~~~
src/murmur3.c:291:15: warning: this statement may fall through [-Wimplicit-fallthrough=]
  291 |   case  2: k1 ^= (uint64_t)(tail[ 1]) << 8;
      |            ~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~
src/murmur3.c:292:3: note: here
  292 |   case  1: k1 ^= (uint64_t)(tail[ 0]) << 0;
      |   ^~~~
src/murmur3.c: At top level:
cc1: warning: unrecognized command line option ‘-Wno-unknown-warning-option’
mkdir -p `dirname objects/status.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/status.c -o objects/status.o
mkdir -p `dirname objects/user_input.o`
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -c src/user_input.c -o objects/user_input.o
cc  -Iinclude -Wall -Wextra -Wdisabled-optimization -Winline -Wdouble-promotion -Wunknown-pragmas -Wno-unknown-warning-option -mtune=native -ffast-math -O2 -D NDEBUG -O2 -D NDEBUG -o duplicut  objects/main.o  objects/thpool.o  objects/file.o  objects/chunk.o  objects/line.o  objects/tag_duplicates.o  objects/optparse.o  objects/config.o  objects/error.o  objects/memstate.o  objects/meminfo.o  objects/bytesize.o  objects/hmap.o  objects/hash.o  objects/fasthash.o  objects/murmur3.o  objects/status.o  objects/user_input.o -lm -pthread
strip -s duplicut

Feature request: Word length

Hi
Using the same technique as for duplicates, is it possible to remove words that are <> a certain word length. (min/max word length)
Also if you could use the same method for splitting a wordlist into separate wordlists based on word length? so it can split one large wordlist into separate lists such as 8chr words,9chr,10chr etc

Just a thought as you've nailed the duplicates with this technique

Verbose output

Hi,
I think it would be a nice addition to have duplicut show how many duplicate entries got removed and how many junk entries were filtered out etc.

Seems like it hangs

I am using it on Kali Linux 2019.4 VM
I gave the VM 24 GB of RAM and 4 cores so resources shouldn't be a problem
The command I used:
/root/duplicut/duplicut combined.txt -l 60 -o clean1.txt
time: 0:01:44:58 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
time: 0:01:56:10 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
time: 0:02:02:24 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
time: 0:02:20:17 5.00% (ETA: unknown) step 2/3: cleaning chunk 1/5 ...
The file I am trying to clean is 24GB and it's been sitting on 5% for more then 3 hours so far

Inconsistency sometimes occurs across multiple runs on the same file

Running with the default command on larger files (over 1GB) leads to inconsistency across multiple runs

$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'

duplicut successfully removed 0 duplicates and 42 filtered lines in 05 seconds

$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'

duplicut successfully removed 384 duplicates and 0 filtered lines in 02 seconds

$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'

duplicut successfully removed 0 duplicates and 384 filtered lines in 02 seconds

$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'

duplicut successfully removed 0 duplicates and 385 filtered lines in 02 seconds

$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'

duplicut successfully removed 221 duplicates and 385 filtered lines in 02 seconds

$ duplicut '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.found' -o '/media/sf_kalishare/Wordlists/hashmob.net_2023-05-14.medium.DUPLICUT'

duplicut successfully removed 378 duplicates and 385 filtered lines in 02 seconds

Any idea why this is occurring? i was expecting the same results over and over again,
On further testing it seems the cleanup stats go wild when writing to an already existing file, resulting in various inconsistent file size and words.

Transform lines to lowercase

Using wordlists typically are used with transform rules. Personally I would like a option to transform all lines to lowercase.

Perhaps additionally even a option to strip [0-9] and specials characters.

Use a more performant hash function

Xxhash, used by rling, and t1ha (solar designer thinks of it as a hash function candidate for jtr) are good candidates to check if performance of duplicut increases

[BUG] duplict hangs-up with specific wordlist

trying to launch duplicut on the following wordlist triggers a bug on duplicut, he stays alive and never exits

wordlist:

445
139
3389
5985
135
137
80
443
22
50000
21
1720
80
443
143
623
3306
110
5432
25
22
23
1521
50013
161
2222
17185
135
8080
4848
1433
5560
512
513
514
445
5900
5901
5902
5903
5904
5905
5906
5907
5908
5909
5038
111
139
49
515
7787
2947
7144
9080
8812
2525
2207
3050
5405
1723
1099
5555
921
10001
123
3690
548
617
6112
6667
3632
783
10050
38292
12174
2967
5168
3628
7777
6101
10000
6504
41523
41524
2000
1900
10202
6503
6070
6502
6050
2103
41025
44334
2100
5554
12203
26000
4000
1000
8014
5250
34443
8028
8008
7510
9495
1581
8000
18881
57772
9090
9999
81
3000
8300
8800
8090
389
10203
5093
1533
13500
705
4659
20031
16102
6080
6660
11000
19810
3057
6905
1100
10616
10628
5051
1582
65535
105
22222
30000
113
1755
407
1434
2049
689
3128
20222
20034
7580
7579
38080
12401
910
912
11234
46823
5061
5060
2380
69
5800
62514
42
5631
902
5985
5986
6000
6001
6002
6003
6004
6005
6006
6007
47001
523
3500
6379
8834

[enhancement] Sort options

Hi! First of all - thank you for these project.
Could I ask you to think about sort feature implementation?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.