edoardottt / cariddi Goto Github PK

View Code? Open in Web Editor NEW

1.4K 13.0 148.0 511 KB

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

Home Page: https://edoardoottavianelli.it

License: GNU General Public License v3.0

Go 98.83% Makefile 0.35% Batchfile 0.82%

endpoints endpoint-discovery bugbounty crawler secret-keys secrets-detection infosec reconnaissance recon crawling

cariddi's People

Contributors

Stargazers

Watchers

Forkers

j5s attacker-codeninja silentsoul04 mikesdsd laowang1026 afwu aali99 slooppe johann-smith marcoshat mahdih010 github-user-akash jermainlaforce heberisc legly 5l1v3r1 wangziguilai zmdprogrom dheerajmadhukar lankabhedi phuong39 genelli2021 bruto001 madusec rajivraj minkione lijinta1984 polling-repo-continua arunsigood tk-t0n0y hackinfinity bernardmasika bbhunter bug-tools segudev an0nym0u5101 ceyhuncamli mnemosdev optionalg tor-0 mgcfish hartl3y94 stish834 chhajershrenik sirduraeerecto tomb-rider navin-hacsociety wisdark vijaysimha123 tadryanom cybversum 0xsojalsec rajasekher449 excloudx6 ruevaughn paulinhoneto sylviagaytaneh2021 fostane namnhat239 carboncrystal ti0sec s3rgeym w1kend the1976 g6kg wavesurfmaster araselmir ssteo sunandm designerscoders cyb3rjerry haka110 data-gami udahacker enigmatyk israelccarvalho flaviupopescu mvandermeulen jeffry38 khanjanny ocervell brahst xxpaike dusheeno isgasho 0xy4q mdsabbirkhan 0xnull-ops ahmed-alhameedawi titounkle itcms ip-rw gprime31 ronin-dojo short blue-infosec omaramin17 gdft2112 blackkhawkk iq-scm

cariddi's Issues

Wrap the results in a struct

#94 (comment)

cc @cyb3rjerry

Regex (or other method) for Intensive mode

Bug description
When cariddi runs without -intensive flag set it works fine, all the urls crawled on the target(s) belong to the input domain(s).
This is the method used for normal behaviour (line 108). As you can notice on line 108 there is a colly option to restrict the crawler to crawl only urls belonging to the inputted domain(s).

Instead, when you use the -intensive flag set, this is the method used: from line 114 to 119.
The problem with this method is that there could be some false positives, like facebook.com?q=c.target.com.
I'm trying to figure out how pick only target urls even if cariddi is running in intensive mode.

Just to be clear:

The normal behaviour (without -intensive) is that if you pass as input target.com, cariddi will crawl only (and strictly) urls belonging to target.com.
If the -intensive mode is set, cariddi should crawl all the urls belonging to *.target.com.

Additional context
Go-colly is the module used for crawling.

Cant parse url list with ports

Describe the bug
If you provide url with port (not every web app stay on 80 and 443 port =) ) - cariddi cant parse it

like as

echo http://ya.ru:80 | cariddi
The URL provided is not built in a proper way: http://ya.ru:80

Pass a custom directory for raw responses: `-srd /path/to/responses/dir` (similar to `httpx` / `katana`)

See #128

Default requests

sitemap.xml
robots.txt

slice bounds out of range

Describe the bug
After a while we (for some reason) get a slice out of bounds errors

goroutine 25981 [running]:
github.com/edoardottt/cariddi/pkg/crawler.New.func15(0xc072bd1380)
	/opt/hacking/repository/cariddi/pkg/crawler/colly.go:215 +0x950
github.com/gocolly/colly.(*Collector).handleOnResponse(0xc072bd1380?, 0xc072bd1380)
	/home/cyb3rjerry/go/pkg/mod/github.com/gocolly/[email protected]/colly.go:936 +0x1be
github.com/gocolly/colly.(*Collector).fetch(0xc00018d520, {0x0?, 0x0?}, {0x8c1eaa, 0x3}, 0x1, {0x0?, 0x0}, 0x0?, 0xc0304a96e0, ...)
	/home/cyb3rjerry/go/pkg/mod/github.com/gocolly/[email protected]/colly.go:621 +0x69b
created by github.com/gocolly/colly.(*Collector).scrape
	/home/cyb3rjerry/go/pkg/mod/github.com/gocolly/[email protected]/colly.go:532 +0x645

To Reproduce
Steps to reproduce the behavior:

run echo "https://mapbox.com/" | cariddi -s -intensive

Desktop (please complete the following information):
Linux cyb3rjerry 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Insert more than one false positive

Second ctrl+c should force quit the program

Describe the bug
CTRL+C once initiates the "smooth" exit correctly. However, pressing it a second time should "force quit" as it currently simply hangs

To Reproduce
Steps to reproduce the behavior:

Run any long scan
Press CTRL+C twice
See error

Expected behavior
Force quit on second CTRL+C

Screenshots

Desktop (please complete the following information):
Linux cyb3rjerry 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Add release for windrows10 users

Add a exe binary

Add read request from file functionality

https://stackoverflow.com/questions/33963467/parse-http-requests-and-responses-from-text-file-in-go

help

internal/unsafeheader

compile: version "go1.17.6" does not match go tool version "go1.18.2"

internal/cpu

compile: version "go1.17.6" does not match go tool version "go1.18.2"

internal/abi

how to fix it

thx.

Raw requests improvements

A few useful features would be nice for storing raw responses (-sr):

#129
#130

Rationale:
A custom directory is nice to have when you run in distributed environments and you want to save all requests to a shared mount for instance, so that later you can run batch tools to get all the raw requests and analyze them offline (with cariddi, or nuclei, etc...)

The stored_response_path can be useful when saving the results to a database, so that we can later retrieve the corresponding response txt file.

Too many files open

Describe the bug
When there are too much requests ongoing cariddi opens too much times the output files (especially with both -ot and -oh enabled).

To Reproduce
Just run cariddi with txt and html output enabled.
echo <TARGET> | cariddi -e -ot out-txt -oh out-html

Expected behavior
Run without problems and finish, don't interrupt with errors.

Error

2021/07/03 16:16:40 open output-cariddi/<TARGET>.results.txt: too many open files
2021/07/03 16:16:40 invalid argument

Proxy settings not honored

It seems that proxies are not honored, by looking at Wireshark traffic I see some requests not going through any proxy.

I think this is related to gocolly/colly#392

We probably need to set

c.WithTransport(&http.Transport{
  DisableKeepAlives: true,
})

in the code here

The URL provided is not built in a proper way: ${URL}

The URL provided is not built in a proper way: www.edoardoottavianelli.it

I tried with the sample I saw on the video but still not working. I have all packages and libraries, GO included so I don't know why is not working. Have a nice Sunday!

*EDIT
It works using powershell but not terminal.
Also some endpoints don't work, maybe because they are private e.g.: https://api.nike.com

Enhancement of Sorting URLs like URO

Hi @edoardottt

Hope you are doing well !!

Cariddi : An Awesome & Nice tool, I came through recently and using actively

Effectively collecting/crawling endpoints, finding juicy things & more...

Describe the solution you'd like

I would like to suggest an Enhancement of Sorting URLs like URO for enhanced output without large results with multiple duplicates/junks

Describe alternatives you've considered

Currently using URO written in python

Additional context

Kindly please let me know if anything I missed out, I'm glad to assist with pleasure on any questions/infos if required further

Thanks & Regards,
@zy9ard3

Document input flags in code

Document input flags in code.

https://github.com/edoardottt/cariddi/blob/devel/pkg/input/flags.go

Add JSON functionality and console reporting of secrets, errors

Would be great if the tool could take a -json option to output JSON like similar tools (katana, gospider, gau).

It could output JSON Lines that way and have a 'type' key for secrets and regex matches etc... instead of putting everything in a folder.

Example output:

{"source":"href","type":"url","output":"https://example.com/path/","status":403,"length":140}
{"source":"body","type":"url","output":"https://example.com/path2/","status":403,"length":140}
{"source":"https://example.com/path3.js?v=1652869476","type":"secretfinder","output":"<AWS_SECRET_KEY>","status":0,"length":0}

Thoughts ?

Add Proxy mode

Reference: https://pkg.go.dev/github.com/gocolly/colly#Collector.SetProxy

Problem with Crawling POST Parameters

Hello Developers,,

The Tool is great after many scans I've discover and be sure that the tool not crawling and catch all parameters in pages specially the "parameters" in the "Filter Categories" most of this "Filters" are with POST requests

Here a live example for the Filter Categories
when you check the filter the parameter will be added so u can see it in the URL

i try to scan with many mode -ext -e -c all of them didnt catch the POST parameters

?manufacturer=1-batella&c=1-baby-food&label=1-new&s[flavour]=mango&price_from=2&price_to=3

Step 1
https://i.ibb.co/0KNTH20/1.png

Step 2
https://i.ibb.co/y69q7x7/2.png

Step 3
https://i.ibb.co/YpD7vWJ/3.png

Hope my explain is clear and I hope the developer's find a solution to fix the crawl techniques to make the tool Crawl like this POST requests to make the tool extract more "parameters"

Best Regards,, and keep this tool UP!

Add intensive switch

When providing a list of subdomains to cariddi, like:

cat alive.txt | cariddi

normally it takes one line once and it fires a crawler instance looking for resources in that website.
If the first line (just as example) is sub1.example.com, it will look only for resources in that subdomain, having as statement in the colly (colly, the golang based crawler) settings:

c.AllowedDomains(target)

this means if there are resources pointing to login.example.com cariddi will not consider them.

With the -intensive switch cariddi don't care about allowed domains, but there is a regex matching the 2nd level domain.
This means that every resource contained in sub1.example.com, also not belonging to the exact subdomain will be crawled.
This is an high computing process and it's likely to have an high rate of duplicates resources in the standard output, but not in the output file.

This means 2 things:

You want to go really deep in the recon process.
You will see a lot of duplicates in the standard output but not in the output file (target.results.txt).

So, please don't use the stdout as input for another command, but use the txt output file!.

Custom User Agent

Add the option -ua (User Agent) to set a custom User Agent.

e.g. cat targets | cariddi -ua "Chrome1337 haxx v1311121"

Simple as that: http://go-colly.org/docs/introduction/configuration/

Add cookie support

https://pkg.go.dev/github.com/gocolly/colly#Collector.SetCookies

Add https://github.com/aels/subdirectories-discover as a wordlist

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Ways to enrich crawler results.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Add more lists to the brute force stage.
Taken from there https://github.com/aels/subdirectories-discover for example (yes, I'm the curator of this list)

Improve Regex to avoid false positives

Context: #94 (comment)

Signal Killed error

When using -cache and cariddi reads data really fast using the cache it eats too much RAM memory and the process crash with error signal: killed.

Fast solution: don't use -cache. Sorry for this. If you have ideas or knowledge about how to solve this problem just comment down here.

Refactor visitHTMLLink()

Context: #94 (comment)

Cariddi use request and response txt file as input

Cariddi could use burp or any proxy output as input

Update and add secrets detection regexes

https://github.com/projectdiscovery/nuclei-templates/tree/master/file/keys

Insert AWS S3 Buckets detection

Add in scanner/secrets.go the Secret object with proper S3 Bucket regular expression to match s3 aws links.

Multiple info in the same URL

When a page contains multiple info (e.g. two comments) only the first one is included in results.

Control-C handle

When ctrl-c pressed, give the user a menu with some options

Add Random User Agent mode

Reference: https://pkg.go.dev/github.com/gocolly/colly/extensions#RandomUserAgent

Add JSON file output

see #103 (comment)

Scan Only

@edoardottt Would be helpful if cariddi have a feature to only scan endpoints (not crawl) like with an argument -only-scan

echo "https://example.com/x.js" | cariddi -only-scan
cat "links.txt" | cariddi -onlyscan

Fix code scanning alert - Incomplete regular expression for hostnames

Should be https://hooks\.slack\.com/...

Tracking issue for:

https://github.com/edoardottt/cariddi/security/code-scanning/3

Wrap the New() params into a single object

#94 (comment)

cc @cyb3rjerry

Insert Regex for General Error Pages

insert regex for 'general' error (java, sql, python, test pages) in scanner/secrets.go

Use the cache folder to improve performance

This line is commented now.
The _cache folder should be not used where the command is started, but only in a precise folder.
Which? ~? /usr/bin ?

Add direct sitemap support

http://go-colly.org/docs/examples/shopify_sitemap/

JSON lines aggregate results

When crawling a target and searching for info, when multiple matches are in the same URL the JSON struct holds two distinct elements instead of one element using an array:

Now:

{
  "url": "http://testphp.vulnweb.com/index.php",
  "method": "GET",
  "status_code": 200,
  "words": 388,
  "lines": 110,
  "content_type": "text/html",
  "matches": {
    "filetype": {
      "extension": "php",
      "severity": 5
    },
    "infos": [
      {
        "name": "Email address",
        "match": "[email protected]"
      },
      {
        "name": "HTML comment",
        "match": "<!-- InstanceEndEditable -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- here goes headers headers -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- end masthead -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- begin content -->"
      },
      {
        "name": "HTML comment",
        "match": "<!--end content -->"
      },
      {
        "name": "HTML comment",
        "match": "<!--end navbar -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- InstanceEnd -->"
      }
    ]
  }
}

Desired output:

{
  "url": "http://testphp.vulnweb.com/index.php",
  "method": "GET",
  "status_code": 200,
  "words": 388,
  "lines": 110,
  "content_type": "text/html",
  "matches": {
    "filetype": {
      "extension": "php",
      "severity": 5
    },
    "infos": [
      {
        "name": "Email address",
        "match": [
          "[email protected]"
        ]
      },
      {
        "name": "HTML comment",
        "match": [
          "<!-- InstanceEndEditable -->",
          "<!-- here goes headers headers -->",
          "<!-- end masthead -->",
          "<!-- begin content -->",
          "<!--end content -->",
          "<!--end navbar -->",
          "<!-- InstanceEnd -->"
        ]
      }
    ]
  }
}

cc @ocervell what do you think?

-i docs doesn't ignore subdomains containing "docs"

Describe the bug
When crawling https://mapbox.com/ we notice that "docs.mapbox.com" gets crawled

To Reproduce
Steps to reproduce the behavior:

Run echo "https://mapbox.com/" | cariddi -s -intensive -i docs
Look at output

Expected behavior
docs.* not to be crawler

Desktop (please complete the following information):

OS: Linux WKS-001772 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Browser [e.g. chrome, safari]
Version: v1.1.9

Panic while compiling some regex during a find the secrets run

Describe the bug
Panic while compiling some regex during a find the secrets (-s) run. It also happens with the -e flag as well.

panic: regexp: Compile(`*`): error parsing regexp: missing argument to repetition operator: `*`

goroutine 1 [running]:
regexp.MustCompile(0x14fb20a, 0x1, 0x0)
	/usr/local/Cellar/go/1.16.4/libexec/src/regexp/regexp.go:311 +0x157
github.com/edoardottt/cariddi/crawler.Crawler(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x14, 0x1, 0x0, ...)
	/Users/sean/repos/cariddi/crawler/colly.go:54 +0x1ce
main.main()
	/Users/sean/repos/cariddi/main.go:91 +0x3cf

To Reproduce
Steps to reproduce the behavior:

Create urls file with a valid url in it
Run the following command: cat urls|./cariddi -d 2 -s
See stack trace shortly after launching

Expected behavior
Cariddi should process the provided site and find any/all secrets

Desktop (please complete the following information):

OS: Mac OS
Version: 11.4 (Bug Sur)

Initial call to robots.txt and sitemap.xml don't enforce ignored words

After review the code I've noticed that these two files don't follow the enforced ignored list that's passed via -i

@edoardottt Since it's only two calls at the very beginning do you feel like we should enfored the check or we should leave it as is? As the creator I feel like you would have the best insight on what behavior should/should not be enforced :)

cariddi/pkg/crawler/colly.go

Lines 263 to 286 in f52de6c

 path, err := urlUtils.GetPath(protocolTemp + "://" + target) 

 if err == nil { 

 if path == "" { 

 err = c.Visit(protocolTemp + "://" + target + "/" + "robots.txt") 

 if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) { 

 log.Println(err) 

 } 

 err = c.Visit(protocolTemp + "://" + target + "/" + "sitemap.xml") 

 if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) { 

 log.Println(err) 

 } 

 } else if path == "/" { 

 err = c.Visit(protocolTemp + "://" + target + "robots.txt") 

 if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) { 

 log.Println(err) 

 } 

 err = c.Visit(protocolTemp + "://" + target + "sitemap.xml") 

 if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) { 

 log.Println(err) 

 } 

 } 

 }

Add make update command

"domain formatted in a bad way" kills scan and debug doesn't give any info on the URL that caused this

Describe the bug
Hey there, so after running the scanner on an endpoint, I noticed that out of the blue I get a domain formatted in a bad way error. However, even when enabling -debug, I get no additional info as to what URL has killed the scan. Is this intended?

To Reproduce
Steps to reproduce the behavior:

Run echo "https://xxxxxx.com" | cariddi -s -sf regex.txt -intensive -ot results.txt
Wait
Observe domain formatted in a bad way error

Expected behavior
Would it not be better to simply skip that URL rather than killing the scan via os.Exit(1)?

Screenshots

Desktop (please complete the following information):

OS: Linux XXXXXXXX 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Version: Cariddi v1.1.9

Additional context

Let me know if you'd appreciate a hand in any way!

Example Regex File

@edoardottt provide an example format for custom regex file.

	path, err := urlUtils.GetPath(protocolTemp + "://" + target)
	if err == nil {
	if path == "" {
	err = c.Visit(protocolTemp + "://" + target + "/" + "robots.txt")
	if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
	log.Println(err)
	}

	err = c.Visit(protocolTemp + "://" + target + "/" + "sitemap.xml")
	if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
	log.Println(err)
	}
	} else if path == "/" {
	err = c.Visit(protocolTemp + "://" + target + "robots.txt")
	if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
	log.Println(err)
	}

	err = c.Visit(protocolTemp + "://" + target + "sitemap.xml")
	if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
	log.Println(err)
	}
	}
	}

edoardottt / cariddi Goto Github PK

cariddi's People

Contributors

Stargazers

Watchers

Forkers

cariddi's Issues

internal/unsafeheader

internal/cpu

internal/abi

So, please don't use the stdout as input for another command, but use the txt output file!.

Recommend Projects

Recommend Topics

Recommend Org