Giter VIP home page Giter VIP logo

edoardottt / cariddi Goto Github PK

View Code? Open in Web Editor NEW
1.4K 13.0 148.0 511 KB

Take a list of domains, crawl urls and scan for endpoints, secrets, api keys, file extensions, tokens and more

Home Page: https://edoardoottavianelli.it

License: GNU General Public License v3.0

Go 98.83% Makefile 0.35% Batchfile 0.82%
endpoints endpoint-discovery bugbounty crawler secret-keys secrets-detection infosec reconnaissance recon crawling

cariddi's People

Contributors

cyb3rjerry avatar dependabot[bot] avatar edoardottt avatar mrnfrancesco avatar noraj avatar ocervell avatar rodnt avatar w1kend avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cariddi's Issues

Regex (or other method) for Intensive mode

Bug description
When cariddi runs without -intensive flag set it works fine, all the urls crawled on the target(s) belong to the input domain(s).
This is the method used for normal behaviour (line 108). As you can notice on line 108 there is a colly option to restrict the crawler to crawl only urls belonging to the inputted domain(s).

Instead, when you use the -intensive flag set, this is the method used: from line 114 to 119.
The problem with this method is that there could be some false positives, like facebook.com?q=c.target.com.
I'm trying to figure out how pick only target urls even if cariddi is running in intensive mode.

Just to be clear:

  • The normal behaviour (without -intensive) is that if you pass as input target.com, cariddi will crawl only (and strictly) urls belonging to target.com.
  • If the -intensive mode is set, cariddi should crawl all the urls belonging to *.target.com.

Additional context
Go-colly is the module used for crawling.

Cant parse url list with ports

Describe the bug
If you provide url with port (not every web app stay on 80 and 443 port =) ) - cariddi cant parse it

like as

echo http://ya.ru:80 | cariddi
The URL provided is not built in a proper way: http://ya.ru:80

slice bounds out of range

Describe the bug
After a while we (for some reason) get a slice out of bounds errors

goroutine 25981 [running]:
github.com/edoardottt/cariddi/pkg/crawler.New.func15(0xc072bd1380)
	/opt/hacking/repository/cariddi/pkg/crawler/colly.go:215 +0x950
github.com/gocolly/colly.(*Collector).handleOnResponse(0xc072bd1380?, 0xc072bd1380)
	/home/cyb3rjerry/go/pkg/mod/github.com/gocolly/[email protected]/colly.go:936 +0x1be
github.com/gocolly/colly.(*Collector).fetch(0xc00018d520, {0x0?, 0x0?}, {0x8c1eaa, 0x3}, 0x1, {0x0?, 0x0}, 0x0?, 0xc0304a96e0, ...)
	/home/cyb3rjerry/go/pkg/mod/github.com/gocolly/[email protected]/colly.go:621 +0x69b
created by github.com/gocolly/colly.(*Collector).scrape
	/home/cyb3rjerry/go/pkg/mod/github.com/gocolly/[email protected]/colly.go:532 +0x645

To Reproduce
Steps to reproduce the behavior:

  1. run echo "https://mapbox.com/" | cariddi -s -intensive

Desktop (please complete the following information):
Linux cyb3rjerry 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

Second ctrl+c should force quit the program

Describe the bug
CTRL+C once initiates the "smooth" exit correctly. However, pressing it a second time should "force quit" as it currently simply hangs

To Reproduce
Steps to reproduce the behavior:

  1. Run any long scan
  2. Press CTRL+C twice
  3. See error

Expected behavior
Force quit on second CTRL+C

Screenshots
image

Desktop (please complete the following information):
Linux cyb3rjerry 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

help

internal/unsafeheader

compile: version "go1.17.6" does not match go tool version "go1.18.2"

internal/cpu

compile: version "go1.17.6" does not match go tool version "go1.18.2"

internal/abi

how to fix it

thx.

Raw requests improvements

A few useful features would be nice for storing raw responses (-sr):

Rationale:
A custom directory is nice to have when you run in distributed environments and you want to save all requests to a shared mount for instance, so that later you can run batch tools to get all the raw requests and analyze them offline (with cariddi, or nuclei, etc...)

The stored_response_path can be useful when saving the results to a database, so that we can later retrieve the corresponding response txt file.

Too many files open

Describe the bug
When there are too much requests ongoing cariddi opens too much times the output files (especially with both -ot and -oh enabled).

To Reproduce
Just run cariddi with txt and html output enabled.
echo <TARGET> | cariddi -e -ot out-txt -oh out-html

Expected behavior
Run without problems and finish, don't interrupt with errors.

Error

2021/07/03 16:16:40 open output-cariddi/<TARGET>.results.txt: too many open files
2021/07/03 16:16:40 invalid argument

Proxy settings not honored

It seems that proxies are not honored, by looking at Wireshark traffic I see some requests not going through any proxy.

I think this is related to gocolly/colly#392

We probably need to set

c.WithTransport(&http.Transport{
  DisableKeepAlives: true,
})

in the code here

The URL provided is not built in a proper way: ${URL}

The URL provided is not built in a proper way: www.edoardoottavianelli.it

I tried with the sample I saw on the video but still not working. I have all packages and libraries, GO included so I don't know why is not working. Have a nice Sunday!

*EDIT
It works using powershell but not terminal.
Also some endpoints don't work, maybe because they are private e.g.: https://api.nike.com

Enhancement of Sorting URLs like URO

Hi @edoardottt

Hope you are doing well !!

Cariddi : An Awesome & Nice tool, I came through recently and using actively

Effectively collecting/crawling endpoints, finding juicy things & more...

Describe the solution you'd like

I would like to suggest an Enhancement of Sorting URLs like URO for enhanced output without large results with multiple duplicates/junks

Describe alternatives you've considered

Currently using URO written in python

Additional context

Example Output

Kindly please let me know if anything I missed out, I'm glad to assist with pleasure on any questions/infos if required further

Thanks & Regards,
@zy9ard3

Add JSON functionality and console reporting of secrets, errors

Would be great if the tool could take a -json option to output JSON like similar tools (katana, gospider, gau).

It could output JSON Lines that way and have a 'type' key for secrets and regex matches etc... instead of putting everything in a folder.

Example output:

{"source":"href","type":"url","output":"https://example.com/path/","status":403,"length":140}
{"source":"body","type":"url","output":"https://example.com/path2/","status":403,"length":140}
{"source":"https://example.com/path3.js?v=1652869476","type":"secretfinder","output":"<AWS_SECRET_KEY>","status":0,"length":0}

Thoughts ?

Problem with Crawling POST Parameters

Hello Developers,,

The Tool is great after many scans I've discover and be sure that the tool not crawling and catch all parameters in pages specially the "parameters" in the "Filter Categories" most of this "Filters" are with POST requests

Here a live example for the Filter Categories
when you check the filter the parameter will be added so u can see it in the URL

i try to scan with many mode -ext -e -c all of them didnt catch the POST parameters

?manufacturer=1-batella&c=1-baby-food&label=1-new&s[flavour]=mango&price_from=2&price_to=3

Step 1
https://i.ibb.co/0KNTH20/1.png

Step 2
https://i.ibb.co/y69q7x7/2.png

Step 3
https://i.ibb.co/YpD7vWJ/3.png

Hope my explain is clear and I hope the developer's find a solution to fix the crawl techniques to make the tool Crawl like this POST requests to make the tool extract more "parameters"

Best Regards,, and keep this tool UP!

Add intensive switch

When providing a list of subdomains to cariddi, like:

cat alive.txt | cariddi

normally it takes one line once and it fires a crawler instance looking for resources in that website.
If the first line (just as example) is sub1.example.com, it will look only for resources in that subdomain, having as statement in the colly (colly, the golang based crawler) settings:

c.AllowedDomains(target)

this means if there are resources pointing to login.example.com cariddi will not consider them.

With the -intensive switch cariddi don't care about allowed domains, but there is a regex matching the 2nd level domain.
This means that every resource contained in sub1.example.com, also not belonging to the exact subdomain will be crawled.
This is an high computing process and it's likely to have an high rate of duplicates resources in the standard output, but not in the output file.

This means 2 things:

  1. You want to go really deep in the recon process.
  2. You will see a lot of duplicates in the standard output but not in the output file (target.results.txt).

So, please don't use the stdout as input for another command, but use the txt output file!.

Add https://github.com/aels/subdirectories-discover as a wordlist

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
Ways to enrich crawler results.
Describe the solution you'd like
A clear and concise description of what you want to happen.
Add more lists to the brute force stage.
Taken from there https://github.com/aels/subdirectories-discover for example (yes, I'm the curator of this list)

Signal Killed error

When using -cache and cariddi reads data really fast using the cache it eats too much RAM memory and the process crash with error signal: killed.

Fast solution: don't use -cache. Sorry for this. If you have ideas or knowledge about how to solve this problem just comment down here.

Scan Only

@edoardottt Would be helpful if cariddi have a feature to only scan endpoints (not crawl) like with an argument -only-scan

echo "https://example.com/x.js" | cariddi -only-scan
cat "links.txt" | cariddi -onlyscan

JSON lines aggregate results

When crawling a target and searching for info, when multiple matches are in the same URL the JSON struct holds two distinct elements instead of one element using an array:

Now:

{
  "url": "http://testphp.vulnweb.com/index.php",
  "method": "GET",
  "status_code": 200,
  "words": 388,
  "lines": 110,
  "content_type": "text/html",
  "matches": {
    "filetype": {
      "extension": "php",
      "severity": 5
    },
    "infos": [
      {
        "name": "Email address",
        "match": "[email protected]"
      },
      {
        "name": "HTML comment",
        "match": "<!-- InstanceEndEditable -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- here goes headers headers -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- end masthead -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- begin content -->"
      },
      {
        "name": "HTML comment",
        "match": "<!--end content -->"
      },
      {
        "name": "HTML comment",
        "match": "<!--end navbar -->"
      },
      {
        "name": "HTML comment",
        "match": "<!-- InstanceEnd -->"
      }
    ]
  }
}

Desired output:

{
  "url": "http://testphp.vulnweb.com/index.php",
  "method": "GET",
  "status_code": 200,
  "words": 388,
  "lines": 110,
  "content_type": "text/html",
  "matches": {
    "filetype": {
      "extension": "php",
      "severity": 5
    },
    "infos": [
      {
        "name": "Email address",
        "match": [
          "[email protected]"
        ]
      },
      {
        "name": "HTML comment",
        "match": [
          "<!-- InstanceEndEditable -->",
          "<!-- here goes headers headers -->",
          "<!-- end masthead -->",
          "<!-- begin content -->",
          "<!--end content -->",
          "<!--end navbar -->",
          "<!-- InstanceEnd -->"
        ]
      }
    ]
  }
}

cc @ocervell what do you think?

-i docs doesn't ignore subdomains containing "docs"

Describe the bug
When crawling https://mapbox.com/ we notice that "docs.mapbox.com" gets crawled

To Reproduce
Steps to reproduce the behavior:

  1. Run echo "https://mapbox.com/" | cariddi -s -intensive -i docs
  2. Look at output

Expected behavior
docs.* not to be crawler

Desktop (please complete the following information):

  • OS: Linux WKS-001772 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Browser [e.g. chrome, safari]
  • Version: v1.1.9

Panic while compiling some regex during a find the secrets run

Describe the bug
Panic while compiling some regex during a find the secrets (-s) run. It also happens with the -e flag as well.

panic: regexp: Compile(`*`): error parsing regexp: missing argument to repetition operator: `*`

goroutine 1 [running]:
regexp.MustCompile(0x14fb20a, 0x1, 0x0)
	/usr/local/Cellar/go/1.16.4/libexec/src/regexp/regexp.go:311 +0x157
github.com/edoardottt/cariddi/crawler.Crawler(0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x2, 0x14, 0x1, 0x0, ...)
	/Users/sean/repos/cariddi/crawler/colly.go:54 +0x1ce
main.main()
	/Users/sean/repos/cariddi/main.go:91 +0x3cf

To Reproduce
Steps to reproduce the behavior:

  1. Create urls file with a valid url in it
  2. Run the following command: cat urls|./cariddi -d 2 -s
  3. See stack trace shortly after launching

Expected behavior
Cariddi should process the provided site and find any/all secrets

Desktop (please complete the following information):

  • OS: Mac OS
  • Version: 11.4 (Bug Sur)

Initial call to robots.txt and sitemap.xml don't enforce ignored words

After review the code I've noticed that these two files don't follow the enforced ignored list that's passed via -i

@edoardottt Since it's only two calls at the very beginning do you feel like we should enfored the check or we should leave it as is? As the creator I feel like you would have the best insight on what behavior should/should not be enforced :)

path, err := urlUtils.GetPath(protocolTemp + "://" + target)
if err == nil {
if path == "" {
err = c.Visit(protocolTemp + "://" + target + "/" + "robots.txt")
if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
log.Println(err)
}
err = c.Visit(protocolTemp + "://" + target + "/" + "sitemap.xml")
if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
log.Println(err)
}
} else if path == "/" {
err = c.Visit(protocolTemp + "://" + target + "robots.txt")
if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
log.Println(err)
}
err = c.Visit(protocolTemp + "://" + target + "sitemap.xml")
if err != nil && debug && !errors.Is(err, colly.ErrAlreadyVisited) {
log.Println(err)
}
}
}

"domain formatted in a bad way" kills scan and debug doesn't give any info on the URL that caused this

Describe the bug
Hey there, so after running the scanner on an endpoint, I noticed that out of the blue I get a domain formatted in a bad way error. However, even when enabling -debug, I get no additional info as to what URL has killed the scan. Is this intended?

To Reproduce
Steps to reproduce the behavior:

  1. Run echo "https://xxxxxx.com" | cariddi -s -sf regex.txt -intensive -ot results.txt
  2. Wait
  3. Observe domain formatted in a bad way error

Expected behavior
Would it not be better to simply skip that URL rather than killing the scan via os.Exit(1)?

Screenshots
image

Desktop (please complete the following information):

  • OS: Linux XXXXXXXX 5.19.0-23-generic 24-Ubuntu SMP PREEMPT_DYNAMIC Fri Oct 14 15:39:57 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
  • Version: Cariddi v1.1.9

Additional context

Let me know if you'd appreciate a hand in any way!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.