Giter VIP home page Giter VIP logo

theharvester's Introduction

theHarvester

TheHarvester CI TheHarvester Docker Image CI Rawsec's CyberSecurity Inventory

What is this?

theHarvester is a simple to use, yet powerful tool designed to be used during the reconnaissance stage of a red
team assessment or penetration test. It performs open source intelligence (OSINT) gathering to help determine
a domain's external threat landscape. The tool gathers names, emails, IPs, subdomains, and URLs by using
multiple public resources that include:

Passive modules:

Active modules:

  • DNS brute force: dictionary brute force enumeration
  • Screenshots: Take screenshots of subdomains that were found

Modules that require an API key:

Documentation to setup API keys can be found at - https://github.com/laramies/theHarvester/wiki/Installation#api-keys

  • bevigil - Free upto 50 queries. Pricing can be found here: https://bevigil.com/pricing/osint
  • binaryedge - $10/month
  • bing
  • bufferoverun - uses the free API
  • censys - API keys are required and can be retrieved from your Censys account.
  • criminalip
  • fullhunt
  • github
  • hunter - limited to 10 on the free plan, so you will need to do -l 10 switch
  • hunterhow
  • intelx
  • netlas - $
  • onyphe -$
  • pentestTools - $
  • projecDiscovery - invite only for now
  • rocketreach - $
  • securityTrails
  • shodan - $
  • tomba - Free up to 50 search.
  • zoomeye

Install and dependencies:

Comments, bugs, and requests:

  • Twitter Follow Christian Martorella @laramies [email protected]
  • Twitter Follow Matthew Brown @NotoriousRebel1
  • Twitter Follow Jay "L1ghtn1ng" Townsend @jay_townsend1

Main contributors:

  • Twitter Follow Matthew Brown @NotoriousRebel1
  • Twitter Follow Jay "L1ghtn1ng" Townsend @jay_townsend1
  • Twitter Follow Lee Baird @discoverscripts

Thanks:

  • John Matherly - Shodan project
  • Ahmed Aboul Ela - subdomain names dictionaries (big and small)

theharvester's People

Contributors

apehex avatar as77c avatar blshkv avatar captain686 avatar chrissparksnj avatar dbfreem avatar dependabot-preview[bot] avatar dependabot[bot] avatar digininja avatar dkasak avatar fproldan avatar frapava98 avatar initbar avatar jenstimmerman avatar jzold avatar kernelpan1k avatar l1ghtn1ng avatar laramies avatar leebaird avatar may55 avatar mmynk avatar munahaf avatar notoriousrebel avatar pierce403 avatar tdefise avatar thehappydinoa avatar thehexable avatar wez3 avatar yalattas avatar yoonthegoon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

theharvester's Issues

Clean up emails

Convert emails to lower case.
Delete emails that contain a comma.
Delete emails that start with 3 periods: (...)
Delete emails that start with a non-letter: ~`!@#$%^&*()_-+={[}]|:;"'<,>.?/

RFE - Identify URLs

I'd like to suggest theharvester identifies URLs as well as e-mails and domains. I find a lot of sites and files that shouldn't be available publicly during OSINT analysis. There is a lot of value in this information and even more if theharvester could make them available.

Bing search and email "at" symbol

Looks like Bing that uses theHarvester does not search by @ symbol anymore. But code that uses search by "at" symbol still persist in program. Just to let you know.

Cannot use socks proxy

Due to the working environment, I have to use a proxy to get to internet, so I set up a ssh dynamic tunnel on port 1080, and put it into my kali linux system proxy setting.
This is what came up when I try to run theharvester
input theharvester -d google.com -b all
output
Full harvest.. [-] Searching in Google.. Unable to determine SOCKS version from socks://127.0.0.1:1080/
Can someone please give some advice, thanks

theHarvester quits without any results

image

After entering the last command, the only output is the green title with version, nothing more.
I use kde neon, i installed the library request, also i am no pro user of linux, any ideas?

Source inconsistencies

The sources are inconsistent between the README, usage text, invalid search engine text, and files in the discovery folder. What should be the current source list? It might also make sense to have them lexicographically ordered throughout.

No results found

It worked like 20 minutes ago but now it just doesn´t give me any names from linkedin anymore!

command: theharvester -d lidl.nl -l 500 -b linkedin

Error Message on Launch - Mac OSX Yosemite

Traceback (most recent call last):
File "/Users/totallynotme/Downloads/theHarvester-master/theHarvester.py", line 10, in
from discovery import *
File "/Users/totallynotme/Downloads/theHarvester-master/discovery/googlesearch.py", line 6, in
import requests
ImportError: No module named requests

Shodan API is broken [WIP]

There is a new Shodan API version. the old one is deprecated and it is broken.

I am working to support this new version and improve some outputs.

wont fun on kali linux get error message

Traceback (most recent call last):
File "theHarvester.py", line 10, in
from discovery import *
File "/root/Downloads/theHarvester-master/discovery/googlesearch.py", line 6, in
import requests
File "/usr/local/lib/python2.7/dist-packages/requests/init.py", line 53, in
from .packages.urllib3.contrib import pyopenssl
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/contrib/pyopenssl.py", line 55, in
orig_connectionpool_ssl_wrap_socket = connectionpool.ssl_wrap_socket
AttributeError: 'module' object has no attribute 'ssl_wrap_socket'

Enhancement : sort final results

Could we have <ip>:<address> pair sorted at the end?

For example, in "[+] Hosts found in search engines:" results, these could be from:

23.209.109.116:www.godaddy.com
97.74.104.218:253Dppc.google.godaddy.com
23.209.103.194:in.godaddy.com
23.209.98.69:m.godaddy.com
64.202.188.33:who.godaddy.com
184.168.130.123:gateway.godaddy.com
68.178.178.34:support.godaddy.com
64.202.188.108:auctions.godaddy.com
50.62.173.171:garage.godaddy.com
97.74.104.70:help.godaddy.com
68.178.211.43:whois.godaddy.com
216.69.149.215:mya.godaddy.com
216.69.149.53:dns.godaddy.com
173.201.19.2:certs.godaddy.com
72.167.239.239:ocsp.godaddy.com
208.109.255.100:cns1.godaddy.com
216.69.185.100:cns2.godaddy.com
97.74.104.218:mails.godaddy.com

Into:

173.201.19.2:certs.godaddy.com
184.168.130.123:gateway.godaddy.com
208.109.255.100:cns1.godaddy.com
216.69.149.215:mya.godaddy.com
216.69.149.53:dns.godaddy.com
216.69.185.100:cns2.godaddy.com
23.209.103.194:in.godaddy.com
23.209.109.116:www.godaddy.com
23.209.98.69:m.godaddy.com
50.62.173.171:garage.godaddy.com
64.202.188.108:auctions.godaddy.com
64.202.188.33:who.godaddy.com
68.178.178.34:support.godaddy.com
68.178.211.43:whois.godaddy.com
72.167.239.239:ocsp.godaddy.com
97.74.104.218:253Dppc.google.godaddy.com
97.74.104.218:mails.godaddy.com
97.74.104.70:help.godaddy.com

Which gives more intuitive results.

GOOGLE CAPTCHA

Hello! When I use GOOGLE as data source, what if the CAPTCHA turns up? How to ensure stable data acquiring and avoid being blocked?
Thank you so much!

Zero results in Mac OS High Sierra

Hello,
I'm running the harvester in Mac OS High Sierra, but I get zero results every time.
It seems the the software runs when I make a search but I cannot get any result even looking for mayor companies or domains.
Very newbie with theharvester and info gathering tools.
I don't know if I have to install additional libraries or Apis or there's something in Mac OS blocking theharvester.
Please help

RequestsDependencyWarning: urllib3 (1.21.1) or chardet (2.1.1) doesn't match a supported version!

Trying to run theharvester on kali 2017.3 rolling but I get this error message.

/usr/lib/python2.7/dist-packages/requests/init.py:80: RequestsDependencyWarning: urllib3 (1.21.1) or chardet (2.1.1) doesn't match a supported version!
RequestsDependencyWarning)
Request library not found, please install it before proceeding

I have checked and my urllib3 is the correct version and my chardet is correct as well.

use asynchronous DNS request with ANY rather than A queries in hostchecker.py

For async DNS, you may use dnspython or twisted.
Both can be run in asynchronous request.
I found adns and asyncore dns based examples to be lacking "ANY" requests, which is precisely what is needed here.

note that your current code misses any non "A" resource record, that is any NS, MX, AAAA (ipv6), etc...

tag releases

Would it be possible to tag stable releases, for the benefit of package managers? Thanks!

List modules in alphabetical order

in the README file.

-baidu: Baidu search engine
-bing: microsoft search engine - www.bing.com
-bingapi: microsoft search engine, through the API (you need to add your Key in
the discovery/bingsearch.py file)
-google: google search engine - www.google.com
-googleCSE: google custom search engine
-googleplus: users that works in target company (uses google search)
-google-profiles: google search engine, specific search for Google profiles
-linkedin: google search engine, specific search for Linkedin users
-pgp: pgp key server - pgp.rediris.es
-shodan: Shodan Computer search engine, will search for ports and banner of the
discovered hosts (http://www.shodanhq.com/)
-twitter: twitter accounts related to an specific domain (uses google search)
-vhost: Bing virtual hosts search
-yahoo: Yahoo search engine

sudden stop and no output file

Kali 2018.02. TheHarvester 3.0.
Executed command: theHarvester.py -d $target_domain -l 500 -b all -f ./report.htm

The most of the time it stops during the harvest. RC=0. No preconditions.

Fault points:
[-] Searching in PGP Key server..
Searching PGP result..
---->{sudden stop}

or during Virtual Hosts scanning.

The most of the time output files are not written (no specific precondition).

Problems in the stash.sqlite file:

  • mail items are registered but data are not complete (eg: it saves 'evilcorp.com' instead of '[email protected]').
  • DNS scope not saved (... compare with Virtual Hosts output).

Adding a feature; how to contribute back!

Hi I added a feature that saves emails in .json and .csv format, to a folder within the directory called "harvestedEmails" > "Domain".

The point is to get them in easy to manipulate .json format of: {Full Name: Email address, ...}
And a .csv format of where first row is fullnames and next row is emails.

So the way this works is if you search for emails in the domain "@microsoft.com" you get a folder called harvestedEmails and a subfolder called "microsoft.com" with the saved {fullnames: emails} .json file and the .csv

So my question is:
How do I contribute this back and commit, and are you looking for features like this to be contributed back? I am not used to committing to another project repo so I was wondering how this works. Also not sure how author recognition works in that regard.

Twitter output unreliable

Hey,

the Twitter output is absolutely unreliable. There are many Twitter "users" which are false positives.The following "users" are nearly in every request:

python theHarvester/theHarvester.py -d idontexist.com -b twitter -n

Warning: Pycurl is not compiled against Openssl. Wfuzz might not work correctly when fuzzing SSL sites. Check Wfuzz's documentation for more information.


  *******************************************************************
*                                                                 *
* | |_| |__   ___    /\  /\__ _ _ ____   _____  ___| |_ ___ _ __  *
* | __| '_ \ / _ \  / /_/ / _` | '__\ \ / / _ \/ __| __/ _ \ '__| *
* | |_| | | |  __/ / __  / (_| | |   \ V /  __/\__ \ ||  __/ |    *
*  \__|_| |_|\___| \/ /_/ \__,_|_|    \_/ \___||___/\__\___|_|    *
*                                                                 *
* TheHarvester Ver. 3.0                                           *
* Coded by Christian Martorella                                   *
* Edge-Security Research                                          *
* [email protected]                                   *
*******************************************************************


[-] Starting harvesting process for domain: idontexist.com

[-] Searching in Twitter ..
	Searching 100 results..
	Searching 200 results..
	Searching 300 results..
	Searching 400 results..
	Searching 500 results..
Users from Twitter:
-------------------
@-moz-keyframes gb__a
@keyframes gb__a
@media 
@-moz-keyframes progressmove
@keyframes progressmove
@-moz-keyframes gb__nb
@keyframes gb__nb
@keyframes qli-container-rotate 
@keyframes qli-fill-unfill-rotate 
@keyframes qli-blue-fade-in-out 
@keyframes qli-red-fade-in-out 
@keyframes qli-yellow-fade-in-out 
@keyframes qli-green-fade-in-out 
@keyframes qli-left-spin 
@keyframes qli-right-spin 
@
@broofa.com

Rate limit information

Is there any way to determine when you have been rate limited with the script?

For example I want to get some emails and it would be useful to know that instead of the script returning 0 results, which could mean it didn't find any, or that I'm rate limited. That I could get get some feedback from the results saying it was limited.

cannot see clearly output

hello,

if i try theharvester with sudo theharvester.py -d domainname.* -b all

but the output is not clear at all like :

n" div class "ftrD" id "ftrD_Language" a class "b_toggle" role "menuitem" href " search?q %40ludia.* amp count 50 amp adlt strict amp lf 1 amp qpvt %40ludia.*" h "ID SERP,5815.1" Only English a a class

cause i want to search with all the top level domain , maybe i do something wrong i have tryed to search in google ...

And finally how to search : -d recrutement@*.sometopleveldomain
I dont want to spam but just to find an internship, sorry if i'm not at the good place.

Best regards.

Google redirects to captcha page after few hits

PGP keyserver

There is also something wrong with PGP key server.
When data source given as all it started with google and then pgp and then exits.
on checking the code for pgp module in discovery:
self.server = "pgp.rediris.es:11371"
self.hostname = "pgp.rediris.es"
both these links are not working, any help ??

Thanks,

Not outputting html file

when using the syntax theHarvester -d domain.com -l 100 -b all -h myresults.html
no html file is generated

DNS Brute Force Error

Not sure if this is something I'm doing wrong, but when trying to invoke the DNS Brute Force option, it gives an error that it can't open the dictionary file. After looking at the source files, I noticed the "dns-names.txt" file wasn't there. I'm using Kali 2.0, and there's another instance of theHarvester under /usr/share/golismero/tools, which contains the dictionary file.

I copied the dictionary file to /usr/share/theharvester, then edited the "dnssearch.py" file to point to the dictionary file, and it worked fine.

I'm not a Python expert by any means, and this was probably a quick and dirty fix, but was wondering if there was another way I was supposed to run the brute force option?

Skipping emails when word-break <wbr>

Results seem to be incomplete or missing when there are tags after the '@' before parsing.

The result displayed will be: "@domainname.com", so no name before of the '@'.

I've tried cleaning up the results but unsuccessful so far.
Has anyone found a fix for this?

ImportError: cannot import name htmlExport

root@kali:~# theharvester
Traceback (most recent call last):
  File "/usr/bin/theharvester", line 19, in <module>
    from lib import htmlExport
ImportError: cannot import name htmlExport
root@kali:~# 

Anyone have any ideas i have searched pip for htmlExport and even google with no wheel or questions on this issuse.

many thanks john

ImportError: cannot import name rewriter_config_pb2

I am simply following the tutorials of tensorflow and while performing following cmd its giving above mentioned issue:
python ptb_word_lm.py --data_path=/home/priyankit/data/validsrc2-hi --model=small

error is:

Traceback (most recent call last):
File "ptb_word_lm.py", line 68, in
import util
File "/home/priyankit/models-master/tutorials/rnn/ptb/util.py", line 23, in
from tensorflow.core.protobuf import rewriter_config_pb2
ImportError: cannot import name rewriter_config_pb2

system install (as a module) support and setup.py for easier installation

There are few minor changes would be required. Something like this:

sed -e 's|from discovery|from theHarvester.discovery|' -i theHarvester.py || die "sed failed"
sed -e 's|from lib|from theHarvester.lib|' -i theHarvester.py || die "sed failed"
sed -e 's|from lib|from theHarvester.lib|' -i lib/htmlExport.py || die "sed failed"
for i in discovery/*.py; do
     sed -e 's|import myparser|from theHarvester import myparser|' -i $i || die "sed for $i failed"
done
touch __init__.py

<strong> in virtual host results

Not sure why, but < strong > is displaying in my results for virtual hosts

example
./theHarvester.py -d godaddy.com -b google -v

Virtual hosts:
23.43.189.116 < strong >www.godaddy.com<

any ideas?

Fix Disclosing Emails

Hello Team,

Im just wondering if there's a fix to hide emails when using your tool.. so that our email will not disclose to public.

can you suggest a possible mitigation for this?

Missing modules

When the tool is ran with no arguments, not all of the modules are listed.
Also place the modules in alphabetical order.

baidu - missing
bing
bingapi
google
googleCSE
googleplus
google-profiles
linkedin
pgp
shodan - missing
twitter
vhost - missing
yahoo - missing

dogpilesearch - not listed in README or tool
jigsaw - not listed in README

please tag a proper release

version 2.7.2 is mentioned in few places, but the source code was not tagged properly and it is not available for the download.

Please consider making an official release.

Cant provide a list of sources ??

Hey,

am I overlooking something or is it not possible to provide multiple sources?
I know I can specify sources with -b, but this allows only 1 parameter.
This means that if I want to check lets say google, bing and twitter I cant do something like:

theharvester -d example.com -b google,bing,twitter

Instead I have to start the harvester 3 times ...

theharvester -d example.com -b google
theharvester -d example.com -b bing
theharvester -d example.com -b twitter

Otherwise theharvester will only use the first source (or throw an 'Invalid search engine' error)...

It would be very useful to be able to use more than just one (or all) sources.

getting 0 results after multiple sites.

After some email harvests on multiple sites.

theHarvester giving 0 results.

i think google Captcha detects it and whenever i use socks 5, http proxy, vpn same results.
Anyway to resolve this issue?

Thanks.

theharvester hangs with -b google, but works fine with -b bing

hi.

theharvester on kali hangs with -b google, but works fine with -b bing

this is a new install in vmware guest under windows 10.

any ideas? any suggestions on where i should look first.

fyi i have a 2k8 hyper-v with same rev of kali and all works great.

thanks, damian

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.