Giter VIP home page Giter VIP logo

crawlergo_x_xray's Introduction

crawlergo_x_XRAY

crawlergo_x_XRAY

360 0Kee-Team 的 crawlergo动态爬虫 结合 长亭XRAY扫描器的被动扫描功能 (其它被动扫描器同理)

https://github.com/0Kee-Team/crawlergo

https://github.com/chaitin/xray

20201130更新,launcher_new.py增加随机http请求头,避免被扫描器识别。

注:需在pip安装 fake_useragent 库 pip install fake_useragent

20190115更新,launcher_new.py使用crawlergo提供的方法推送请求给xray

crawlergo默认推送方法有个不足就是无法与爬虫过程异步进行。使用launcher.py可以异步节省时间。

注:若运行出现权限不足,请删除crawlergo空文件夹。

如遇到报错注意将64位的crawlergo.exe和launcher.py还有targets.txt放在一个目录,将crawlergo目录删除

20190113更新,增加容错,解决访问不了的网站爬虫卡死。

介绍

一直想找一个小巧强大的爬虫配合xray的被动扫描使用,曾经有过自己写爬虫的想法,奈何自己太菜写一半感觉还没有awvs的爬虫好用

360 0Kee-Teem最近公开了他们自己产品中使用的动态爬虫模块,经过一番摸索发现正合我意,就写了这个脚本

由于该爬虫并未开放代理功能并且有一些从页面抓取的链接不会访问,所以我采用的官方推荐的方法,爬取完成后解析输出的json再使用python的request库去逐个访问

大概逻辑为:

image

爬取和请求的过程使用了多线程和队列使得请求不会阻塞下一个页面的爬取

用法

1. 下载xray最新的release, 下载crawlergo最新的release

注意,是下载编译好的文件而不是git clone它的库

2. 把launcher.py和targets.txt放在crawlergo.exe同目录下

3. 配置好并启动xray被动扫描(脚本默认配置为127.0.0.1:7777)若修改端口请同时修改launcher.py文件中的proxies

image

配置参数详见XRAY官方文档

image

4. 配置好launcher.py的cmd变量中的crawlergo爬虫配置(主要是chrome路径改为本地路径), 默认为:

./crawlergo -c C:\Program Files (x86)\Google\Chrome\Application\chrome.exe -t 20 -f smart --fuzz-path --output-mode json target

image

配置参数详见crawlergo官方文档

5. 把目标url写进targets.txt,一行一个url

image

6. 用python3运行launcher.py ( XRAY被动扫描为启动的状态 )

7. 生成的sub_domains.txt为爬虫爬到的子域名, crawl_result.txt为爬虫爬到的url

🚀Star Trend

Stargazers over time

etc

  1. 开源的样本大部分可能已经无法免杀,需要自行修改

  2. 我认为基础核心代码的开源能够帮助想学习的人

  3. 本人从github大佬项目中学到了很多

  4. 若用本人项目去进行:HW演练/红蓝对抗/APT/黑产/恶意行为/违法行为/割韭菜,等行为,本人概不负责,也与本人无关

  5. 本人已不参与大小HW活动的攻击方了,若溯源到timwhite id与本人无关

crawlergo_x_xray's People

Contributors

bugxh1 avatar thelsa avatar timwhitez avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

crawlergo_x_xray's Issues

OSError: [WinError 193] %1 不是有效的 Win32 应用程序。

crawlergo_x_XRAY-master>python launcher_new.py
File "launcher_new.py", line 29
cmd = ["./crawlergo", "-c", "C:\Users\gkd\Desktop\chrome-win\chrome.exe","-t", "5","-f","smart","--fuzz-path", "--push-to-proxy", "http://127.0.0.1:7888/", "--push-pool-max", "10","--output-mode", "json" , target]
^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape

C:\Users\gkd\Desktop\crawlergo_x_XRAY-master>python launcher_new.py
Traceback (most recent call last):
File "launcher_new.py", line 50, in
main(data1)
File "launcher_new.py", line 30, in main
rsp = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "C:\Users\gkd\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\gkd\AppData\Local\Programs\Python\Python38-32\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
OSError: [WinError 193] %1 不是有效的 Win32 应用程序。

请大佬帮忙

老哥您好,问下,我都配置好了,crawlergo能爬取到结果,但是xray一直不进行扫描,是什么原因造成的呢?求解
f4b99fdb7aa730e0f98baa74a8e0de1
3d6261462ec26be633112943ea3f1a9
是xray接收的问题吗?

请问运行后是没有提示的么?

image

运行后就这样子没动了,不管多久都是这样,然后看了下进程发现占了不少cpu,是全部扫完之后才会输出还是我运行失败了?

FileNotFoundError: [WinError 2] 系统找不到指定的文件。

运行提示FileNotFoundError: [WinError 2] 系统找不到指定的文件。
查了路径是没有问题的
报错信息如下:
Traceback (most recent call last):
File "launcher.py", line 89, in
main(data1)
File "launcher.py", line 59, in main
rsp = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "H:\python\python3.7\lib\subprocess.py", line 775, in init
restore_signals, start_new_session)
File "H:\python\python3.7\lib\subprocess.py", line 1178, in _execute_child
startupinfo)
FileNotFoundError: [WinError 2] 系统找不到指定的文件。

为啥运行不起来呀?

运行
python3 launcher_new.py
等一会儿就自动结束了。没有任何报错,也不知道这是咋回事。

win7 x64 error

C:\Users\Public\crawlergo_x_XRAY>python3 launcher.py
Traceback (most recent call last):
File "launcher.py", line 83, in
main(data1)
File "launcher.py", line 59, in main
rsp = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "C:\Program Files\Python37\lib\subprocess.py", line 800, in init
restore_signals, start_new_session)
File "C:\Program Files\Python37\lib\subprocess.py", line 1207, in _execute_child
startupinfo)

ca证书

没有装ca证书也能用代理模式抓https?

单线程,效率太低

看了下,是单线程的,每一次循环都会阻塞一段时间,效率太低。另外这里搞的队列感觉也没啥意义,没有发挥出多线程的魅力。改为多线程后,效率提升显著。

使用launcher_new.py出现fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached的解决方案

launcher_new.py使用了fake-useragent可能会出现报错:
fake_useragent.errors.FakeUserAgentError: Maximum amount of retries reached

本人尝试了如下解决方案:

先:
pip3 install --upgrade fake_useragent

1.修改代码launcher_new.py第十一行
ua = UserAgent(use_cache_server=False,verify_ssl=False,cache=False)

2.利用临时文件
下载https://fake-useragent.herokuapp.com/browsers/0.1.11 //好像国内无法访问
改名为fake_useragent_0.1.11.json
放入临时文件夹(如/tmp)

import tempfile
tempfile.gettempdir()

都无法解决

查看代码site-packages/fake_useragent/settings.py
发现要请求BROWSERS_STATS_PAGE = 'https://www.w3schools.com/browsers/default.asp'
但是该url好像国内无法访问所以报该错误

所以建议使用国外vps,或者弃用fake-useragent库。

PermissionError: [WinError 5] 拒绝访问。

Traceback (most recent call last):
  File "launcher.py", line 80, in <module>
    main(data1)
  File "launcher.py", line 59, in main
    rsp = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
  File "D:\Python38\Lib\subprocess.py", line 854, in __init__
    self._execute_child(args, executable, preexec_fn, close_fds,
  File "D:\Python38\Lib\subprocess.py", line 1307, in _execute_child
    hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] 拒绝访问。

subprocess方法的communicate()函数会卡住

subprocess方法的communicate()函数会卡住。

command命令在powershell能成功执行,但是放到subprocess的communicate()就会卡死。
我自己写了个test,用command命令会卡死,用dir/b命令就不会卡。

#!/usr/bin/python3
\# coding: utf-8
import simplejson
import threading
import subprocess
import requests
import warnings
warnings.filterwarnings(action='ignore')

def runcmd():
    command = ["./crawlergo", "-c", "D:\tools\crawlergo_x_XRAY\chromium\chrome.exe","-t", "20","-f","smart","--fuzz-path", "--output-mode", "json", "http://testphp.vulnweb.com/"]
    #command = ['dir','/b']
    #print(command)
    ret = subprocess.Popen(command, stdout=subprocess.PIPE,stderr=subprocess.PIPE)
    output, error = ret.communicate()
    print(output.decode('gbk'))
    print(error.decode('gbk'))

#runcmd(["dir","/b"])#序列参数
#runcmd("exit 1")#字符串参数

runcmd()

请问这个是这个函数的bug吗?还是我8g内存太小了?
有没有解决方法?

PermissionError: [WinError 5] 拒绝访问。

Traceback (most recent call last):
File "launcher.py", line 83, in
main(data1)
File "launcher.py", line 59, in main
rsp = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
File "C:\Users\DELL\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\DELL\AppData\Local\Programs\Python\Python38\lib\subprocess.py", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
PermissionError: [WinError 5] 拒绝访问。

请问这个情况怎么解决

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.