Giter VIP home page Giter VIP logo

datahenhq / till Goto Github PK

View Code? Open in Web Editor NEW
809.0 6.0 23.0 2.09 MB

DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with minimal code changes on your scraper. Integrates with any scraper in 5 minutes.

Home Page: https://till.datahen.com

License: Apache License 2.0

Go 58.26% HTML 41.20% JavaScript 0.49% CSS 0.05%
web-scraping man-in-the-middle proxy-server mitm scraping crawler scraper

till's Issues

Invalid memory address or nil pointer dereference

A user reported the following issue:

gotten error: Put "https://till.datahen.com/api/v1/instances/default/stats": read tcp 192.168.1.100:54493->104.21.62.154:443: read: connection reset by peer
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x1028d8244]

goroutine 76 [running]:
github.com/DataHenHQ/till/server.startRecurringStatUpdate()
	/__w/till/till/server/stats.go:51 +0x194
created by github.com/DataHenHQ/till/server.Serve
	/__w/till/till/server/server.go:108 +0x3a8

version `GLIBC_2.28' not found, when running Till 0.8.0 on Ubuntu 18.04

Hello, when I try to run Till (till_0.8.0_Linux_x86_64.tar.gz) on Ubuntu 18.04, I got this error:
./till: /lib/x86_64-linux-gnu/libc.so.6: version 'GLIBC_2.28' not found (required by ./till)

I think its because of Till not compatible with GLIBC that comes with Ubuntu 18.04 or lower,
This is the version of GLIBC that comes with Ubuntu 18.04 or lower:

Ubuntu 16.04 -> GLIBC 2.23
Ubuntu 18.04 -> GLIBC 2.27

Compliling Till binaries with lower version of GLIBC might fix this issue.

This is my system details:
image

Doesn't work with puppeteer

When I try to use till with curl, it works fine after install the cert, but when I specify the proxy with puppeteer:

(async () => {
  const browser = await puppeteer.launch({
     headless: false,
      args: [
        '--proxy-server=http://localhost:2933',
        '--ignore-certificate-errors',
        '--ignore-certificate-errors-spki-list '
     ],
    });

I get a err_ssl_version_or_cipher_mismatch error message on Chrome, using Windows 10.

POST with body does not work

When doing POST with a request body, it drops the connection:

curl -X POST 'https://postman-echo.com/post' -H 'X-DH-Cache-Freshness: now' -H "Content-Type: application/json" --data '{"hello":"world"}' -kv --proxy http://localhost:2933                      
Note: Unnecessary use of -X or --request, POST is already inferred.
...
> POST /post HTTP/1.1
> Host: postman-echo.com
> User-Agent: curl/7.58.0
> Accept: */*
> X-DH-Cache-Freshness: now
> Content-Type: application/json
> Content-Length: 17
>
* upload completely sent off: 17 out of 17 bytes
* Connection #0 to host localhost left intact

Doesn't work with node

Works with curl but doesn't work with node:

const got = require('got');
const {HttpsProxyAgent} = require('hpagent');

(async function main() {
  const response = await got.post('https://example.com/', {
    agent: {
      https: new HttpsProxyAgent({
        proxy: 'http://localhost:2933',
      }),
    },

    https: {
      rejectUnauthorized: false,
    },

    json: { query },
  });

  console.log({ response });
})();

I tried with http.request and request.post as well; no joy; I get ECONNRESET every time.

Insecure Browser Detected!

At the site below, I am using puppeteer through till. Till is started with the command;

till serve --proxy-file c:\temp\till\proxylist.txt --token --force-user-agent --ua-type desktop

https://secure.utah.gov/llv/search/index.html

Insecure Browser Detected!
We noticed that your browser is REALLY OLD.

Let me know if there is additional information I can provide to help debug.

Info:
Cache MISS
RID 01FDWJ3NMJK05AZM77M50DGTHS
GID secure.utah.gov-08228751e1bab3b452a3e0d6830a1e50
SID
Timestamp 2021-08-24 13:07:45

Config:
ForceUA true
UaType desktop
UseProxy true
StickyCookies true
StickyUA true
IgnoreInterceptors []
IgnoreAllInterceptors false
CacheFreshness now
CacheServeFailures false

Request:
Method POST
URL https://secure.utah.gov/llv/search/index.html
Header
Accept text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Encoding gzip, deflate, br
Accept-Language en-US
Cache-Control no-cache
Connection keep-alive
Content-Length 573
Content-Type application/x-www-form-urlencoded
Cookie JSESSIONID=997DA0A8AD40B2C1AB70B5B556A1E845; TS01bdb7d2=0143bf51700840319b0a08eeb8fbe8681009410c51139c8d6e58c60496aabe65498495e0a8af296ef0edd92ba99bd801c0a1857f32b92dcf6a887da12f8c350179f47b9356; TS01959f26=0143bf5170adbfef85245e0b4e92afb875bb8f3992139c8d6e58c60496aabe65498495e0a8edfd3d3b396fd57ee8cec40e757bb731; __utma=128287630.704573828.1629824827.1629824827.1629824827.1; __utmc=128287630; __utmz=128287630.1629824827.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmt=1; fontsize=90%25; _ga=GA1.2.704573828.1629824827; _gid=GA1.2.1327052485.1629824827; _gat_UA-103830962-11=1; __utmb=128287630.2.9.1629824865296
Origin https://secure.utah.gov
Pragma no-cache
Referer https://secure.utah.gov/llv/search/index.html
Sec-Fetch-Dest document
Sec-Fetch-Mode navigate
Sec-Fetch-Site same-origin
Sec-Fetch-User ?1
Upgrade-Insecure-Requests 1
User-Agent Mozilla/5.0 (Windows NT 6.1; Trident/7.0; rv:11.0) like Gecko
ContentLength 573
Body g-recaptcha-response=03AGdBq25ElbcQolMLQOZ3PnpAWF7Tvqhe55PBHrAp1oWqRWEQyBdlZquRQa5rXaucm4CiAxyZR7roFzAQsWJjjzesc_br5Sywr21DVfYyDNCjHEbGYorI3fsPOgxmijW0p9TJZGLtiaZmBVE9J4MeOZxGrUD9NQSR517qnmphipvyOqODKOPESQwcKoYOLDIaNsA4PWOMsbj2EnKZZc4j-79W5FBPT5hDmnyfgSNLcW7VFSQ73O5muE_jybf0LvgyWKKtaKexuIK_lLwjt44qXAzz7xoG2ruafB6N7xo2vrUoplhQ394iSeO7chDN47QlbI_4x1SIq0f0g5KRxVN9GYuCgPsCRCc547f6HkigwK-BZueHD8eSVG2YglwxL7vvUHj_eXHml2_gdHUVwp_Gk9QJdeFnM6vfPKDxpEqMgcjdH8IBzqxJmzo&licenseNumberCore=339473&licenseNumberFourDigit=5518&type=by_number&_csrf=e92e3abe-4aab-4843-87d4-bbbd87d4b879

Response:
Status 200 OK
Proto HTTP/1.1
Header
Accept-Ranges bytes
Connection Keep-Alive
Content-Length 17488
Content-Type text/html; charset=UTF-8
Date Tue, 24 Aug 2021 17:07:46 GMT
Keep-Alive timeout=5, max=100
Server Apache
Strict-Transport-Security max-age=16070400; includeSubDomains
ContentLength 17488

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.