Giter VIP home page Giter VIP logo

google-parser's Introduction

Google parser is a lightweight yet powerful HTTP client based Google Search Result scraper/parser with the purpose of sending browser-like requests out of the box. This is very essential in the web scraping industry to blend in with the website traffic.

Questions

  1. Does this work with serverless functions? Yes, this works with serverless functions like AWS Lambda. I haven't tested it with other serverless functions but it should work with them too.
  1. Are more features coming? Yes, I am working on adding more features like proxies, pagination, etc.
  1. I'm stuck, what should I do? You can create an issue on GitHub, pull requests are also welcome.

Features

  • Proxy support ✅︎
  • Custom Headers support ✅︎

Installation

pnpm add @nrjdalal/google-parser
yarn or npm
yarn add @nrjdalal/google-parser
npm install @nrjdalal/google-parser

Usage

1. Browser Info

Usage:

import { browserInfo } from '@nrjdalal/google-parser'

const response = await browserInfo()

Response:

{
  method: 'GET',
  // IP address of the client
  clientIp: '182.69.180.111',
  // country code of the client
  countryCode: 'US',
  bodyLength: 0,
  headers: {
    'x-forwarded-for': '182.69.180.111',
    'x-forwarded-proto': 'https',
    'x-forwarded-port': '443',
    host: 'api.apify.com',
    // random user agent client hint
    'sec-ch-ua': '"Google Chrome";v="113", "Chromium";v="113", "Not-A.Brand";v="24"',
    // devices: ['Desktop']
    'sec-ch-ua-mobile': '?0',
    // operatingSystems: ['windows', 'linux', 'macos']
    'sec-ch-ua-platform': '"macOS"',
    'upgrade-insecure-requests': '1',
    // random user agent
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/113.0.0.0 Safari/537.36',
    accept: '*/*',
    'sec-fetch-site': 'same-site',
    'sec-fetch-mode': 'cors',
    'sec-fetch-user': '?1',
    'sec-fetch-dest': 'empty',
    'accept-encoding': 'gzip, deflate, br',
    'accept-language': 'en-US,en;q=0.5',
    'alt-used': 'www.google.com',
    referer: 'https://www.google.com/'
  }
}

2. Google Search

Usage:

import { googleSearch } from '@nrjdalal/google-parser'

const response = await googleSearch({ query: 'nrjdalal' })

Output:

{
  code: 200,
  status: 'success',
  message: 'Found 5 results in 1s',
  query: 'nrjdalal',
  data: {
    results: [
      {
        title: 'Neeraj Dalal nrjdalal',
        link: 'https://github.com/nrjdalal',
        description: 'Web Developer & Digital Strategist. Follow their code on GitHub.',
        ...
      }
    ]
  },

}

Error:

  • This error is thrown when the request is blocked by Google. This can happen due to various reasons like too many requests, captcha, etc. using the same IP address.
{
  code: 429,
  status: 'error',
  message: 'Captcha or too many requests.',
  query: 'nrjdalal'
}

3. Google Search with Same Headers

Why? It is not recommended to change headers for every request as it can lead to detection. So, it is recommended to use the same headers for every request for a single IP.

Usage:

import { getHeaders, googleSearch } from '@nrjdalal/google-parser'

const headers = getHeaders()

// same headers for same IP
console.log(await googleSearch({ query: 'facebook', options: { headers } }))
console.log(await googleSearch({ query: 'apple', options: { headers } }))

// regeneration of headers for new IP if needed
console.log(
  await googleSearch({ query: 'netflix', options: { headers: getHeaders() } })
)

3. Google Search with Proxy

Usage:

import { googleSearch } from '@nrjdalal/google-parser'

console.log(
  await googleSearch({
    query: 'microsoft',
    options: {
      proxyUrl: 'http://username:password@host:port',
    },
  })
)

google-parser's People

Contributors

nrjdalal avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

google-parser's Issues

Pagination Request

Hi nrjdalal,

Very nicely done! I'm looking for a good scraping solution but there are not so many out there! Scraping Google search results is not an easy thing and I love your straightforward approach!

I would have a question regarding the pagination system, the results I get through google-parse seem to be limited to 30 results so far, any idea if you plan to extend it with some pagination mechanisms?

Thanks

Parsing approach

Hi Neeraj. I see that this parser relies on hardcoded classnames like .co8aDb. Have you figured out if those classes are more or less permanent? I mean: if they are some per-release / per-build hashes – this parser can break any moment. And if they are hashes of e.g. internal classnames that rotate relatively rare – it's another story.

I'm considering to use the library but I'm concerned about this aspect.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.