Giter VIP home page Giter VIP logo

smartproxy / web-scraping-api Goto Github PK

View Code? Open in Web Editor NEW
21.0 0.0 5.0 102 KB

Web Scraping API code examples for Python, PHP and Node.js

License: MIT License

JavaScript 40.99% PHP 31.02% Python 27.99%
data-mining data-scraping facebook-scraping instagram-scraper nodejs-scraping php-scraping python-scraper python-scraping scraper scraping scraping-api scraping-web scraping-websites social-media-scraper socialmediascraper twitter-scraping web-scraper web-scraping webscraper webscraper-api

web-scraping-api's Introduction

Web Scraping API

List of contents

Introduction

With our Web Scraping API, you can scrape various websites en masse.

Authentication

Once you have an active Web Scraping API subscription, you can try sending a request right from the dashboard Web Scraping API > Authentication method tab simply by entering your username, password, and clicking on Generate. You will also see an example of curl request generated right below your entered user:pass.

Note that this is only an example with preset values to get you on the right track for forming your own request, meaning you will not be able to change the request values in the dashboard itself – that will have to be done in your code.

Scraping

You can use universal parameter as your target and supply any URL you want, which will return the HTML of the targeted URL.

API Link: https://scraper-api.smartproxy.com/v2/scrape

  POST /scrape

Target: universal (not parseable)

Required parameters: url (ip.smartproxy.com in this example)

Parameter Type Description
url url Target URL
target string Scraping target - universal

Examples

Programming Language Example location Download
Python python/ipsmartproxy.py curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/python/ipsmartproxy.py > ipsmartproxy.py
PHP php/ipsmartproxy.php curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/php/ipsmartproxy.php > ipsmartproxy.php
Node.js nodejs/ipsmartproxy.js curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/nodejs/ipsmartproxy.js > ipsmartproxy.js

Response

{
  "results": [
    {
      "content": "Your Ip is: 213.87.163.6",
      "status_code": 200,
      "url": "https://ip.smartproxy.com/",
      "task_id": "6971034977135771649",
      "created_at": "2022-09-01 09:24:14",
      "updated_at": "2022-09-01 09:24:17"
    }
  ]
}

Headless

Not seeing the results you wanted?

Try enabling JavaScript rendering using the headless parameter. - Parameters

This parameter renders JavaScript on the target website making more data available for scraping.

Facebook

Facebook Page

Target: universal (not parseable)

Required parameters: url

Parameter Type Description
url url Target URL
target string Scraping target - universal

Response

{
  "results": [
    {
      "content": "<html> Facebook page content</html>"
      "status_code": 200,
      "url": "https://www.facebook.com/ladygaga",
      "task_id": "6972452679540839425",
      "created_at": "2022-09-05 07:17:40",
      "updated_at": "2022-09-05 07:17:45"
    }
  ]
}

Examples

Programming Language Example location Download
Python python/facebookpage.py curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/python/facebookpage.py > facebookpage.py
PHP php/facebookpage.php curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/php/facebookpage.php > facebookpage.php
Node.js nodejs/facebookpage.js curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/nodejs/facebookpage.js > facebookpage.js

Facebook Post

Target: universal (not parseable)

Required parameters: url

Parameter Type Description
url url Target URL
target string Scraping target - universal
headless string Javascript rendering - html

Response

{
  "results": [
    {
      "content": "<html> Facebook page content</html>"
      "status_code": 200,
      "url": "https://www.facebook.com/zuck/posts/pfbid0HeY54v4LMcv2EMxDz5RvnWaR6swsGFWikzUbrsEFtvxu9n4GCx7zA2YTM69XdiYnl",
      "task_id": "6972484278999372801",
      "created_at": "2022-09-05 09:23:14",
      "updated_at": "2022-09-05 09:23:32"
    }
  ]
}

Examples

Programming Language Example location Download
Python python/facebookpost.py curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/python/facebookpost.py > facebookpost.py
PHP php/facebookpost.php curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/php/facebookpost.php > facebookpost.php
Node.js nodejs/facebookpost.js curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/nodejs/facebookpost.js > facebookpost.js

Facebook Group

Target: universal (not parseable)

Required parameters: url

Parameter Type Description
url url Target URL
target string Scraping target - universal

Response

{
  "results": [
    {
      "content": "<html> Facebook page content</html>"
      "status_code": 200,
      "url": "https://www.facebook.com/groups/1394454774138066",
      "task_id": "6972486765374350337",
      "created_at": "2022-09-05 09:33:07",
      "updated_at": "2022-09-05 09:33:33"
    }
  ]
}

Examples

Programming Language Example location Download
Python python/facebookgroup.py curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/python/facebookgroup.py > facebookgroup.py
PHP php/facebookgroup.php curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/php/facebookgroup.php > facebookgroup.php
Node.js nodejs/facebookgroup.js curl https://raw.githubusercontent.com/Smartproxy/Web-Scraping-API/main/nodejs/facebookgroup.js > facebookgroup.js

Parameters

Parameter Type Description
target string Data source. (universal)
url url Direct URL (link)
locale string This will change the web interface language. Example: – en-US – en-GB
geo string The geographical location that the result depends on. Full Country names required
device_type string Device type and browser. Supported: desktop, desktop_chrome, desktop_firefox, mobile, mobile_android, mobile_ios.
headless string Enable JavaScript rendering. Supported: html, png

Response Codes

HTTP Response Codes

Response Description Solution
200 - Success Server has replied and given requested response. Celebrate!
204 - No content Job not completed yet. Wait a few seconds before trying again.
400 - Multiple error messages Bad structure of the request. Re-check your request to make sure it is in the correct format.
401 - Invalid / not provided authorization header (client not found) Incorrect login credentials or missing authorization. Re-check your provided credentials for authorization.
403 - Forbidden Your account does not have access to this resource. Make sure the target is supported by us
404 - Not found Your target was not found. Re-check your targeted URL.
429 - Too many requests Exceeded rate limit for your subscription. Make sure you still have at least one request left. Wait a couple minutes and try again. If you are encountering the error often – chat with us to see if your rate limit can be increased.
500 - Internal error Service unavailable, possibly due to some issues we are encountering. Wait a couple minutes and send another request. Contact us for more information.
524 - Timeout Service unavailable, possibly due to some issues we are encountering. Wait a couple minutes and send another request. Contact us for more information.

License

All code is released under MIT License

web-scraping-api's People

Contributors

noahdrucker avatar spsebastiaan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.