Giter VIP home page Giter VIP logo

curl-robot's Introduction

###Installation You can install this package with Composer. Run following command:

composer require stil/curl-robot:dev-master

###Example 1

<?php
require __DIR__.'/../vendor/autoload.php';

use cURL\Request;
use cURL\Robot\RobotSwarm;
use cURL\Robot\Robot;
use cURL\Robot\Event\RequestAttachingEvent;
use cURL\Robot\Event\RequestCompletedEvent;
use cURL\Robot\RequestProviderInterface;

class Crawler implements RequestProviderInterface
{
    protected $number = 0;

    /**
     * Method returning next request to execute
     */
    public function nextRequest()
    {
        return new Request("http://httpbin.org/delay/1?num=".($this->number++));
    }
}

$swarm = new RobotSwarm();
$swarm->setRequestProvider(new Crawler());
$swarm->getDefaultOptions()->set([
    CURLOPT_TIMEOUT        => 5,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_HTTPHEADER     => [
        'User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:31.0) Gecko/20100101 Firefox/31.0',
        'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8'
    ]
]);

$robot1 = new Robot();
$robot1->setQueueSize(1);
$robot1->addRateLimit(new RateLimit(20, 60));
$robot1->addListener('request.attaching', function (RequestAttachingEvent $e) {
    echo "Attaching request from robot1\n";
});

$robot2 = new Robot();
$robot2->setQueueSize(3);
$robot1->addRateLimit(new RateLimit(120, 60));
$robot2->addListener('request.attaching', function (RequestAttachingEvent $e) {
    // Proxy requests
    $e->request->getOptions()->set(CURLOPT_PROXY, '10.0.0.1:8080');
    echo "Attaching request from robot2 (proxied)\n";
});

$swarm->addListener('request.completed', function (RequestCompletedEvent $e) {
    $httpCode = $e->response->getInfo(CURLINFO_HTTP_CODE);

    if ($httpCode == 200) {
        $json = $e->response->getContent();
        $data = json_decode($json, true);
        printf("Successful request #%d\n", $data['args']['num']);
    } else {
        printf("Wrong HTTP code %d\n", $httpCode);
        // Retry request until we exceeded allowed amount of attempts
        if ($e->handler->getAttempts() < 3) {
            printf("Retrying, attempt %d\n", $e->handler->getAttempts());
            $e->swarm->retry($e->handler);
        }
    }
});

$swarm->add($robot1);
$swarm->add($robot2);
$swarm->run();

curl-robot's People

Contributors

jannejava avatar stil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

curl-robot's Issues

Make it possible to pause crawler

When we execute thousands of requests, it's often a good choice to buffer them to memory. However, when the buffer is already full, we need for example store responses in database. It may take several minutes to complete. There is a problem, because crawler may still hold unprocessed requests, which are going to be timeouted after databse processing is done.

curl-robot should provide straighforward methods to pause execution and resume it.

Better RPM calculation

Currently, requests-per-minute are calculated since the beginning of start. It can cause the result to be inaccurate, especially for long running tasks.
My idea is to calculate RPM just from last several minutes of running task. Even better, if we could set custom period of time with class method.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.