Giter VIP home page Giter VIP logo

goutte's People

Contributors

brmatt avatar brunochalopin avatar christianchristensen avatar csarrazi avatar davedevelopment avatar dunglas avatar everzet avatar fabpot avatar ganchiku avatar hason avatar havvg avatar hnw avatar igorw avatar jakoch avatar keradus avatar larowlan avatar mtdowling avatar nek- avatar pborreli avatar robo47 avatar siwinski avatar spolischook avatar stof avatar thewilkybarkid avatar tiger-seo avatar tomasvotruba avatar tombevers avatar tony-co avatar zachbadgett avatar zeopix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

goutte's Issues

Unable to use Guzzle setBaseUrl and setUserAgent

The following code

use Goutte\Client;
$client = new Client();
$guzzle = $client->getClient();
$guzzle->setBaseUrl('www.example.com');
$guzzle->setUserAgent('foo');

has not effect because these values are overrided by BrowserKit later. The only way to do that is by setting server parameters

$client->setServerParameter('HTTP_HOST', 'www.example.com');
$client->setServerParameter('HTTP_USER_AGENT', 'foo');

I think this is less convenient because we could want to tweak Guzzle directly like other settings. Or at least, the doc should explain these parameters shouldn't be modified directly in Guzzle.

Any way to access Guzzle response

Hello,

I'm trying to access to Guzzle response because i want to get the effective url.

Currently, as far as i can see, there is absolutely no way to access the response. Why?!!

Cannot git submodule init recursive

Not really a bug, but if you can help me to solve this issue it will save me one life.

Trying to git submodule update --init --recursive and getting this:

fatal: Not a git repository: ../../../../d:/uniserver/vhosts/cc.meg/.git/modules/lib/guotte/modules/vendor/Sym
fony/Component/BrowserKit
Unable to fetch in submodule path 'vendor/Symfony/Component/BrowserKit'
Failed to recurse into submodule path 'lib/guotte'

Any ideas?
Thank you a lot.

Please make a 1.0.4 release to work with latest version of Guzzle

Hello,

I ran into a dependency hell when trying to use Mink, MinkGoutteDriver and Guzzle.

From what I understood, the problem is the following :

  • I'm using the very latest version of Guzzle, i.e. 3.8.*
  • MinkGoutteDriver requires the latest 1.0 version of Goutte, i.e. ~1.0
  • The latest available version of Goutte on Packagist is 1.0.3, which requires guzzle/http in version >=3.0.5,<3.8-dev

This means I just can't use the latest Guzzle with MinkGoutteDriver.

The master branch of Goutte requires guzzle/http in version >=3.0.5,<3.9-dev and all the tests are passing

Moreover, can't we just remove the <3.9-dev ? Why this restriction ?

Enforce or detect UTF-8 encoding when 'charset' is not set

I am using Goutte to scrape a couple of sites and a few of them provide UTF-8 content but only set "text/html" as the Content-Type, thus making the DomCrawler assume it is ISO-8859-1 which results in double-encoded UTF-8 strings in the returned DOMDocument (and in the results for text() and so on).

Right now I am working around this by extending Goutte\Client and overriding createCrawlerFromContent, calling the parent method with ";charset=UTF-8" added to the type when there is no charset attribute. Probably not a really good way to do it, so I didn't want to make a pull request just yet.

My main point is that this took me quite a while to figure out and Goutte could probably be more convenient/save other new users from falling into the same trap by letting users specify an encoding. Besides that, thanks for a great library!

SSL certificate problem

Hi,

Here is an error I get when trying to get https://ocean.ac-guadeloupe.fr/publinet/resultats

( ! ) Fatal error: Uncaught exception 'Guzzle\Http\Exception\CurlException' with message ' in phar://C:/wamp/www/PERSO/Crawler/goutte.phar/vendor/guzzle/http/Guzzle/Http/Curl/CurlMulti.php on line 382
( ! ) Guzzle\Http\Exception\CurlException: [curl] 60: SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed [url] https://ocean.ac-guadeloupe.fr/publinet/resultats in phar://C:/wamp/www/PERSO/Crawler/goutte.phar/vendor/guzzle/http/Guzzle/Http/Curl/CurlMulti.php on line 382

Here is the minimal code to reproduce error :

<?php
require_once "goutte.phar";

use Goutte\Client;

// effectue une requête réelle vers un site externe
$client = new Client();
$crawler = $client->request('GET', 'https://ocean.ac-guadeloupe.fr/publinet/resultats');

var_dump($crawler);

Do I do something wrong ?

Problem with file upload

I'm using Behat and Mink to test a file upload. The upload action looks like that:

public function importCsvAction()
{
    $csvFile = new CSVFile($this->getUser());
    $form = $this->createFormBuilder($csvFile)
        ->add('file')
        ->getForm();

    if ($this->getRequest()->getMethod() === 'POST') {
        $form->bindRequest($this->getRequest());

        if ($form->isValid()) {
            $csvFile->upload();
            $this->persistAndFlush($csvFile);
            return new RedirectResponse($this->generateUrl('_contact_importcsvmapping', array('hash' => $csvFile->getHash())));
        }
    }

    return array('form' => $form->createView());
}

However, if I run the testing script, the form tells me "This value should not be empty". My script looks like that:

Scenario: Import from CSV
    Given I am logged in as "promoter"
    And I follow "Contacts"
    And I follow "import email contacts"
    And I attach the file "/Users/stephan/Downloads/email-export.csv" to "form[file]"
    And I press "import"
    Then show last response  
    And I press "import data now"
    Then I should see "success"

Is this a bug in goutte? It seems like there is no file data sent in the request.

Composer

Hello,

can you add the Goutte to composer ? :) - This would be really nice, because i used the symfony 2.1 with composer

Best regards

Im getting error regarding Client.php

Fatal error: Class 'Goutte\Goutte\Client' not found in /var/www/scrapper/index.php on line 8
my code:

request('GET', 'http://www.symfony-project.org/'); ?>

But in that path Client.php is already there.Dont know why this issue is popping up.

Caching

Is there a way to cache the response for a configurable time?

Parameters are being ignored when sending a GET request.

In order to scrape data that change dynamically according to GET parameters in a website,
I need to create a GET request with parameters such as:

//a=a&b=b
$params = array(
            'a' => 'a',
            'b' => 'b',
);

$crawler = $client->request('GET', 'http://www.example.com/', $params);

Then the complete URL should be formed by the URI (http://www.example.com/) plus the Request parameters (a=a&b=b) so that the request is sent to the correct URL.

Cookies with same name on different (sub)domains overwrite each other

I'm trying to crawl a url that's part of a signon process, which issues a couple of redirects, in combination with cookies:

1: subdomain1.domain.tld => sets session_id cookie, issues 302 redirect to subdomain2
2: subdomain2.domain.tld => sets session_id cookie, issues 302 redirect to subdomain1
3: subdomain1.domain.tld => issues 302 redirect to the final page

The first 2 steps work fine, but as a result of the second url, the session_id cookie overwrites the first one, even though it's on a different subdomain. When requesting the third url, the (required) session_id cookie is not being sent, because it is stored as belonging to the second subdomain, while the request is for the first subdomain.

I'm not sure if the CookieJar class in the BrowserKit needs to be fixed for this, or if Goutte needs to be fixed, but I think it's safe to say this case is handled differently in a browser (I've inspected the flow in Chrome, which sends the cookie on the third request).

handle redirection loop

browserkit redirection don't handle a loop of redirects, i can only disable or enable the redirection.

So i have this loop and can't catch it:

Symfony\Component\BrowserKit\Client->request( ) ..\Client.php:423
Symfony\Component\BrowserKit\Client->followRedirect( )  ..\Client.php:274

for my problem, i've solved it catching the loop in createResponse

class MyClient extends Client{
    const MAX_REDIRECTS = 5;

    public $redirectCount = 0;

    public function request($x, $y){
        return parent::request($x,$y);
    }

    protected function createResponse(GuzzleResponse $response)
    {
        $this->checkRedirectionLoop($response->getStatusCode());
        return parent::createResponse($response);
    }

    protected function checkRedirectionLoop($statusCode){
        $isRedirect = $statusCode == 301 || $statusCode == 302;
        if($isRedirect){
            $this->redirectCount++;
            if($this->redirectCount >= self::MAX_REDIRECTS){
                $this->redirectCount = 0;
                throw new Exception('intercepted loop of redirects');
            }
        }else{
            $this->redirectCount = 0;
        }
    }
}

Problem when several "Cookie" http header was sent to the server

When i'am using the following code as client :

use Goutte\Client;
$goutte = new Client();
$guzzleRequest = $goutte->getClient()->createRequest('get', 'http://localhost/cookies.php');
$guzzleRequest->addCookie('foo', 'FOO');
$guzzleRequest->addCookie('bar', 'BAR');
$guzzleRequest->send();

And the following code on the server :

<?php
ob_start();
var_dump($_COOKIE);
file_put_contents('/path/to/cookies.log', ob_get_clean());

I have in the /path/to/cookies.log file the following contents :

array(1) {
  'foo' =>
  string(12) "FOO, bar=BAR"
}

I'm using PHP 5.3.14 with XDebug 2.2.0 on Ubuntu.
Goutte is installed in Symfony 2.1 with the following composer.json :

{
    "require": {
        "php": ">=5.3.6",
        "symfony/symfony": "dev-master",
        "symfony/assetic-bundle": "dev-master",
        "symfony/swiftmailer-bundle": "dev-master",
        "symfony/monolog-bundle": "dev-master",
        "symfony/twig-bundle": "2.1.*@stable",
        "symfony/yaml": "dev-master",
        "symfony/config": "dev-master",
        "symfony/translation": "dev-master",
        "symfony/config": "dev-master",
        "sensio/distribution-bundle": "dev-master",
        "sensio/framework-extra-bundle": "dev-master",
        "sensio/generator-bundle": "dev-master",
        "doctrine/orm": "dev-master",
        "doctrine/doctrine-bundle": "dev-master",
        "doctrine/doctrine-fixtures-bundle": "dev-master",
        "twig/extensions": "dev-master",
        "jms/security-extra-bundle": "1.1.*",
        "stof/doctrine-extensions-bundle": "dev-master",
        "whiteoctober/breadcrumbs-bundle": "2.1.*-dev",
        "behat/behat": "2.4.*@stable",
        "behat/mink": "1.4.*@stable",
        "behat/gherkin": ">=2.2.1",
        "behat/mink-extension": "dev-master",
        "behat/mink-goutte-driver": "*",
        "behat/mink-sahi-driver": "*",
        "behat/mink-browserkit-driver":  "*",
        "behat/mink-selenium2-driver":   "*",
        "behat/symfony2-extension": "dev-master",
        "mageekguy/atoum": "dev-master"
    },
    "scripts": {
        "post-install-cmd": [
            "Sensio\\Bundle\\DistributionBundle\\Composer\\ScriptHandler::buildBootstrap",
            "Sensio\\Bundle\\DistributionBundle\\Composer\\ScriptHandler::clearCache",
            "Sensio\\Bundle\\DistributionBundle\\Composer\\ScriptHandler::installAssets"
        ],
        "post-update-cmd": [
            "Sensio\\Bundle\\DistributionBundle\\Composer\\ScriptHandler::buildBootstrap",
            "Sensio\\Bundle\\DistributionBundle\\Composer\\ScriptHandler::clearCache",
            "Sensio\\Bundle\\DistributionBundle\\Composer\\ScriptHandler::installAssets"
        ]
    },
    "extra": {
        "symfony-app-dir": "app",
        "symfony-web-dir": "web",
        "symfony-assets-install": "symlink"
    },
    "autoload": {
        "psr-0": {
            "Webloc": "src/"
        }
    },
    "config": {
        "bin-dir": "bin"
    }
}

Remove goutte.phar when installed via Composer

I include serveral packages via Composer that require Goutte. So it's installed automatically.

The problem is that my PHPStorm installation sees the Goutte.phar and uses it for autocompletion:
screen shot 2013-06-06 at 11 43 34

Is there a way to remove the Goutte.phar automatically when installing via Composer? Or maybe you should remove the Goutte.phar from GIT?

submitting forms without submit button

I'm dealing with forms without a submit button, they are submitted with a link and javascript.

My approach would be to inject a submit button and then create the form object and submit it.

So would it make sense to auto inject a submit button into a form, if you create the form based on the form tag and it does not include a submit button already?

Best

Thomas

Redirection disabled ?

Hello,

I was just wondering, why deactivate redirections in #85 ? Before that PR, Goutte was diligently following the redirections, as we could expect from this client.

thanks.

CP1251 problem

I want to get some information from a site with CP1251 encoding.

use Goutte\Client;
use Nonlux\Bundle\Entity\News;
....
protected function downloadQueuePage(){
$cli = new Client();
$url=array_pop($this->_url);
$this->output->writeln("http://www.baikal-daily.ru" . $url);
$cra=$cli->request("get", "http://www.baikal-daily.ru" . $url);
$news=new News();
$news->setSiteId(1);
$news->setUrl($url);
$news->setTitle($cra->filter("#content .main h3")->text());
}

Default Crawler returns on some pages empty nodes h1, but it exist on the page and layout like is valid. After the magic of the code Groute, Crawler and iconv. In one case, I got:

В Улан-Удэ трёхлетний мальчик упал в открытый колодец
упал в открытый колодец
�й колодец
дец

a rather that:

В Улан-Удэ трёхлетний мальчик упал в открытый колодец

Another time I got a lot of beep signals from the console, which dumps the received pages.
How can I solve this problem? Where to find the source of evil?

How to handle file download on form submission?

I'm using Goutte to submit a form where the response isn't an HTML page but rather a MS Excel file. Specifically the response has these headers:

Content-Type: application/vnd.ms-excel
Content-Disposition: attachment; filename="stuff.xls"

How can I access the contents of this file?

form attribute is not respected

Goutte seems to not respect the "form" attribute as an overwrite to form ownership.

http://dev.w3.org/html5/spec/single-page.html#attr-fae-form

With the exception of IE (10 included), all modern browsers support it fine. In the case of IE it's a simple javascript fix—just like every other normal html5 feature.

I would expect the driver to do the standard behaviour.

@environment
Feature: test environment drivers
  In order perform tests successfully
  As a Developer
  Test drivers must conform to standards

  Background:
    Given I am on the test site
      And I am on "/tests/form-attribute.php"

  Scenario: press button_1 of form_1 from inside of form_1
    Given I press "Capture fields"
     Then I should see "form_name is form_1"
      And I should see "2 apples"
      And I should see "0 oranges"
      And I should see "button_1 present"
      And I should see "button_2 not present"
      And I should see "button_3 not present"
      And I should see "outer_field present"

  Scenario: press button_3 of form_1 from inside of form_2
    Given I press "Submit from outside the form"
     Then I should see "form_name is form_1"
      And I should see "2 apples"
      And I should see "0 oranges"
      And I should see "button_1 not present"
      And I should see "button_2 not present"
      And I should see "button_3 present"
      And I should see "outer_field present"

  Scenario: press button_2 of form_2 from inside of form_1
    Given I press "Submit form_2"
     Then I should see "form_name is form_2"
      And I should see "0 apples"
      And I should see "3 oranges"
      And I should see "button_1 not present"
      And I should see "button_2 present"
      And I should see "button_3 not present"
      And I should see "outer_field not present"

Where /tests/form-attribute.php is as follows...

<!DOCTYPE html>
<meta charset="utf-8">
<title>Test Case</title>

<?php if ($_SERVER['REQUEST_METHOD'] !== 'POST'): ?>

    <form id="form_1" action="<?= $_SERVER['REQUEST_URI'] ?>" method="POST">
        <input type="checkbox" name="apples[]" value="1" checked/>
        <input form="form_1" type="hidden" name="form_name" value="form_1"/>
        <button form="form_1" type="submit" name="button_1">Capture fields</button>
        <button form="form_2" type="submit" name="button_2">Submit form_2</button>
    </form>

    <input form="form_1" type="checkbox" name="apples[]" value="2" checked/>

    <form id="form_2" action="<?= $_SERVER['REQUEST_URI'] ?>" method="POST">
        <input form="form_2" type="checkbox" name="oranges[]" value="1" checked/>
        <input form="form_2" type="checkbox" name="oranges[]" value="2" checked/>
        <input form="form_2" type="checkbox" name="oranges[]" value="3" checked/>
        <input form="form_2" type="hidden" name="form_name" value="form_2"/>
        <input form="form_1" type="hidden" name="outer_field" value="success"/>
        <button form="form_1" type="submit" name="button_3">Submit from outside the form</button>
    </form>

<?php else: ?>

<?
    echo 'form_name is '.(isset($_POST['form_name']) ? $_POST['form_name'] : 'undefined')."<br/>\n";
    echo (isset($_POST['apples']) ? \count($_POST['apples']) : 0)." apples<br/>\n";
    echo (isset($_POST['oranges']) ? \count($_POST['oranges']) : 0)." oranges<br/>\n";

    foreach (['button_1', 'button_2', 'button_3', 'outer_field'] as $key)
    {
        echo $key.' '.(isset($_POST[$key]) ? 'present' : 'not present')."<br/>\n";
    }

    echo "<hr/>\n";

    \var_dump($_POST);
?>

<?php endif; ?>

System Packages; updated to the time of this post

installed:
  behat/behat [v2.4.0] : Scenario-oriented BDD framework for PHP 5.3
  behat/gherkin [v2.2.5] : Gherkin DSL parser for PHP 5.3
  behat/mink [1.4.x-dev] : Web acceptance testing framework for PHP 5.3
  behat/mink-browserkit-driver [dev-master] : Symfony2 BrowserKit driver for Mink framework
  behat/mink-extension [dev-master] : Mink extension for Behat
  behat/mink-goutte-driver [dev-master] : Goutte driver for Mink framework
  behat/mink-selenium2-driver [dev-master] : Selenium2 (WebDriver) driver for Mink framework
  fabpot/goutte [dev-master] : A simple PHP Web Scraper
  guzzle/guzzle [v3.0.5] : Guzzle is a PHP HTTP client library and framework for building RESTful web service clients
  instaclick/php-webdriver [dev-master] : PHP WebDriver for Selenium 2
  ...
  symfony/browser-kit [2.1.x-dev] : Symfony BrowserKit Component
  symfony/config [2.1.x-dev] : Symfony Config Component
  symfony/console [2.1.x-dev] : Symfony Console Component
  symfony/css-selector [2.1.x-dev] : Symfony CssSelector Component
  symfony/dependency-injection [2.1.x-dev] : Symfony DependencyInjection Component
  symfony/dom-crawler [2.1.x-dev] : Symfony DomCrawler Component
  symfony/event-dispatcher [2.1.x-dev] : Symfony EventDispatcher Component
  symfony/finder [2.1.x-dev] : Symfony Finder Component
  symfony/process [2.1.x-dev] : Symfony Process Component
  symfony/translation [2.1.x-dev] : Symfony Translation Component
  symfony/yaml [2.1.x-dev] : Symfony Yaml Component

No local changes

For the sake of length removed obviously irrelevant ones.

exception error Curl exception [curl] 28 Please Help me !

There is an error occuring from request to url and click link showing error "[curl] 28: Operation timed out after 30750 milliseconds with 177049 out of -1 bytes received [url] http://ww....", so what cause this error and how to fix it? (The error occurs occasionally, not every time we run the system)

Allow to override Curl options passed to Guzzle request

Currently curl options are hardcoded in Client.php, but sometimes more options have to be set (increase timeout or max. redirects).

Code from Client.php:

 $guzzleRequest->getCurlOptions()
        ->set(CURLOPT_FOLLOWLOCATION, false)
        ->set(CURLOPT_MAXREDIRS, 0)
        ->set(CURLOPT_TIMEOUT, 30);

Crawler unable to select UTF-8 links

This bug is probably related with crawler component.

Simple example:

use Goutte\Client;

$client = new Client();
$crawler = $client->request('GET', 'https://www.eregitra.lt/viesa/interv/Menu.php');
$crawler = $client->click($crawler->selectLink('Teorija')->link());
var_dump($crawler->selectLink('Egzaminų tvarkaraštis'));

Output:

object(Symfony\Component\DomCrawler\Crawler)#13 (2) {
  ["uri":"Symfony\Component\DomCrawler\Crawler":private]=>
  string(83) "https://www.eregitra.lt/viesa/interv/Menu.php?Action=nveis_teor&node=4&statid=menu1"
  ["storage":"SplObjectStorage":private]=>
  array(0) {
  }
}

Looks like strange chars in "Egzaminų tvarkaraštis" are messing things.

cannot pass proxy adapter config to Goutte client

Wondering if you could help. I am passing $zendOptions as an array to the new \Goutte\Client($zendOptions, array()) method but cannot successfully update the adapter option except i explicitly call $client->setAdapter(new \Zend\Http\Client\Adapter\Proxy) in Goutte/Client.php above setConfig method. Why is this so? Also the vendor folder in src/Goutte is empty after installation and executing the update_vendors.sh file returns HEAD is now at dd59ba9 merged branch ganchiku/fixed-typo (PR #36).

UTF-8 BOM problem in crawler creation

If the response is an UTF content with BOM included, crawler returns garbage.

I've created following solution in child class (extending Goutte\Client)

        if(substr($content, 0,3) == pack("CCC",0xef,0xbb,0xbf))
        { 
            $content=substr($content, 3); 
        }
        return parent::createCrawlerFromContent($uri, $content, $type);

Recent "src/" update breaks Composer install?

I'm trying to install the latest Goutte version via Composer and I keep geting an error that the "Goutte\Client" can't be found. I noticed that there was a recent change that pulled the code out of the "src/" directory. When Composer makes the autoloader, it still tries to include this in the namespacing:

'Goutte' => $vendorDir . '/fabpot/goutte/src/',

If I manually remove it, I get this exception: https://gist.github.com/2704894

Seems like something's broken....not sure what, though.

Architectural Problem

Returning object from filtered crawler is DOM object and because of that it can't be reached directly via text() method of the crawler.

New Symphony API is returning every filtered Crawler object as instance of a Crawler Object and because of that when searched in documentation, it will lead people wrong and it's hard to understand.

I simply hacked and used "new Crawler()" to store DOM objects as Crawler object, now it's easy to use. Code is here;

require_once '../goutte.phar';  
use Goutte\Client;
use Symfony\Component\DomCrawler\Crawler;

$client = new Client();
$crawler = $client->request('GET','http://*****');

$navigation = $crawler->filter('body #anaNavList > li > h2 > a');
foreach ($navigation as $value) {
   $new_crawler = new Crawler($value);
   echo $new_crawler->text()."<br>";
}

I wish this powerful library updates the newest Crawler API. With the newest version, there will be no need to hack.

Edit: (Examples)

Crawler Object:

$x = $crawler->filter('body');
$y = $x->filter('div');
// both of them are "crawler object".

Dom Object:

$z = $x->filter('div').siblings(); // DOM object

So when we used "filter" it's ok but when we searched deeper and use functions like siblings,children etc. it is converting to DOM object because of the Crawler API version.

Notice - Uninitialized string offset

I've got the notice below, after upgrading Goutte.

Notice - Uninitialized string offset: 0 in phar://APPPATH/vendor/Goutte/goutte.phar/vendor/guzzle/guzzle/src/Guzzle/Http/Message/EntityEnclosingRequest.php on line 132

Is this a bug or I do somthing wrong?

Thanks.

Dependency Issue

Is there reason the Guzzle dependency is set to be less then 3.7? We use guzzle in another library and this is causing an issue.

Invalid characters in URI

     Symfony\Component\DomCrawler\Crawler Object
(
    [uri:Symfony\Component\DomCrawler\Crawler:private] => https://xxx.testrail.com/index.php?/auth/login/index.php?/auth/login/
    [storage:SplObjectStorage:private] => Array
        (
            [0000000038ada3700000000031bb568f] => Array
                (
                    [obj] => DOMElement Object
                        (
                            [tagName] => html
                            [schemaTypeInfo] => 
                            [nodeName] => html
                            [nodeValue] => Invalid characters in URI - TestRail

here is the code that's being used:

      require_once 'php/Goutte/goutte.phar';
      use Goutte\Client;
      $client  = new Client();
      $crawler = $client->request( 'POST', 'https://xxx.testrail.com/index.php?/auth/login/' );
      $form    = $crawler->selectButton( 'Login' )->form();
      $crawler = $client->submit( $form, array( 'name' => 'xxx', 'password' => 'xxx' ) );
      print_r( $crawler );
      $nodes   = $crawler->filter( '.errorPanel' )->text();

note the doubling of the path... is that a bug, or am I doing something incorrectly?

Unable to clone using composer: reference in packagist is wrong

In order to use Goutte together with Mink / Behat, I tried installing it using Composer by adding a requirement for package "fabpot/goutte" in my composer.json. When running Composer, this results in an exception:

Updating dependencies
Installing guzzle/guzzle (v2.6.0)
  - Installing guzzle/guzzle (v2.6.0)
    Downloading: 100%         
    Unpacking archive
    Cleaning up

Installing fabpot/goutte (dev-master 0fdf7f)
  - Installing fabpot/goutte (dev-master)
    Cloning 0fdf7fe60e1e87b2b126886295e32054ccd02dc5
Cloning into /Users/holtkamp/workspace/project/vendor/fabpot/goutte...
Cloning into /Users/holtkamp/workspace/project/vendor/fabpot/goutte...
Cloning into /Users/holtkamp/workspace/project/vendor/fabpot/goutte...



  [RuntimeException]
  Failed to clone http://github.com/fabpot/goutte via git, https and http protocols, aborting.

  fatal: http://github.com/fabpot/goutte/info/refs not found: did you run git update-server-info on the server?

I also tried to clone the Goutte repository directly. At this point I fount out that the repository name is with an upper case

$ git clone https://github.com/fabpot/goutte.git
Cloning into goutte...
fatal: https://github.com/fabpot/goutte/info/refs not found: did you run git update-server-info on the server?

$ git clone https://github.com/fabpot/Goutte.git
Cloning into Goutte...
remote: Counting objects: 435, done.
remote: Compressing objects: 100% (240/240), done.
remote: Total 435 (delta 179), reused 399 (delta 144)
Receiving objects: 100% (435/435), 599.52 KiB | 111 KiB/s, done.
Resolving deltas: 100% (179/179), done.

So the second one, works. By configuring my 'own' repository in Composer which refers to the proper Goutte repository I made it work. However, Composer prefers Packagist packages, so for package "fabpot/goutte" it would still go to Packagist and ignore my own repository.

Conclusion: can you update your Packagist account to have the repository refer to git://github.com/fabpot/Goutte.git and not git://github.com/fabpot/goutte.git?

Non-body HTML tags

I'm using Goutte through Mink to make some functional tests on a website. One of these tests consist on parse the meta tag "robots" in order to check the correct value on each kind of page that the website has.

Is there a way to retrieve non-body HTML tags through Goutte?

Thanks in advance! :)

Greetings!
Christian.

Latest goutte.phar: curl: error setting certificate verify locations

See here:
http://stackoverflow.com/questions/13288680/error-setting-certificate-verify-locations-vagrant-guzzle-curl

Guzzle\Http\Exception\CurlException: [curl] 77: error setting certificate verify locations:
CAfile: phar:///home/samuel/code/retail/autoTests/GoutteTests/goutte.phar/vendor/guzzle/guzzle/src/Guzzle/Http/Resources/cacert.pem
CApath: /etc/ssl/certs

I'm having the same problem with the latest goutte.phar.
The older version I still had lying around works fine though. If you tell me how, I tell you which version that is.

Update Guzzle Dependency

The Goutte package on packagist.org depends on guzzle/guzzle 2.6.* but the current version of Guzzle is way ahead of that. The 2.6.6 version has a problem with some versions of cURL where multiple headers with the same name are concatenated with commas instead of semi-colons. The latest version of Guzzle fixes this problem by manually concatenating headers instead of relying on cURL to do it.

Disabling ssl certification check

Hi,

According to http://mink.behat.org/#gouttedriver the client can be initiated with an array of zend_http_client options. Since I don't valid certifications on all machines, I have tried to disable certification as follows with no success:

$gouttedriver = new \Behat\Mink\Driver\GoutteDriver(
new \Behat\Mink\Driver\Goutte\Client(array(
'curloptions' => array(
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_CERTINFO => false
),
)
), array());

Is there a way of doing it that works?

Invalid namespace for use Zend\HTTP\Response\Response as ZendResponse;

Hi Fabien,

I'm trying Goutte inside the Symfony2 sandbox. I've installed the ZF2 dependencies, which include the following folders:

Zend/
  Http/
    Response/
    Response.php

I've got the following error with Goutte when calling Goutte\Client::request().

Catchable Fatal Error: Argument 1 passed to Goutte\Client::createResponse() must be an instance of Zend\HTTP\Response\Response, instance of Zend\HTTP\Response given, called in /Users/Hugo/Sites/Symfony_2_0/Sandbox/src/vendor/goutte/src/Goutte/Client.php on line 48 and defined in /Users/Hugo/Sites/Symfony_2_0/Sandbox/src/vendor/goutte/src/Goutte/Client.php line 76

In fact, the Zend\HTTP\Response\Response class doesn't exist. It's Zend\HTTP\Response that is the good one.

I've submitted a pull request that fixes the issue.

Hugo.

Goutte changes space to %2B

Form: <input name="a b" value="3" type="text" />

Query String in GET request: a%2Bb=3

Should be: a+b=3

Code:

$crawler = $client->request('GET', 'http://.../form.php');
$form = $crawler->selectButton('s')->form();
$crawler = $client->submit($form);

Any way to use this tool with PHP 5.3.1

Hello,

We are stuck using version 5.3.1 and would love to use this tool. Is there any way we can get around this problem? Is there an older version of this tool? Is there a good alternative to this tool that would work on the version we are stuck in?

Thank You

Notice: Undefined index: name' invendor/Goutte/src/Goutte/Client.php:100

This error occures only on my form with upload fields, so i guess it's related to the uploading functionallity.

Here goes the stacktrace:

Stack trace:
      #0 vendor/Goutte/src/Goutte/Client.php(100): Behat\Behat\Definition\Annotation\Definition->errorHandler(8, 'Undefined index...', '/Library/WebSer...', 100, Array)
      #1 vendor/Goutte/src/Goutte/Client.php(60): Goutte\Client->createClient(Object(Symfony\Component\BrowserKit\Request))
      #2 vendor/symfony/src/Symfony/Component/BrowserKit/Client.php(262): Goutte\Client->doRequest(Object(Symfony\Component\BrowserKit\Request))
      #3 vendor/symfony/src/Symfony/Component/BrowserKit/Client.php(222): Symfony\Component\BrowserKit\Client->request('POST', 'http://localhos...', Array, Array)
      #4 vendor/Behat/Mink/src/Behat/Mink/Driver/GoutteDriver.php(343): Symfony\Component\BrowserKit\Client->submit(Object(Symfony\Component\DomCrawler\Form))
      #5 vendor/Behat/Mink/src/Behat/Mink/Element/NodeElement.php(104): Behat\Mink\Driver\GoutteDriver->click('(//html/.//inpu...')
      #6 vendor/Behat/Mink/src/Behat/Mink/Element/NodeElement.php(112): Behat\Mink\Element\NodeElement->click()
      #7 vendor/Behat/Mink/src/Behat/Mink/Element/TraversableElement.php(125): Behat\Mink\Element\NodeElement->press()
      #8 vendor/Behat/Mink/src/Behat/Mink/Behat/Context/MinkContext.php(194): Behat\Mink\Element\TraversableElement->pressButton('Veranstaltung s...')
      #9 [internal function]: Behat\Mink\Behat\Context\MinkContext->pressButton('Veranstaltung s...')
      #10 vendor/Behat/Behat/src/Behat/Behat/Definition/Annotation/Definition.php(157): call_user_func_array(Array, Array)
      #11 vendor/Behat/Behat/src/Behat/Behat/Tester/StepTester.php(179): Behat\Behat\Definition\Annotation\Definition->run(Object(Ajado\EventHubBundle\Features\Context\FeatureContext), Array)
      #12 vendor/Behat/Behat/src/Behat/Behat/Tester/StepTester.php(155): Behat\Behat\Tester\StepTester->runStepDefinition(Object(Behat\Behat\Definition\Annotation\When))
      #13 vendor/Behat/Behat/src/Behat/Behat/Tester/StepTester.php(119): Behat\Behat\Tester\StepTester->executeStep(Object(Behat\Gherkin\Node\StepNode))
      #14 vendor/Behat/Gherkin/src/Behat/Gherkin/Node/AbstractNode.php(42): Behat\Behat\Tester\StepTester->visit(Object(Behat\Gherkin\Node\StepNode))
      #15 vendor/Behat/Behat/src/Behat/Behat/Tester/ScenarioTester.php(139): Behat\Gherkin\Node\AbstractNode->accept(Object(Behat\Behat\Tester\StepTester))
      #16 vendor/Behat/Behat/src/Behat/Behat/Tester/ScenarioTester.php(87): Behat\Behat\Tester\ScenarioTester->visitStep(Object(Behat\Gherkin\Node\StepNode), Object(Ajado\EventHubBundle\Features\Context\FeatureContext), Array, false)
      #17 vendor/Behat/Gherkin/src/Behat/Gherkin/Node/AbstractNode.php(42): Behat\Behat\Tester\ScenarioTester->visit(Object(Behat\Gherkin\Node\ScenarioNode))
      #18 vendor/Behat/Behat/src/Behat/Behat/Tester/FeatureTester.php(81): Behat\Gherkin\Node\AbstractNode->accept(Object(Behat\Behat\Tester\ScenarioTester))
      #19 vendor/Behat/Gherkin/src/Behat/Gherkin/Node/AbstractNode.php(42): Behat\Behat\Tester\FeatureTester->visit(Object(Behat\Gherkin\Node\FeatureNode))
      #20 vendor/Behat/Behat/src/Behat/Behat/Console/Command/BehatCommand.php(108): Behat\Gherkin\Node\AbstractNode->accept(Object(Behat\Behat\Tester\FeatureTester))
      #21 vendor/symfony/src/Symfony/Component/Console/Command/Command.php(214): Behat\Behat\Console\Command\BehatCommand->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
      #22 vendor/symfony/src/Symfony/Component/Console/Application.php(194): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
      #23 vendor/symfony/src/Symfony/Bundle/FrameworkBundle/Console/Application.php(75): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
      #24 vendor/symfony/src/Symfony/Component/Console/Application.php(118): Symfony\Bundle\FrameworkBundle\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
      #25 app/console(16): Symfony\Component\Console\Application->run()
      #26 {main}

the html is here: https://gist.github.com/1103686

Any ideas why this might happen?

Checkboxes array issue

Imagine we have a complex form which contains following part:

<input name="foo[]" id="foo-10" value="10" type="checkbox">
<input name="foo[]" id="foo-11" value="11" type="checkbox">

<input name="bar[]" id="bar-11" value="11" type="checkbox">
<input name="bar[]" id="bar-12" value="12" type="checkbox">

This part submitted as array:

foo[]=10&foo[]=11&bar[]=11&bar[]=12

And on server side

// $_POST['foo']
array(0 => '10', 1 => '11')

But when I create form with Goutte i have one 'foo[]' checkbox with possible value 11 and one 'bar[]' checkbox with possible value 12.

So I can't check both foo[] checkboxes or first one (same for bar[]).

Is this any workarround or this is a bug? I'm not sure in component which produces this behavior (Goutte/Guzzle/Crawler), sorry )

It looks like a cacert.pem-File is missing in .phar

I just got an error:
" copy(phar:///var/www/goutte/goutte.phar/vendor/guzzle/http/Guzzle/Http/Resources/cacert.pem): failed to open stream: phar error: "vendor/guzzle/http/Guzzle/Http/Resources/cacert.pem" is not a file in phar "/var/www/goutte/goutte.phar""
So I extracted the phar with the php ExtractTo-Method. I was able to navigate to /vendor/guzzle/http/Guzzle/Http . Inside the Http-Directory, there are some direcories and files, but no Resources-Directory as expected. Just downloaded the newest phar just some hours ago ("Latest commit: 2f51047".

Changelog?

The current changelog is very out of date. Do you plan on updating it? If not, perhaps just remove it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.