Giter VIP home page Giter VIP logo

ganon's People

Contributors

nielsad avatar

Watchers

 avatar

ganon's Issues

getPlainText html_entity_decode encoding error

Running version lower than PHP 5.3 (and even higher versions, if you believe 
the comments at php.net) does not default to UTF-8, but to ISO-8859-1, when 
using html_entity_decode(...) function. This creates problems when using 
getPlainText(), because it does not take into account the encoding.

What will reproduce the problem?
Just parse something in an encoding other than *YOUR* html_entity_decode(...) 
function and it should be easy to see the problems.

What is the expected output? What do you see instead?
Expected output are correctly converted html enttities. 
I get an empty string, like " " => ""
but I would expect to see, " " => " "


Which version are you using?
Ganon single file PHP5 (rev. #78)

Please provide any additional information below.
It can be easily resolved by replacing the function getPlainText() from 
return preg_replace('`\s+`', ' ', html_entity_decode($this->toString(true, 
true, true), ENT_QUOTES));

to

return preg_replace('`\s+`', ' ', html_entity_decode($this->toString(true, 
true, true), ENT_QUOTES, $this->getEncoding()));

Original issue reported on code.google.com by [email protected] on 19 Jan 2013 at 1:57

Unable to set the disabled attribute of an input tag in a valid way

Setting the disabled attribute of an input tag like as follows

$node->disabled = 'disabled';

results in the output: <input disabled="">
NOTE: the value is not inserted.

where as if you do the following:

$node->disabled = 1;

the result output is: <input disabled="1">
NOTE: the value is inserted.

The problem is the '$this->attributes[$a] !== $a' check at line 337 within the 
HTML_Node::toString_attributes() method in gan_node_html.php 

This check stops an attribute name being the same as the attribute's value, but 
in the case of disabled="disabled" this is required.

This happens in rev72

Original issue reported on code.google.com by [email protected] on 20 Jul 2012 at 4:04

Inserted (addChild) elements are not in future queries

What will reproduce the problem?
If you insert a an element (such as a new input field), and then do a select on 
it (such as "input") the new child doesn't show up in the select results.

What is the expected output? What do you see instead?
The inserted item should be included in future results.

Which version are you using?
 * Ganon single file version - PHP5+ version
 * Generated on 24 Mar 2012

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 11 Sep 2012 at 9:02

Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM

What steps will reproduce the problem?
1. Running the youtube-sample on a host with PHP Version 5.2.6-1+lenny8

What is the expected output? What do you see instead?
I get an error:
Parse error: syntax error, unexpected T_PAAMAYIM_NEKUDOTAYIM in 
/var/www/test/ganon.php  on line 1053

What version of the product are you using? On what operating system?
Ganon Rev28, Debian Lenny



Original issue reported on code.google.com by [email protected] on 20 Jul 2010 at 12:48

Memory Leaks, and solution for...

Hi!

First off all, thank you for your ganon library... it's a great code (very 
useful for me)

Well, I am writing any web scrappers with ganon, and, I have memory leak 
problems with large run scripts.

At the end, Linux ends my script because exhausts all the memory availlable 
(even the swap partition) and leave
my server runninng very slowly... sic.

I took a few days investigating why these memory leaks, and, this is the 
complete conclusions:

First, a little test to see the memory leak running with ganon:

<?php
include('lib/ganon.php');

set_time_limit(3600); //For slow servers...

function ParseTest($TheHtml){

  //do serveral Parses to check memory liberation
  //without leaving the function scope:
  for ($f = 1; $f <= 20; $f++){

   $test_html = new HTML_Parser_HTML5($TheHtml);
   $span=$test_html->root->select('span[class="IsThis"]',0);

   //Test if the select works...
   if (!$span) echo 'Select Error...';
  }//for f

}//ParseTest



echo '<pre>';
echo 'Php Version:'.phpversion().'<br><br>';

//Build an html for testing
$test_string=str_repeat('<div><span class="NOIsThis">Foo</span></div><div><span 
class="IsThis">Bar</span></div>',40);

//Loop for testing memory consumption
for ($i = 1; $i <= 20; $i++){
 ParseTest($test_string);
 echo sprintf( '>>>>>>>>>> Iteration: %4s, Memory Usage: %8s <br>',
               $i,number_format(memory_get_usage()) );


}
echo '</pre>';
?>


If I Run the test with the original ganon ( Ganon single file PHP5 (rev. #72) 
), the test script stops because it consume
all the memory available for php (I think in my case is 128MB).

This is the output of the test:

Php Version:5.2.14

>>>>>>>>>> Iteration:    1, Memory Usage: 9,277,760 
>>>>>>>>>> Iteration:    2, Memory Usage: 18,021,192 
>>>>>>>>>> Iteration:    3, Memory Usage: 26,567,912 
>>>>>>>>>> Iteration:    4, Memory Usage: 35,508,408 
>>>>>>>>>> Iteration:    5, Memory Usage: 44,055,968 
>>>>>>>>>> Iteration:    6, Memory Usage: 52,602,616 
>>>>>>>>>> Iteration:    7, Memory Usage: 61,935,256 
>>>>>>>>>> Iteration:    8, Memory Usage: 70,482,696 
>>>>>>>>>> Iteration:    9, Memory Usage: 79,028,872 
>>>>>>>>>> Iteration:   10, Memory Usage: 87,575,696 
>>>>>>>>>> Iteration:   11, Memory Usage: 96,122,120 
>>>>>>>>>> Iteration:   12, Memory Usage: 104,669,872 
>>>>>>>>>> Iteration:   13, Memory Usage: 113,216,320 
>>>>>>>>>> Iteration:   14, Memory Usage: 123,336,072 
>>>>>>>>>> Iteration:   15, Memory Usage: 131,883,464 


Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to 
allocate 3441 bytes) in \lib\ganon.php on line 247


Whell, like I set, I took a few days investigating this problem and... there 
are two things I found that cause these
memory leaks:

- You must destroy the baseclass of any extended class (when is necessary), by 
calling parent::__destruct();
- The callback functions created can not be destroyed, and Ganon creates alot 
of these callback functions.
  Reference: The comments of the php manual in: http://php.net/manual/en/function.create-function.php

Always is better not use autogenerated code.

I made all these modifications in ganon.php, creating: nml_ganon.php and, using 
it, this is the result of the previous
test:

Php Version:5.2.14

>>>>>>>>>> Iteration:    1, Memory Usage:  600,712 
>>>>>>>>>> Iteration:    2, Memory Usage:  600,712 
>>>>>>>>>> Iteration:    3, Memory Usage:  600,712 
>>>>>>>>>> Iteration:    4, Memory Usage:  600,712 
>>>>>>>>>> Iteration:    5, Memory Usage:  600,712 
>>>>>>>>>> Iteration:    6, Memory Usage:  600,720 
>>>>>>>>>> Iteration:    7, Memory Usage:  600,720 
>>>>>>>>>> Iteration:    8, Memory Usage:  600,720 
>>>>>>>>>> Iteration:    9, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   10, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   11, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   12, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   13, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   14, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   15, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   16, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   17, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   18, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   19, Memory Usage:  600,720 
>>>>>>>>>> Iteration:   20, Memory Usage:  600,720 

Ok, no memory leaks...

NOTE:
I just changed one of the callback functions (for now), but the code has others 
create_function in 
the getChildrenByAttribute function of HTML_Node class, so... its not complete 
yet (maybe in a days I will
finish this)

I dont know if here I can attach a file (I will tray), in any case, I have put 
the file accesible in one of my servers, at:

http://trucomania.org/inaki/nml_ganon_rev72.zip

I hope you think in this for your next revisión. I will change the rest of the 
callbacks when I found time.

Thanks again for your great library! 

Original issue reported on code.google.com by [email protected] on 20 Sep 2012 at 9:42

Attachments:

wrap() sets wrapped element as last child

What will reproduce the problem?
Wrap an element, that isn't last among its siblings, with another element.

What is the expected output? What do you see instead?
Expected: Element is wrapped in another element; nothing else.
Actual: Element is wrapped in another element; element and new parent are set 
as last child of original parent element.

Which version are you using?
Rev. 72, PHP 5.3.6

Original issue reported on code.google.com by [email protected] on 27 Sep 2012 at 2:30

Not selecting by class properly

What will reproduce the problem?

<?php
include 'ganon.php';

$html = '<html><head><body><div class="special-post">This is a special 
post</div></body></html>';
$dom = str_get_dom($html);
$special = $dom('.special');
echo $special[0]->getPlainText();


What is the expected output? What do you see instead?

Exception, $special[0] shouldn't be set because the document doesn't have any 
element with the class "special". I get the string "This is a special post" 
instead.


Which version are you using?
r78

Please provide any additional information below.

I am testing the library and I have found this big bug at the first test. I 
like the idea and the way that ganon works, I hope it will get a fix.

Thanks for your work

Original issue reported on code.google.com by [email protected] on 29 Oct 2012 at 12:24

When input name attribute equals "name" the value is removed

What will reproduce the problem?

    $result = str_get_dom('<input name="name"/>');
    echo $result->toString(true, true, 1);

What is the expected output? What do you see instead?

    Expected output is:
        <input name="name"/>
    Actual output is:
        <input name />

Which version are you using?

    not sure but it says at the top of file:
    * Ganon single file version - PHP5+ version
    * Generated on 20 Oct 2012

Original issue reported on code.google.com by [email protected] on 17 Apr 2013 at 9:43

setInnerText content is not query-able

Hello,

I'm curious whether the following behavior is supposed to work, or if not is 
there some kind of workaround? Thanks!

What will reproduce the problem?

$html = str_get_dom('<div id="a"></div>');
$html('#a', 0)->setInnerText('<div id="b"></div>');
$html('#b', 0)->setInnerText('hello');
echo $html;

What is the expected output?
<div id="a"><div id="b">hello</div></div>

What do you see instead?
<div id="a"><div id="b"></div></div>


Original issue reported on code.google.com by [email protected] on 13 Apr 2011 at 3:18

Curl support

It is not exactly an issue, but i'd like to use this beautiful parser with 
curl. How can I use this function with your class?

function get_web_page( $url )
{
    $options = array(
        CURLOPT_RETURNTRANSFER => true,     // return web page
        CURLOPT_HEADER         => false,    // don't return headers
        CURLOPT_FOLLOWLOCATION => true,     // follow redirects
        CURLOPT_ENCODING       => "",       // handle compressed
        CURLOPT_USERAGENT      => "spider", // who am i
        CURLOPT_AUTOREFERER    => true,     // set referer on redirect
        CURLOPT_CONNECTTIMEOUT => 120,      // timeout on connect
        CURLOPT_TIMEOUT        => 120,      // timeout on response
        CURLOPT_MAXREDIRS      => 10,       // stop after 10 redirects
    );

    $ch      = curl_init( $url );
    curl_setopt_array( $ch, $options );
    $content = curl_exec( $ch );
    $err     = curl_errno( $ch );
    $errmsg  = curl_error( $ch );
    $header  = curl_getinfo( $ch );
    curl_close( $ch );

    $header['errno']   = $err;
    $header['errmsg']  = $errmsg;
    $header['content'] = $content;
    return $header;
}

Original issue reported on code.google.com by [email protected] on 24 Oct 2012 at 10:34

Problem to understand how Ganon works

Hi @all!

So far, I've used phpQuery in my projects. Now I have seen Ganon, and I want to 
use it for future. But I have a problem to unterstand how Ganon works.


This is what I have done in phpQuery: I want to load some HTML-Code into my 
template and change some attributes.

$index = phpQuery::newDocumentHTML('HTML-Code of the entire page');
$content = phpQuery::newDocumentHTML('Some HTML-Code who has to be in index');

phpQuery::selectDocument($index);
pq('#content')->append($cont);
pq('#content a')->attr("href", "chmod")->text("Next");
die ($index);


And now I've tried to do this with Ganon:
$index = str_get_dom('HTML-Code of the entire page');
$content = str_get_dom('Some HTML-Code who has to be in index');

$index->select('#content', 0)->setInnerText($content);


And here this error comes: "Fatal error: Cannot use object of type HTML_Node as 
array"

Could anybody help me with the correct code to do my actions: load some 
HTML-Code into my template and change some attributes.
This would be great :)

Regards, Steff

Original issue reported on code.google.com by [email protected] on 19 May 2013 at 2:54

Auto charset conversión for getPlainText()?

I'm scrapping a web page in iso-8859-1, but my scripts works in UTF-8 (php 
code, mysql databases, etc), so.. if I get the text of a node, getPlainText() 
returns the text in iso-8859-1 (the charset oh the loaded html) and I cant make 
equality comparisions in my code.

I solved this (for this particular case) converting to UTF-8 in the 
getPlainText implementation:

function getPlainText() {
    return preg_replace('`\s+`', ' ', utf8_encode( html_entity_decode($this->toString(true, true, true), ENT_QUOTES) ));
}

but... I'm thinking... what about an automatic detection of the loaded html 
encoding and one option to set the charset for the result strings of 
getPlainText()?

I's just an idea O:)

Original issue reported on code.google.com by [email protected] on 6 Sep 2012 at 7:50

Doen't work with 5.4

What will reproduce the problem?
Running it with 5.4 :)

Which version are you using?
Latest Rev from Feb 16 with PHP 5.4

Fatal error: 'break' operator with non-constant operand is no longer supported 
in C:\work\php\someproject\libs\ganon.php on line 1609 

Original issue reported on code.google.com by [email protected] on 12 Mar 2012 at 12:55

Incorrect parsing for children of children

The following functions will return all children in the DOM object. However, it 
looks like if there is text between the nested tags it sometimes misses a 
child. For example, <div>Hello<span>world</span></div> will miss the span data. 
Also, the ability to dump the DOM into a JSON obejct as provided below would be 
a nice feature.

function get_all_children($el) {
    $output = array();
    $row = array(
        'name' => $el->getTag(),
        'raw' => $el->getInnerText()
    );
    for ($i = 0; $i < $el->childCount(); $i++) {
        $row['children'] = get_all_children($el->getChild($i));
    }
    foreach($el->attributes as $attr => $value) {
        $row['attribs'] = array(
            $attr => $value
        );
    }
    array_push($output, $row);
    return $output;
}

function get_dom_array($html, $selector) {
    $output = array();
    foreach($html($selector) as $el) {
        $row = array(
            'name' => $el->getTag(),
            'raw' => $el->getInnerText()
        );
        for ($i = 0; $i < $el->childCount(); $i++) {
            $row['children'] = get_all_children($el->getChild($i));
        }
        foreach($el->attributes as $attr => $value) {
            $row['attribs'] = array(
                $attr => $value
            );
        }
        array_push($output, $row);
    }
    return $output;
}

$html = str_get_dom('<html><body><div>Hello World</div></body></html>');
$dom_array = get_dom_array($html, 'div');
echo json_encode($dom_array);

Original issue reported on code.google.com by [email protected] on 18 Oct 2012 at 1:47

typo in "filter_contains" function name

There is a typo in packed version of ganon.php on line 1860:

protected function filter_containts($text) {

should be

protected function filter_contains($text) {

Original issue reported on code.google.com by [email protected] on 11 Nov 2010 at 12:15

problems getting text from SPAN

The following code echos nothing.  It should echo some prices.

$html = file_get_dom('http://www.libertysilver.se/kopa/guldtackor');
foreach($html->select('div.productBox') as $product){
   echo $product->select('span.productUnitSellPrice span', 0)->getPlainText() . "<br>";
}

It seems like whenever I have a problem with ganon, it's related to SPAN tags.

I'm using the latest version of ganon.

Thanks for you help :)

Original issue reported on code.google.com by [email protected] on 17 Jun 2012 at 3:40

The parser don't work for every url

ganon.php (html_parser) don't work with urls like this: 
http://www.google.it/language_tools


Maximum execution time of 30 seconds exceeded in ganon.php on line 247


PHP Version 5.3.8
last version of ganon.php


Original issue reported on code.google.com by [email protected] on 1 Dec 2011 at 5:29

Does not recognize <!DOCTYPE html> as open HTML tag

What will reproduce the problem?
Trying to get nodes inside html tags if document uses html5

If 'file.html' starts with the HTML5 tag.
<!DOCTYPE html>
...
</html>

$html_node = $html('html', 0);
echo gettype($html_node);     // RETURNS NULL


However if the doc is declared with

<html>
...
</html>

it works as intended



What is the expected output? What do you see instead?


Which version are you using?


Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 5 Dec 2012 at 8:56

$node what is that.

this is example code

from this wiki url : http://code.google.com/p/ganon/wiki/AccesElements

please let me know what is $node how to defined it.

thanks in advance.




    // To use a CSS selector query on a node, you simply use the node as a function.
    // The result will be stored in an array (of nodes).
    $match_array = $node('.myclass');


    // To iterate the result, you can use foreach
    foreach($match_array as $element) {
      echo $element, "<br>\n"; 
    }


    // The above can be shortened to the following
    foreach($node('.myclass') as $element) {
      echo $element, "<br>\n"; 
    }


    // Because $element is also a node, you can also perform a query on that node
    // and nest queries
    foreach($node('.myclass') as $element) {
      foreach($element('.myotherclass') as $new_element) {
        echo $new_element, "<br>\n"; 
      }
    }

    // If you know which element of the array you
    // are going to need, you can pass an index to the function
    $a = $node('a', 2);
    // A negative index will start counting from the end of the array
    $a = $node('a', -1);






Original issue reported on code.google.com by [email protected] on 4 Jan 2013 at 7:26

Wrong class detection in Select

An example:

$test_string = '<div><span class="NOIsThis">Foo</span></div><div><span 
class="IsThis">Bar</span></div>';
$test_html = str_get_dom($test_string);
$spans = $test_html->select('span.IsThis');
echo 'Spans with class IsThis (should be one):'.count($spans);
echo "\r\n";
echo 'This should print Bar: 
'.$test_html->select('span.IsThis',0)->getPlainText();

I want select the span with class "IsThis", but, the query returns the first 
span (with class "NOIsThis").

I think this is wrong... dont you think?

Original issue reported on code.google.com by [email protected] on 10 Sep 2012 at 3:01

0 as text doesn't work

I've been using your excellent DOM parser for a project of mine recently, and 
came across this bug:

In the latest version (r55), consider the following to be part of the HTML 
input:

<b>0</b> zero
<b>1</b> one
<b>2</b> two

The output generated is then:

<b></b> zero
<b>1</b> one
<b>2</b> two

This is because of the following line in parse_text():
if ($this->status['text']) {
which needs to be
if ($this->status['text'] !== "") {

Because "0" obviously evaluates to false, the text contents of the <b> tag 
never gets properly saved.

Original issue reported on code.google.com by [email protected] on 30 Mar 2011 at 3:53

Fatal error: Call to undefined function reg_replace() on line 1509

What will reproduce the problem?

- Using the function removeClass();

What is the expected output? What do you see instead?

 - Fatal error:  Call to undefined function reg_replace() on line 1509

Which version are you using?

 - Generated on 20 Oct 2012

Please provide any additional information below.

It should be 'preg_replace'.

Original issue reported on code.google.com by [email protected] on 31 Jul 2013 at 12:53

Attribute's quotation type is not preserved

What will reproduce the problem?

<tag attr="value">
<tag attr='value'>

both get output as:

<tag attr="value">

when reconstructing the HTML.

What is the expected output? What do you see instead?

Expected output is to preserve the type of quotes used, single ' or double ". 
This is important with inline/embedded javascript in attributes.

Which version are you using?

Ganon single file PHP5 (rev. #78)

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 19 Mar 2013 at 6:56

Fatal error if object not found

What will reproduce the problem?

calling a getPlainText() on a element that isn't found.


What is the expected output? What do you see instead?

Would love this to return blank or false or something instead of a fatal error

Which version are you using?
rev72

Please provide any additional information below.


Original issue reported on code.google.com by [email protected] on 5 Jun 2012 at 5:04

file_get_dom cannot pass arguments to file_get_contents

Some sites serve content differently, or not at all *cough*Facebook*, depending 
on the user_agent string passed in the headers. As the function is now, the 
user would have to hard-code these extra parameters in. If you change the 
file_get_dom function to the following:

function file_get_dom($file, $return_root = true, $use_include_path = false, 
$context = null) {
    $f = file_get_contents($file, $use_include_path, $context);
    return (($f === false) ? false : str_get_dom($f, $return_root));
}

You could set this on a case-by-case basis like so:

$opts = array('http' => array('method' => 'GET', 'header' => "Accept-language: 
en\r\n", 'user_agent' => 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; 
rv:1.9.2.16) Gecko/20110319 Firefox/3.6.16'));
$context = stream_context_create($opts);
$tmp_discussion_html = file_get_html($some_url, true, false, $context);

Original issue reported on code.google.com by [email protected] on 29 Apr 2011 at 10:45

span with line break seems to inject br tag.

The parser seems to be inserting the <br> tag anytime I have a span followed by 
a link break.

Ex: When presented with the following HTML
<span>
1
</span>

I expect the following (when using $html->select('body')

<span>
1
</span>

Instead all <span> with breaks are replaced with:
<span><br/>
1<br/>
</span><br/>

Using version. 72


Original issue reported on code.google.com by [email protected] on 16 Jul 2012 at 5:44

getChildrenByTag, getChildrenByID, getChildrenByClass, etc doesnt work

Hi

These functions in ganon.php:

    function getChildrenByID($id, $recursive = true) {
        return getChildrenByAttribute('id', $id, 'equals', 'total', $recursive);
    }
    function getChildrenByClass($class, $recursive = true) {
        return getChildrenByAttribute('class', $id, 'equals', 'total', $recursive);
    }
    function getChildrenByName($name, $recursive = true) {
        return getChildrenByAttribute('name', $name, 'equals', 'total', $recursive);
    }

returns an error, because the call ( getChildrenByAttribute ) is not in the 
scope
of the class.

You must make the call as:

    function getChildrenByID($id, $recursive = true) {
        return $this->getChildrenByAttribute('id', $id, 'equals', 'total', $recursive);
    }
    function getChildrenByClass($class, $recursive = true) {
        return $this->getChildrenByAttribute('class', $id, 'equals', 'total', $recursive);
    }
    function getChildrenByName($name, $recursive = true) {
        return $this->getChildrenByAttribute('name', $name, 'equals', 'total', $recursive);
    }   

in order to work...

or... you can put off that function_create invention and do a normal select 
instead:

    function getChildrenByAttribute($attribute, $value, $mode = 'equals', $compare = 'total', $recursive = true) {
        return $this->select( sprintf('[%s="%s"]',$attribute,$value) );
    }
    function getChildrenByTag($tag, $compare = 'total', $recursive = true) {
    return $this->select( $tag );
    }
    function getChildrenByID($id, $recursive = true) {
        return $this->select( sprintf('[id="%s"]',$class) );
    }
    function getChildrenByClass($class, $recursive = true) {
    return $this->select( sprintf('[class="%s"]',$class) );
    }
    function getChildrenByName($name, $recursive = true) {
        return $this->select( sprintf('[name="%s"]',$class) );
    }



Original issue reported on code.google.com by [email protected] on 20 Sep 2012 at 11:37

Maximum execution time of 30 seconds

CODE: $html = file_get_dom('http://www.wikieasy.it');

ERROR: Fatal error: Maximum execution time of 30 seconds exceeded in 
C:\Inetpub\wwwroot\uno\ganon.php on line 238

VERSION:  Last version for php5


It give me this error with differents site like:
http://www.univpm.it

Thanks

Original issue reported on code.google.com by [email protected] on 10 Apr 2013 at 2:57

memory exhausted

What will reproduce the problem?


What is the expected output? What do you see instead?


Which version are you using?

Generated on 20 Oct 2012

Please provide any additional information below.

array(4) {
  ["type"]=>
  int(1)
  ["message"]=>
  string(81) "Allowed memory size of 134217728 bytes exhausted (tried to allocate 114078 bytes)"
  ["file"]=>
  string(69) "libs/pquery/ganon.php"
  ["line"]=>
  int(238)
}

Original issue reported on code.google.com by [email protected] on 31 Jul 2013 at 3:08

Fatal error with deleteChild()

Thank you for this code! It is great!

Working with it, I found a little issue:

When I want to get an element without children:

                $p = str_get_dom($html); 

        $b = $p('*',0);

        //Iterate over childnodes
        for ($i = 1; $i < $b->childCount(); $i++) {
          $b->deleteChild($i);
        }

I get this:

Notice: Undefined offset: 0 in 
/Applications/XAMPP/xamppfiles/htdocs/lubith/v2/version/2.0.0/library/html/ganon
/ganon.php on line 1302

Fatal error: Call to a member function delete() on a non-object in 
/Applications/XAMPP/xamppfiles/htdocs/lubith/v2/version/2.0.0/library/html/ganon
/ganon.php on line 1302

I've change line 1302 from

    $this->children[$child]->delete();

        to

    if(isset($this->children[$child])) $this->children[$child]->delete();

Now it is working.


Original issue reported on code.google.com by [email protected] on 26 Mar 2013 at 3:03

Some sites not being loaded by file_get_dom

What will reproduce the problem?
Grabbing the DOM of some sites just doesn't seem to work.  Here's one that 
fails for me: http://www.hisradio.com

What is the expected output? What do you see instead?
I expect it to grab the DOM, like when I use http://www.google.com

Which version are you using?
Latest, using php 5.3

Please provide any additional information below.
I'm thinking there are server settings that disallow php access, possibly in a 
robots.txt file or something along those lines.  Am I missing something?

Original issue reported on code.google.com by [email protected] on 31 Aug 2013 at 3:22

All text nodes which are not an element nodes

How I can find all text nodes which are not an element nodes?

For example: 
I have this text.
<strong>Hallo!</strong> What <strong>are</strong> <strong>you doing</strong>?

And I want find only words "What" and "?". Is it possible?

Original issue reported on code.google.com by [email protected] on 1 Aug 2013 at 2:09

Some notices with E_NOTICE

What will reproduce the problem?
Turn on error_reporting(E_ALL);

What is the expected output? What do you see instead?
It shouldn't kick up any warnings, but it kicks these up:

Notice: Uninitialized string offset: 1 in /mnt/..../ganon.php on line 2086

Notice: Uninitialized string offset: 1 in /mnt/..../ganon.php on line 2086


Which version are you using?
 * Ganon single file version - PHP5+ version
 * Generated on 24 Mar 2012

Please provide any additional information below.

That's it! Thanks for the great library. :-)

Original issue reported on code.google.com by [email protected] on 10 Sep 2012 at 4:09

str_get_dom() error

$rt="<td>my name somebody</td>";
$html= str_get_dom($rt);
foreach($html('input[class]') as $element) {
    echo $element->class; 
}

line number 2 show s error 

Fatal error: Function name must be a string in 
/home/content/18/7124318/html/rkys/geL.php

Original issue reported on code.google.com by [email protected] on 28 Mar 2012 at 8:10

PHPDoc

It's possible to add the PHPDoc comments for every method?
In netbeans, this could be useful for autocomplete.

Original issue reported on code.google.com by [email protected] on 25 Apr 2013 at 8:15

output from echoing $html is already decoded

What will reproduce the problem?
$html = str_get_dom("<div>&nbsp;&gt;&lt;</div>");
echo $html;

What is the expected output? What do you see instead?
I expect the raw source code to be output, the same as what I put in. Instead, 
I get:
<div>�><</div>

Basically, it's the same as what I put in, but run through 
html_entity_decode(). Is there some way to get raw html?

Which version are you using?
PHP 5.3.3-7, ganon rev #69

Original issue reported on code.google.com by [email protected] on 1 Mar 2012 at 4:35

filter_element is protected

What will reproduce the problem?
Fatal error: Call to protected method HTML_Node::filter_element() from context 
'HTML_Formatter' in *.* on line 2761


Which version are you using?
ganon.php rev#59




Original issue reported on code.google.com by [email protected] on 14 Feb 2012 at 9:27

toString_attributes, Invalid argument supplied for foreach()

What will reproduce the problem?
include( 'ganon.php' );
$html = str_get_dom( '<html><body>foo bar<p>foobar</p><?php echo "foobar"; 
?></body></html>' );
echo $html;

What is the expected output? What do you see instead?
Expected:
<html><body>foo bar<p>foobar</p><?php echo "foobar"; ?></body></html>
Got:
PHP Warning:  Invalid argument supplied for foreach()
<html><body>foo bar<p>foobar</p><?php echo "foobar"; ?></body></html>

Which version are you using?
rev78

Please provide any additional information below.
Easy fix; in function toString_attributes( ) surround:
foreach($this->attributes as $a => $v) {
  $s .= ' '.$a.(((!$this->attribute_shorttag) || ($this->attributes[$a] !== $a)) ? '="'.htmlspecialchars($this->attributes[$a], ENT_QUOTES,$
}
with:
if(is_array($this->attributes)){
  ...
}

Original issue reported on code.google.com by [email protected] on 2 Sep 2013 at 10:52

Undefined variable: tag_ns

    PHP Notice:  Undefined variable: tag_ns in blah/vendor/ganon.php on line 1177
    PHP Notice:  Undefined variable: tag_ns in blah/vendor/ganon.php on line 1160

Replacing `$tag_ns` with `$_->tag_ns` solved this for me.

Original issue reported on code.google.com by pushkov.alexander.110 on 15 Mar 2013 at 1:01

  • Merged into: #27

$tag_ns is uninitialized

What will reproduce the problem?
Using ganon in PHP 5.4

What is the expected output? What do you see instead?
No warnings - instead warnings are shown for lines 1160 and 1177 where $tag_ns 
is used

Which version are you using?
Ganon single file PHP5 (rev. #78)

Please provide any additional information below.

Original issue reported on code.google.com by [email protected] on 25 Oct 2013 at 4:31

Class selector won't work if element has multiple classes

The following code produces an empty array for '$spans':

$test_string = '<div><span class="text test">Foo</span></div><div><span 
class="text test">Bar</span></div>';

$test_html = str_get_dom($test_string);
$spans = $test_html('.text');
$results = '.text: ' . count($spans);

Original issue reported on code.google.com by [email protected] on 29 Apr 2011 at 10:41

file_get_dom, runs forever

What will reproduce the problem?
$html = file_get_dom('http://www.nhl.com/ice/schedulebyseason.htm');

What is the expected output? What do you see instead?
After taking more than 30 seconds and triggering a fatal error many times, I 
set `set_time_limit(0);`. It has been ongoing since for about 15 minutes.
"Fatal error: Maximum execution time of 30 seconds exceeded in 
C:\xampp\htdocs\hockey\ganon.php on line 238"

Which version are you using?
Ganon single file PHP5 (rev. #78)
PHP 5.4.7

Please provide any additional information below.
It worked with the examples provided, 'code.google.com'

Original issue reported on code.google.com by [email protected] on 8 Mar 2013 at 1:24

Sujestion for avoid: Fatal error: Call to a member function getPlainText() on a non-objec

Hi (again)

This is just a suggestion for improvement

I am making any scrappers to get data of several webs, and. I'm concerned about 
the possibility of
any changes in the structure of the webs that I'm scrapping.

My scrappers do sistematic (and unnatended) work so... I always need to check 
if all the tags what are I 
spected are in the web page and log it for posterior analysis.

With this in my main... I never can concatenate several operations (select, 
getPlainText, etc) because if any of 
the selects returns null, the script crash with the error: 

Fatal error: Call to a member function getPlainText() on a non-object in ...

Sometimes I call to select just for test if a node is present (for example, 
test if the div with id 
"LastMinuteOffer" it's present.
In this case, I dont concatenate calls, just do:

$t1=$html->select('div#LastMinuteOffer',0);
if ($t1){
//There are a last minute offfer...
}

But sometimes, I just want to get the text of a delimited node, so, in any 
cases, I concatenate several
calls in one, something like this:

$MovieTitle=$html->select('h3.title a.title',0)->getPlainText();

In this case, if the select fails, returns null, so... the getPlainText() fires 
the error:

Fatal error: Call to a member function getPlainText() on a non-object in ...

and the script fails.

This circunstance forces me to no concatenate nothing and test every thing, 
with nasty code like this:

$t1=$html->select('h3.title a.title',0)->getPlainText();
if (!$t1) {$TheError='Fail in Movie Title'; return false }
$MovieTitle=$t1->getPlainText();

I have done a new function to improve my code, perhaps any other guy is 
interested in:

select_imperative

With this function, I can concatenate all I want without danger of errors and I 
can catch the exception if any of the
selects fails.
I can do something like:

  try {
    $MovieTitle=$html->select_imperative('h3.title a.title',0)->getPlainText();
  } catch(Exception $e) {
    $TheError='Fail in Movie Title: '.$e->getMessage()."\n";
    return false; //Return with error
  }
  return true;   //Return All ok

Or can catch group all the errors in just one:

  try {
    $MovieTitle=$html->select_imperative('h3.title a.title',0)->getPlainText();
    $Author=$html->select_imperative('span.author',0)->getPlainText();
    $Date=$html->select_imperative('span.date',0)->getPlainText();
    $Format=$html->select_imperative('span.format',0)->getPlainText();

  } catch(Exception $e) {
    $TheError='Error scrapping Movie: '.$e->getMessage();
    return false; //Return with error
  }
  return true;   //Return All ok

With this I reduce my code huff.... a lot.


In the class HTML_Node:

  function select_imperative($query = '*', $index = false, $recursive = true, $check_self = false) {
    if ( ($rv=$this->select($query,$index,$recursive, $check_self)) == null){
      throw new Exception('Null query in select: '.$query);
    } else return $rv;
  }

and, in the class HTML_Parser:

  function select_imperative($query = '*', $index = false, $recursive = true, $check_self = false) {
        return $this->root->select_imperative($query, $index, $recursive, $check_self);
    }

Regards!

Original issue reported on code.google.com by [email protected] on 21 Sep 2012 at 6:26

Typo in removeClass

In gan_node_html.php line:

$class = reg_replace('`\b'.preg_quote($c).'\b`si', '', $class);

should be:

$class = preg_replace('`\b'.preg_quote($c).'\b`si', '', $class);

Original issue reported on code.google.com by [email protected] on 16 Jun 2013 at 12:41

Infinite Loop

Got infinite loop on function parse() line 1186, after call in my script 
$postImages = $html->find('img');


Te input was:
<p>     &nbsp;</p> <h2 style="text-transform: uppercase; font-weight: normal; 
font-size: 17px; color: rgb(241, 214, 143); font-family: 'Trebuchet Ms', 
Verdana, Arial, sans-serif; line-height: 17px;">    <em>CEDRO</em></h2> <p> 
    <em><span style="color: rgb(241, 214, 143); font-family: 'Trebuchet Ms', 
Verdana, Arial, sans-serif; line-height: 17px; background-color: rgb(43, 63, 
28);">O Cedro &eacute; uma das madeiras mais conhecidas, mas pouca gente 
j&aacute; viu a &aacute;rvore em s&iacute;. Ele serviu de suporte para uma das 
primeiras manifesta&ccedil;&otilde;es art&iacute;sticas brasileiras: o Barroco 
Guarani.</span></em></p> <p>    <br />  <p>         <br />      <p>             <br />          <h2 
style="text-transform: uppercase; font-weight: normal; font-size: 17px; color: 
rgb(241, 214, 143); font-family: 'Trebuchet Ms', Verdana, Arial, sans-serif; 
line-height: 17px;">                <img 
src="http://www.umpedeque.com.br/site_umpedeque/public/img/arvores/cedro_inteiro
.jpg" data_ratio="0.76" style="" data_width="380" data_height="500" /></h2> 
        </p>    </p> </p>

Formated:

<p>     &nbsp;</p> 
<h2 style="text-transform: uppercase; font-weight: normal; font-size: 17px; 
color: rgb(241, 214, 143); font-family: 'Trebuchet Ms', Verdana, Arial, 
sans-serif; line-height: 17px;">    <em>CEDRO</em></h2>
 <p>    
    <em>
        <span style="color: rgb(241, 214, 143); font-family: 'Trebuchet Ms', Verdana, Arial, sans-serif; line-height: 17px; background-color: rgb(43, 63, 28);">O Cedro &eacute; uma das madeiras mais conhecidas, mas pouca gente j&aacute; viu a &aacute;rvore em s&iacute;. Ele serviu de suporte para uma das primeiras manifesta&ccedil;&otilde;es art&iacute;sticas brasileiras: o Barroco Guarani.</span>
    </em>
</p> 
<p> 
    <br />  
    <p>
        <br /> 
        <p> 
            <br />  
            <h2 style="text-transform: uppercase; font-weight: normal; font-size: 17px; color: rgb(241, 214, 143); font-family: 'Trebuchet Ms', Verdana, Arial, sans-serif; line-height: 17px;">                <img src="http://www.umpedeque.com.br/site_umpedeque/public/img/arvores/cedro_inteiro.jpg" data_ratio="0.76" style="" data_width="380" data_height="500" /></h2>
        </p> 
    </p>
</p>


Original issue reported on code.google.com by [email protected] on 18 Aug 2013 at 9:11

Attachments:

setIndex function

        function setIndex($index) {
        if ($this->parent) {
            if ($index > $this->index()) {
                --$index;
            }
            $this->parent->deleteChild($this, true);
            $this->parent->addChild($this, $index);
        }
    }

Original issue reported on code.google.com by [email protected] on 17 May 2013 at 12:08

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.