sunra / php-simple-html-dom-parser Goto Github PK
View Code? Open in Web Editor NEWPHP Simple HTML DOM Parser adaptation for Composer and PSR-0
PHP Simple HTML DOM Parser adaptation for Composer and PSR-0
I have a trouble. Using the $html->load_file method, it shows errors if a page doesn't exist.
The error says: 'Warning: file_get_contents(http://auto.desko.kg/car/24779): failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found in C:\xampp\htdocs\deskoparse\simple_html_dom.php on line 1080'. Is it possible to add checks to the parser so it could find out if such a page exists, and also why that mthod doesn't return 'True' if a page exists?
I have been experiencing the error "Warning: file_get_contents(): stream does not support seeking..." since I upgraded to PHP 7.1.x
Any fixes ?
Here is an updated package for this library:
composer require caophihung94/php-simple-html-dom-parser
Here is my testing code:
$CC = <<<EOF
<p style="max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); text-align: center; font-family: 微软雅黑; font-size: 14px; line-height: 24px; box-sizing: border-box !important; word-wrap: break-word !important; background-color: rgb(255, 255, 255);"><img img_width="500" img_height="398" data-type="jpeg" data-ratio="NaN" data-w="0" width="auto" width="auto" data-src="http://mmbiz.qpic.cn/mmbiz/fZ6yVsBCVhLQdrDUBay4Ps1qhhKGiadibMIdicxOXx74cXsIVxk0Emib1XpZxHUXLuToWEMibPRr0I8noqtuWZfowNg/640?wx_fmt=jpeg"/></p>
EOF;
//well, load the class as u often do
//Loader::import('SimpleHtmlDom', 'html');
$DOM = str_get_html($CC);
if ( $DOM == false )
{
return false;
}
echo $DOM->innertext;
and output is:
<p style="max-width: 100%; min-height: 1em; white-space: pre-wrap; color: rgb(62, 62, 62); text-align: center; font-family: 微软雅黑; font-size: 14px; line-height: 24px; box-sizing: border-box !important; word-wrap: break-word !important; background-color: rgb(255, 255, 255);"><img img_width="500" img_height="398" data-type="jpeg" data-ratio="NaN" data-w="0" width="auto"></p>
well, something is missing.
if i comment the code fragment bettween line 1488 to 1491 and i would got what i want:
if (isset($node->attr[$name]))
{
return;
}
it maybe a bug!
As far as I can determine, Simple HTML DOM does not have a way to actually remove DOM elements from a document. This can be troublesome, especially if you're using mpdf to make a PDF file and there's an <svg>
tag in there; mpdf flips out whenever it sees one.
There may be a good reason removeChild() has not been implemented, but as a suggestion for a future update, could such a function be implemented?
The parser won't find "<p class="body"" in this line:
<script id="forecast-summary-0" type="text/x-jquery-tmpl"> <div id="forecast-summary" class="summary-column"> <h3>Forecast Summary</h3> <div class="forecast-summary" lang="en-GB"> <ul > <li> <h4 class="title">This Evening and Tonight</h4> <p class="body">Fairly cloudy this evening with scattered heavy showers, which gradually ease through the evening. However cloud thickeing overnight to bring periods of occasionally heavy rain before dawn as southeast winds increase strong to near gale.</p> </li> </ul> </div> </div> </script>
(all one line). I Use "$body = $body[0]->find('p[body]');" to find it but it returns no results. Is there something I've missed, can you help???
find
method cannot find tag that has additional classes.
For example, I want to find all tags that have 'services' class:
<div class='services'>
or
<div class='services last-item'>
or
<div class='services active'>
But, If I run:
$html->find('div[class=services]');
I will only get one result:
<div class='services'>
This pattern:
([\w-:*])(?:#([\w-]+)|.([\w-]+))?(?:[@?(!?[\w-:]+)(?:([!^$]?=)["']?(.*?)["']?)?])?([/, ]+)
Treats the - in both the character groups as ranges rather than characters to match meaning that the regex is looking for everything including and between \w-: rather than the three characters by themselves. The same issue is repeated near the middle of the regex.
See pr #70
is there a way to avoid something like this???
$dom = HtmlDomParser::str_get_html($html_str);
if($dom->find('h1', 0))
return $dom->find('h1', 0)->plaintext;
if($dom->find('h2', 0))
return $dom->find('h2', 0)->plaintext;
if($dom->find('h3', 0))
return $dom->find('h3', 0)->plaintext;
if($dom->find('h4', 0))
return $dom->find('h4', 0)->plaintext;
If you use php 7.3 and higher, then use my edits. Otherwise, you will get errors due to migration to PCRE2 in new versions of PHP.
For example: Warning: preg_match_all (): Compilation failed: invalid range in character class at offset 4
Hello, I'm using the latest version of the parser <1.8.1> downloaded from the official sourceforge page. When I use the function file_get_html(
) to pull a webpage from a remote host, I'm getting a warning that the request has timed out <at line 136>, though the warning/error occurs only when it's made from a remote host/environment - it works perfectly fine when made from my local server.Edit: That's the whole code on github - here
Additional edit: You can experience the warning/error in the integrated github environment or at my remote server...
When I use your library together with Laravel and take advantage of Laravel's possibility to start local development server using command php artisan serve
I run into an issue where Laravel server gets stuck in an endless loop of calls to simple_html_dom_node->__destruct()
. After maximum execution time is exceeded, Laravel server calls:
Laravel development server started: <http://127.0.0.1:8000>
[Thu Jun 14 09:24:43 2018] PHP Fatal error: Maximum execution time of 60 seconds exceeded in C:\Users\[REDACTED]\Desktop\Tests\PHP\blog\blog\vendor\sunra\php-simple-html-dom-parser\Src\Sunra\PhpSimple\simplehtmldom_1_5\simple_html_dom.php on line 140
[Thu Jun 14 09:24:43 2018] PHP Stack trace:
[Thu Jun 14 09:24:43 2018] PHP 1. simplehtmldom_1_5\simple_html_dom_node->__destruct() C:\Users\[REDACTED]\Desktop\Tests\PHP\blog\blog\vendor\sunra\php-simple-html-dom-parser\Src\Sunra\PhpSimple\simplehtmldom_1_5\simple_html_dom.php:0
I debugged the issue for a while but could not resolve it by any other way than to delete/rename/comment-out your destructors in mentioned class.
Minimum, Complete and Verifiable example/Steps to reproduce:
composer create-project --prefer-dist laravel/laravel blog
cd blog
and update composer.json with required dependency to your library "sunra/php-simple-html-dom-parser": "^1.5"
composer update
to fetch newly added dependency.routes/web.php
and update its contents to contain followinguse Sunra\PhpSimple\HtmlDomParser;
Route::get('/', function(){
$input = <<<EOM
<!-- PUT YOUR NON-TRIVIAL HTML MARKUP HERE -->
EOM;
$parser = new HtmlDomParser();
$dom = $parser->str_get_html($input);
return view('welcome');
});
(1) Note that <!-- YOUR NON-TRIVIAL HTML MARKUP HERE -->
should really be replaced with non-trivial markup, e.g. google.com's source from front-page.
5. Start local development server php artisan serve
and access used address (it defaults to 127.0.0.1:8000)
(2) Note I was not able to reproduce it using on PHPv7.1.14 or PHPv7.1, but PHPv7.1.13, PHPv7.1.18 and even PHPv7.2 do suffer from this behavior.
I worked-around this issue by setting up composer script on post-autoload-dump
event where I search and destroy (rename) your destructors.
PROBLEM
When parsing a document having: <input name="me" value="my { dog is nice"> the document is parsed in an invalid way. The value property for $input in
foreach($this->html->find('input[name='me']') as $input)
is "my {dog is nice" plus all remaining HTML, instead of "my {dog is nice".
WORKAROUND
I commented $this->remove_noise("'({\w)(.*?)(})'s", true); in the load method, but I guess an improvement in remove_noise in order to be aware of quotes would be a better solution.
Regards, Pablo.
Hi sunra, I am having an issue using Simple HTML DOM Parser. Have used it several times before but until now I came across this issue:
When searching for TDs, when there is a blank TD (with or no content) I get as a result the next TDs.
I have found also that someone reported the same on Stackoverflow: http://stackoverflow.com/questions/11123267/simple-html-dom-parser-return-empty-td-with-all-tds-values
Example as a result of var_dumping $html->find('td');
(element 2 should be blank!):
0
12/02/2014 09:14 AM
1
MEXICO D.F. En proceso de entrega MEX MEXICO D.F.
2
12/02/2014 08:27 AM
MEXICO D.F. Llegada a centro de distribucion
Envio en proceso de entrega
The recent changes on composer.json file is in master branch and as the default version is 1.5.1, its needed to merge the composer.json file from master to 1.5.1 branch.
I'm having some trouble trying to parse documents > MAX_FILE_SIZE. Since this is a constant, I can't redefine this in a clean way. I think you could define this as a public static var in class simple_html_dom_node and use it from there.
https://github.com/voku/simple_html_dom
A HTML DOM parser written in PHP - let you manipulate HTML in a very easy way! This is a fork of PHP Simple HTML DOM Parser project but instead of string manipulation we use DOMDocument and modern php classes like "Symfony CssSelector".
PHP Warning 'yii\base\ErrorException' with message 'preg_match(): Compilation failed: invalid range in character class at offset 4'
in .../sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/simplehtmldom_1_5/simple_html_dom.php:1378
on $contents = file_get_contents($url, $use_include_path, $context, $offset = 0) put an & on character &
it happen just when i change my hosting provider. in other hosting provider works great
Problem get content from khmer24.com.
//sample code
$html = file_get_html('http://khmer24.com');
print_r($html);
//result is blank page
Note: if I change url to http://google.com it works.
this is a simple PHP script
<?php
$html = HtmlDomParser::str_get_html('<html><body><span>a</span><span>{b</span><span>c}</span><span>d</span></body></html>');
foreach ($html->find('span') as $v) {
echo $v->innertext."\n";
}
?>
I expected follwings:
A
{b
c}
d
But result is follwings:
a
{b</span><span>c}
d
My class like this
public function LayHinhTuDong9GagAction($id="aP9QwYV%2CaRjvbnq%2CaOB9wDy")
{
$client = new \GuzzleHttp\Client();
$res = $client->request('GET', 'https://9gag.com/?id='.$id.'&c=10',
[
'headers' => [
'referer'=>'https://9gag.com/',
'x-requested-with'=>'XMLHttpRequest',
'method'=>'GET',
'authority'=>'9gag.com',
'path'=>'/?id=aP9QwYV%2CaRjvbnq%2CaOB9wDy&c=10',
'scheme'=>'https',
'accept'=>'application/json, text/javascript, */*; q=0.01',
'accept-encoding'=>'gzip, deflate, br',
'user-agent'=>'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
]
]
);
$chuoi_dulieu= $res->getBody();
$stringBody = (string) $chuoi_dulieu;
$stringBody=\GuzzleHttp\json_decode($stringBody);
$array_hinh=$stringBody->items;
echo count($array_hinh);
var_dump($array_hinh);
foreach($array_hinh as $key=>$hinh){
$la_video=0;
$dom = HtmlDomParser::str_get_html( $hinh );
$tua_de=$dom->find("h2",0)->plaintext;
$elems = $dom->find("source");
if(empty($elems))// xử lý khi là hình gif mp4 video
{
$elems_2 = $dom->find("div[class=badge-video-container]");
if(!empty($elems_2)){
if($elems_2[0]->{'data-video-source'}=="YouTube"){
echo "YouTube";
continue;
}
}
}else
{
$link_video=$elems[0]->src;;
$link_hinh=preg_replace('/460(\w*)/', "700b", $elems[0]->src);
$link_hinh=str_replace("mp4","jpg",$link_hinh);
$slug_id=$this->LuuVideo($link_video,$link_hinh);
if(!$slug_id)
{
echo "loi";exit;
}
$la_video=1;
// continue;// tiep tuc vong lap bo qua cac ham phia sau
}
if($la_video!=1)
{
$elems = $dom->find("img");//tim hinh anh khong phai la video
$link_hinh=preg_replace('/460(\w*)/', "700b", $elems[0]->src);
$slug_id=$this->LuuAnh($link_hinh);
if(!$slug_id)
{
echo "loi";exit;
}
}
echo $link_hinh.'<br>';
$post=new \PostsCollection();
$post->tua_de=$tua_de;
$post->link_hinh="/photo/".$slug_id.'.jpg';
$post->link_goc=$link_hinh;
$post->save();
// $dom->clear();
// unset($dom);
$this->view->pick("LayHinhTuDong/index");
}
Hello,
I am looping through a HTML string as follows:
foreach ( $dom->find( 'text' ) as $element ) {
if ( !in_array( $element->parent()->tag, $excludedParents ) ) {
$element->innertext = preg_replace(
'/(?<!\w)' . preg_quote( $search, "/" ) . '(?!\w)/i',
$replace,
$element->innertext
);
}
}
This works fine for excluded parents like a
, div
or em
, but not for a.test
or div#test
. Is there an elegant way to solve that?
When you attempt to get the text of an element that has no html elements in it it returns a non-utf-8 encoded string. An element such as
<h3>Технические работы на сервере<h3>
the string returned by innerText() is not encoded properly but the string returned by outerText() is returned with the proper encoding. This refers to the simple_html_dom_node class.
Is it possible to extract contents of first text node?
I.e. string Hello
in subtree
<div>
Hello
<strong>World!</strong>
</div>
(1/1) ErrorException
file_get_contents(): stream does not support seeking
$html = HtmlDomParser::file_get_html('http://www.google.com/');
foreach($html->find('a') as $element)
echo $element->href . '
';
By running composer validate
we get following.
"./composer.json" does not match the expected JSON schema:
- authors[0].name : The property name is required
Note: I have the patch file but do not have access push to repository to create pull request
So, the page that I am parsing. It has a td
tag within that in some cases it has a
tag and in some cases it doesn't have.
However, i have tried this $row->find('td', 2)->find('a', 0)
and it says can't find value on null.
Is there anyway to find the child exists or not?
One way that I have found is count($row->find('td', 2)->find('a', 0))
and if it returns 1 basically there's a child and otherwise none.
Is there any other way to find it?
Thanks in advance.
when will php 7.1 be supported?
After upgrading to V1.5.2 it always shows error to this function file_get_html()
Currently the one we are using is v1.5.0 but after updating it now, it shows this error:
Your requirement could not be resolved to an installation set of packages.
Problem 1
I have the following table - only 2 rows shown for brevity. How do I traverse the table to extract the price class value for Catalog ID 100245 i.e. H1?
<tbody>
<tr class="catalog_line">
<td class="properties">
<div class="grid-prop">
<span class="label nom">Catalog ID</span>
<span class="catdata1 cdatamarker">100245</span>
</div>
<div class="grid-prop nom">
<span class="label">Product, price class</span>
<span class="catdata1">
<span class="category">Cars</span>
, H1
</span>
</div>
</td>
</tr>
<tr class="catalog_line">
<td class="properties">
<div class="grid-prop">
<span class="label nom">Catalog ID</span>
<span class="catdata1 cdatamarker">100246</span>
</div>
<div class="grid-prop nom">
<span class="label">Product, price class</span>
<span class="catdata1">
<span class="category">Cars</span>
, H1
</span>
</div>
</td>
</tr>
<tbody>
Example code
$content = "<div class=test><embed src='http://....swf' quality='high' width='480' height='400' align='middle' allowScriptAccess='always' allowFullScreen='true' mode='transparent' type='application/x-shockwave-flash'></embed></div>";
$dom = HtmlDomParser::str_get_html($content);
$newsContent = $dom->find(".test", 0)->text();
var_dump(newsContent);
Expected result:
string(0) ""
Result:
string(8) "</embed>"
//function file_get_html($url, $use_include_path = false, $context=null, $offset = -1, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
function file_get_html($url, $use_include_path = false, $context=null, $offset = 0, $maxLen=-1, $lowercase = true, $forceTagsClosed=true, $target_charset = DEFAULT_TARGET_CHARSET, $stripRN=true, $defaultBRText=DEFAULT_BR_TEXT, $defaultSpanText=DEFAULT_SPAN_TEXT)
``
$offset = 0 fix problem with php 7.1
getAttribute error while the attribute name include '-'
Is there anybody meet this problem before?
like this :
$element->data-lazyload;
Hi,
First of all, thanks for this great tool. I'm having a little problem. When I use either HtmlDomParser::file_get_html($urlOfThePage), or get the html of the file with curl and use HtmlDomParser::file_get_html($str) for one specific html page, those functions return false. They are perfectly working fine with other pages but this one. Why would that be?
Thanks.
I've stumble upon edge case where html reached MAX_FILE_SIZE constant, it would be nice to be able to increase it.
It could be implemented really easy just checking if not already defined, then user could redefine it as necessary.
Even better would be exception to know what happened without diving into library code itself.
Hello and thank you for your great work,
I'm using php-simple-html-dom-parser in a free project and try to solve a bug that occurred.
This is my code:
foreach ( $dom->find( 'text' ) as $element ) {
if ( !in_array( $element->parent()->tag, [ 'a', 'pre', 'code' ] ) ) {
foreach ( $markers as $marker ) {
$text = $marker[ 'text' ];
$url = $marker[ 'url' ];
$tip = strip_tags( $marker[ 'excerpt' ] );
$tooltip = ( $tooltip ? "data-uk-tooltip title='$tip'" : "" );
$tmpval = "tmpval-$i";
$element->innertext = preg_replace(
'/\b' . preg_quote( $text, "/" ) . '\b/i',
"<a href='$url' $hrefclass target='$target' $tmpval>\$0</a>",
$element->innertext,
1
);
$element->innertext = str_replace( $tmpval, $tooltip, $element->innertext );
$i++;
}
}
}
This code searches for text on a page and replaces words with other words.
It works fine.
But as I found out, this code is removing new lines from <pre><code>...</code></pre>
:
This is an example-output using the code above:
<pre><code><div class="uk-form-row"> <label class="uk-form-label">{{ 'Pages' | trans }}</label> <div class="uk-form-controls uk-form-controls-text"> <input-tree :active.sync="package.config.nodes"></input-tree> </div> </div> </code></pre>
This is an example-output without using the code above:
<pre><code><div class="uk-form-row">
<label class="uk-form-label">{{ 'Pages' | trans }}</label>
<div class="uk-form-controls uk-form-controls-text">
<input-tree :active.sync="package.config.nodes"></input-tree>
</div>
</div>
</code></pre>
file_get_html returns false for this URL: https://tripadvisor.ca/Restaurant_Review-g255344-d724335-Reviews-Dynasty_Chinese_Restaurant-Launceston_Tasmania.html
which can be loaded in the browser, but this URL works fine: 'https://tripadvisor.ca'
I'm trying to fetch some data from external website where I need many requests. Then the file_get_contents() may through some authorization error. What's your thought about this?
I'm trying to select 'td > a span' but it's selecting 'td a span'...
simplehtmldom is currently in version 1.8.1
Why not use the latest version?
I had trouble with the current version because mb_detect_encoding
isn't available on all systems. This is fixed in version 1.8.1
I have the following warnings when using this library
Warning: file_get_contents(): stream does not support seeking in vendor/sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/simplehtmldom_1_5/simple_html_dom.php on line 81
Warning: file_get_contents(): Failed to seek to position -1 in the stream in vendor/sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/simplehtmldom_1_5/simple_html_dom.php on line 81
Can someone gives help?
I found a problem with :first-child
psedo-class selector.
For this HTML
<div>
<a href="javascript:void(0)">×</a>
<div class="links">
<ul>
<li>
<a href="https://github.com/">link 1</a>
<span>(info)</span>
</li>
<li>
<a href="https://github.com/">link 2</a>
<span>(info)</span>
</li>
</ul>
</div>
</div>
Selector .links > ul > li:first-child > a
matches 0 elements, selector .links > ul > li > a
matches two elements.
Expected behavior is that selector .links > ul > li:first-child > a
matches this element:
<li>
<a href="https://github.com/">link 1</a>
<span>(info)</span>
</li>
$dom = HtmlDomParser::str_get_html('
欢迎来到。这是我的第一篇文章。最先写作吧!
');What is the cause of this mistake?
ErrorException : preg_match(): Compilation failed: invalid range in character class at offset 4
at /Users/enle/app/hyena-cms/vendor/sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/simplehtmldom_1_5/simple_html_dom.php:1378
1374| $this->char = $this->doc[--$this->pos]; // prev
1375| return true;
1376| }
1377|
1378| if (!preg_match("/^[\w-:]+$/", $tag)) {
1379| $node->_[HDOM_INFO_TEXT] = '<' . $tag . $this->copy_until('<>');
1380| if ($this->char==='<') {
1381| $this->link_nodes($node, false);
1382| return true;
Exception trace:
1 preg_match("/^[\w-:]+$/", "p")
/Users/enle/app/hyena-cms/vendor/sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/simplehtmldom_1_5/simple_html_dom.php:1378
2 simplehtmldom_1_5\simple_html_dom::read_tag()
/Users/enle/app/hyena-cms/vendor/sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/simplehtmldom_1_5/simple_html_dom.php:1187
3 simplehtmldom_1_5\simple_html_dom::parse()
/Users/enle/app/hyena-cms/vendor/sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/simplehtmldom_1_5/simple_html_dom.php:1081
4 simplehtmldom_1_5\simple_html_dom::load("
欢迎来到。这是我的第一篇文章。最先写作吧!
")5 simplehtmldom_1_5\str_get_html("
欢迎来到。这是我的第一篇文章。最先写作吧!
")6 call_user_func_array("\simplehtmldom_1_5\str_get_html")
/Users/enle/app/hyena-cms/vendor/sunra/php-simple-html-dom-parser/Src/Sunra/PhpSimple/HtmlDomParser.php:21
7 Sunra\PhpSimple\HtmlDomParser::str_get_html("
欢迎来到。这是我的第一篇文章。最先写作吧!
")8 App\Service\ArticleFormatter::convertImage(Object(Closure))
/Users/enle/app/hyena-cms/app/Service/ArticleFormatter.php:91
9 App\Service\ArticleFormatter::importImage()
/Users/enle/app/hyena-cms/app/Console/Commands/WpSynchronizationImage.php:50
10 App\Console\Commands\WpSynchronizationImage::handle()
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:32
11 call_user_func_array([])
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:32
12 Illuminate\Container\BoundMethod::Illuminate\Container{closure}()
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:90
13 Illuminate\Container\BoundMethod::callBoundMethod(Object(Illuminate\Foundation\Application), Object(Closure))
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:34
14 Illuminate\Container\BoundMethod::call(Object(Illuminate\Foundation\Application), [])
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Container/Container.php:576
15 Illuminate\Container\Container::call()
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Console/Command.php:183
16 Illuminate\Console\Command::execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
/Users/enle/app/hyena-cms/vendor/symfony/console/Command/Command.php:255
17 Symfony\Component\Console\Command\Command::run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Illuminate\Console\OutputStyle))
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Console/Command.php:170
18 Illuminate\Console\Command::run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
/Users/enle/app/hyena-cms/vendor/symfony/console/Application.php:908
19 Symfony\Component\Console\Application::doRunCommand(Object(App\Console\Commands\WpSynchronizationImage), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
/Users/enle/app/hyena-cms/vendor/symfony/console/Application.php:269
20 Symfony\Component\Console\Application::doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
/Users/enle/app/hyena-cms/vendor/symfony/console/Application.php:145
21 Symfony\Component\Console\Application::run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Console/Application.php:90
22 Illuminate\Console\Application::run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
/Users/enle/app/hyena-cms/vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:122
23 Illuminate\Foundation\Console\Kernel::handle(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
/Users/enle/app/hyena-cms/artisan:38
If you try to get all the anchors like this, then by default one (the last) element is returned:
$anchors = $soup->getElementsByTagName('a');
The following does give me all the elements:
$anchors = $soup->getElementsByTagName('a', null);
The idx
value defaults to -1
. Is this on purpose?
Hi,
I installed the library using composer so now I have a folder under "vendor/sunra/php-simple-html-dom-parser/....".
I am pasting my code controller code here (using CodeIgnitrer), and for some reason the library doesn't load properly.
I keep getting the error: "Call to undefined function file_get_html()" when running create_main_array()
function.
Is there something that I'm not getting right?
I did include the autoload.php file, like any other library installed with composer and this worked till now.
did the same with use Sunra\PhpSimple\HtmlDomParser;
.
<?php
**require FCPATH. 'vendor/autoload.php';
use Sunra\PhpSimple\HtmlDomParser;**
/******************************************/
/* example Scraping */
/******************************************/
class Example extends CI_Controller {
public function __construct() {
parent::__construct();
// Check if the user is logged in else KICK!:
if ( ! $this->session->userdata('is_logged_in') ) {
redirect('login');
}
// Load 'kas_model' Model
$this->load->model('users_model');
$this->load->model('expenses_model');
// Sets the server not to have a time out.
ini_set('max_execution_time', 0);
ini_set('memory_limit', '-1');
// More Of MySQL
ini_set('mysql.connect_timeout','0');
// Expand the array displays
ini_set('xdebug.var_display_max_depth', 5);
ini_set('xdebug.var_display_max_children', 256);
ini_set('xdebug.var_display_max_data', 1024);
}
// Main Page
public function index(){
$this->load->view('header');
$this->load->view('dashboard');
$this->load->view('example/main_example');
$this->load->view('footer');
}
// Gets a page [string] variable and returns a string of the HTML.
public function scrape_page($page) {
// $string = file_get_contents($page);
$string = **file_get_html**($page);
return $string;
}
// Running this controller
public function create_main_array() {
**$string = $this->scrape_page('https://example.com/websites');
// Find all images
foreach($string->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($string->find('a') as $element)
echo $element->href . '<br>';**
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.