thomasweinert / fluentdom Goto Github PK
View Code? Open in Web Editor NEWA fluent api for working with XML in PHP
Home Page: https://thomas.weinert.info/FluentDOM/
License: MIT License
A fluent api for working with XML in PHP
Home Page: https://thomas.weinert.info/FluentDOM/
License: MIT License
When running this code, the website crashes with error ERR_EMPTY_RESPONSE...
$html='<a>hello</a>';
require_once($this->plugin_dir . '_inc/php/autoload.php');
$fd = FluentDOM::load($html, 'text/html');
The line
$fd = FluentDOM::load($html, 'text/html');
Makes it crash..
Any idea ?
Select the top level nodes (document element) for new instances if a manipulation function is used before find().
When passing in fragments of html into FluentDOM::QueryCss() it will wrap it with
... . Is it possible not to have the fragment wrapped?Port the ical to xcal converter from Carica Status Monitor to FluentDOM, providing a loader for ical.
Replace calls using func_get_args() to variadics. This will increase the minimum version requirement to PHP 5.6, so it should be a major release.
I think I'm seeing a double decode error on a utf-8 string.
In the test below, the href attribute is a 'RIGHT SINGLE QUOTATION MARK' which is U+2019 aka the bytes e2 80 99 .
When I do $element->getAttribute('href');
the byte values present are c3, a2, c2, 80, c2, 99.
These just happen to be the characters U+00E2, U+0080, U+0099 - i.e. it appears the right quotation mark is decoded to bytes, and then those bytes are then decoded again.
// U+2019 e2 80 99 RIGHT SINGLE QUOTATION MARK
// U+00E2 c3 a2 LATIN SMALL LETTER A WITH CIRCUMFLEX
// U+0080 c2 80
// U+0099 c2 99
<?php
use FluentDOM\Document;
use FluentDOM\Element;
require_once(__DIR__.'/../../vendor/autoload.php');
$rightQuoteMark = "’";
if (!function_exists('getRawCharacters')) {
function getRawCharacters($result)
{
$resultInHex = unpack('H*', $result);
$resultInHex = $resultInHex[1];
$resultSeparated = implode(', ', str_split($resultInHex, 2)); //byte safe
return $resultSeparated;
}
}
echo "Raw characters are: " . getRawCharacters($rightQuoteMark) . "\n";
$html = <<< HTML
<html>
<body>
<a href="%s"><span>blah</span>
</a>
</body>
</html>
HTML;
$html = sprintf($html, $rightQuoteMark);
$document = new Document();
$document->loadHTML($html);
$linkClosure = function (Element $element) {
$href = $element->getAttribute('href');
echo "href chars after parsing are: " . getRawCharacters($href) . "\n" ;
};
$document->find('//a')->each($linkClosure);
// FluentDOM 5.3.0
// "reference": "19c5a3c77c91871d2a2545949b5bde20889fcb45"
Package suggests fluentdom/css-selector
, but it should be fluentdom/selectors-phpcss
.
Newer libxml version have several options that control the loading process. It could be useful to wrap that options.
Not all options are available widely at the moment. Some emulation for the features like LIBXML_HTML_NODEFDTD might be useful.
Overload all methods that have an *NS version to resolve namespaces using the document defined namespaces. Some are already implemented. Add the missing methods.
getAttribute()
getAttributeNode()
getElementsByTagName()
hasAttribute()
removeAttribute()
setAttribute()
setAttributeNode()
setIdAttribute()
I'm not sure if its a FluentDOM issue or not. I believe css selectors should be case-insensitive but the are not.
$fd = FluentDOM::QueryCss('<div></div>')
->find('DIV')
->text('Hello World!');
echo $fd->document->toHtml(); //returns <div></div> (symfony css converter)
Evaluates the expression expecting a node list, but returns a FluentDOM\Query instance.
Usage:
$dom = new FluentDOM\Document();
$dom->loadXml($xml);
foreach ($dom->find('//atom:entry') as $entry) {
echo $entry->find('atom:title')->text();
}
This allows an alternative access to the fluent api.
At the moment a parsing error might result in just the message Invalid/empty content parameter.
. If this is because of an fatal error in the parsing it would be nice to include information about that error in the message.
Is there any way to have custom tags with implicit name spaces left alone? For example, I have a custom tags along the lines of:
<wt:folder id="123" foo="bar">blah blah</wt:folder>
<wt:person id="1234" />
I use some parsers to convert certain blocks of html to other html structures, each parser processes the html and then ends up returning the finished html with:
$fd = \FluentDOM::load($html, 'text/html');
$fd->registerNamespace('wt', 'urn:wt');
// do some stuff here with the nodes
return new \FluentDOM\HTML5\Serializer($return->document);
Eventually, when all the parsers are finished, I have to seemingly load it back into FluentDOM to be able to get just the content of the body tag (unless there's a way to output the content without the body wrappers and doctype, etc.?):
$fd = \FluentDOM($html, 'text/html');
$fd->registerNamespace('wt', 'urn:wt');
return $fd->find('body')->html();
But it will output the custom tag as something like:
<folder foo="bar" something="other" xmlns:wt="">blah blah</folder>
Is there any way to retain the original format of the tag?
If I compare the output of
$fd = FluentDOM::QueryCss($output, 'text/html');
die($fd);
and
die($output);
I notice that the the output differs. Now, I have not done a single selection or change, only loaded the html and echoed it. What it seems to do is try to close tags but the problem is that it does so incorrectly.
In the middle of a bit of javascript it breaks the document.
This is what the original looks like if I don't run it through FluentDOM at all
...
}).on('error', function (event, id, name, errorReason, xhrOrXdr) {
$('#restricted-fine-uploader .flashmessage-error').remove();
$('#restricted-fine-uploader').append('<div class="flashmessage flashmessage-error">' + errorReason + '<a class="close" onclick="javascript:$(\'.flashmessage-error\').remove();" >X</a></div>');
...
But if it is loaded into FluentDOM and echoed right away this changes to this
...
}).on('error', function (event, id, name, errorReason, xhrOrXdr) {
$('#restricted-fine-uploader .flashmessage-error').remove();
$('#restricted-fine-uploader').append('<div class="flashmessage flashmessage-error">' + errorReason + '<a class="close" onclick="javascript:$(\'.flashmessage-error\').remove();" >X</script>
</fieldset>
</form>
</div>');
...
The closing of the a tag is removed and a closing script tag is instead inserted and several other tags to. My gut feeling makes me think it has something to do with issues handling scripts and text strings within scripts that contain html.
Hello, I am using FluentDOM. Now I have a xml example:
<?xml version="1.0" encoding="UTF-8"?>
<a a1="xxx">
<b bid="p1">
<c>1</c>
<d>2</d>
</b>
<b bid="p2">
<c>3</c>
<c>3</c>
<c>3</c>
<d name="k1" value="v1"></d>
<d name="k2" value="v2"></d>
<e>5</e>
</b>
</a>
I want to iterate each element and get attribute 'bid'. Here is my php code:
$nodes = FluentDOM::Query($xml, 'text/xml')
->find('/a/b');
foreach ($nodes as $node) {
$elements = $node->find("./@bid");
echo count($elements);
}
It prints out '0', '0', which means there is no result found. I just want to get attribute 'bid',
so can anyone help me point it out?
Currently FluentDOM allows to use CSS Selectors if Carica/PhpCss is found. If PhpCss is not installed, but Symfony/CssSelector use this for to translate the CSS selectors to Xpath.
@f433aa41
find()
method always uses option Nodes\Fetcher::UNIQUE
I think UNIQUE
is not necessary in many situations, but may cause large amount of calculation, uses too many CPU resources.
This simple find
could lead to hundreds of XPATH evaluation.
$html->find('table#content tr');
The examples directory has really grown over the years. As has the FluentDOM API. So the directory needs a major cleanup.
Appending a string of html fails if it contains <a href with & in the url.
Error message given is:
Invalid/empty content parameter.
If i first replace the & to _ or some other character the html is appended just fine.
The query is created with.
$fd = FluentDOM::QueryCss($output, 'text/html');
The document is then extracted and passed on
$document = $fd->getDocument();
The document is then used for the actual appending
try {
FluentDOM::QueryCss($document)->find('button')->parent()->after($MY_HTML_STRING);
}
catch(\Exception $e) {
die($e->getMessage());
}
A quick test can be done with something like
FluentDOM::QueryCss($document)->find('body')->append('<a href="http://www.google.com?test1=foo&test2=bar">FooBar</a>');
which fail and
FluentDOM::QueryCss($document)->find('body')->append('<a href="http://www.google.com">FooBar</a>');
which works fine.
Can add that the Symphony css selector is used right now. Not sure if it's the same for the other once, but I guess the "issue" is not with the selector but deeper into the library.
Please consider the following code:
$first = FluentDOM::QueryCss('<input/>');
$second = FluentDOM::QueryCss('<div></div>');
The first line works as expected.
echo $second->find(':root')->append($first->document->toHtml())->document->toHtml(); // works
echo $second->find(':root')->append($second->document->toHtml())->document->toHtml(); //failes
But, the second line fails with tho following exception:
Fatal error: Uncaught InvalidArgumentException: Invalid/empty content parameter. in D:\www\www\lab\vendor\fluentdom\fluentdom\src\FluentDOM\Nodes\Builder.php:108
Stack trace:
#0 D:\www\www\lab\vendor\fluentdom\fluentdom\src\FluentDOM\Query.php(242): FluentDOM\Nodes\Builder->getContentNodes('<input>\n')
#1 D:\www\www\lab\vendor\fluentdom\fluentdom\src\FluentDOM\Query.php(273): FluentDOM\Query->apply(Array, '<input>\n', Object(Closure))
#2 D:\www\www\lab\vendor\fluentdom\fluentdom\src\FluentDOM\Query.php(814): FluentDOM\Query->applyToSpawn(Array, '<input>\n', Object(Closure))
#3 D:\www\www\lab\qp-test.php(8): FluentDOM\Query->append('<input>\n')
Is there any workaround? Is there any more concise alternative two append a QueryCss tag to another one?
Add the replaceWholeText()
method FluentDOM\Text
and FluentDOM\CdataSection
.
https://www.w3.org/TR/DOM-Level-3-Core/core.html#Text3-replaceWholeText
If the FluentDOM\Query instance is in html mode (content type) treat the provided fragment string as HTML fragments, not XML fragments.
Hi,
I have a complex system that performs different handling types on a dom document using PHP's DOMDocument, for one part I have chosen FluentDOM to handle only a special part of the document (a large element). Is it possible to load FluentDom with a DOMNode object?
Right now, we can only load a whole document but, for performance issues, I don't want to reload it again . It would be great if I could pass that special DOMNode to FluentDom.
Something like this is what we do in jquery, where "element" can be a jquery object or a DOM object:
$(element).text('foo bar');
When for example $dom->text($text)
is invoked while $text=null
, the method acts as a getter while it is intended to be a setter (following jQuery). This is also true for attr('foo',null)
and others. Sending null and sending no parameter needs to be distinguished. I think public function html($html = NULL) {}
need to be converted to public function html() {}
and the arguments fetched by func_get_args
function.
Try to support ArrayAccess in FluentDOM\Element. If the key is an integer or a string of digits the child node should be returned. If it is an string, return the attribute.
$node[42]
is $node->childNodes->item(42)
$node['id']
is$node->getAttribute('id')
A loader that loads html fragments, not adding html
and body
automatically. It might be possible to extend the HTML loader that way.
Add a loader for JSONx. This loader would convert JSONx into JsonDOM, allowing easier Xpath expressions.
<json:object>
<json:string name="ticker">IBM</json:string>
</json:object>
would be converted to:
<json:json>
<ticker>IBM</ticker>
</json:json>
If here is a loader, it would make sense to add a serializer, too. So you can save the loaded file into the original format.
Hello
as title said, how to do that?.
i try to doing something like this
echo FluentDOM($request->getResponseText())
->find('//title')
->text();
Where $request->getResponseText()
is return from curl, but its give me errors
Warning: DOMDocument::loadXML(): Entity 'eacute' not defined in Entity, line: 6033 in ..vendor\fluentdom\fluentdom\src\FluentDOM\Loader\Xml.php on line 38
Thanks :)
I do appreciate for this excellent package. May you please provide some info regarding choosing between Carica/PhpCss and Symfony/CssSelector in the docs?
Back in the day I used phpQuery for altering rendered pages just before they are send to the client. Since that project has been quite silent for a long time I decided to look for something else when I needed the same functionality again. I found a few and FluentDOM was one of them. I tested it along with the two officially supported css selectors.
I decided to start by benchmarking the same functionality in FluentDOM and phpQuery and was chocked to see that initializing that took a few ms on phpQuery took almost 200ms for FluentDOM.
$query = FluentDOM::QueryCss($html);
After a while I figured out that there was a large overhead when it tried to figure out what content it was given that I could get rid of by specifying that it was html I was feeding it with. So if I instead used
$fd = FluentDOM::QueryCss($html, 'text/html');
the time for initialization was on par with phpQuery.
So I started updating some old code that used phpQuery and everything went smoothly. But a few times I ran into a minor issue where it would complain about tag mismatching etc. I was confused since html, contrary to xml, is very loose with this stuff. But then I noticed that when working with the supplied html, internally it was handled as xml and causing these "errors" when for instance doing append operations etc.
One part of me like that it complains so I can spot any issue and fix it. But another part of me feels its a bit confusing to be able to specify html but yet have it tested against the xml rules.
Is this by design, by mistake, by bug or just a side affect of libxml and other underlying libraries used?
Implement an FluentDOM::attr property, that allows to trigger get/set attributes using array syntax.
Examples:
$fd->attr['foo'] = 'bar'; $fd->attr = array('foo' => 'bar', 'bar' => 'foo'); $value = $fd->attr['foo'];
A loader for YAML files maybe based on an existing library. It should convert it into a JsonDOM representation.
$fd = FluentDOM::load($source, 'text/yaml');
A method that duplicates/clones the current FluentDOM object, it's document, namespaces and loaders.
If no source argument is provided it will copy the references to the matches nodes, too. If a source is provided it will load it.
It would be possible to add query selectors to the extended DOM classes, but it would require a CSS selector library and I am not sure it is needed.
XPath is a lot more powerful and Query selectors do not support XML namespaces (by definition).
On the other side FluentDOM already support CSS selectors for the FluentDOM\Query class.
setAttributeNodeNS() actually behaves different from setAttributeNode(). Think about redefining the behavior or at least documenting it:
$dom = new DOMDocument();
$dom->formatOutput = TRUE;
$dom->appendChild($dom->createElement('element'));
$dom->documentElement->setAttributeNS('urn:foo', 'foo:attribute', 42);
$attribute = $dom->createAttributeNS('urn:bar', 'bar:attribute');
$attribute->value = 21;
$dom->documentElement->setAttributeNode($attribute);
echo $dom->saveXml();
$dom = new DOMDocument();
$dom->formatOutput = TRUE;
$dom->appendChild($dom->createElement('element'));
$dom->documentElement->setAttributeNS('urn:foo', 'foo:attribute', 42);
$attribute = $dom->createAttributeNS('urn:bar', 'bar:attribute');
$attribute->value = 21;
$dom->documentElement->setAttributeNodeNS($attribute);
echo $dom->saveXml();
Output
<?xml version="1.0"?>
<element xmlns:foo="urn:foo" xmlns:bar="urn:bar" bar:attribute="21"/>
<?xml version="1.0"?>
<element xmlns:foo="urn:foo" xmlns:bar="urn:bar" foo:attribute="42" bar:attribute="21"/>
Allow to register/inject fragment loaders that are used to parse string arguments for methods like FluentDOM\Query::append() depending on the content type. Allow the current loader to register itself for this, too.
DOMDocument::saveXml() (and saveHtml()) allow a node as argument. Here is a ticket in the PHP Bugtracker that suggests to allow node lists as well.
It should be possible to implement it into to FluentDOM\Document, without the PHP implementation.
Since Fluent uses DOMDocument as its HTML parser, it suffers from a limitation of DOMDocument, in that any ETAGO's contained within a SCRIPT tag will prematurely end the script block, causing your script to fail. For example, the following block:
<script type="text/template" id="tmpl-variation-template">
<div class="woocommerce-variation-description">
{{{ data.variation.variation_description }}}
</div>
</script>
will be transformed by FluentDOM into:
<script type="text/template" id="tmpl-variation-template">
<div class="woocommerce-variation-description">
{{{ data.variation.variation_description }}}
</script>
</div>
This issue is discussed in detail at the following URL's:
http://stackoverflow.com/questions/4029341/dom-parser-that-allows-html5-style-in-script-tag
https://mathiasbynens.be/notes/etago
I'm wondering if there's any way, you could extend FluentDOM\Document to work around this DOMDocument limitation and handle this properly?
PHPUnit_Framework_Testcase::getMock() is deprecated.
I can't find any information in the README where to report security vulnerabilities. Please add a section with security contact information.
Why is this piece of code 6x slower with FluentDOM 5.x compared to 4.x ? (HHVM or not)
$fd = FluentDOM($html, 'text/html');
$r = array();
foreach ($fd->find("//tr[@class='product']") as $fd_child)
{
$rr = array();
$rr['imgsrc'] = $fd_child->find("td[@class='image']//img")->attr("src");
$h3 = $fd_child->find("td[@class='specs']//h3");
$rr['url'] = $h3->find("a")->attr("href");
$rr['title'] = $h3->text();
$rr['desc'] = $fd_child->find("td[@class='specs']")->xml();
$rr['price'] = $fd_child->find("td[@class='purchase-info']//span[@itemprop='price']")->text();
$rr['savings'] = $fd_child->find("td[@class='purchase-info']//p[@class='savings']")->text();
$r[] = $rr;
}
Hi,
I'm writing a Wordpress Plugin and would like to use FluentDOM + Selectors-Symfony selector within.
I don't know much about composer so I would like to avoid using it to install the whole thing.
So I downloaded FluentDOM and Selectors-Symfony.
I extracted them like this :
[..]/lib/FluentDOM-master
[..]/lib/Selectors-Symfony-master
and I'm loading FluentDOM like this :
if (!class_exists('FluentDOM')) require_once([..]/lib/FluentDOM-master/src/FluentDOM.php');
But I'm not sure of the place where I extracted Selectors-Symfony, and I don't know how to "register" it.
Got PHP Fatal error: Interface 'FluentDOM\Node\QuerySelector' not found in...
When trying to run
$fd = FluentDOM::load($htmlstring, 'text/html');
Could you help me ?
Thanks !
Hi,
I pass nodes to different classes for special handling, I want each class to check if the correct tag type has been provided. I can check if required attributes are there with hasAttr but how do I check if the correct tag is provided in the query?
I think it is possible to get tag name by using the dom object associated with the node but that would somehow kill the purpose. How about adding a function for this?
Thanks
Implement an FluentDOMStyle::css property, that allows to get/set css style properties using array syntax.
$border = $fd->css['border']; $fd->css['border'] = 'none'; $fd->css = array('border' => 'none', 'color' => '#000');
This would be syntax sugar for the FluentDOMStyle::css method.
\\FluentDOM
$first = FluentDOM::QueryCss('<input/>');
$second = FluentDOM::QueryCss('<div></div>');
echo $second->find(':root')->append($first->find(":root"))->document->toHtml();
$jQuery
$('<div></div>).append('<input/>');
The php version can be as simple as the js version.
Import VCard 4.0 to its XML representation.
This shares a lot of logic with the iCalendar loader/format
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.