note / xml-lens Goto Github PK
View Code? Open in Web Editor NEWXML Optics library for Scala
Home Page: https://note.github.io/xml-lens/
License: MIT License
XML Optics library for Scala
Home Page: https://note.github.io/xml-lens/
License: MIT License
This lib looks great!
But I'm a bit stuck on how to do something that I'd expected to be straight forward. Maybe I just don't know enough about optics. What I want is to select only the elements that have an attribute with a given value.
So given this example:
val xml =
"""
|<a>
| <b>
| <c example="1">1234</c>
| <c example="2">5678</c>
| <c example="3">9123</c>
| </b>
|</a>
""".stripMargin
I'd be after only the <c>
that has a example
attribute with value 2
. Using this expression gives me all of the <c>
elements:
val c = root \ "b" \ "c"
println(pl.msitko.xml.parsing.XmlParser.parse(xml).map(c.getAll))
//prints:
//Right(List(Element(Vector(Attribute(ResolvedName(,,example),1)),List(Text(1234)),Vector()), Element(Vector(Attribute(ResolvedName(,,example),2)),List(Text(5678)),Vector()), Element(Vector(Attribute(ResolvedName(,,example),3)),List(Text(9123)),Vector())))
I've tried using having
, something like this:
val c = root \ "b" \ "c" having {
case LabeledElement(_, Element(attr, _, _)) => attr.find(_.key.localName == "example").exists(_.value == "1")
}
but that seems to only pass child elements of <c>
to the partial function, which doesn't give me a chance to inspect the attributes.
Is it possible to have some kind of interoperability with XML literals?
At some point we will want to have reasonable output. Outside of pure formatting aspect it would be nice to e.g. try to avoid multiple namespace declarations for the same namespaces. Probably all namespace declarations should be moved to root element.
Such operations should be optional - there may be some cases when user want to avoid unneccessary transformations as want to have output as much similar to input as it's possible.
There's an example of such behavior (namely - many namespace declarations for one namespace) in test replaceOrAddAttr for ResolvedNameMatcher
in OpticsBuilderSpec
On the other hand not sure if any other values than (1.0 and utf-8) are in any practical use...
There are already some simple tests but they're very synthetic. They're useful in the sense that they allow us to easily find what the bottleneck is. Besides of them we should have tests mimicking real world usage (doing some transformations on real world XML, trying to operate on quite big files (e.g. a few MBs may be also interesting).
Would be nice to add test results to doc (probably a separated MD file not to clutter the main docs)
Incase you're interested:
https://gitlab.com/fommil/scalaz-deriving/tree/master/examples/xmlformat/src/main/scala/xmlformat
you might be interested in the XNode.scala
file, and from there to the encoders/decoders and the parsers/printers.
It's not obvious how equality should be implemented.
Probably related to #5
When replacingEntityReferences
is enabled it may be observed that a few Text
in row appears. In theory javax.xml.stream.isCoalescing
should control this behavior but unfortunately while setting it to true
solves that issue it has some unexpected side effects - namely EntityReferences
are not parsed if replacingEntityReferences
is set to true. It may seems that we can set isCoalescing
only when replacing...
is set to true but it will not work as it also causes CData not being parsed. It's described here: https://docs.oracle.com/cd/E17802_01/webservices/webservices/docs/1.5/sjsxp/ReleaseNotes.html
To avoid relying on strange behaviors of Java parsers I think xml-lens
should provide coalescing functionality by itself. Either as part of parser or as post-processing (same as minimize
is done)
Possibly related to #7
We find that two parsed ASTs often don't compare equal because the attributes/namespace declarations are in a different order. The order of attributes should be irrelevant - https://www.w3.org/TR/REC-xml/#sec-starttags
This seems to caused by the attributes/namespacedecs being stored in a Seq
:
final case class Element(attributes: Seq[Attribute] = Seq.empty, children: Seq[Node] = Seq.empty, namespaceDeclarations: Seq[NamespaceDeclaration] = Seq.empty)
Could this be solved by using a Map
? For instance:
final case class Element(attributes: Map[ResolvedName, String] = Map.empty, children: Seq[Node] = Seq.empty, namespaceDeclarations: Map[String, String] = Map.empty)
I guess this looses the use of the explict Attribute
and NamespaceDeclaration
types but is worth the tradeoff IMHO
Entity references, PCData among the others are missing.
Ideas of additional options in PrinterConfig
:
<a attr="val1" attr="val2"></a>
). Exemplary behaviors - ignore it and print all of them, flatten them by concatenating them separated by spaces, use the last value, use the first value)Add appropriate package private modifiers etc.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.