Giter VIP home page Giter VIP logo

xml-lens's Issues

How would I focus on only the instances of an element which have a particular attribute value?

This lib looks great!

But I'm a bit stuck on how to do something that I'd expected to be straight forward. Maybe I just don't know enough about optics. What I want is to select only the elements that have an attribute with a given value.

So given this example:

val xml =
  """
    |<a>
    |  <b>
    |    <c example="1">1234</c>
    |    <c example="2">5678</c>
    |    <c example="3">9123</c>
    |  </b>
    |</a>
  """.stripMargin

I'd be after only the <c> that has a example attribute with value 2. Using this expression gives me all of the <c> elements:

val c = root \ "b" \ "c"
println(pl.msitko.xml.parsing.XmlParser.parse(xml).map(c.getAll))

//prints:
//Right(List(Element(Vector(Attribute(ResolvedName(,,example),1)),List(Text(1234)),Vector()), Element(Vector(Attribute(ResolvedName(,,example),2)),List(Text(5678)),Vector()), Element(Vector(Attribute(ResolvedName(,,example),3)),List(Text(9123)),Vector())))

I've tried using having, something like this:

val c = root \ "b" \ "c" having {
  case LabeledElement(_, Element(attr, _, _)) => attr.find(_.key.localName == "example").exists(_.value == "1")
}

but that seems to only pass child elements of <c> to the partial function, which doesn't give me a chance to inspect the attributes.

Normalization of XML

At some point we will want to have reasonable output. Outside of pure formatting aspect it would be nice to e.g. try to avoid multiple namespace declarations for the same namespaces. Probably all namespace declarations should be moved to root element.

Such operations should be optional - there may be some cases when user want to avoid unneccessary transformations as want to have output as much similar to input as it's possible.

There's an example of such behavior (namely - many namespace declarations for one namespace) in test replaceOrAddAttr for ResolvedNameMatcher in OpticsBuilderSpec

Performance tests mimicking real usage

There are already some simple tests but they're very synthetic. They're useful in the sense that they allow us to easily find what the bottleneck is. Besides of them we should have tests mimicking real world usage (doing some transformations on real world XML, trying to operate on quite big files (e.g. a few MBs may be also interesting).

Would be nice to add test results to doc (probably a separated MD file not to clutter the main docs)

Equivalent to `javax.xml.stream.isCoalescing` in `XmlParser`

When replacingEntityReferences is enabled it may be observed that a few Text in row appears. In theory javax.xml.stream.isCoalescing should control this behavior but unfortunately while setting it to true solves that issue it has some unexpected side effects - namely EntityReferences are not parsed if replacingEntityReferences is set to true. It may seems that we can set isCoalescing only when replacing... is set to true but it will not work as it also causes CData not being parsed. It's described here: https://docs.oracle.com/cd/E17802_01/webservices/webservices/docs/1.5/sjsxp/ReleaseNotes.html

To avoid relying on strange behaviors of Java parsers I think xml-lens should provide coalescing functionality by itself. Either as part of parser or as post-processing (same as minimize is done)

`Element`s don't compare equal when attributes are in different order

Possibly related to #7

We find that two parsed ASTs often don't compare equal because the attributes/namespace declarations are in a different order. The order of attributes should be irrelevant - https://www.w3.org/TR/REC-xml/#sec-starttags

This seems to caused by the attributes/namespacedecs being stored in a Seq:

final case class Element(attributes: Seq[Attribute] = Seq.empty, children: Seq[Node] = Seq.empty, namespaceDeclarations: Seq[NamespaceDeclaration] = Seq.empty)

Could this be solved by using a Map? For instance:

final case class Element(attributes: Map[ResolvedName, String] = Map.empty, children: Seq[Node] = Seq.empty, namespaceDeclarations: Map[String, String] = Map.empty)

I guess this looses the use of the explict Attribute and NamespaceDeclaration types but is worth the tradeoff IMHO

Add more options to PrinterConfig

Ideas of additional options in PrinterConfig:

  • add Boolean option for repairing namespaces (i.e. automatically defining used namespaces in case they're not yet defined)
  • add option which defines how to treat multiple attributes for the same elements (namely <a attr="val1" attr="val2"></a>). Exemplary behaviors - ignore it and print all of them, flatten them by concatenating them separated by spaces, use the last value, use the first value)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.