Giter VIP home page Giter VIP logo

xmlr's Introduction

xmlr

XML dom package for R implemented using Reference Classes

The jdom project (www.jdom.org) provided a lot of initial inspiration for the api but there are several differences, mainly

  • Attributes: Attribute is not a specific class but just a name value element of the named list property of an Element
  • Namespaces: Whereas in jdom namespaces is a special class that exists as a specific object attribute, xmlr takes a "simpler" approach in the sense that namespace declarations are just another element attribute and name space prefixes are part of the element name. This might change in the future but for now this is how it is done.

You can create and xmlr Document programmatically or by parsing text or a file.

Creating the DOM programmatically

To create the following xml

<table xmlns='http://www.w3.org/TR/html4/'>
    <tr>
        <td>Apples</td>
        <td>Bananas</td>
    </tr>
</table>

You could do something like this:

 doc <- Document$new()
  root <- Element$new("table")
  root$setAttribute("xmlns", "http://www.w3.org/TR/html4/")

  root$addContent(
    Element$new("tr")
      $addContent(Element$new("td")$setText("Apples"))
      $addContent(Element$new("td")$setText("Bananas"))
  )
  doc$setRootElement(root)

Or you could do like this:

doc2 <- parse.xmlstring("
<table xmlns='http://www.w3.org/TR/html4/'>
    <tr>
        <td>Apples</td>
        <td>Bananas</td>
    </tr>
</table>")

Note that there is no pretty print available (yet) so if you print it print(doc2) it will look like this:

> print(doc2)
<table xmlns='http://www.w3.org/TR/html4/'><tr><td>Apples</td><td>Bananas</td></tr></table>
> 

using xmlr

xmlr is published both to CRAN for use in GNU R and to Maven Central for use in Renjin. For cran it is as simple as install.packages('xmlr') followed by library('xmlr'). For renjin add the following to your pom.xml:

<dependency>
  <groupId>se.alipsa</groupId>
  <artifactId>xmlr</artifactId>
  <version>0.2.1</version>
</dependency>

...and then library('se.alipsa:xmlr')

Limitations

Processing instructions, custom entity references, and comments are not yet supported.

Any proper xml including CDATA, comments, processing instructions etc. can be parsed though, it is just that only elements, attributes and text will be retained.

CDATA can be read from string or file but handled as ordinary text after that. I.e. the output might not be valid XML.

There are probably issues (memory, performance) with very large XML trees.

Why xmlr?

I had problems on one of my machines to install the XML package (some gcc issue) so needed an alternative. As I have been thinking about doing something more comprehensive with Reference Classes I decided to create a pure base-R DOM model with some simple ways to do input and output to strings and files. As it turned out to be useful to me, I thought it might be for others as well, so I decided to open source and publish it.

xmlr's People

Contributors

pernyfelt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

xmlr's Issues

Pretty printing

{htmltidy} — https://github.com/hrbrmstr/htmltidy — has a widget-based pretty printer.

The underlying libtidy — http://api.html-tidy.org/tidy/quickref_next.html#PrettyPrintHeader — it uses can pretty-print/reformat XML for use in outputting in text mode.

I could either add explicit textual pretty print processing functions in it that you could then use (it's on CRAN, and cld be made a Suggests with a namespace check for pretty printing if also loaded to avoid a hard dependency) or PR into here with libtidy code and just the text pretty printing (unless you also want the widget).

For console-output, doing checks for "being in RStudio" shld be "a thing" since one can still DoS RStudio with too much text output (yay using a browser DOM for text rendering).

Give it a ponder and lemme know.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.