Giter VIP home page Giter VIP logo

xsltjson's Introduction

XSLTJSON: Transforming XML to JSON using XSLT

XSLTJSON is an XSLT 2.0 stylesheet to transform arbitrary XML to JavaScript Object Notation (JSON). JSON is a lightweight data-interchange format based on a subset of the JavaScript language, and often offered as an alternative to XML in—for example—web services. To make life easier XSLTJSON allows you to transform XML to JSON automatically.

XSLTJSON supports several different JSON output formats, from a compact output format to support for the BadgerFish convention, which allows round-trips between XML and JSON. To make things even better, it is completely free and open-source. If you do not have an XSLT 2.0 processor, you can use XSLTJSON Lite, which is an XSLT 1.0 stylesheet to transforms XML to the JSONML format.

Usage

There are three options in using XSLTJSON. You can call the stylesheet from the command line, programmatically, or import it in your own stylesheets.

The stylesheet example below would transform any node matching my-node to JSON. If you import XSLTJSON in your stylesheet, you have to add the JSON namespace xmlns:json="http://json.org/" to your stylesheet because all functions and templates are in that namespace. The json:generate() function takes a XML node as input, generates a JSON representation of that node and returns it as an xs:string. This is the only function you should call from your stylesheet.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:json="http://json.org/">
    <xsl:import href="xml-to-json.xsl"/>
    <xsl:template match="my-node">
        <xsl:value-of select="json:generate(.)"/>
    </xsl:template>
</xsl:stylesheet>

If your stylesheet's sole purpose is to transform XML to JSON, it would be easier to use the xml-to-json.xsl stylesheet directly from the command line. The following line shows how to do that using Java and Saxon.

java net.sf.saxon.Transform source.xml xml-to-json.xsl

You can also call the stylesheet programmatically, but this depends heavily on your programming environment, so please consult the documentation of your programming language or XSLT processor.

Parameters

There are five Boolean parameters to control the stylesheet, and all are turned off by default (set to false().) You can control them from the command line, from your program or from another stylesheet. Four of the parameters are used to control the output format and are discussed in more detail in the section on output formats.

  • use-badgerfish — Use the BadgerFish convention to output JSON without XML namespaces.
  • use-rabbitfish — Output basic JSON with an @ to mark XML attributes.
  • use-rayfish — Use the Rayfish convention to output JSON without XML namespaces.
  • use-namespaces — Output XML namespaces according to the BadgerFish convention.
  • debug — Enable or disable the output of the temporary XML tree used to generate JSON. Note that turning this on invalidates the JSON output.
  • jsonp — Enable JSONP; prepend the JSON output with the given string. Defaults to an empty string.
  • skip-root — Enable or disable skipping the root element and returning only the child elements of the root. Disabled by default.

For example; to transform source.xml to BadgerFish JSON with Saxon, you would invoke the following on the command line:

java net.sf.saxon.Transform source.xml xml-to-json.xsl use-badgerfish=true()

For other options consult the Saxon manual, or your XSLT processor's documentation.

If you import the stylesheet in your own stylesheet you can override the default parameters by redefining them. So if you want to output JSON using the BadgerFish convention, you should add the following parameter definition to your stylesheet.

    <xsl:param name="use-badgerfish" as="xs:boolean" select="true()"/>

You can force the creation of an array by adding the force-array parameter to your XML. So instead of creating two nested objects, the following example will create an object containing an array.

<list json:force-array="true" xmlns:json="http://json.org/">
  <item>one</item>
</list>

{list: {item: ['one']}}

The force-array attribute will not be copied to the output JSON .

Output formats

There are four output formats in XSLTJSON, which one to use depends on your target application. If you want the most compact JSON, use the basic output. If you want to transform XML to JSON and JSON back to XML, use the BadgerFish output. If you want something in between, you could use the RabbitFish output; which is similar to the basic version, but does distinguish between elements and attributes. If you're dealing with a lot of data centric XML, you could use the highly structured Rayfish output. All four output formats ignore XML namespaces unless the use-namespaces parameter is set to true(), in which case namespaces are created according to the BadgerFish convention.

Each format has a list of rules by which XML is transformed to JSON. The examples for these rules are all but one taken from the BadgerFish convention website to make comparing them easier.

Basic output (default)

The purpose of the basic output is to generate the most compact JSON possible. This is useful if you do not require round-trips between XML and JSON or if you need to send a large amount of data over a network. It borrows the $ syntax for text elements from the BadgerFish convention but attempts to avoid needless text-only JSON properties. It also does not distinguish between elements and attributes. The rules are:

  • Element names become object properties.

  • Text content of elements goes directly in the value of an object.

     <alice>bob</alice>
    

    becomes

     { "alice": "bob" }
    
  • Nested elements become nested properties.

     <alice><bob>charlie</bob><david>edgar</david></alice>
    

    becomes

     { "alice": { "bob": "charlie", "david": "edgar" } }
    
  • Multiple elements with the same name and at the same level become array elements.

    <alice><bob>charlie</bob><bob>david</bob></alice>
    

    becomes

    { "alice": { "bob": [ "charlie", "david" ] } }
    
  • Mixed content (element and text nodes) at the same level become array elements.

    <alice>bob<charlie>david</charlie>edgar</alice>
    

    becomes

    { "alice": [ "bob", { "charlie": "david" }, "edgar" ] }
    
  • Attributes go in properties.

    <alice charlie="david">bob</alice>
    

    becomes

    { "alice": { "charlie": "david", "$": "bob" } }
    

BadgerFish convention (use-badgerfish)

The BadgerFish convention was invented by David Sklar ; more detailed information can be found on his BadgerFish website. I have taken some liberties in supporting BadgerFish, for example the treatment of mixed content nodes (nodes with both text and element nodes as children) which was not covered in the convention (except for a mention in the to-do list) but is supported by XSLTJSON. The other change is that namespaces are optional instead of mandatory (which is also mentioned in the to-do list.) The rules are:

  • Element names become object properties.

  • Text content of elements goes in the $ property of an object.

    <alice>bob</alice>
    

    becomes

    { "alice": { "$": "bob" } }
    
  • Nested elements become nested properties.

    <alice><bob>charlie</bob><david>edgar</david></alice>
    

    becomes

    { "alice": {"bob": { "$": "charlie" }, "david": { "$": "edgar" } } }
    
  • Multiple elements with the same name and at the same level become array elements.

    <alice><bob>charlie</bob><bob>david</bob></alice>
    

    becomes

    { "alice": { "bob": [ { "$": "charlie" }, { "$": "david" } ] } }
    
  • Mixed content (element and text nodes) at the same level become array elements.

    <alice>bob<charlie>david</charlie>edgar</alice>
    

    becomes

    { "alice": [ { "$": "bob" }, { "charlie": { "$": "david" } }, { "$": "edgar" } ] }
    
  • Attributes go in properties whose name begin with @ .

    <alice charlie="david">bob</alice>
    

    becomes

    { "alice": { "@charlie": "david", "$": "bob" } }
    

RabbitFish (use-rabbitfish)

RabbitFish is identical to the basic output format except that it uses Rule 6 “Attributes go in properties whose name begin with @” from the BadgerFish convention in order to distinguish between elements and attributes.

Rayfish (use-rayfish)

The Rayfish convention was invented by Micheal Matthew and aims to create highly structured JSON which is easy to parse and extract information from due to its regularity. This makes it an excellent choice for data centric XML documents. The downside is that it does not support mixed content (elements and text nodes at the same level) and is slightly more verbose than the other output formats. The rules are:

  • Elements are transformed into an object with three properties: #name, #text and #children. The name property contains the name of the element, the text property contains the text contents of the element and the children property contains an array of the child elements.

    <alice/>
    

    becomes

    { "#name": "alice", "#text": null, "#children": [ ] }
    
  • Nested elements become members of the #children property of the parent element.

    <alice><bob>charlie</bob><david>edgar</david></alice>
    

    becomes

    { "#name": "alice", "#text": null, "#children": [ 
        { "#name": "bob", "#text": "charlie", "#children": [ ] }, 
        { "#name": "david", "#text": "edgar", "#children": [ ] }
    ]}
    
  • Attributes go into an object in the #children property and begin with @ .

    <alice charlie="david">bob</alice>
    

    becomes

    { "#name": "alice", "#text": "bob", "#children": [ 
        { "#name": "@charlie", 
          "#text": "david", 
          "#children": [ ] 
        }
    ]}
    

Namespaces (use-namespaces)

When turned on, namespaces are created according to the BadgerFish convention. In basic output, the @ is left out of the property name.

XSLTJSON Lite (XSLT 1.0 compatible)

The XSLTJSON Lite stylesheet transforms arbitrary XML to the JSONML format. It is written in XSLT 1.0, so it is compatible with all XSLT 1.0 and 2.0 processors, as well as the XSLT processor built into most modern browsers (for client-side transformations.) The stylesheet doesn't take any parameters and has no configurable options. Use it like you would use any XSLT stylesheet.

Requirements

XSLTJSON requires an XSLT 2.0 processor. An excellent option is Saxon, which was used to test and develop XSLTJSON.

XSLT 2.0?

Don't have an XSLT 2.0 processor? Check out Micheal Matthew's Rayfish project, xml2json, or a modified xml2json version by Martynas Jusevičius. You can also use XSLTJSON Lite to transform XML to JSONML.

License

XSLTJSON is licensed under the new BSD License (see the header comment.)

Credits

Thanks to: Chick Markley (Octal number & numbers with terminating period fix), Torben Schreiter (Suggestions for skip root, and newline entities bug fix), Michael Nilsson (Bug report & text cases for json:force-array), Rick Brown (bug report and fix for numbers starting with '+' symbol).

xsltjson's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xsltjson's Issues

Doesnt handle arrays correctly

given this xml:

<thing>
<commonthing>
    <subthing>party</subthing>
</commonthing>

<someotherthing>
    <test>yup</test>
</someotherthing>

<commonthing>
    <subthing>party too</subthing>
</commonthing>

<someotherthing>
    <test>yup too</test>
</someotherthing>

</thing>

will return this json:

{
    "thing": {
        "commonthing": {
         "subthing": "party too"
        }, 
        "someotherthing": {
            "test": "yup too"
     }
 }
}

I would expect something like this:

{
    "thing": {
        "commonthing": [
            {
                "subthing": "party"
            }, 
            {
                "subthing": "party too"
            }
        ], 
        "someotherthing": [
            {
                "test": "yup"
            }, 
            {
                "test": "yup too"
            }
        ]
    }
}

this only seems to be an issue when the common elements are not directly adjacent to each other.

XML2Json convert string to number

Hi everyone,

I am using xml2json transform file.

My xml element:

<id>9999</id>

But after transform xml2json:

"id": 9999,

I want to this;

"id": "9999"

Because target source waiting to me this field string,but this file convert auto number my fields.

Please help

Badly formatted numeric string not being enclosed in quotes in JSON output

The following XML segment creates an invalid JSON output:
(See the very first line, the use of 1. and then a \n )

                <text id="12">1.
                    <variable-string id="13">expression="vehicle"&gt;VEHICLE?</variable-string>
                    (vehicle)
                    <image id="14">
                        <size>
                            <width>150</width>
                            <height>150</height>
                        </size>
                        <description>Your vehicle</description>
                        <url>
                            <variable-string expression="vehicleUrl">http://t0.gstatic.com/images?q=tbn:ANd9GcRMnKChEmsa3UKl7ZGAY_aK9Qjo9ttVYBI59nYebm8miKiHBp11cw</variable-string>
                        </url>
                    </image>
                </text>

The output will not wrap the '1.' inside quote and therefore fails JSON validity checks.

The fix is simple. In the XSLT, when detecting string types, just do a normalize-space. In this case I am doing normalize-space for all non-string detection types, just in case the XSLT parsers like to consider 'true\n' to be a boolean.

            <xsl:choose>
                <xsl:when test="(string(number(normalize-space(.))) = 'NaN' or ends-with(normalize-space(.), '.') or starts-with(normalize-space(.),'0') and not(normalize-space(.) eq '0')) and not(normalize-space(.) = 'false') and not(normalize-space(.) = 'true') and not(normalize-space(.) = 'null')">
                    <xsl:text/>"<xsl:value-of select="json:encode-string(.)"/>"<xsl:text/>
                </xsl:when>
                <xsl:otherwise>
                    <xsl:text/><xsl:value-of select="."/><xsl:text/>
                </xsl:otherwise>
            </xsl:choose>

FYI I am using EditiX 2010 Free edition as a test bed. I am not sure if type detection is buggy through this engine. It is worth a try to roll this test case in your Saxon version.

Thanks!
--Wilson Cheung

Signed +ve numbers (and international telephone numbers)

Hi Bram

First thing let me say thanks for this first rate tool, love your work.

I have had to patch my local copy because it creates invalid JSON when it encounters signed positive numbers (decimals and integers). The XML below demonstrates three variations of this:

<?xml version="1.0" encoding="UTF-8"?>
<numbertest>
    <positiveInt>+100</positiveInt>
    <positiveDec>+100.00</positiveDec>
    <sms>+61262641111</sms>
</numbertest>

JSON does not allow numbers to start with a '+', so there are two solutions:

  1. Treat these numbers as strings.
  2. Strip off the '+' which would work for numbers but ruin telephone numbers

I think option 1 is safest and in keeping with the way you have treated similar situations.

This is how I have patched the XSL:

<xsl:choose>
  <!--
      A value is considered a string if the following conditions are met:
       * There is whitespace/formatting around the value of the node.
       * The value is not a valid JSON number (i.e. '01', '+1', '1.', and '.5' are not valid JSON numbers.)
       * The value does not equal the any of the following strings: 'false', 'true', 'null'.
  -->
  <xsl:when test="normalize-space(.) ne . or not((string(.) castable as xs:integer  and not(starts-with(string(.),'+')) and not(starts-with(string(.),'0') and not(. = '0'))) or (string(.) castable as xs:decimal  and not(starts-with(string(.),'+')) and not(starts-with(.,'-.')) and not(starts-with(.,'.')) and not(starts-with(.,'-0') and not(starts-with(.,'-0.'))) and not(ends-with(.,'.')) and not(starts-with(.,'0') and not(starts-with(.,'0.'))) )) and not(. = 'false') and not(. = 'true') and not(. = 'null')">
    <xsl:text/>"<xsl:value-of select="json:encode-string(.)"/>"<xsl:text/>
  </xsl:when>

Json To Xml supported?

Hello,

This is not exactly an issue it's more like a question. But here it goes.

I found your library on the internet today and it's really useful to convert Xml to Json. Works pretty well. Thanks for that.

Now i'm trying to do the opposite, convert Json to Xml. Does your implementation also support this? Any idea on how to do it without any Java code?

Thanks in advance,
Ângelo Costa

Feature Request: Maintain cardinal order with # array for all children nodes

Hello Bram,

We have been using your XSLT with great success, however, we have a requirement to maintain cardinal ordering for all nodes that are potentially repeatable, for example:

<scenario id="9">
    <paragraph id="10">
        <text id="11">This is a paragraph in the scenario. The scenario should start
            a few lines below the title of the problem. This paragraph should
            occur at the very top of the scenario.
        </text>
    </paragraph>
    <paragraph id="12">
        <text id="13">This is another paragraph in the scenario. It should be separated
            from the above paragraph by a gutter whose height is the equivalent
            of one line of text. Since this is the last paragraph in the scenario,
            the scenario block should end directly below it.
        </text>
    </paragraph>
    <exhibit id="117">
        <image ascent="0" description="Map of USA" height="352" id="118"
               url="http://geology.com/world/the-united-states-of-america-map.gif" width="550"/>
    </exhibit>
    <paragraph id="119">
        <text id="120">The above image is in an "exhibit" element. It should be separated
            from the content blocks above
            <image ascent="0" description="Map of USA" height="352" id="118"
                   url="http://geology.com/world/the-united-states-of-america-map.gif" width="550"/>
            and below by a one-line gutter, just
            as paragraphs are separated
            <image ascent="0" description="Map of USA" height="352" id="118"
                   url="http://geology.com/world/the-united-states-of-america-map.gif" width="550"/>
            from one another. It should also be centered.
        </text>
    </paragraph>
</scenario>

I would like to make a suggestion of creating a "#" children array for nodes with more than 1 child, which the intent to guarantee cardinal ordering:

NOTE: This is a sample JSON that I crafted manually:

{
    "scenario": {
        "@id": 9,
        "#": [
            {
                "paragraph": {
                    "@id": 10,
                    "text": {
                        "@id": 11,
                        "$": "This is a paragraph in the scenario. The scenario should start\n            a few lines below the title of the problem. This paragraph should\n            occur at the very top of the scenario.\n        "
                    }
                }
            },
            {
                "paragraph": {
                    "@id": 12,
                    "text": {
                        "@id": 13,
                        "$": "This is another paragraph in the scenario. It should be separated\n            from the above paragraph by a gutter whose height is the equivalent\n            of one line of text. Since this is the last paragraph in the scenario,\n            the scenario block should end directly below it.\n        "
                    }
                }
            },
            {
                "paragraph": {
                    "@id": 119,
                    "text": {
                        "@id": 120,
                        "$": [
                            {
                                "$": "The above image is in an \"exhibit\" element. It should be separated\n            from the content blocks above\n            "
                            },
                            {
                                "image": {
                                    "@ascent": 0,
                                    "@description": "Map of USA",
                                    "@height": 352,
                                    "@id": 118,
                                    "@url": "http://geology.com/world/the-united-states-of-america-map.gif",
                                    "@width": 550
                                }
                            },
                            {
                                "$": "\n            and below by a one-line gutter, just\n            as paragraphs are separated\n            "
                            },
                            {
                                "image": {
                                    "@ascent": 0,
                                    "@description": "Map of USA",
                                    "@height": 352,
                                    "@id": 118,
                                    "@url": "http://geology.com/world/the-united-states-of-america-map.gif",
                                    "@width": 550
                                }
                            },
                            {
                                "$": "\n            from one another. It should also be centered.\n        "
                            }
                        ]
                    }
                }
            },
            {
                "exhibit": {
                    "@id": 117,
                    "image": {
                        "@ascent": 0,
                        "@description": "Map of USA",
                        "@height": 352,
                        "@id": 118,
                        "@url": "http://geology.com/world/the-united-states-of-america-map.gif",
                        "@width": 550
                    }
                }
            }
        ]
    }
}

I have tried rayfish which does some of this but it doesn't do the mixed text/element like in badgerfish. So I would like to make the above suggestion.

It is likely that we will have someone in-house to modify the XSLT, so we might be able to contribute to the project if you are booked.

Thanks again, please let me know what you think about this.
--Wilson Cheung

Force string

It would be helpful with some sort of force-string attribute.

The purpose of this would be when you want your output to be string, but the XML sometimes contains plain numbers - other times not.

Perhaps with the option to specify attributes that needs to be forced in such a way.

Output {} instead of null for self-closing element

This isn't a bug as such, but as there's no discussion functionality in github...

If I run your xslt over this:

<outer>
    <inner />
</outer>

I get this:

"outer" : {
    "inner" : null
}

But in my case what I need is this:

"outer" : {
    "inner" : { }
}

Is there any xml construct that would cause it to output the second? For the moment I've changed your xslt (see tstibbs@884fc32), but have I missed something?

comments stops parsing text string in default case

If you use the following input:

<root>\"
<!-- &#x8; -->
<!-- &#xc; -->
&#xd;
&#x9;
</root>

and run the following:

java -cp lib/saxon/saxon9.jar net.sf.saxon.Transform t.xml conf/xml-to-json.xsl

You'll get:

{
    "root": [
        "\\\"\n",
        {
            "": null
        },
        {
            "": null
        }
    ]
}

where as if you move the comments to after the character entities, you'll get:

    "root": [
        "\\\"\n\r\n\t\n",
        {
            "": null
        },
        {
            "": null
        }
    ]
}

With the CR and HT both properly in the output.

About Same level same name become array elements Problems

Dear,

<alice><bob>charlie</bob><bob>david</bob></alice>

becomes
{ "alice": { "bob": [ "charlie", "david" ] } }

But When I Use like:
<alice><bob>charlie</bob><a>test</a><bob>david</bob></alice>

Why can becomes
{ "alice": { "bob": [ "charlie", "david" ]
,"a": "test" }}

I Get
{ "alice": { "bob": "charlie"
, "a": "test"
, "bob": "david" } }

Please Help.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.