Giter VIP home page Giter VIP logo

txml's Introduction

tXml

A very small and probably the fastest xml parser in pure javascript.

This lib is easy to use: txml.parse(xml);.

  1. this code is about 255 lines, can be easily extended.
  2. this code is 1.6kb minified + gzipped.
  3. this code is 5 - 10 times faster than sax/xml2js and still 2-3 times faster than fast-xml-parser
  4. this code can running in a worker.
  5. this code is parsing at average the same speed as native DOMParser + potential to be faster.
  6. this code is easy to read and good for study.
  7. this code creates a domObject with minimal footprint, that is easy to traverse.
  8. this code has proven in different projects, like RSS reader, openStreetMap, websites.
  9. this code can even parse handwritten XML that contains various errors.
  10. this code is working in client and server.
  11. this code is 100% covered by unit tests.
  12. this.code is extreme small, perfect for browser, node, cloud function, edge.

so, there are good reasons to give tXml.js a try.

XML - features

  1. tags
  2. childTags
  3. text-nodes
  4. white-spaces
  5. attributes with single and double quotes
  6. attributes without value
  7. xmlComments (ignored or keep)
  8. CDATA
  9. embedded CSS and Javascript
  10. HTML singleTag elements br, img, link, meta, hr (configurable)
  11. doctype definitions
  12. xml namespaces
  13. sync API for a sync process
  14. getElementsById/Class direct on the xmlString
  15. simplify, similar to PHP's SimpleXML
  16. simplifyLostLess
  17. filter, similar to underscore, as a alternative to CSS selectors
  18. monomorphism for fast processing and fewer if statements (a node always has tagName:'', attributes:{} and children:[])
  19. streamSupport ! ! !
  20. process stream with for await loop

Try Online

Try without installing online: https://tnickel.de/2017/04/02/txml-online

new in version 4

  • improved support for CDATA
  • option to keep comments
  • comment support for transformStream (comments inside elements are working, but not top level)
  • allow options for transformStream
  • export parser function only as txml, it will be the cleanest in all environments and let you use txml.parse(xml) where xml is the string.
  • remove .parseStream in favor of transformStream
  • more stable auto generated typescript definitions.

Installation

In browser you load it how ever you want. For example as tag: <script src="dist/txml.min.js"></script>.

In node and browserify, run "npm install txml" in your project and then in your script you require it by const txml = require('txml'); or in typescript import * as txml from 'txml';.

For specially small builds using modern module bundlers like rollup or webpack you can import txml/txml or txml/dist/txml. This will not add the transformStream into the bundle and with that exclude the Node.js files.

Methods

txml.parse (xmlString, options)

  1. xmlString is the XML to parse.
  2. options is optional
    • searchId an ID of some object. that can be queried. Using this is incredible fast.
    • filter a method, to filter for interesting nodes, use it like Array.filter.
    • simplify to simplify the object, to an easier access.
    • pos where to start parsing.
    • keepComments if you want to keep comments in your data (kept as string including <!-- -->) (default false)
    • keepWhitespace keep whitespace like spaces, tabs and line breaks as string content (default false)
    • noChildNodes array of nodes, that have no children and don't need to be closed. Default is working good for html. For example when parsing rss, the link tag is used to really provide an URL that the user can open. In html however a link text is used to bind css or other resource into the document. In HTML it does not need to get closed. so by default the noChildNodes contains the tagName 'link'. Same as 'img', 'br', 'input', 'meta', 'link'. That means: when parsing rss, it makes to set noChildNodes to [], an empty array.
txml.parse(`<user is='great'>
    <name>Tobias</name>
    <familyName>Nickel</familyName>
    <profession>Software Developer</profession>
    <location>Shanghai / China</location>
</user>`);
// will return an object like: 
[{
    "tagName": "user",
    "attributes": {
        "is": "great"
    },
    "children": [{
            "tagName": "name",
            "attributes": {},
            "children": [ "Tobias" ]
        }, {
            "tagName": "familyName",
            "attributes": {},
            "children": [ "Nickel" ]
        }, {
            "tagName": "profession",
            "attributes": {},
            "children": [ "Software Developer" ]
        }, {
            "tagName": "location",
            "attributes": {},
            "children": [ "Shanghai / China" ]
        }
    ]
}];  

txml.simplify (tXml_DOM_Object)

Same purpose of simplify, to make the data easier accessible. It is modeled after PHP s simplexml. You can quickly access properties. However, some attributes might be lost. Also some string values can be lost. For details see Issue 19. This method is used with the simplify parsing option.

  1. tXml_DOM_Object the object to simplify.
txml.simplify(txml.parse(`<user is='great'>
    <name>Tobias</name>
    <familyName>Nickel</familyName>
    <profession>Software Developer</profession>
    <location>Shanghai / China</location>
</user>`));
// will return an object like: 
{
    "user": {
        "name": "Tobias",
        "familyName": "Nickel",
        "profession": "Software Developer",
        "location": "Shanghai / China",
        "_attributes": {
            "is": "great"
        }
    }
}

txml.simplifyLostLess (tXml_DOM_Object)

This version is not the same as in PHP simple_xml. But therefor, you do not lose any information. If there are attributes, you get an _attribute property, even if there is only one of a kind, it will be an array with one item, for consistent code.

txml.filter (tXml_DOM_Object, f)

This method is used with the filter parameter, it is used like Array.filter. But it will traverse the entire deep tree.

  1. tXml_DOM_Object the object to filter.
  2. f a function that returns true if you want this elements in the result set.
const dom = txml.parse(`
<html>
    <head>
        <style>
            p { color: "red" }
        </style>
    </head>
    <body>
        <p>hello</p>
    </body>
</html>`);
const styleElement = data.filter(dom, node=>node.tagName.toLowerCase() === 'style')[0];

txml.getElementById (xml, id)

To find an element by ID. If you are only interested for the information on, a specific node, this is easy and fast, because not the entire xml text need to get parsed, but only the small section you are interested in.

  1. xml the xml string to search in.
  2. id the id of the element to find returns return one node

txml.getElementsByClassName (xml, className)

Find the elements with the given class, without parsing the entire xml into a tDOM. So it is very fast and convenient. returns a list of elements.

  1. xml the xml string to search in.
  2. className the className of the element to find

txml.transformStream (offset, parseOptions?)

  1. offset optional you to set short before the first item. usually files begin with something like "" so the offset need to be before the first item starts so that between that item and the offset is no "<" character. alternatively, pass a string, containing this preamble.
  2. options optional, similar to the parse methods options. return transformStream.
const xmlStream = fs.createReadStream('your.xml')
  .pipe(txml.transformStream());
for await(let element of xmlStream) {
  // your logic here ...
}

The transform stream is great, because when your logic within the processing loop is slow, the file read stream will also run slower, and not fill up the RAM memory. For a more detailed explanation read here

Changelog

  • version 5.1.0
    • export ./* in package.json to allow older bundlers to import sub path directly. import { parse } from 'txml/dist/txml.mjs';
  • version 5.0.1
    • fix simplify empty objects (issue #24)
  • version 5.0.0
    • improved handling of whitespace (issue #21)
    • automated build with rollup (PR #23)
  • version 4.0.1
    • fixed children type definition not to include number (issue #20)
    • add hr to self closing tags
    • new parser option keepWhitespace (issue #21)

Developer

Tobias Nickel

Tobias Nickel German software developer in Shanghai.

txml's People

Contributors

dasdaniel avatar dependabot[bot] avatar manzt avatar tobiasnickel avatar trusktr avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

txml's Issues

Parse special entinties

With DomParser, this attributes with special attributes like &gt;, &lt; are replaced with the corresponding character, will be possible txml do the same?

Unexpected close tag

const str = '<?xml version="1.0" encoding="UTF-8" ?>\n' +
'<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">\n' +
'<fields />\n' +
'<add>\n' +
'<link page="0" rect="556.5,684.5,563,691" color="#000000" flags="print" name="f54cfadd-dc5a-126e-9512-e0812552a6a3" subject="Annotation" date="D:20220223101227+01\'00\'" width="0" style="solid"><OnActivation><Action Trigger="U"><URI Name="https://google.com"/></Action></OnActivation></link>\n' +
'<link page="0" rect="165.5,483,176,492" color="#000000" flags="print" name="4fc2499f-e94f-a45c-bfee-0cd68357f219" subject="Annotation" date="D:20220223101227+01\'00\'" width="0" style="solid"><OnActivation><Action Trigger="U"><URI Name="https://google.com"/></Action></OnActivation></link>\n' +
'<highlight page="0" rect="117.0469,461.6675,317.36563304933884,469.7305" color="#00CC63" flags="print" name="d5041f3b-57b9-5f66-8a33-15bc7b41b426" title="1" subject="Highlight" date="D:20220223101241+01\'00\'" creationdate="D:20220223101241+01\'00\'" coords="117.0469,469.7305,317.36563304933884,469.7305,117.0469,461.6675,317.36563304933884,461.6675"><contents>HI-does anyone know what the new program they are</contents></highlight>\n' +
'</add>\n' +
'<modify />\n' +
'<delete />\n' +
'</xfdf>';

const parsed= txml.parse(str);
Uncaught (in promise) Error: Unexpected close tag
Line: 4
Column: 289
Char: >
    at parseChildren (txml.mjs?a401:82:1)
    at parseNode (txml.mjs?a401:239:1)
    at parseChildren (txml.mjs?a401:138:1)
    at parseNode (txml.mjs?a401:239:1)
    at parseChildren (txml.mjs?a401:138:1)
    at parseNode (txml.mjs?a401:239:1)
    at parseChildren (txml.mjs?a401:138:1)
    at Module.parse (txml.mjs?a401:292:1)
    at eval (PdfPreview.vue?05a7:292:1)

Backwards compatibility with popular libraries

The value of this library increases if it can be configured to build the resulting object similar to popular projects like rapidx2j, nkit, x2je, xml2json and xml2js. It would instantly become a drop-in replacement for each of those libraries used in other node projects.

In order to do that, you'd need to:

  1. add nodes with their tagName as keys to an object rather than values to a property;
  2. group attributes to a key with a custom name, in this case "@";
  3. only turn the object of the tagName key into an array if the tagName occurs more than once.

Would you be interested in adding the option(s) to do so?

Example: In stead of this:

[
  "children": [
    {
      "tagName": "child1",
      "attributes": {},
      "children": [
        {
          "tagName": "child2",
          "attributes": {},
          "children": [
            {
              "tagName": "child3",
              "attributes": {},
              "children": [
                "tagName": "child4",
                // ...

We need to end up with this:

{
  "child1": {
    "@": {},
    "child2": {
      "@": {},
      "child3": {
        "@": {},
        "child4": [
          // ...

In response to Redsandro/node-xml2js-olympics#1

transformStream doesn't seem to be working correctly

Hello 👋

I went looking for a fast xml library for a particular use case I have and boy did I ever find one! Amazing work Tobias!
I can use streams with it too?! Superb!

Unfortunately, using the code from the readme doesn't seem to work for me :(. I'm simply told xmlStream is not async iterable
I'm on node v18.11.0.

const xmlStream = fs
    .createReadStream("./my-file.xml")
    .pipe(txml.transformStream());

  for await (let element of xmlStream) {
    // your logic here ...
  }

I tried attaching event handles like

const xmlStream = fs
    .createReadStream("./my-file.xml")
    .pipe(txml.transformStream());

xmlStream.on('data', (data) => {console.log(data});

And that didn't work, it just hangs, but if I remove the pipe I see the chunks coming in from the readStream

const xmlStream = fs
    .createReadStream("./my-file.xml");

xmlStream.on('data', (data) => {console.log(data.toString())});

Not sure what else I can try from my side.

office document whitespace

see #21 my comment.
another thing: in addition to tagName, can we add prefix and localName field to element?
also can we don't create attributes/children if there is nothing there?

filter and getElementById not working as expected

the parse options and methods filter and getElementByID are somehow not working as I understand them.
Is there some bug or maybe lacking docs?

I expect the same result for a and b and furthermore for c and d, right?

const a = txml.parse(rawXML, {
  filter: (dom) => dom.attributes.webdavID == id,
});
const b = txml.parse(rawXML).filter((dom) => dom.attributes.webdavID == id)
// expect a==b
// result: a is node with id and b == []

const c = txml.parse(rawXML, { searchId: 'N10005' });
const d = txml.getElementById(rawXML, 'N10005');
// expect c==d
// result: c is whole xml and d is the node with id

Using the library with Sveltekit - Cannot read properties of undefined (reading 'from')

I am just really learning Sevelte and Javascript webdevelopment - I am having issues using the library.

I have done:
npm i txml

But I am not sure how to 'import' it. I have tried:
import * as txml from 'txml';

But that gives me a 500 error and the message:
TypeError: Cannot read properties of undefined (reading 'from')
at node_modules/safe-buffer/index.js (index.js:11:12)
at __require2 (chunk-OZI5HTJH.js?v=5487c821:15:50)
at node_modules/readable-stream/lib/_stream_readable.js (_stream_readable.js:55:14)
at __require2 (chunk-OZI5HTJH.js?v=5487c821:15:50)
at node_modules/readable-stream/readable-browser.js (readable-browser.js:1:28)
at __require2 (chunk-OZI5HTJH.js?v=5487c821:15:50)
at node_modules/through2/through2.js (through2.js:1:17)
at __require2 (chunk-OZI5HTJH.js?v=5487c821:15:50)
at transformStream.mjs:2:22

Parsing doesn't error on certain invalid xml.

There are 2 scenarios I have found where the parser does not throw an error even though the xml is invalid.

  1. root level elements without a close tag that aren't marked as a noChildNodes element
parse("<a><b></b>")
// returns
// [
//   {
//     tagName: 'a',
//     attributes: {},
//     children: [ { tagName: 'b', attributes: {}, children: [] } ]
//   }
// ]
  1. root level orphaned close tags (stops parsing after the first one encountered)
parse("</a><b></b>")
// returns
// []

parse("<a></a></b><c></c>")
// returns
// [ { tagName: 'a', attributes: {}, children: [] } ]

Technically there is a third where it doesn't error if there is more than one root element ex. <a></a><a></a> but I am assuming that is intended behavior so it can parse xml fragments. Maybe there should be an option to toggle whether to parse as a fragment or not? Either way this one isn't a big deal because it is very easy to check after parsing if there are multiple root elements.

Cut off text content of <style>

With the following HTML:

<style>p { color: "red" }</style>

txml returns:

[
      {
        tagName: 'style',
        attributes: {},
        children: [ 'p { color: "red" ' ] // note here the closing bracket is missing
      }
]

and so the text content of the <style> tag loses the } closing bracket

Attributes are not included in simple result for elements without children

Wasn't sure whether this was a feature or bug, but I'm thinking it might be the latter.

Here is the input

<response>
  <error msg="haha, nice try">
    <anything></anything>
  </error>
</response>

and this is the output

{
  "response": {
    "error": {
      "anything": "",
      "_attributes": {
        "msg": "haha, nice try"
      }
    },
    "_attributes": {}
  }
}

if I take out the anything tags, the msg attribute is not included anymore

<response>
  <error msg="haha, nice try">
  </error>
</response>

result:

{
  "response": {
    "error": "",
    "_attributes": {}
  }
}

but I'm expecting:

{
  "response": {
    "error": {
      "_attributes": {
        "msg": "haha, nice try"
      }
    },
    "_attributes": {}
  }
}

It looks like the presence of an additional child tag is required to treat the element as an object instead of as a string value

Also parse comments

It would be nice to have an option to parse comments, too.
Sometimes comments in the AST can be useful.
As an optional option, this wouldn't unnecessarily impede performance.

Properly parse unquoted attributes (please🙏?)

Whenever I parse this string:

<p type=bold>hello mom!</p>

I get unexpected results:

{
  tagName:"p",
  attributes:{
    type:null,
    bold:null
  },
  children:["hello mom!"]
}

I know this is an xml parser but wouldn't it just make sense to generate:

{
  tagName:"p",
  attributes:{
    type:"bold"
  },
  children:["hello mom!"]
}

Please consider adding such type parsing in the next bug fix, I love your library! 🥰🥰

Fails to parse XML comment as stream

When a SVG XML file begins with an XML comment, the SVG file is not parsed (stream ends immediately):

<!-- Test -->
<svg height="200" width="500">
  <polyline points="20,20 40,25 60,40 80,120 120,140 200,180" style="fill:none;stroke:black;stroke-width:3" />
</svg>

Without the comment, parsing as stream works fine:

<svg height="200" width="500">
  <polyline points="20,20 40,25 60,40 80,120 120,140 200,180" style="fill:none;stroke:black;stroke-width:3" />
</svg>

Error in tXml.d.ts

Hi @TobiasNickel,

just got this issue while importing txml in my angular application.

Error: node_modules/txml/tXml.d.ts:25:23 - error TS1246: An interface property cannot have an initializer.

25     setPos: boolean = false;

Angular doesn't allow an initializer in interfaces :-/

Thanks for your help :-)

Issue importing into browser with typescript

I am having issues included this library in the browser. It appears that this library is using node only APIs (e.x. fs). I am using typescript and webpack.

[0] ERROR in ./node_modules/tXml/tXml.js
Module not found: Error: Can't resolve 'fs' in '.../node_modules/tXml'

I have tried importing a few different ways without any success
import { xml as tXml, simplify } from 'tXml';
import * as tXml from 'tXml';

Also, it appears that either the README is out of data or that the typescript definitions generated are incorrect, as the type definition does not match the README examples.

simplify is not invoked like described by the README.md

Hi,

README.md states:
tXml("<user is='great'><name>Tobias</name><familyName>Nickel</familyName><profession>Software Developer</profession><location>Shanghai / China</location></user>",{simplify:1});

however, simplify is not invoked like this. Rather it is invoked by:
tXml.simplify(tXml(xmlString));

kind regards
Martin

tXml.d.ts

Should the type be

export type tNode = {
tagName: string;
attributes: object;
children: (tNode | string | number)[];
};

?

rollup includes the entire node stream api into the bundle

as mentioned in the svgo PR, rollup includes to much unneccessary code into a bundle: svg/svgo#1301 (comment)

we should find a way to support propper treeshaking for smaller bundles.

Through2 look not necessary. Can be replaced with const { PassThrough } = require('through2');

The reason for that is, that txml is pure js. There will be no issues about compiling any native modules. It is also much faster than the native module.

AFAIK rollup will still hoist lazy requires to convert into es modules. You better split stream based extension into separate entry point.

Unexpected close tag (<link>)

tXml fails to parse perfectly valid XML like this:

<a><link>foobar</link></a>

I get error "Error: Unexpected close tag" (for the record, fix is to give noChildNodes: [] as option for parse()).

This is kind of a documented feature as documentation describes noChildNodes as "array of nodes, that have no children and don't need to be closed. Default is working good for html." and is given as an example. However, it is surprising that parsing fails if closing tag do exist (not sure but I think link tag can be closed?). It is also surprising that XML parser, by default, fails to parse valid XML even if stated in the documentation if you read far enough especially when everything seems to work at first and then just crashed later on when link tag makes it way to XML.

So I would propose enhancement along the lines:

  1. Make parse() accept end tags for noChildNodes as well, if this is possible
  2. Update documentation so that it is obvious that by default parse() is designed for HTML and may fail with XML
  3. Perhaps even separate functions with different defaults link parseHtml() and parseXml(), this would also help e.g. with #44 so parseXml and parseHtml would by default decode entities and CDATA while parse() would be backwards compatible

But hey, great parser in any case, worked just fine with a product using Rhino engine blocking access to everything otherwise able to decode XML...

MEI element attributes missing through simplify

@TobiasNickel thanks for providing this phantastic library!

I was trying to parse MEI xml code (music score encodings follwing https://music-encoding.org), using txml online.

However, it seems that simplify() ignores the attributes of the lowest level elements (that is in my example e.g., the attributes).

Example XML encoding:
https://raw.githubusercontent.com/trompamusic-encodings/Beethoven_WoOAnh5_BreitkopfHaertel/master/Beethoven_WoOAnh5_Nr1_1-Breitkopf.mei

Excerpt of simplified JSON output from txml online:

                              {
                                "layer": {
                                  "note": [
                                    "",
                                    ""
                                  ],
                                  "beam": {
                                    "note": [
                                      "",
                                      "",
                                      "",
                                      ""
                                    ],
                                    "_attributes": {
                                      "xml:id": "beam-0000001293813124"
                                    }
                                  },
                                  "_attributes": {
                                    "xml:id": "layer-0000000369478504",
                                    "n": "1"
                                  }
                                },
                                "_attributes": {
                                  "xml:id": "staff-0000000951146630",
                                  "n": "1"
                                }
                              },                             

index.d.ts errors in console

node_modules/txml/dist/index.d.ts:2:10 - error TS2552: Cannot find name 'transformSt
ream'. Did you mean 'TransformStream'?

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
           ~~~~~~~~~~~~~~~

  node_modules/typescript/lib/lib.dom.d.ts:13783:13
    13783 declare var TransformStream: {
                      ~~~~~~~~~~~~~~~
    'TransformStream' is declared here.

node_modules/txml/dist/index.d.ts:2:27 - error TS2552: Cannot find name 'filter'. Di
d you mean 'File'?

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
                            ~~~~~~

  node_modules/typescript/lib/lib.dom.d.ts:5014:13
    5014 declare var File: {
                     ~~~~
    'File' is declared here.

node_modules/txml/dist/index.d.ts:2:35 - error TS2304: Cannot find name 'getElementB
yId'.

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
                                    ~~~~~~~~~~~~~~

node_modules/txml/dist/index.d.ts:2:51 - error TS2304: Cannot find name 'getElements
ByClassName'.

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
                                                    ~~~~~~~~~~~~~~~~~~~~~~

node_modules/txml/dist/index.d.ts:2:75 - error TS2304: Cannot find name 'parse'.

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
                                                                            ~~~~~

node_modules/txml/dist/index.d.ts:2:82 - error TS2304: Cannot find name 'simplify'.

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
                                                                                   ~
~~~~~~~

node_modules/txml/dist/index.d.ts:2:92 - error TS2304: Cannot find name 'simplifyLos
tLess'.

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
  
         ~~~~~~~~~~~~~~~~

node_modules/txml/dist/index.d.ts:2:110 - error TS2304: Cannot find name 'stringify'
.

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
  
                           ~~~~~~~~~

node_modules/txml/dist/index.d.ts:2:121 - error TS2304: Cannot find name 'toContentS
tring'.

2 export { transformStream, filter, getElementById, getElementsByClassName, parse, s
implify, simplifyLostLess, stringify, toContentString };
  

Have a lot of errors and it's annoying.

Inconsistent interface for utility methods

I'm a bit confused about how to use the utility methods and I'm getting inconsistent outputs.

With the following HTML string

<head>
    <style>
        p { color: "red" }
    </style>
</head>

<body>
    <p>hello</p>
</body>

I get different results depending on how I use filter.

Following the README and the typescript types this example seems to be the correct one:

    xml(x).filter(element => {
      return element.tagName.toLowerCase() == 'style';
    })

However, it returns an empty array.

After digging into the unit tests, it appears that using filter as an option renders the desire output.

   xml(x, {
      filter: function (element: txml.INode) {
        return element.tagName.toLowerCase() == 'style';
      }
    } as any)

But this feels weird because it's not documented (I don't know if it will break in certain circumstances) and the typescript types don't reflect it

Dealing with whitespaces

  let str = '<?xml version="1.0" encoding="UTF-8" standalone="yes"?><w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"><w:body><w:p><w:r><w:t>aaaaa</w:t></w:r><w:r><w:rPr><w:sz w:val="48"></w:sz></w:rPr><w:t>       </w:t></w:r><w:r><w:t>bbbbbb</w:t></w:r></w:p><w:sectPr><w:pgSz w:w="793.7008056640625" w:h="1122.5196533203125" w:orient="portrait"></w:pgSz><w:pgMar w:top="95.99999237060547" w:right="119.81102752685547" w:bottom="95.99999237060547" w:left="119.81102752685547" w:header="47.24409484863281" w:footer="47.24409484863281" w:gutter="0"></w:pgMar></w:sectPr></w:body></w:document>';
  const resut = txml.parse(str);

Note the <w:t> </w:t> part.
Checking the result object, that 'w:t' item has an empty 'children' array: children: [].

After reading source code, it seems that parseChildren function has the following lines:

var text = parseText()
if (text.trim().length > 0)
    children.push(text);
pos++;

text.trim()causes the issue. Is there any particular purpose 'trim()' is needed here? Or am I missing something in the process?

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.