Giter VIP home page Giter VIP logo

xmlutils.py's Introduction

NOTICE (May 2017): I'm no longer able to maintain this project or look at posted issues. If you're interested in maintaining, please let me know.

xmlutils.py

xmlutils.py is a set of Python utilities for processing xml files serially
for converting them to various formats (SQL, CSV, JSON). The scripts use ElementTree.iterparse() to iterate through nodes in an XML document, thus not needing to load the entire DOM into memory. The scripts can be used to churn through large XML files (albeit taking long :P) without memory hiccups.

Simple table-representing XMLs can be converted to CSV using xmltable2csv. It assumes each entry is encapsulated in some tag, and successfuly tested on some XLSX files.

Blind conversion of XML to CSV and SQL is not recommended. It only works if the structure of the XML document is simple (flat). On the other hand, xml2json supports complex XML documents with multiple nested hierarchies. Lastly, the XML files are not validated at the time of conversion.

Installation

With pip or easy_install

pip install xmlutils or easy_install xmlutils

Or from the source

python setup.py install

Commandline utilities

Once the package is installed, the three bundled commandline utilities should be available from the terminal.

xml2csv

Convert an XML document to a CSV file.

xml2csv --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "item"
Arguments
--input 	Input XML document's filename*
--output 	Output CSV file's filename*
--tag 		The tag of the node that represents a single record (Eg: item, record)*
--delimiter 	Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--ignore 	A space separated list of element tags in the XML document to ignore
--noheader 	Exclude CSV fields header (first line). Off by default
--encoding 	Character encoding of the document. Default is utf-8
--limit 	Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer 	The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number of disk writes. Default is 1000

xmltable2csv

Convert an XML table to a CSV file.

xmltable2csv --input "samples/fruits.xml" --output "samples/fruits.csv" --tag "Data"
Arguments
--input         Input XML table's filename*
--output        Output CSV file's filename*
--tag           The tag of the node that represents a single record (Eg: Data, record)*
--delimiter     Delimiter for seperating items in a row. Default is , (a comma followed by a space)
--header        Whether to print the header (first row of records in the XML) in the first line; 1=yes, 0=no. Default is 1.
--encoding      Character encoding of the document. Default is utf-8
--limit         Limit the number of records to be processed from the document to a particular number. Default is no limit (-1)
--buffer        The number of records to be kept in memory before it is written to the output CSV file. Helps reduce the number of disk writes. Default is 1000.

xml2sql

Convert an XML document to an SQL file.

xml2sql --input "samples/fruits.xml" --output "samples/fruits.sql" --tag "item" --table "myfruits"
Arguments
--tag           the record tag. eg: item
--table         table name
--ignore        list of tags to ignore
--limit         maximum number of records to process
--packet        maximum size of an insert query in MB (MySQL's max_allowed_packet)

xml2json

Convert XML to JSON. xml2json supports hierarchies nested to any number of levels.

xml2json --input "samples/fruits.xml" --output "samples/fruits.json"

Modules

xmlutils.xml2sql

from xmlutils.xml2sql import xml2sql

converter = xml2sql("samples/fruits.xml", "samples/fruits.sql", encoding="utf-8")
converter.convert(tag="item", table="table")
Arguments
tag 	-- the record tag. eg: item
table	-- table name
ignore	-- list of tags to ignore
limit	-- maximum number of records to process
packet	-- maximum size of an insert query in MB (MySQL's max_allowed_packet)

Returns:
{	num: number of records converted,
	num_insert: number of sql insert statements generated
}

xmlutils.xml2csv

from xmlutils.xml2csv import xml2csv

converter = xml2csv("samples/fruits.xml", "samples/fruits.csv", encoding="utf-8")
converter.convert(tag="item")
Arguments
tag	-- the record tag. eg: item
delimiter -- csv field delimiter
ignore	-- list of tags to ignore
limit	-- maximum number of records to process
buffer	-- number of records to keep in buffer before writing to disk

Returns:
number of records converted

xmlutils.xml2json

from xmlutils.xml2json import xml2json

converter = xml2json("samples/fruits.xml", "samples/fruits.json", encoding="utf-8")
converter.convert()

# to get a json string
converter = xml2json("samples/fruits.xml", encoding="utf-8")
print converter.get_json()
Arguments
pretty	-- pretty print?

xmlutils.py's People

Contributors

knadh avatar lorcan avatar dunglehome avatar jadavbheda avatar hellosputnik avatar egalev avatar ropp avatar billwanjohi avatar onemoretime avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.