Giter VIP home page Giter VIP logo

onix-data's Introduction

Reliability Security Rating Maintainability

Lines of Code Bugs Vulnerabilities Coverage Duplicated Lines (%)

ONIX-Data

This solution provides a C# library, for both Framework and Standard, that serves to provide .NET data structures (and an accompanying set of helpful parsers) for the ONIX XML format, which is the international standard for representing the electronic data regarding books (along with other media). This format has been established by the international book trade body known as EDITEUR. Within this solution, you will find two collections of classes for serialization/deserialization: one that represents the legacy format (i.e., 2.1 and earlier) and another that represents the current format (i.e., 3.0). In addition, two helpful parser classes have been included in order to assist with the population of those collections.

Even though the "sunset date" for the legacy version 2.1 has passed, many (if not most) organizations still use 2.1 for the time being, and they will likely be used for the near future.

Unfortunately, since validation of ONIX files has proven problematic on the .NET platform, there is an accompanying Java project that can serve to validate those files instead.

NOTE: The Framework project is now considered to be deprecated. All future development will only occur in the Standard project.

Requirements

  • Visual Studio 2019 (at least)
  • An unconditional love for a XML tag collection that attempts to cover the ontology of the known universe.

ONIX Editions Handled

  • ONIX 3.0 (short tags)
  • ONIX 3.0 (reference tags)
  • ONIX 2.1.3 and earlier (short tags)
  • ONIX 2.1.3 and earlier (reference tags)

NOTE: Even though this project addresses many tags of both ONIX versions, it does not currently parse out all of them, especially in the case of ONIX 3.0 (which appears to aim at supporting the ontology of the known universe). In the case that you find something unsupported and wanted, you can create an issue within this repo, and I will attempt to address it in my free time. (Or you can implement it on your own and then issue a pull.) The same applies for any possible features that can be incorporated into the Extensions folder (like autocorrection with ChatGPT, etc.).

For Large ONIX Files

When parsing larger ONIX files (typically anything greater than 250 MB), it's strongly encouraged to use the OnixLegacyPlusParser class (for ONIX 2.1) and the OnixPlusParser class (for ONIX 3.0). These two classes are used just like the OnixLegacyParser and OnixParser classes, and they will help the user to avoid out-of-memory exceptions.

Notes

There is one caveat to know before using any of the Parsers: the ONIX-Data project does perform non-optional preprocessing on the ONIX file before doing any actual parsing. These changes are merely real-world substitutions for ONIX encodings (found in the ONIX DTD), which is the same result for the output when parsing with a DTD. These non-optional replacements actually change the file itself, and it can take a few minutes to finish (like 6-8 minutes per 400 MB), depending on the machine's specs. So, if you value the original copy of your ONIX file (i.e., with non-standard ONIX encodings), be sure to create a backup copy beforehand.

The Parsers also have an optional preprocessing step (invoked via the constructor), which will perform other friendly edits (like removing misformed HTML encodings, etc.) that will clean the file of any suspicious characters. These characters can cause the Microsoft XML libraries to throw an exception.

If you would like to become better acquainted with legacy format of the ONIX standard, you can find documentation and relevant files (XSDs, DTDs, etc.) on the archive page of EDITEUR.

If you would like to become better acquainted with the current version of the ONIX standard, you can find documentation and relevant files (XSDs, DTDs, etc.) on the current page of EDITEUR.

Projects

Project Source Nuget_Package Description
OnixData https://www.nuget.org/packages/ONIX-Data/ This C# library serves to provide .NET data structures (and an accompanying set of helpful parsers) for the ONIX XML format.
OnixData.Standard https://www.nuget.org/packages/ONIX-Data.Standard/ Packaged as a .NET Standard library, this C# library serves to provide .NET data structures (and an accompanying set of helpful parsers) for the ONIX XML format.
OnixData.Standard.Benchmarks This project benchmarks the Standard version of the library, running its own simple tests against a variety of sample sizes and providing reports of its performance.
OnixData.Standard.BaseTests This library contains more thorough unit tests of several ONIX sample files, which will then be employed in validating the library against various Microsoft frameworks.
OnixData.Standard.NetFrameworkTests This project uses the BaseTests project to run unit tests against the .NET 4.6 framework.
OnixData.Standard.CoreTests This project uses the BaseTests project to run unit tests against the .NET Core framework.
OnixData.Standard.Net5Tests This project uses the BaseTests project to run unit tests against the .NET 5 framework.
OnixTestHarness This project is a simple test harness that provides some use cases on how to use the ONIX-Data parser.

Usage Examples

// An example of using the ONIX parser for the contemporary ONIX standard (i.e., 3.0)
int nOnixPrdIdx = 0;
string sFilepath = @"YourVer3OnixFilepath.xml";

FileInfo CurrentFileInfo = new FileInfo(sFilepath);
using (OnixParser V3Parser = new OnixParser(CurrentFileInfo, true))
{
    OnixHeader Header = V3Parser.MessageHeader;

    foreach (OnixProduct TmpProduct in V3Parser)
    {
        string tmpISBN = TmpProduct.ISBN;

        var Title       = TmpProduct.Title;
        var Author      = TmpProduct.PrimaryAuthor;
        var Language    = TmpProduct.DescriptiveDetail.LanguageOfText;
        var PubDate     = TmpProduct.PublishingDetail.PublicationDate;
        var SeriesTitle = TmpProduct.SeriesTitle;
        var USDPrice    = TmpProduct.USDRetailPrice;

        var BarCodes = TmpProduct.OnixBarcodeList;

        /*
         * The IsValid method will inform the caller if the XML within the Product tag is invalid due to syntax
         * or due to invalid data types within the tags (i.e., a Price with text).
         *
         * (The functionality to fully validate the product in accordance with the ONIX standard is beyond the scope
         * of this library.)
         *
         * If the product is valid, we can use it; if not, we can record its issue.  In this way, we can proceed 
         * with parsing the file, without being blocked by a problem with one record.
         */
        if (TmpProduct.IsValid())
        {
            System.Console.WriteLine("Product [" + (nOnixPrdIdx++) + "] has EAN(" +
                                     TmpProduct.EAN + ") and USD Retail Price(" + TmpProduct.USDRetailPrice.PriceAmount +
                                     ") - HasUSRights(" + TmpProduct.HasUSRights() + ").");
                                     
            /*
            * For 1-to-many composites, where a product can have more than one subitem (like Contributor), you should
            * use the lists that have a prefix of 'Onix', so that you can avoid having to detect whether or not the
            * reference or short composites have been used.
            */
            if (TmpProduct.DescriptiveDetail.OnixContributorList != null)
            {
                foreach (var TmpContrib in TmpProduct.DescriptiveDetail.OnixContributorList)
                {
                    System.Console.WriteLine("\tAnd has a contributor with key name (" + TmpContrib.KeyNames + ").");
                }
            }                                         
        }
        else
        {
            System.Console.WriteLine(TmpProduct.GetParsingError());
        }
    }
}

// An example of using the ONIX parser for the legacy ONIX standard (i.e., 2.1)
int nLegacyShortIdx = 0;
string sLegacyShortFilepath = @"YourOnixFilepath.xml";
using (OnixLegacyParser onixLegacyShortParser = new OnixLegacyParser(new FileInfo(sLegacyShortFilepath), true))
{
    OnixLegacyHeader Header = onixLegacyShortParser.MessageHeader;

    // Check some values of the header

    foreach (OnixLegacyProduct TmpProduct in onixLegacyShortParser)
    {
        string Ean = TmpProduct.EAN;

        /*
         * The IsValid method will inform the caller if the XML within the Product tag is invalid due to syntax
         * or due to invalid data types within the tags (i.e., a Price with text).
         *
         * (The functionality to fully validate the product in accordance with the ONIX standard is beyond the scope
         * of this library.)
         *
         * If the product is valid, we can use it; if not, we can record its issue.  In this way, we can proceed 
         * with parsing the file, without being blocked by a problem with one record.
         */
        if (TmpProduct.IsValid())
        {
            System.Console.WriteLine("Product [" + (nLegacyShortIdx++) + "] has EAN(" +
                                     TmpProduct.EAN + ") and USD Retail Price(" + TmpProduct.USDRetailPrice.PriceAmount +
                                     ") - HasUSRights(" + TmpProduct.HasUSRights() + ").");
                                     

            /*
             * For 1-to-many composites, where a product can have more than one subitem (like Contributor), you should
             * use the lists that have a prefix of 'Onix', so that you can avoid having to detect whether or not the
             * reference or short composites have been used.
             */
            if (TmpProduct.OnixContributorList != null)
            {
                foreach (OnixLegacyContributor TempContrib in TmpProduct.OnixContributorList)
                {
                    System.Console.WriteLine("\tAnd has a contributor with key name (" + TempContrib.KeyNames + ")."); 
                }
            }
        }

        }
        else
        {
            System.Console.WriteLine(TmpProduct.GetParsingError());
        }
    }
}

onix-data's People

Contributors

danieloppenlander avatar dgil-unedbarbastro avatar jaerith avatar szolkowski avatar vegardlarsen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

onix-data's Issues

serialization

I see in the sample code that we can Deserialization, a check the source code and I can't find how to make a serilization?

//somathing like this
OnixProductId s = new OnixProductId();
s.ProductIDType = 15;
s.IDValue = "99999999999999";

OnixData.Version3.OnixProduct p = new OnixData.Version3.OnixProduct();
p.productidentifier = new OnixProductId[] { s };

// Save to xml ?

thansk !

Determine Onix file version

Hi Jaerith,

Thank you again for making the time for the issue I had with the Default header price type codes. I appreciate it a lot.

I was wondering (when you have time) if there was a way to determine which ONIX version was sent to us.

Thank you!

Missing Price.Territory.CountriesIncluded

Hi,

I'm trying to filter the price list by country using the field Territory.CountriesIncluded:

image

but I'm not able to find it.

When I inspect the OnixData.Version3.Price.OnixPrice object and I can't find it:

image

Am I searching in the wrong place?

Regards,
Dani

ONIX-Data + SQUIDEX CMS

Hi, first of all I want to say that your project is very interesting. I haven't found any other like it, open source, .net, onix. Secondly, I was wondering if you knew about Squidex the .NET Headless CMS. My end goal is to build a system to send, receive, and store Book Metadata.

Could I theoretically use your project for metadata ingestion ? What would be the best-practice way of doing it ?

Header DefaultPriceTypeCode isn't used when no PriceTypeCode found in Product

Hello jaerith,

First off, i want to thank you for this amazing project. It has made parsing ONIX legacy and 3.0 files with ease. I've had to make some additions to the ONIX 3 parsing, which was easy to do.

I was wondering if you have an idea on how to refer to the header DefaultPriceTypeCode when no PriceTypeCode is found in the Product segment?

For example...

                    bHasUSDPrice =
                        TmpSupplyDetail.OnixPriceList.Any(x => x.HasSoughtPriceTypeCode() && (x.CurrencyCode == "USD"));

Returns false because PriceType is -1...

    public bool HasSoughtPriceTypeCode()
    {
        return CONST_SOUGHT_PRICE_TYPES.Contains(this.PriceType);
    }

Yet, there is a default Price Type in the Header.

Thank you.

New Tag Request

Hello,

I was wondering if the following tag could be added to both parsers?

B044 - Biographical Notes

Thank you.

The 'xmlns' attribute is bound to the reserved namespace 'http://www.w3.org/2000/xmlns/'

Hi,
I'm trying to use the ONIX-Data parser with ONIX files from a provider and I'm getting this exception:

  Message: 
    System.ArgumentException : The 'xmlns' attribute is bound to the reserved namespace 'http://www.w3.org/2000/xmlns/'.
  Stack Trace: 
    XmlTextWriter.WriteStartAttribute(String prefix, String localName, String ns)
    XmlDOMTextWriter.WriteStartAttribute(String prefix, String localName, String ns)
    XmlAttribute.WriteTo(XmlWriter w)
    XmlElement.WriteStartElement(XmlWriter w)
    XmlElement.WriteElementTo(XmlWriter writer, XmlElement el)
    XmlElement.WriteTo(XmlWriter w)
    XmlNode.get_OuterXml()
    OnixEnumerator.MoveNext()
    MyService.ParseOnixFile(String filePath) line 42
    MyServiceTests.ParseOnixFile_ShouldReturnOK() line 34
    GenericAdapter`1.GetResult()
    AsyncToSyncAdapter.Await(Func`1 invoke)
    TestMethodCommand.RunTestMethod(TestExecutionContext context)
    TestMethodCommand.Execute(TestExecutionContext context)
    SimpleWorkItem.PerformWork()

The ONIXMessage and all Product xml nodes have the namespace attribute xmlns="http://ns.editeur.org/onix/3.0/reference":

image

If I remove all the xmlns attributes from the file then the parsing works.

I there some way to get the parse working without modifying the original ONIX files?

Thanks in advance.
Best Regards,
Dani

Extending project with GitHub Actions (CI)

Hello,

I wanted to learn GitHub Actions and used my fork of this project as playground. I think it can be also added to main project since it is always good to have CI working for projects. I prepared them just for NETStandard part, since it is where tests are.

You can take a look how it looks like I my fork:

https://github.com/Stanislaw000/ONIX-Data/actions

PR: szolkowski#6 (you need to click on show details box).

image

Tests results: https://github.com/Stanislaw000/ONIX-Data/runs/3182479870?check_suite_focus=true

Workflow files are in pull request #18

If you prefer to change something or customize let me know, so we can improve this solution.

Create Onix file

Hello jearith, first I want to congratulate you for this excellent project.

I work at a publisher in Brazil, and starting next month, our system should generate Onix files to insert in the portals of Google, Amazon, Kobo and Apple.

I loved your project and would like to know if I can generate ONIX files with it.
I made an example and it generates the file with duplicate Tags.

I also don't know if it is possible to insert "refname" in short tags, insert version and xmlns in ONIXmessage

Can you help me?

OnixLegacyPlusParser skipping every 2nd record

Hi there,

The OnixLegacyPlusParser (haven't tried this with OnixPlusParser) is skipping every 2nd record when the file doesn't have line separators.

When there are line separators between fields, it works fine.

serializer and deserializer ?

Hi,
Great work.

Is there a way to serialize a given onix file with the parser or do you have any example on how to serialize it?
I was currently using the XmlSerializer with typeof OnixMessage but in most cases the OnixMessage is null and I need to get the list of OnixProducts from the foreach you showed in the readme..

Any suggestions?

SupportingResource node support

Hello,

I have added SupportingResource support on .NET Standard branch. PR: #14

Changes are done both in NET Standard project and old OnixData.
I also propagated changes from PR: #11 to .NET Standard branch to both projects.

.NET core/ .NET 5 version

Hello,

First of all I must say great work with this library!

  1. Is there any known problem/issues/blocker with porting this library to .NET core from classic .NET framework?
  2. Do you accept contributions to your repo?

Legacy Plus Parser crashes at "MoveNext"

Hello,

When an error occurs during the MoveNext function in the Legacy Plus Parser, the reading of the file stops completely.
In the Legacy Parser, it does not do that.

Legacy Plus MoveNext does not return a true on the bResult when this happens.
It doesn't do that in the Legacy unless the # of products count is finished.

Thank you.

Out Of Memory Issues

Hi Jaerith,

Thank you again for help on the other items! It has been very helpful!

Iโ€™m now getting a lot of OutOfMemory errors on big files.... These files are in the 800 MB range.

The errors happen during the processing in OnixData.

Thursday, March 4, 2021 11:18:32 AM: exception processing Single file: System.OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.
at System.String.CtorCharArrayStartLength(Char[] value, Int32 startIndex, Int32 length)
at System.Xml.XmlTextReaderImpl.get_Value()
at System.Xml.XmlLoader.LoadNode(Boolean skipOverWhitespace)
at System.Xml.XmlLoader.LoadDocSequence(XmlDocument parentDoc)
at System.Xml.XmlLoader.Load(XmlDocument doc, XmlReader reader, Boolean preserveWhitespace)
at System.Xml.XmlDocument.Load(XmlReader reader)
at OnixData.OnixParser.get_MessageHeader() in Z:\Visual Studio Projects\ONIX-Data-master\OnixData\OnixParser.cs:line 169
at OnixData.OnixEnumerator.MoveNext() in Z:\Visual Studio Projects\ONIX-Data-master\OnixData\OnixParser.cs:line 316

I was wondering if you had any ideas on what I can do to try and have the full file process.

Thanks again!

Missing fields TaxRatePercent and CountryOfPublication

Hi,
I need to receive those fields which currently are not being parsed:

  • OnixPriceTax.TaxRatePercent
  • OnixPublishingDetail.CountryOfPublication

I've already added to my local cloned ONIX-Data repo, if you want, I can create a pull-request so we can contribute to this repo.

Regards,
Daniel

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.