weso / cwr-dataapi Goto Github PK
View Code? Open in Web Editor NEWCWR-DataApi
License: MIT License
CWR-DataApi
License: MIT License
Create a decoder which transforms a dictionary into a CWR class.
Create a decoder for JSON.
Can use the basic dictionary decoder for help.
Create an encoder for Mongo.
Can use the basic dictionary encoder for help.
In a similar way to the problem with CWRTables, the CWRConfiguration is not configurable.
While this problem is reduced by the fact that this is meant to be set up editing the configuration files, it has the problem of being referred directly on several modules.
The CWRConfiguration class should implement an interface or an abstract class, and then be accessed through a factory or a facade.
While there are classes in the model to represent transactions, these are right now being grouped into just a collection.
It is necessary to find out which of the two would be better, checking both the most simple and the most complex cases of each type of transaction.
Executable Python files, such as tests, should have a shebang like:
The file encoding is indicated using:
'# -- coding: utf-8 --'
Instead of:
'# -- encoding: utf-8 --'
This should be corrected on all files.
The Transmission Trailer rule should indicate that this record ends on a line end.
But this causes an error when reading from a file if it doesn't end at the end of the trailer.
I've been unable to replicate this with tests, and for now the Transmission Trailer rule lacks the end of line requirement.
The country code on the ISRC follows the ISO 3166-1-Alpha-2 standard, and should be validated against it.
Currently it is not doing so.
AgreementTerritory & Territory are very similar. It may be possible to swap AgreementTerritory for Territory, removing the first on the process.
The dictionary parser should be checked with a Group containing several transactions.
Some of the exceptions used on the grammar, mostly for fields values, seem to be incorrectly initialized, and they will cause an 'unprintable exception' error when raised.
The CWR model classes should override the model methods, such as the special methods str or repr, where needed.
More information about this can be found at:
https://docs.python.org/2/reference/datamodel.html
Classes created from ValueEntity seem to be all the same. It may be possible to remove them all, using only the base ValueEntity.
There seem to be several variations of the ISRC field. There must be a way to accept them all.
Otherwise, a text field should be used.
The current implementation has been prepared for revision 3 and the current is revision 13. Check what changes came and apply them.
The dictionary parser should be checked with a Transmission containing several Groups.
NATRecord and NRARecordWork are very similar. Maybe they can be partially combined.
Right now Travis runs a few hundred of tests. While it is necessary to add a lot more to check the parser, it is also needed to somehow simplify those cases where the same groups of tests are repeated, as for example is the case of the fields grammar variations.
On the distribution file there are .pyc files on the data folder.
They should be removed.
The CWRFileDecoder class only decodes the filename into a FileTag if it follows the new naming format.
It should try to decode the new format, but if it fails then it should try again with the old format. A second failure means the filename can't be decoded.
The classes to parse into and from JSON should be redone. Adding new tests for it.
Check if ComponentRecord and AuthoredWorkRecord can be at least partially combined.
There are many record constraints missing which should be added.
When the same Interested Party appears several times on the file a new instance containing all his data is created.
Instead of this, it may be a good idea to create a single instance for the Party and then reuse it each time he appears.
Right now only the first kind of CWR files is supported, the ones being sent to the receiver for processing.
A second type, the acknowledgement file, is created from the first, indicating the results from processing the file.
While this is closely related to the validation process, the parser, the console printer, and any other piece using the model classes, should give support for reading Acknowledgement files.
Check if NOWRecord and NPRRecord can be partially combined.
This is not required, but it can be nice to add Jython support.
In practice, it would mean adding Jython to the Tox test environment. I've already tried doing so, but Travis was unable to make it run (a problem with Java versions).
Still, some basic configurations, including a script, are still on the project, waiting for another try.
Move most of the CWR details to another document (maybe the Github page?), and use the wiki just as a quick guide for the library.
Check the InterestedParty class.
Is it really needed? Should it be removed? Should it be used more often?
Some related classes which don't use it:
After reading a CWR file and creating the model it is needed to be able to save it again into a file.
Mostly this is required for generating Acknowledgement files.
The CWRFile and FileTag classes should be parsed into dictionaries
Create an encoder for JSON.
Can use the basic dictionary encoder for help.
While the CWRTables is useful for keeping all the CWR tables info and files in a single place, the parser may be too dependant on it.
Right now it is used only for the Lookup fields on the grammar.field.table module. Which reduces the coupling, but this can be improved.
It should be possible to implement a custom instance of this class, so an interface or abstract base class should and then an implementation should be accessible through a factory or a facade.
Check if the Writer and Publisher records can be at least partially combined.
AlternateTitleRecord and NATRecord are very similar. Check if they can be combined.
There is a problem with grammar fields where sometimes it gives the base field name instead of the one of the current field.
For example, after creating a numeric field, and naming it 'Field 1', it's name may still be 'Numeric Field'. This seems to be a problem related with fields being a combination of optional rules.
The easiest way to solve this is to add a 'name' parameter to the field creation methods.
EntireWorkTitle & OriginalWorkTitle seem to be equivalent. Must check and remove if needed.
Right now the validation is added manually to the nodes.
There should be a system where a node is assigned a constraint identifier, and then the validation gets configured.
It should be noted that the validation configuration is composed of, at least, the following pieces (check CWR and error expecifications for the actual requirements):
They should be configurable for two reasons:
It would be better if this configuration was set on a file which could be easily modified.
Also, it should be possible to completely deactivate the validation system for testing purposes, or to swap it for another one.
Controlled publishers stored on the CWR file are representing a tree which indicates the relationship between them and the territories.
The model contains classes to build this tree, but currently it is not being done.
Make sure the parser takes care of this.
The leaveWhitespace() method is used too often. It should be required only on the fields.
Which means, only the terminal rules of the grammar should care about keeping the whitespaces or not.
Right now the model instances are created on the same module the rules are contained.
Instead of this, a factory should be used to decouple the parser from the model.
Field factories should be fully configurable. For example, the adapters are right now hardcored, but should be set with parameters or a configuration file.
The name of some fields may change from a model class to another. They all should have the same name.
Also, the field names should be as close to the specification as possible.
The grammar for the fields should be homogeneous. Try to make all of them be created through a method with the same type of parameters (columns, compulsory, name, etc).
At least during the tests, the parsing process seems to be slow. Try to make it as fast as possible.
A Github page, giving details about the project and its background, would help a lot to make the library usable for third parties.
Maybe Sphinx would help here?
Add support for Transactions on the Dictionary encoders.
Create a decoder for Mongo.
Can use the basic dictionary decoder for help.
As the file parsing takes very long, a logger would help to know if everything is going as expected.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.