Giter VIP home page Giter VIP logo

catmandu's Introduction

Build Status Coverage Status Cypress.io tests

LibreCat - an institutional repository

LibreCat is a new institutional repository system developed by LibreCat Group which has as its key features:

  • institutional repository
  • publication list manager for reseachers
  • institutional research data archive.

The development started in 2013 in Bielefeld and was made available on GitHub from the start. Since 2015 the code is in production at Bielefeld. In 2016 Ghent University started using the cataloging backend in production.

Features

  • Cataloging of many record types: Book, Book (Editor), Book Chapter, Book Review, Conference Abstract, Conference (Editor), Conference Paper, Dissertation, Encyclopedia Article, Journal Article, Special Issue, Newspaper Article, Preprint, Report, Translation, Translation (Section), Working Paper, Thesis, Research Data, Project, Award, Research Group
  • Drag and drop upload of full-text publications
  • Copycat from DOI, PubMED, ArXiv and Web of Science
  • Google Scholar indexation support
  • Citation styles configurable from Zotero Style Repository
  • Full MathJAX Latex support to add mathematical formulas in abstracts and titles
  • Pluggable authentication modules
  • Delegate input and management to user others
  • Multilingual support
  • ElasticSearch indexing
  • Pluggable file store backend
  • Command line support using 'Catmandu'
  • OAI-PMH and SRU
  • REST / content negotiation
  • Signposting
  • The LibreCat is open source and shipped with the same license as the Perl language: http://dev.perl.org/licenses/

Install

See our Wiki at: https://github.com/LibreCat/LibreCat/wiki

catmandu's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

catmandu's Issues

Refactor the Fix Path language in a separate module

It would be cleaner for Catmandu::Fix to put the Fix Path parser ("foo.bar.1.*.name") in a separate module. This would ease the documentation and testability of the Path language and provides maybe a way to plugin other Path parsing modules.

retain_key fix should support multiple keys

Delete every key from foo except bar and doz:

retain_field('foo.bar','foo.doz');

One might also support this syntax:

retain_field('foo.(bar|doz)');
retain_field('bar|doz'); # remove all keys but bar and doz

My current workaround to remove all keys but bar and doz:

move_field("foo","tmp.foo");
move_field("bar","tmp.bar");
retain_field("tmp");
move_field("tmp.foo","foo");
move_field("tmp.bar","bar");
remove_field("tmp");

Support command line help for importers and exporters

shows help on Catmandu::Importer

catmandu convert --help

should show help/documentation on Catmandu::Importer::CSV:

catmandu convert CSV --help

should show help/documentation on Catmandu::Exporter::CSV:

catmandu convert to CSV --help

The help command may read more information from the module's POD. Right now I am regularly switching between command line and metacpan to find the right command line arguments.

catmandu command broken

In version 0.8007 of Catmandu, the command "catmandu" shows this error:

Can't use an undefined value as an ARRAY reference at /home/njfranck/git/imaging/local/lib/perl5/Catmandu/CLI.pm line 33

Probably due to $lib_path being undefined.

copy_field asterisk copies only first value to destination

#!/usr/bin/env perl
use Catmandu::Sane;
use Catmandu::Fix;
use Data::Dumper;

my $hash = {
  a => [1..4],
  b => []
};

my $fixer = Catmandu::Fix->new(fixes => ["copy_field('a.*','b.\$append')"]);
print Dumper($fixer->fix($hash));

output:

$VAR1 = {
          'a' => [
                   1,
                   2,
                   3,
                   4
                 ],
          'b' => [
                   1
                 ]
        };

Only the first value from "a" is copied

Catmandu::Importer::YAML fails on UTF8: Catmandu::Importer encoding broken

With YAML::XS, which is preferred by YAML::Any, the YAML importer fails when importing UTF-8 YAML files:

$ echo "umlaut: Ü" | catmandu convert YAML
YAML::XS::Load Error: The problem:

    invalid trailing UTF-8 octet

was found at document: 0

This behaviour of YAML::XS is documented (see https://rt.cpan.org/Public/Bug/Display.html?id=54683) and won't be changed. I suppose it can be fixed by adding an encoding parameter, but this is not documented and by a bug in Catmandu::Importer or Catmandu::App::convert the parameter is not passed to the file handle anyway. This should work (it does when hard-coding "raw" in Catmandu::Importer!), but it does not:

$ echo "umlaut: Ü" | catmandu convert YAML --encoding :raw

Even when it worked, the defaul!t encoding setting (:utf8) is annoying at least for YAML. I have not tested with JSON and I won't invest more work in fixes that don't get released anyway :-(.
.

Support loading modules from another library path

nichtich@033d096 (proposed as pull request #40) allows this:

catmandu -I lib convert MyFormat to JSON < file.myformat

where Catmandu/Importer/MyFormat.pm is located in lib. My current nasty workaround is

perl -Ilib `which catmandu` convert MyFormat to JSON < file.myformat

It's covered by a unit test. Looks like Catmandu::CLI was not being tested before.

Rewrite JSON importer

The current JSON importer can only handle line-based JSON as emitted by the JSON exporter. One should also be able to parse arbitrary JSON documents, including multiple objects in one file. For instance the following file includes three JSON objects:

{ "id": 1 }
{ 
  "id:" 2
}

{ "id": 
3 }

In addition the importer should support a path option as introduced in Catmandu-Importer-XML. For instance the following file with option path=/record.* could import the same three objects:

{
  "records": [ { "id": 1 }, { "id": 2 }, { "id": 3 } ]
}

This Perl Module may help implementing.

Error reporting in Catmandu::Exporter::Template

The Programmers Guide tutorial contains a section showing basic usage of the TT2-based exporter. When trying out the following code snippet, it took me a long time to figure out why no output was produced:

use Catmandu::Exporter::Template;
my $data     = [
 { name => { first => 'James' , last => 'Bond' } , occupation => 'Secret Agent' } ,
 { name => { first => 'Ernst' , last => 'Blofeld' } , occupation => 'Supervillain' } ,
];
my $exporter = Catmandu::Exporter::Template->new(template => '/home/phochste/example.tt');
$exporter->add_many($data);

Of course, I am not phochste, so I changed the path to the template file into "example.tt", expecting the exporter to read the template from the current directory. That was a mistake. This value must be a full, absolute path, and no tilde (~) expansion is performed.

Perhaps this module should make more of an effort to find the template file. In any case, some kind of error message, warning or exception should be produced when the template file cannot be found.

Exporter JSON provides invalid JSON

Using Catmandu::Exporter::JSON gives you something like that:

{name: "Catmandu", creator: "Nicolas"}
{name: "Perl", creator: "Larry"}

But for a valid JSON document it should be:
[ {name: "Catmandu", creator: "Nicolas"},
{name: "Perl", creator: "Larry"} ]

support Windows

Catmandu.pm fails on Win. Probably just because of the different file system structure.

fixes introduce a new key if the key is not found

This shell command line works as expected, downcasing the value of the job feature:

$echo '{"job":"Artist"}' | catmandu data --fix 'downcase("job")'
{"job":"artist"}

When trying to fix the value of a non-existing feature, a new feature with this key and a null value is created:

$ echo '{"job":"Artist"}' | catmandu data --fix 'downcase("occupation")'
{"occupation":null,"job":"Artist"}

This is should not happen.

Create boilerplate code for easy Fixes

We use emit functions in Fix modules to parse JSON paths and make the Fix code run very efficient over deeply nested hashes/arrays. But the emit logic is quite verbose..and something we might change in the future.

However, there are some patterns in when creating Fixes that are repeated over and over and could easily be factored out in a separate module.

e.g.

Fix: on a path change its value to another value could be implemented as

package SomeFix;

with 'EasyEmit';

sub on_path {
my $self = shift;
my $value = shift;
$value = somefunc($value);
}

1;

Catmandu::SRU corrupts XML data - choose another mapping of XML

Catmandu::SRU is not usable for data such as MARCXML and PICAXML because it corrupts the order of XML elements. The module should not use XMLIn from XML::LibXML::Simple but XML::LibXML::Reader and emit a different object format for mapping XML.

Mapping XML to record structures with irreversible order is common and useful for specific formats. See this summary for most common mapping rules. Whether this mapping as done by XML::Simple is suitable, however, depends on the data format.

I'd choose a format as sketched here. Catmandu::SRU should neither parse XML but only unpack the SRU response and direct XML records to a Catmandu::Importer::XML.

use Config::Onion

the config branch already uses Dave's Config::Onion and gives us config merging, local configs etc BUT this breaks the subkey feature in catmandu, e.g. catmandu.list.yml was loaded into a 'list' subhash (is this a problem for anyone?)

Create Catmandu::Registry

Create Catmandu::Registry for handling bundles:
Catmandu::Registry

  • loads list of bunldes
  • register & reveive events
  • manages list of services

Bundle of functionality

  • load/unload
  • contains new model
  • contains Plack middleware/Plack application
  • config files
  • event handling / messaging
  • templates
  • files
  • js, css

remove cruft from Util

there are way too many general functions in there; look into Type::Tiny, Path::Tiny etc. as a replacement

split into separate modules

Just for you info: I'm working on that.

Step 1: Separate

Catmandu::Exporter::BibTeX

Catmandu::Exporter::XLS

Catmandu::Importer::Atom

Catmandu::Store::DBI (?)

Step 2:

Add to Task::Catmandu

Simple mapping table as fix

Many data conversion tables consist of simple mapping tables given as spreadsheets/csv. Fields included in a mapping table are moved/renamed andy fields not notnot included are removed. How about using csv as input format for such mapping fixes?

Allow passing of primary options as arguments in command line

Why can't I just type

catmandu import JSON test.json to DBI dbi:SQLite:test.sqlite

The primary and most common option to Catmandu::Importer is file and the primary, mandatory option to Catmandu::Store is data_source, so the call above should be expanded to

catmandu import JSON --file test.json to DBI --data_source dbi:SQLite:test.sqlite

Typical usage should be simple and non-typical usage should be possible.

Programmers Guide (tutorial)

I encountered a few small issues when working through http://librecat.org/tutorial/index.html ...

People who have not used cpan a lot will have to spend an enormous amount of time confirming each dependency when doing sudo cpan Catmandu. The following lines make sure dependencies are installed fully automatically:

    $ sudo cpan
    o conf prerequisites_policy follow
    o conf build_requires_install_policy yes
    o conf commit
    install Catmandu
    q

Some code examples did not work immediately because of missing semicolons at the end of lines:

    printf "SUM [ Iterator * 2] = %d\n" , $result
    use Catmandu::Exporter::YAML

Fix::uniq and Fix::sort

We just had a Catmandu workshop and the question arose how to sort lists and how to remove duplicates in a fix. I think sort should be limited to simple lists of strings but uniq may be more complex if it also applies to nested structures (not the most common use case).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.