Giter VIP home page Giter VIP logo

darlingtonia's People

Contributors

bess avatar cjcolvar avatar jenlindner avatar little9 avatar mark-dce avatar val99erie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

darlingtonia's Issues

Configure Logfile

ACCEPTANCE

  • Allow me to configure the log location & name
  • Allow me to display output on STDOUT optionally

Requires #36

Update Metadata (Overwrite)

ACCEPTANCE
Given a unique field that I can match records on.

  • I can configure the importer to update the metadata (completely overwrite) for the matching record
  • Files are not reimported

TODO - add import table truth table

Import Logging

Every Import

Should display the following info for each record being processed:

  • Source record unique identifier
    • Filename OR record number in stream OR unique ID in record
# i.e. if there’s an error reported, how do I find the thing I need to go back and fix?
  • Any record validation (input format) errors
  • Any work validation (model level) errors
  • File(s) being attached
    • Any file-level errors (missing files, etc.)

Should display a summary of the overall import run

  • Number of input records read
  • Number of new objects created
  • Number of existing objects
    • skipped (if the rule is to skip existing)
OR
    • updated (if the rule is to update existing)
  • Number of objects not created due to errors

Timestamps would be nice for profiling

  • Start and end time for each record
    • OR (even better) start and duration for each record
# this gives us important performance info

Logs should be easily machine parsable for easy integration into Splunk and other log aggregation systems.

BONUS: Configurable verbosity

Quickstart Guide

As a repository owner, I'd like a quick way to get my application setup for batch ingest.

ACCEPTANCE
Given a CSV with headings matching the Hyrax Basic Metadata Fields and corresponding content (single files per row)

  • There is a guide that walks me through setting up a vanilla Hyrax app to import my content

  • There is a guide that walks me through custom field mappings for

    • additional fields
    • non-string fields types
    • fields with some kind of transformation

Summary Logging

ACCEPTANCE
Provide a summary of as many of the following stats as feasible (in the scoped time)

  • Number of input records read
  • Number of new objects created
  • Number of objects not created due to errors

Rename `Parser`

"Parser" is already proving to be a slightly unwieldy name. We chose this name to differentiate from an Importer and "parser" seemed viable given that examples read (parsed) records from a manifest file. However, the laevigata importer will not parse records from a manifest, but instead draw them from a live Fedora 3 instance.

In this case, the job of the so-called Parser is to provide a stream of mapped and validated InputRecord objects to the Importer. It turns out that "parsing" is an implementation detail.

Perhaps RecordStream?

I don't expect major Parser API changes to accompany any name change. We'll need to deprecate the old class names for removal in 2.0.0, however.

Publish YARD docs

  • Publish Darlingtonia YARD docs either on github.io or on a hosted service

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.