Giter VIP home page Giter VIP logo

ami-preservation's People

Contributors

bturkus avatar cgmcnamara avatar dependabot[bot] avatar genfhk avatar gregh18 avatar nkrabben avatar noellebyer avatar rdmarino avatar rdostaly avatar seansmalley avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ami-preservation's Issues

Add to Google Sheets resources: Filter and list lines that contain a specific value

=FILTER(D:D,SEARCH("invalid",D:D,1))

example: column D (above) contains 10000 lines of text; need to filter out every line that describes an invalid file
e.g. "abc_123456_v01_pm.json is invalid" in order to get a list of every invalid json file. This will search column D and drop a list into the cells below where this formula is entered.

Update links on Quality Control Workflow page

On https://nypl.github.io/ami-preservation/pages/mps/qc-workflow.html,
all of the following links essentially go here: https://github.com/NYPL/ami-preservation

Under 'Mounting Drives Read-Only'

  • The most important step during QC is to mount your drive(s) Read-Only.

Under 'Media specification validation (MediaConch)' (2 links)

Under 'Generating a QC list'

  • Use Terminal to generate a QC list for each drive you are QCing by following the steps outlined here.

Under 'Content Inspection'

  • Content inspection can be completed either on ICC or on the drive by following the steps outlined here.

Under 'SPOT CHECKING CONTENT & JSON'

  • Use a text-editor (Atom / Notepad / Text Edit etc.) to open and inspect JSON files.

Under LOGGING QC FAILURES & FLAGS

  • Use this list of definitions to review and mark-off the items listed in the QC log

Under 'MEDIA INGEST PREPARATION'

  • Follow these steps to prepare media for ingest.

endless loop cassette protocol

Add to specs: protocol for digitizing endless-loop cassettes (i.e. cartridges, compact cassettes etc.):

  • digitize for several minutes until it seems evident that the entire content has been captured, then trim the PM to the beginning and end of content and make EM match PM.
  • if one cannot determine beginning / end, determine max duration of tape and use that as a bases for capture.

Provide guidance on interpreting ami-tools error messages

ERROR - There is no media file found at /Volumes/lpasync/!-PAMI/InHouse/2022_004_pami_222_scl5356/512856/data/EditMasters/scb_512856_v01f01s02_em.wav

The above error was encountered because the assetReferenceFilename in the JSON metadata listed the filename as 'scb_512856_v01f01s02_em.wav' but the filename was actually 'scb_512856_v01f01s02_em.flac'. The error message is correct, but one must know that the tool is looking for the filename as listed in the JSON metadata, to confirm a match.

BWF metadata in FLAC

Hi all,

For about 6 months now I'm the 'maintainer' of the reference FLAC implementation at https://github.com/xiph/flac I get a lot of feedback from users, but almost exclusively from people maintaining their own, private, relatively small audio collection, and I'd like some feedback from the archiving community.

Because of that I visited the NTTW conference this week. I was pointed by Andrew Weaver to this particular github repo. I'm not sure an issue at this repository is the best place for such feedback, if not, perhaps we can get in touch?

The reason I reach out beyond the community I can currently reach is because of the '--keep-foreign-metadata' option in FLAC, which enables storing BWF metadata (and other WAV metadata) in a FLAC file. I would like to improve on it, but am unsure how. Besides that, you might have some pointers for me on specific improvements that might be beneficial for your use cases.

Thanks in advance

how to fix comma issues in JSON

leading commas: find path/to/dir/of/json -type f -iname '*.json' -exec gsed -i 's/{,/{/g' {} \;
trailing commas: find /Volumes/NYPL454416/jsonCommaIssue_audio/ -type f -iname '*.json' -exec any-json --input-format=hjson {} {} \;

i.e. "}," and when there are too many commas after objects

Create new file path based on filerole instead of current file path

new_file_path = os.path.join(source_directory, cms_id, file_path)

Sometimes files are placed in the wrong folder, e.g. example_sc.json is put in PreservationMasters/

split_projects could potentially address this by moving files based on their file name.

Instead of

new_file_path = os.path.join(source_directory, cms_id, file_path)
...
if old_file_path.endswith(('mkv', 'json', 'mp4', 'dv', 'flac')):
            shutil.move(old_file_path, new_file_path)

it could be

if old_file_path.endswith(('mkv', 'json', 'mp4', 'dv', 'flac')):
            if os.path.basename(old_file_path).endswith('_em'):
                        new_file_path =  os.path.join(source_directory, cms_id, 'EditMasters')
            elif ... ('_pm'):
...
            elif .. ('_sc'):
...
            os.makedirs(new_file_path)
            shutil.move(old_file_path, new_file_path)

Add newer Kodak Edge Codes to some form of documentation

Official Kodak guide only goes up to 2016, I think.. We need a way to document newer years.
Possible Kodak Edge Code dates for recent years (unconfirmed as of 2022 via Mark Toscano / AMIA-L):
2020 = KN
2021 = NM
2022 = DK

(confirmed?)
2017 = ET
2018 = AM
2019 = SN

@rdmarino / @cgmcnamara - might be worth discussing for future film docs updates (not that we get many new prints..)

More metadata about payload

copied from issue on ami-specs https://github.com/NYPL/ami-specifications/issues/19

Define a folder and file name scheme for any metadata about digitized objects that should not go into the payload. Examples include QC Tools reports, extracted timecode, ffmpeg logs.

Proposal:
Top level directory named metadata. Files in the directory should have the same name as the file they are related to, up until the extension, e.g. abc_123456_v01_pm.mkv would have a qctool report named abc_123456_v01_pm.qctools.tar.gz. Checksums for any files in the metadata are optional and should be written to the tagmanifest (as part of bagit spec)

Example:

581608
├── bag-info.txt
├── bagit.txt
├── data
│   ├── PreservationMasters
│   │   ├── myt_581608_v01_pm.json
│   │   └── myt_581608_v01_pm.mkv
│   └── ServiceCopies
│       ├── myt_581608_v01_sc.json
│       └── myt_581608_v01_sc.mp4
├── manifest-md5.txt
├── metadata
│   ├── myt_581608_v01_pm_rp188any_frame_timecodes.txt
│   └── myt_581608_v01_pm.qctools.tar.gz
└── tagmanifest-md5.txt

Fix broken links (404) on 'Outsourced Digitization' page

On https://nypl.github.io/ami-preservation/pages/mps/outsourced-digitization.html:

Under 'Project Creation'

Under 'Project Preparation and Logistics'

Under 'Project tracking'

Under 'Project Close-Out,' 'Quality Control'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.