nypl / ami-preservation Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 3.0 5 MB

Repo for NYPL's Audio and Moving Image Preservation Unit. Documentation Site:

Home Page: https://nypl.github.io/ami-preservation/

Python 96.82% Shell 3.18%

ami-preservation's People

Contributors

Stargazers

Watchers

Forkers

nkrabben jyw321 dcallehdz

ami-preservation's Issues

rename master to main

reasons for doing this are pretty clear; also the change has already been made at ami-tools.

Instructions here:
https://github.com/github/renaming

Add transfer checker to snowball / Glacier migration documation

echo "Total transferred: " ; snowballEdge describe-device --profile MPS-XFR3 | jq '.DeviceCapacities[] | select( .Name == "HDD Storage") | (.Total - .Available)/1e12'

co-create docs for AWS migration

@nkrabben

Add to Google Sheets resources: Filter and list lines that contain a specific value

=FILTER(D:D,SEARCH("invalid",D:D,1))

example: column D (above) contains 10000 lines of text; need to filter out every line that describes an invalid file
e.g. "abc_123456_v01_pm.json is invalid" in order to get a list of every invalid json file. This will search column D and drop a list into the cells below where this formula is entered.

mounting read only

put the actual command here
https://nypl.github.io/ami-preservation/pages/qualityControl/qc-workflow.html#mounting-drives-read-only

move / copy image specs to Specs page

https://nypl.github.io/ami-preservation/pages/ami-handling.html#images

split projects moves isos to tags

add awscli-tools to media migration workflow docs

https://github.com/NYPL/awscli_tools

Add -65db to trimming specs

bag spec changes

add in-house eavie sync command

aws s3 sync --exclude '*' --include '*_sc*' --include '*_em*' /source/path/ s3://ami-carnegie-servicecopies/

mediaconch: SC audio should allow silent

change to allow either 0 or 2 audio channels

Update links on Quality Control Workflow page

On https://nypl.github.io/ami-preservation/pages/mps/qc-workflow.html,
all of the following links essentially go here: https://github.com/NYPL/ami-preservation

Under 'Mounting Drives Read-Only'

The most important step during QC is to mount your drive(s) Read-Only.

Under 'Media specification validation (MediaConch)' (2 links)

Use either MediaConch CLI or MediaConch GUI to make sure files meet NYPL specifications.

Under 'Generating a QC list'

Use Terminal to generate a QC list for each drive you are QCing by following the steps outlined here.

Under 'Content Inspection'

Content inspection can be completed either on ICC or on the drive by following the steps outlined here.

Under 'SPOT CHECKING CONTENT & JSON'

Use a text-editor (Atom / Notepad / Text Edit etc.) to open and inspect JSON files.

Under LOGGING QC FAILURES & FLAGS

Use this list of definitions to review and mark-off the items listed in the QC log

Under 'MEDIA INGEST PREPARATION'

Follow these steps to prepare media for ingest.

endless loop cassette protocol

Add to specs: protocol for digitizing endless-loop cassettes (i.e. cartridges, compact cassettes etc.):

digitize for several minutes until it seems evident that the entire content has been captured, then trim the PM to the beginning and end of content and make EM match PM.
if one cannot determine beginning / end, determine max duration of tape and use that as a bases for capture.

Provide guidance on interpreting ami-tools error messages

ERROR - There is no media file found at /Volumes/lpasync/!-PAMI/InHouse/2022_004_pami_222_scl5356/512856/data/EditMasters/scb_512856_v01f01s02_em.wav

The above error was encountered because the assetReferenceFilename in the JSON metadata listed the filename as 'scb_512856_v01f01s02_em.wav' but the filename was actually 'scb_512856_v01f01s02_em.flac'. The error message is correct, but one must know that the tool is looking for the filename as listed in the JSON metadata, to confirm a match.

Add file request ServiceNow link to docs site

create video capture prep page

use child page template

Add to resources: Google Sheets 'filter/search' for partial text match

example, to create a filter to search for the term 'invalid' in column 'A', where column A has a header in cell A1 (partial match, so it will find any cell with this word):

=filter(A2:A,search("invalid",A2:A))

Minidiscs don't require .cue files

Update specs to clarify this
per vendor meeting notes: https://docs.google.com/document/d/1fQtxKQKPestGvwcx9pO8MUJ5ncNmvvN6LWyscn5_Dko/edit?disco=AAAAK7yAJCY

BWF metadata in FLAC

Hi all,

For about 6 months now I'm the 'maintainer' of the reference FLAC implementation at https://github.com/xiph/flac I get a lot of feedback from users, but almost exclusively from people maintaining their own, private, relatively small audio collection, and I'd like some feedback from the archiving community.

Because of that I visited the NTTW conference this week. I was pointed by Andrew Weaver to this particular github repo. I'm not sure an issue at this repository is the best place for such feedback, if not, perhaps we can get in touch?

The reason I reach out beyond the community I can currently reach is because of the '--keep-foreign-metadata' option in FLAC, which enables storing BWF metadata (and other WAV metadata) in a FLAC file. I would like to improve on it, but am unsure how. Besides that, you might have some pointers for me on specific improvements that might be beneficial for your use cases.

Thanks in advance

Command for "To make a Prores 422 HQ from an uncompressed or lossless source" has incorrect audio codec

currently -c:a copy but should be -c:a pcm_s24le

https://nypl.github.io/ami-preservation/pages/resources.html#transcoding-and-packaging-for-in-house-video-files

Errors with multiple tag files

ami-preservation/pami_scripts/split_projects_into_objects.py

Line 81 in 5a172d1

os.makedirs(tag_dir)

This code errors if the folder already exists (e.g. multiple tag files)

Change to
os.makedirs(tag_dir, exist_ok=True)

how to fix comma issues in JSON

leading commas: find path/to/dir/of/json -type f -iname '*.json' -exec gsed -i 's/{,/{/g' {} \;
trailing commas: find /Volumes/NYPL454416/jsonCommaIssue_audio/ -type f -iname '*.json' -exec any-json --input-format=hjson {} {} \;

i.e. "}," and when there are too many commas after objects

make TOC heirarchy consistent across all media types

TOC includes media groups only for film.

add detail to QC Workflow overview (specific checks)

Create new file path based on filerole instead of current file path

ami-preservation/pami_scripts/split_projects_into_objects.py

Line 49 in 5a172d1

new_file_path = os.path.join(source_directory, cms_id, file_path)

Sometimes files are placed in the wrong folder, e.g. example_sc.json is put in PreservationMasters/

split_projects could potentially address this by moving files based on their file name.

Instead of

new_file_path = os.path.join(source_directory, cms_id, file_path)
...
if old_file_path.endswith(('mkv', 'json', 'mp4', 'dv', 'flac')):
            shutil.move(old_file_path, new_file_path)

it could be

if old_file_path.endswith(('mkv', 'json', 'mp4', 'dv', 'flac')):
            if os.path.basename(old_file_path).endswith('_em'):
                        new_file_path =  os.path.join(source_directory, cms_id, 'EditMasters')
            elif ... ('_pm'):
...
            elif .. ('_sc'):
...
            os.makedirs(new_file_path)
            shutil.move(old_file_path, new_file_path)

add audio configuration grid to specs / QC resources

https://docs.google.com/document/d/1gwb2iZNyd17F1fwAif3YkiaFKIe-D5Gu3Q2HsxGH2mY/edit

add --all-errors etc. to JSON validation cheat sheet

ajv validate --all-errors --multiple-of-precision=0 --verbose -s

...

Remove testing code from ReadMe.md

Fix typos on Outsourced Digitization page

On https://nypl.github.io/ami-preservation/pages/mps/outsourced-digitization.html

Change 'wrapp' to 'wrap'
Change ';' to '.' at end of same sentence

Add newer Kodak Edge Codes to some form of documentation

Official Kodak guide only goes up to 2016, I think.. We need a way to document newer years.
Possible Kodak Edge Code dates for recent years (unconfirmed as of 2022 via Mark Toscano / AMIA-L):
2020 = KN
2021 = NM
2022 = DK

(confirmed?)
2017 = ET
2018 = AM
2019 = SN

@rdmarino / @cgmcnamara - might be worth discussing for future film docs updates (not that we get many new prints..)

add pre-acquisition digitization QC workflow / qc form link

https://docs.google.com/document/d/1z8_nLlbzcZYB7ShSRnnv-OQxlCo37Dy0jDtSD2Zfets/edit - QC
https://docs.google.com/document/d/1WwjIyBmBgH48jTYqu5OG_OpNqjRrDnHy_3ZNMDj1xXo/edit# - draft workflow

add container deaccession cms protocol

https://docs.google.com/document/d/18JMgQzlgBeeFIHlRrYYcX_fr42QSACoS/edit

Add film triage practice: split out 35mm mag track into separate containers

add info about isolyzer

https://github.com/KBNLresearch/isolyzer

specifications: TOC links incorrect

example:

video media deliverables links to film section

check all.

More metadata about payload

copied from issue on ami-specs https://github.com/NYPL/ami-specifications/issues/19

Define a folder and file name scheme for any metadata about digitized objects that should not go into the payload. Examples include QC Tools reports, extracted timecode, ffmpeg logs.

Proposal:
Top level directory named metadata. Files in the directory should have the same name as the file they are related to, up until the extension, e.g. abc_123456_v01_pm.mkv would have a qctool report named abc_123456_v01_pm.qctools.tar.gz. Checksums for any files in the metadata are optional and should be written to the tagmanifest (as part of bagit spec)

Example:

581608
├── bag-info.txt
├── bagit.txt
├── data
│   ├── PreservationMasters
│   │   ├── myt_581608_v01_pm.json
│   │   └── myt_581608_v01_pm.mkv
│   └── ServiceCopies
│       ├── myt_581608_v01_sc.json
│       └── myt_581608_v01_sc.mp4
├── manifest-md5.txt
├── metadata
│   ├── myt_581608_v01_pm_rp188any_frame_timecodes.txt
│   └── myt_581608_v01_pm.qctools.tar.gz
└── tagmanifest-md5.txt