Giter VIP home page Giter VIP logo

digitalarchive's Introduction

This client no longer functions -- the Wilson Center has disabled API access to the Digital Archive

Digital Archive

PyPI codecov PyPI - Python Version Documentation Status

A Python client for the Wilson Center's Digital Archive ("DA") of historical primary sources. This library provides an ORM for searching and accessing documents and other resources in the Digital Archive.

Installation

The client is available on pypi. It requires python 3.7+.

pip install digitalarchive

Usage

>>> import digitalarchive

# Search for documents:
>>> soviet_docs = digitalarchive.Document.match(title="soviet").all()

# Collections and other resource types are also searchable.
>> soviet_collections = digitalarchive.Collection.match(name="soviet")

# Grab a single, specific document:
>>> document = digitalarchive.Document.match(id="112566").first()

# Pull transcripts, translations, and original scans of documents:
>>> document.hydrate()
>>> document = test_doc.transcripts[0].html

# Pull the metadata and other assets for an entire resultset.
>>> chernobyl_docs = digitalarchive.Document.match(title="chernobyl")
>>> chernobyl_docs.hydrate()
>>> chernobyl_docs.all()

# Or just download all the documents!
>>> all_documents = digitalarchive.Document.match().all()

Complete documentation for the client and the Digital Archive's models are available here.

Disclaimers

  • This is an unofficial library. I am not presently affiliated with the Wilson Center. I understandthat the API is unlikely to change in the near future, but I cannot guarantee that this library won't break without warning.
  • If you plan to scrape the DA, please be respectful.

Planned Features

  • Support for searching by date range.
  • Asynchronous hydration of large result sets.
  • For Collections, inlcude keyword hits in short_description for searches. (modify collection searches to use the record.json instead of collection.json endpoint.

digitalarchive's People

Contributors

dependabot[bot] avatar epikulski avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

mirror-dump

digitalarchive's Issues

Type models not parsed during hydration.

When a recordset is hydrated, the type and right attribute on each document is a dict rather than an instance of digitalarchive.models.Type or digitalarchive.models.Right

Example:

    >>> from digitalarchive import Document
    >>> docs = Document.match(description="Taiwan Strait Crisis")
    >>> docs.hydrate()
    >>> docs.list[0].type
    [{'id': '34', 'name': 'Telegram'}]
    >>> docs.list[0].rights
    {'id': '4', 'name': 'CWIHP', 'rights': 'The Cold War International History Project welcomes reuse of Digital Archive materials for research and educational purposes. Some documents may be subject to copyright, which  is retained by the rights holders in accordance with US and international copyright laws.  To enquire about this document&#39;s rights status or request permission for commercial use, please contact the Cold War International History Project at <a href="mailto:[email protected]">[email protected]</a>.'}

Make all models properly serializeable

Add a serialize method or other mechanism to easily convert DA models into JSON for storage by users or consumption by other applications.

Requirements:

  1. Each document model can be serialized:
  • Subject
  • Language
  • Transcript
  • Translation
  • MediaFile
  • Contributor
  • Donor
  • Coverage
  • Collection
  • Repository
  • Publisher
  • Type
  • Right
  • Classification
  • Document
  • Theme

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.