Giter VIP home page Giter VIP logo

code-gov-harvester-deprecated's Introduction

Code.gov Harvester

Harvester to process agency code.json files.

DEPRECATION WARNING

This repository is considered deprecated and will be archived. For the new version of this tool please go to GSA/code-gov-harvester.

Running

  1. Clone repo: $> git clone [email protected]:GSA/code-gov-harvester.git
  2. Move into the project directory
  3. Install npm modules: $> npm install
  4. Run index.js: $> node index.js

Generated Files

Three files will be generated:

  1. harvester.log: harvester log file
  2. data/release.json<timestamp>: JSON file with all the released projects found in each agency's code.json.
  3. data/releaseIndex.json<timestamp>: JSON file with the created LunrJS index

code-gov-harvester-deprecated's People

Contributors

danieljdufour avatar ricardoareyes avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

code-gov-harvester-deprecated's Issues

Add repo topics

It would be awesome if this repo could include a code-gov topic to make it easily discoverable along other code.gov related utilities on GitHub.com.

Topics are now an even more prominent part of the discovery experience of GitHub.com (https://github.com/topics), and are being used to help users discover projects they may be interested in (https://github.com/updates). Creating a centralized place for code.gov repositories may also help in reducing duplication of effort as well.

Also, that would then make them more easily searchable on https://code.gov... how meta.

Discrepancies with agency releases on code.gov platform

The agencies code.json file releases are not 100% reflected on the code.gov platform. There are discrepancies governmentWideReuse and openSource projects on the code.gov website which agencies have reported on their code.json files.

example:

Agency  # of Releases on JSON OSS GovWideReuse # EXPECTED on Web #web * # of Discrepancy
Department of Energy (DOE) 711 711 0 711 704 *7
General Services Administration (GSA) 1514 1377 145 1522 1500 *22
National Aeronautics and Space Administration (NASA) 1010 531 715 1246 1006 *240

DOE, GSA, and NASA are showing discrepancies 7, 22, 240 respectively, as noted above with *.

It appears that the harvester algorithm to check for duplicate release titles is causing the issue due to special characters on field value.

Add additional repo information from integrations

Add additional integrations to get more data for a specific repo. For now, this will be integrating Github to the harvester. The info will be added per project repo.

Eg.

{
    "releases": {
        "ProjectRepo": {
            ...
            "githubInfo": {},
            "collaborators": []
        }
    }
}

Add Reporting

Summary

The API generates a report.json which feeds the compliance dashboard. The harvester needs to create the report.json in the same way as the API.

Steps

Implement Report features
Generate report while validating the code.json
Add issues to report
Add a agencyMetadata.json file to the project.

Change Agency names

Some agencies have requested us to update the names that show on the compliance dashboard:

  • Update NARA's name to National Archives and Records Administration
  • Update Treasury to Department of the Treasury

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.