Giter VIP home page Giter VIP logo

gtfs-feed-archive's People

Contributors

ed-g avatar kmvoorhees avatar lisaattrillium avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gtfs-feed-archive's Issues

Reliability improvements

Reliability improvements for archive.oregon-gtfs.com.

  • Skip feed urls which cannot be fetched, report error but allow ZIP archive download.
  • More convenient logging and error reporting. YES
  • Internal upgrade: create a database storage engine instead of in-memory for everything. This allows much easier debugging for me, and would make the software more modular and easy to maintain.
  • Change agent-based download mechanism to a pipe line system using either core.async or simple work queues. Again the benefit is debugging.

web site page: download archive

If archive file is available send its contents with status 200 OK.

If archive file has been requested but is not yet available, return 204 No Content.

If the archive file does not exist and has not been requested, return 404 Not Found.

cache persistance for downloaded zip files

Load/save cache manager to an EDN file, so we can remember what we've already downloaded.

Running download agents should not be persisted.

When loading we should verify that referenced files actually exist.

This will create a need to expire old/unnecessary cache entries, since otherwise the cache will just keep growing... :-)

give a warning if not all GTFS feeds can be fetched

Give a warning if not all files from the GTFS list can be downloaded, and if the user elects to continue anyway, produce a subset of the feeds as Oregon-GTFS-feeds-INCOMPLETE-date.zip

This seems better than producing no archive file if a feed is unfetchable.

Of course when they go to their feeds page the broken feeds will show up but they may not have checked recently.

web site page: generate archives

Either for all GTFS feeds, or only those changed since a certain date. Drop the archives in the users' archive download directory.

[last modified] date of each feed

  • GTFS Archive Tool needs to run automatically on a regular basis
  • We need for agency name, feed name, feed URL, and last modified date
    to be available either in a CSV or from an API

Capturing a note.

I think the ZIP file download system is clunky for purposes except truly making an archive.

What we want is a way to query the historical versions of feeds contained in the archive and download their data via REST.

How about an endpoint that gives:

  • Feed name
  • Last modification-time
  • Last check time
  • Link to gtfs-api, which will give you the feed version there, provided the import is up to date
  • Download link to feed directly from gtfs-archive. I suggest this since URL changes depending on which version we're talking about, and there's no guarantee older download URLs will work.

compare last-modified dates in download agent so we don't grab files twice

Compare last-modified dates of file in the cache-manager. Function (already-have-fresh-feed? feed-name date) which will check based on the cache refresh interval.

If so the download agent should have a "successful" state with file-saved => true, however it should change its file name to be the same as the existing finished download.

Ignore CSV entries with no URL

Some agencies don't provide a public download link but the data is available after filling out a form. We should just skip entries that don't have a download URL.

Handle servers which don't provide modification-time

Download the file if they don't provide modification-time, then compare against existing archives to see if its the same file we already have: if it compares the same, then pretend as though it had the same modification time.

We could also short-cut by checking the file size. It's not guaranteed to change when the feed does, but it probably will. Then download, say, once per week to make sure.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.