trillium-solutions / gtfs-feed-archive Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 1.0 154 KB

GTFS (general transit feed) archive utility

Clojure 84.24% Shell 1.43% Perl 14.33%

gtfs-feed-archive's People

Contributors

Stargazers

Watchers

gtfs-feed-archive's Issues

create script to launch the server background as a daemon

Using https://github.com/strongh/lein-init-script

web site page: configure CSV data sources

proxy error with ODOT private archiver

@ed-g -- When attempting to download a full archive from the ODOT private portal, users are consistently recieving this error message. Could you take a look when you have a minute?

Let me know if you need any other info.

http://archive.oregon-gtfs.com/oregon-private-feeds/archive-target

web site page: show errors / problems or debugging log.

Integrate with the timbre library so we can capture log messages to show them in the web interface as well as writing to a file, or printing to console.

Reliability improvements

Reliability improvements for archive.oregon-gtfs.com.

Skip feed urls which cannot be fetched, report error but allow ZIP archive download.
More convenient logging and error reporting. YES
Internal upgrade: create a database storage engine instead of in-memory for everything. This allows much easier debugging for me, and would make the software more modular and easy to maintain.
Change agent-based download mechanism to a pipe line system using either core.async or simple work queues. Again the benefit is debugging.

web site page: download archive

If archive file is available send its contents with status 200 OK.

If archive file has been requested but is not yet available, return 204 No Content.

If the archive file does not exist and has not been requested, return 404 Not Found.

cache persistance for downloaded zip files

Load/save cache manager to an EDN file, so we can remember what we've already downloaded.

Running download agents should not be persisted.

When loading we should verify that referenced files actually exist.

This will create a need to expire old/unnecessary cache entries, since otherwise the cache will just keep growing... :-)

cache-manager should guarantee only one running download per unique feed-name.

That way multiple download won't clobber each other.

feed location URL in last_updates.csv?

I am interested in adding some information to the zip file of zip files;
data location URL, perhaps as a third column in the last-updates-csv file?

Fix HTTPS download

For example, https://www.miapp.ca/GTFS/google_transit.zip

It looks like the download never starts, or it times out.

give a warning if not all GTFS feeds can be fetched

Give a warning if not all files from the GTFS list can be downloaded, and if the user elects to continue anyway, produce a subset of the feeds as Oregon-GTFS-feeds-INCOMPLETE-date.zip

This seems better than producing no archive file if a feed is unfetchable.

Of course when they go to their feeds page the broken feeds will show up but they may not have checked recently.

web site page: generate archives

Either for all GTFS feeds, or only those changed since a certain date. Drop the archives in the users' archive download directory.

[last modified] date of each feed

GTFS Archive Tool needs to run automatically on a regular basis

We need for agency name, feed name, feed URL, and last modified date
to be available either in a CSV or from an API

Capturing a note.

I think the ZIP file download system is clunky for purposes except truly making an archive.

What we want is a way to query the historical versions of feeds contained in the archive and download their data via REST.

How about an endpoint that gives:

Feed name
Last modification-time
Last check time
Link to gtfs-api, which will give you the feed version there, provided the import is up to date
Download link to feed directly from gtfs-archive. I suggest this since URL changes depending on which version we're talking about, and there's no guarantee older download URLs will work.

compare last-modified dates in download agent so we don't grab files twice

Compare last-modified dates of file in the cache-manager. Function (already-have-fresh-feed? feed-name date) which will check based on the cache refresh interval.

If so the download agent should have a "successful" state with file-saved => true, however it should change its file name to be the same as the existing finished download.

oregon-gtfs.com error

this link is published on oregon-gtfs.com
http://archive.oregon-gtfs.com/oregon-public-feeds/archive-creator

it yields an error message

cc @ed-g

spotted by ODOT

download agent should start with a :download-directory and only later determine a :download-file

The :download-file name depends on the modification time of the feed file.

Therefore until the feed is reachable, we won't know what to call it. Feeds where the network was down when the download agent was started were getting file names with no date. Instead, we can just wait until we grab the file, and then use its modification date.

trillium-solutions / gtfs-feed-archive Goto Github PK

gtfs-feed-archive's People

Contributors

Stargazers

Watchers

gtfs-feed-archive's Issues

Recommend Projects

Recommend Topics

Recommend Org