Giter VIP home page Giter VIP logo

Comments (7)

carlfredl avatar carlfredl commented on May 27, 2024 1

In Europe the NAP model has solved this issue in a handful of countries so far. The political effort that went into clarifying the source of truth in each of these is commendable and a huge achievement. It might be helpful to provide guidance on how that has been achieved places like Austria, Netherlands, and Norway.

Meanwhile many NAPs including Germany are hosting numerous overlapping datasets. Other entire NAPs are offline or lacking transit data entirely. In addition to what @evansiroky shared, this all seems to indicate we shouldn't only depend on centralized management being achieved universally, either from a regulatory or resourcing standpoint.

from transit.

westontrillium avatar westontrillium commented on May 27, 2024 1

I don’t know about calling this a best practice as there are factors that may not make this the recommended course of action, and I’m skeptical that advocating for the use of an agency’s official URL would do much to guarantee more stability. Whether for organizational or funding reasons, sometimes an agency’s URL does not match their name (e.g., “ECCOG” vs “Outback Express”), agencies rebrand and change their name or URL, procure new websites or merge with others. An agency may choose not to publish GTFS with their agency domain for a variety of reasons. Trillium publishes feeds at data.trilliumtransit.com, oregon-gtfs.com, etc., many for small agencies or cities that don’t have the capability to publish data at their own domain or their website content management system might pose barriers (e.g., an unavoidable automation that changes the suffix every time a new file is uploaded). Establishing the use of agency domains as a best practice seems a bit restrictive given the breadth of circumstances an agency might be under that steer them toward a different approach. I can understand advocating for as much agency control as possible over how and where their data is published, but that doesn’t really seem to be the topic of this discussion.

Looking at the user story…

…avoid having to update our database of which URL to download a transit agency’s feeds from,
So that I can consistently download each transit agency’s most up-to-date data even if they change their internal GTFS publishing process

The direct factor in avoiding having to constantly update a database’s fetch URLs is simply that those URLs don’t change, regardless of what the domain might be. But this is already a best practice; those agencies (and vendors) with constantly changing URLs are just not following it. Apart from these cases, though, URLs still change for a variety of legitimate reasons. So is there an alternative to mitigate this pain point other than by creating an additional best practice? Perhaps this is where the establishment of something like https://database.mobilitydata.org/ as a single source of truth could come into play…?

I would also be interested to hear from other consumers on this pain point.

from transit.

skinkie avatar skinkie commented on May 27, 2024

Does not make sense to me. Since the agency does not have to be the initiator of the GTFS publication in the first place. What you want is called a "National Access Point" where dataproviders are mandatory to register their dataset with the available metadata, works well in Europe, makes sence in the rest of the world.

from transit.

GraemeLeighton00 avatar GraemeLeighton00 commented on May 27, 2024

+1 This will be very useful!

from transit.

e-lo avatar e-lo commented on May 27, 2024

Since the agency does not have to be the initiator of the GTFS publication in the first place.

That's why it would be best practice not a requirement. I agree that this would be useful not only for URL stability issues but also for many of the issues I've heard about making sure to be using the "official" schedule that the agency wants you to...since there are quite a few that have more than one floating around.

from transit.

skinkie avatar skinkie commented on May 27, 2024

I would like to avoid having to update our database of which URL to download a transit agency's feeds from,
So that I can consistently download each transit agency's most up-to-date data even if they change their internal GTFS publishing process.

My reasoning is that your user-story should be resolved in a better way, not by scraping agency websites.

from transit.

evansiroky avatar evansiroky commented on May 27, 2024

Hello @skinkie. Thanks for your feedback. In this case my organization (The State of California) maintains a similar thing as you describe as a national access point. We maintain our own list of GTFS datasets which we publish here: https://data.ca.gov/dataset/cal-itp-gtfs-ingest-pipeline-dataset/resource/e4ca5bd4-e9ce-40aa-a58a-3a6d78b042bd

We manually maintain those URLs as best as we can because this our only option at this time. We frequently run into issues of having outdated data because we are not required to be notified by the transit agencies when they update their data. While there is now a mandate in our country for most transit agencies to provide their URLs, they are only mandated to do so for GTFS Schedule data and not realtime. They also only report this once a year at most to a federal agency. Furthermore, the availability of these URLs from the federal agency is something we are uncertain whether we will have access to.

At this time, the creation of a mandate to have URLs reported is outside of our control. And even if there were a mandate, there may be transit agencies that forget to provide their most up-to-date URL when they change vendors. Or they may only report it as a requirement once a year thus creating a potentially large gap of time between when they change their URLs. And on top of that, there may be an additional gap between when the data is reported and when the agency that the URL is reported to makes the reported URLs available other organizations such as ours.

Given all of this, I still recommend creating this best practice to aid with feed aggregators and entities producing a national access point, but also for direct data consumers as well.

from transit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.