Problem MobilityData’s heard a number of pains fr

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for providing that critical context <a class="user-mention notranslate" data-ho

Thanks for providing this feedback <a class="user-mention notranslate" data-hovercard-

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

After looking into some other examples (like the <a href="https://gtfs.org/schedule/be

From <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Move Dataset Publishing and General Practices from Best Practices to the spec about transit HOT 10 OPEN

emmambd commented on May 28, 2024 1

Move Dataset Publishing and General Practices from Best Practices to the spec

from transit.

Comments (10)

antrim commented on May 28, 2024 5

Hi @emmambd and @isabelle-dr ! Thanks for moving this all forward! We (Trillium/Optibus) discussed this internally more. First, as I led the GTFS Best Practices working group, I thought it might be helpful to provide a bit of extra-institutional memory around the Best Practices.

The purpose of the Best Practices was to align industry interpretation around the Spec. As GTFS was acquiring broader use, we were encountering situations where trip planners, CAD/AVL vendors, and others had some different expectations and interpretations of the spec. This made problems for everyone — data consumers, GTFS producers (vendors and agencies), and transit riders. So we assembled together some prominent GTFS consumers and producers to agree on Best Practices.

It was always the vision that some of these Best Practices would make their way into the Spec reference and be subject to the larger governance process.

In the early days of MobilityData, we also discussed transitioning some of the Best Practices to a “how to” guide. So, BP that define what correct GTFS is would go into the reference. BP that define how to use the spec and provide examples would go to this “How To Guide”.

Trillium (@trilliumtransit) & Optibus (@Optibus) support the goal of aligned expectations/specs across data consumers. Misalignment creates issues for our business, other data producers, data consumers, and transit passengers. We support a process that moves obvious and well-supported Best Practices into the Reference document, discards Best Practices that have gone out of date or disagrees with the reference, and also revisits more complex Best Practices in discussion with the community — to see if they should be reformulated or moved to a different document.

from transit.

emmambd commented on May 28, 2024 1

Thanks for providing that critical context @antrim! We definitely want to start this process by focusing on increasing clarity and removing duplication. Based on all this feedback, I'm thinking that we alter the proposed scope for this first iteration in #375 to cover:

Harmonizing/merging best practices that conflict with the field requirement level severity in the spec — meaning that the requirement level in the spec should be updated to "Recommended" where it's currently Optional (or sometimes not in the spec at all).
Merging any individual best practices from the Dataset Publishing section into the spec that we know are widely used. The main one we've heard about from the community so far is adding At any time, the published GTFS dataset should be valid for at least the next 7 days.

I started an audit of best practices that fit this scope and suggested improvements that the community is welcome to give feedback on. We'll be talking about it internally and planning to move forward with a PR based on it in the next few weeks.

from transit.

westontrillium commented on May 28, 2024

I think the community would definitely benefit from an increased visibility of Best Practices. Including them in the transit/gtfs GH repo seems to me to be the most obvious way to do that–in fact I can’t seem to find them referenced anywhere at all there. However, I am not sure embedding them in the reference.md file is the right approach, if that is what is being suggested. I worry about making that page more bloated and inaccessible with such a significant addition of text (longterm… I understand the scope of this proposal is just the Dataset Publishing & General Practices section, but that would presumably just be the first wave of merging all Best Practices).

Another consideration: What value do we get from having essentially four tiers of requirement–Required, Conditionally Required, Should/Recommended, and Optional? At this point, why not instead move to make Shoulds/Recommended into Conditionally Required (e.g., “should include agency phone if one available” becomes Conditionally Required if such a number is available)?

We should think globally and consider how these kind of changes may increase the barrier to entry for producers.

Are there other ways of increasing visibility of Best Practices in this space? Could the Best Practices live in a dedicated .md page? Can we actually reference the Best Practices in the reference.md page (e.g., “see Best Practice on X component here”)?

from transit.

emmambd commented on May 28, 2024

Thanks for providing this feedback @westontrillium! I definitely agree we should think about this change globally, and I appreciate the prompts to consider different solutions.

However, I am not sure embedding them in the reference.md file is the right approach, if that is what is being suggested. I worry about making that page more bloated and inaccessible with such a significant addition of text

We want the spec to be accessible, but the spec is also expanding more and more over time (e.g fares adoption, the community’s pending work on flex). The spec is going to get longer regardless of where the best practices live. Because of this, I think we need to separate out the spec itself (reference.md) from how the spec is rendered. We can find more accessible ways to render the spec on https://gtfs.org while still allowing the spec to expand.

One solution MobilityData is exploring to render the spec in an easier-to-use way:

Defining components of GTFS (e.g core GTFS for required files, pathways, translations) so documentation readers can more easily find the use cases they care about and how to model them in GTFS.

One result of this could be a dynamic interface on https://gtfs.org/ where I can choose what use cases I want to represent (e.g I want GTFS basic requirements, text-to-speech and pathways), and then everything irrelevant to me is filtered out.

Working on improving the rendering would be out of scope for #375, but MobilityData would expect to work on it in parallel to adding the best practices to the spec so we can improve the spec’s accessibility.

There’s also some usability issues with keeping the best practices and spec separate. For example:

I want to create a GTFS dataset. Let’s start with agency.txt.

I go to https://gtfs.org/schedule/reference/#agencytxt and read the agency_id description

Conditionally Required:

Required when the dataset contains data for multiple transit agencies.

Cool, I only have one agency so I am not including this, moving on.

I notice that the Best Practices exists

I go to https://gtfs.org/schedule/best-practices/#agencytxt and I read, in the agency_id description

Should be included, even if there is only one agency in the feed.

Ugh, I guess I should change that!

Since there are so many per-file best practices, this flow seems unintuitive, even for linking to best practices in the spec. Ideally the vision is that we’d have one source of truth (reference.md) that we can render in different ways for user types that want to see a simpler version of the spec.

from transit.

isabelle-dr commented on May 28, 2024

One thought regarding the requirement tiers:

What value do we get from having essentially four tiers of requirement–Required, Conditionally Required, Should/Recommended, and Optional? At this point, why not instead move to make Shoulds/Recommended into Conditionally Required (e.g., “should include agency phone if one available” becomes Conditionally Required if such a number is available)?

My understanding is that Must/Required and Conditionally Required are essentially one tier, which relates to data validity and they trigger ERRORS in the Canonical GTFS Schedule Validator.
There are two additional tiers that relate to data quality: Should/Recommended and Optional:

The Should/Recommend tier deals with minor issues (e. g. route_desc should not be a duplicate of route_short_name) and it triggers WARNINGS.
The Optional is related to completeness (e. g. adding wheelchair accessibility fields makes the data more complete), and it's also used for fields that are not always applicable.

I recognize this is a generalization and there are some exceptions.

Migrating the Should to Conditionally Required would have significant implications for existing datasets and we should probably discuss this in a separate issue.

from transit.

evantrillium commented on May 28, 2024

@emmambd I think your example is the reason why we ought be thinking about why the shoulds aren’t considered required or conditional, and why simply adding them to the spec itself (instead of siloed in Best Practices) creates more confusion.

In the case of wanting to build agency.txt, the spec is explicit that agency_id is optional for single agency datasets. So I don’t understand why we want a producer to then experience “oh, I am supposed to use this.” Effectively, that means the spec’s optional descriptor is meaningless if the shoulds override it.

@isabelle-dr I definitely agree this is outside this discussion, but I think this is why it’s worth considering this question before simply adding more confusing elements to the spec. Again, referencing agency_id for a single agency dataset, 4.0 validator does not flag this as a warning, so it does not fall into the realm of minor issues and would not warrant a should.

I imagine there are other scenarios where the breakdown of R/CR, Shoulds (to avoid warnings), and Optional (for data completeness) break down besides the example of agency_id (oh hey, blocks.) I think having this kind of clear, unambiguous routing of requirements makes sense and ought to be pursued instead of merging best practices into the spec itself. Doing so will likely surface some other examples where we can clean up wording and better define requirements to create a holistic, accessible, navigable spec that works in tandem with the validator (there's a huge written gap between the spec and the validator too we can address.)

from transit.

emmambd commented on May 28, 2024

@evantrillium Thanks for expanding on the agency_id example and clarifying why retaining the shoulds would be confusing in this instance. You're right where this is an example where it's important to change the severity from Conditionally Required to Required, since it 1) significantly improve the readability of the spec 2) wouldn't cause backwards compatibility issues.

I think having this kind of clear, unambiguous routing of requirements makes sense and ought to be pursued instead of merging best practices into the spec itself.

I'm confused why we wouldn't want to pursue both, since merging them increases visibility and clarifying the requirement level increases readability. Doing both seem to help improve data quality (though I agree reviewing requirement levels is a must for merging the per-file requirements). The next iteration of this work after #375 could focus on discrepancies between requirement tiers across the best practices and the spec.

Are there any examples of where having best practices separated from the spec reduces confusion for producers? Or perhaps where we should reconsider the requirement level of any best practices within the scope of Dataset Publishing and General Practices?

from transit.

emmambd commented on May 28, 2024

After looking into some other examples (like the timepoint best practice in stop_times.txt or the block_id best practice in frequencies.txt), we're envisioning that a part of this merging work would be to add a Recommended requirement level to the Presences in cases where modifying the best practice to be Conditionally Required or Required is not possible/easy to understand.

This would look like the following for the agency_id case:

agency_id (Conditionally Required): Identifies a transit brand which […]. Should be included, even if there is only one agency in the feed.
Conditionally Required: Required when the dataset contains data for multiple transit agencies.
Recommended otherwise

Regardless, we agree this kind of harmonization work is key between the best practices and the spec to improve readability and merging best practices into the spec without this would just increase confusion. If there are places where this is relevant to #375, we would include the Recommended requirement level now. If not, it would wait until the next iteration of this work.

from transit.

e-lo commented on May 28, 2024

From @evantrillium

In the case of wanting to build agency.txt, the spec is explicit that agency_id is optional for single agency datasets. So I don’t understand why we want a producer to then experience “oh, I am supposed to use this.” Effectively, that means the spec’s optional descriptor is meaningless if the shoulds override it.

I'd argue that having the shoulds in a completely separate place is even more confusing.

The user experience is essentially:

Ah, I don't need to build agency.txt so I'm going to leave it out of my software implementation
Client is upset that agency.txt is not there b/c its use is widespread...points them to the best practices documentation
Ohhh, there is a whole other set of suggestions that I should have been doing all along?

I'd argue that replacing a MAY with a SHOULD (which is ideally synchronous with adding a warning to the validator) is backwards compatible because it does not produce an ERROR and would still be considered valid GTFS.

Put another way - you shouldn't get a warning in the validator for something that isn't a SHOULD in the GTFS spec.

from transit.

evantrillium commented on May 28, 2024

Hi @e-lo

We are in alignment on a consolidated reference - my example was specifically about agency_id (not the entirety of agency.txt) being an Optional element for single-agency feeds, but a Recommended element for all feeds per the BPs. @emmambd’s recommendations in #376 are a great step towards bringing the spec, the BPs, and the validator all in tighter sync which Trillium fully supports.

from transit.

Move Dataset Publishing and General Practices from Best Practices to the spec about transit HOT 10 OPEN

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent