Comments (10)
Hi @emmambd and @isabelle-dr ! Thanks for moving this all forward! We (Trillium/Optibus) discussed this internally more. First, as I led the GTFS Best Practices working group, I thought it might be helpful to provide a bit of extra-institutional memory around the Best Practices.
The purpose of the Best Practices was to align industry interpretation around the Spec. As GTFS was acquiring broader use, we were encountering situations where trip planners, CAD/AVL vendors, and others had some different expectations and interpretations of the spec. This made problems for everyone — data consumers, GTFS producers (vendors and agencies), and transit riders. So we assembled together some prominent GTFS consumers and producers to agree on Best Practices.
It was always the vision that some of these Best Practices would make their way into the Spec reference and be subject to the larger governance process.
In the early days of MobilityData, we also discussed transitioning some of the Best Practices to a “how to” guide. So, BP that define what correct GTFS is would go into the reference. BP that define how to use the spec and provide examples would go to this “How To Guide”.
Trillium (@trilliumtransit) & Optibus (@Optibus) support the goal of aligned expectations/specs across data consumers. Misalignment creates issues for our business, other data producers, data consumers, and transit passengers. We support a process that moves obvious and well-supported Best Practices into the Reference document, discards Best Practices that have gone out of date or disagrees with the reference, and also revisits more complex Best Practices in discussion with the community — to see if they should be reformulated or moved to a different document.
from transit.
Thanks for providing that critical context @antrim! We definitely want to start this process by focusing on increasing clarity and removing duplication. Based on all this feedback, I'm thinking that we alter the proposed scope for this first iteration in #375 to cover:
- Harmonizing/merging best practices that conflict with the field requirement level severity in the spec — meaning that the requirement level in the spec should be updated to "Recommended" where it's currently Optional (or sometimes not in the spec at all).
- Merging any individual best practices from the Dataset Publishing section into the spec that we know are widely used. The main one we've heard about from the community so far is adding
At any time, the published GTFS dataset should be valid for at least the next 7 days
.
I started an audit of best practices that fit this scope and suggested improvements that the community is welcome to give feedback on. We'll be talking about it internally and planning to move forward with a PR based on it in the next few weeks.
from transit.
I think the community would definitely benefit from an increased visibility of Best Practices. Including them in the transit/gtfs GH repo seems to me to be the most obvious way to do that–in fact I can’t seem to find them referenced anywhere at all there. However, I am not sure embedding them in the reference.md file is the right approach, if that is what is being suggested. I worry about making that page more bloated and inaccessible with such a significant addition of text (longterm… I understand the scope of this proposal is just the Dataset Publishing & General Practices section, but that would presumably just be the first wave of merging all Best Practices).
Another consideration: What value do we get from having essentially four tiers of requirement–Required, Conditionally Required, Should/Recommended, and Optional? At this point, why not instead move to make Shoulds/Recommended into Conditionally Required (e.g., “should include agency phone if one available” becomes Conditionally Required if such a number is available)?
We should think globally and consider how these kind of changes may increase the barrier to entry for producers.
Are there other ways of increasing visibility of Best Practices in this space? Could the Best Practices live in a dedicated .md page? Can we actually reference the Best Practices in the reference.md page (e.g., “see Best Practice on X component here”)?
from transit.
Thanks for providing this feedback @westontrillium! I definitely agree we should think about this change globally, and I appreciate the prompts to consider different solutions.
However, I am not sure embedding them in the reference.md file is the right approach, if that is what is being suggested. I worry about making that page more bloated and inaccessible with such a significant addition of text
We want the spec to be accessible, but the spec is also expanding more and more over time (e.g fares adoption, the community’s pending work on flex). The spec is going to get longer regardless of where the best practices live. Because of this, I think we need to separate out the spec itself (reference.md) from how the spec is rendered. We can find more accessible ways to render the spec on https://gtfs.org while still allowing the spec to expand.
One solution MobilityData is exploring to render the spec in an easier-to-use way:
Defining components of GTFS (e.g core GTFS for required files, pathways, translations) so documentation readers can more easily find the use cases they care about and how to model them in GTFS.
One result of this could be a dynamic interface on https://gtfs.org/ where I can choose what use cases I want to represent (e.g I want GTFS basic requirements, text-to-speech and pathways), and then everything irrelevant to me is filtered out.
Working on improving the rendering would be out of scope for #375, but MobilityData would expect to work on it in parallel to adding the best practices to the spec so we can improve the spec’s accessibility.
There’s also some usability issues with keeping the best practices and spec separate. For example:
I want to create a GTFS dataset. Let’s start with agency.txt.
I go to https://gtfs.org/schedule/reference/#agencytxt and read the agency_id description
Conditionally Required:
- Required when the dataset contains data for multiple transit agencies.
Cool, I only have one agency so I am not including this, moving on.
I notice that the Best Practices exists
I go to https://gtfs.org/schedule/best-practices/#agencytxt and I read, in the agency_id description
Should be included, even if there is only one agency in the feed.
Ugh, I guess I should change that!
Since there are so many per-file best practices, this flow seems unintuitive, even for linking to best practices in the spec. Ideally the vision is that we’d have one source of truth (reference.md) that we can render in different ways for user types that want to see a simpler version of the spec.
from transit.
One thought regarding the requirement tiers:
What value do we get from having essentially four tiers of requirement–Required, Conditionally Required, Should/Recommended, and Optional? At this point, why not instead move to make Shoulds/Recommended into Conditionally Required (e.g., “should include agency phone if one available” becomes Conditionally Required if such a number is available)?
My understanding is that Must/Required and Conditionally Required are essentially one tier, which relates to data validity and they trigger ERRORS in the Canonical GTFS Schedule Validator.
There are two additional tiers that relate to data quality: Should/Recommended and Optional:
- The Should/Recommend tier deals with minor issues (e. g.
route_desc
should not be a duplicate ofroute_short_name
) and it triggers WARNINGS. - The Optional is related to completeness (e. g. adding wheelchair accessibility fields makes the data more complete), and it's also used for fields that are not always applicable.
I recognize this is a generalization and there are some exceptions.
Migrating the Should to Conditionally Required would have significant implications for existing datasets and we should probably discuss this in a separate issue.
from transit.
@emmambd I think your example is the reason why we ought be thinking about why the shoulds aren’t considered required or conditional, and why simply adding them to the spec itself (instead of siloed in Best Practices) creates more confusion.
In the case of wanting to build agency.txt, the spec is explicit that agency_id is optional for single agency datasets. So I don’t understand why we want a producer to then experience “oh, I am supposed to use this.” Effectively, that means the spec’s optional descriptor is meaningless if the shoulds override it.
@isabelle-dr I definitely agree this is outside this discussion, but I think this is why it’s worth considering this question before simply adding more confusing elements to the spec. Again, referencing agency_id for a single agency dataset, 4.0 validator does not flag this as a warning, so it does not fall into the realm of minor issues and would not warrant a should.
I imagine there are other scenarios where the breakdown of R/CR, Shoulds (to avoid warnings), and Optional (for data completeness) break down besides the example of agency_id (oh hey, blocks.) I think having this kind of clear, unambiguous routing of requirements makes sense and ought to be pursued instead of merging best practices into the spec itself. Doing so will likely surface some other examples where we can clean up wording and better define requirements to create a holistic, accessible, navigable spec that works in tandem with the validator (there's a huge written gap between the spec and the validator too we can address.)
from transit.
@evantrillium Thanks for expanding on the agency_id example and clarifying why retaining the shoulds would be confusing in this instance. You're right where this is an example where it's important to change the severity from Conditionally Required to Required, since it 1) significantly improve the readability of the spec 2) wouldn't cause backwards compatibility issues.
I think having this kind of clear, unambiguous routing of requirements makes sense and ought to be pursued instead of merging best practices into the spec itself.
I'm confused why we wouldn't want to pursue both, since merging them increases visibility and clarifying the requirement level increases readability. Doing both seem to help improve data quality (though I agree reviewing requirement levels is a must for merging the per-file requirements). The next iteration of this work after #375 could focus on discrepancies between requirement tiers across the best practices and the spec.
Are there any examples of where having best practices separated from the spec reduces confusion for producers? Or perhaps where we should reconsider the requirement level of any best practices within the scope of Dataset Publishing and General Practices?
from transit.
After looking into some other examples (like the timepoint best practice in stop_times.txt or the block_id best practice in frequencies.txt), we're envisioning that a part of this merging work would be to add a Recommended requirement level to the Presences in cases where modifying the best practice to be Conditionally Required or Required is not possible/easy to understand.
This would look like the following for the agency_id case:
agency_id
(Conditionally Required): Identifies a transit brand which […]. Should be included, even if there is only one agency in the feed.
Conditionally Required: Required when the dataset contains data for multiple transit agencies.
Recommended otherwise
Regardless, we agree this kind of harmonization work is key between the best practices and the spec to improve readability and merging best practices into the spec without this would just increase confusion. If there are places where this is relevant to #375, we would include the Recommended requirement level now. If not, it would wait until the next iteration of this work.
from transit.
From @evantrillium
In the case of wanting to build agency.txt, the spec is explicit that agency_id is optional for single agency datasets. So I don’t understand why we want a producer to then experience “oh, I am supposed to use this.” Effectively, that means the spec’s optional descriptor is meaningless if the shoulds override it.
I'd argue that having the shoulds in a completely separate place is even more confusing.
The user experience is essentially:
- Ah, I don't need to build
agency.txt
so I'm going to leave it out of my software implementation - Client is upset that
agency.txt
is not there b/c its use is widespread...points them to the best practices documentation - Ohhh, there is a whole other set of suggestions that I should have been doing all along?
I'd argue that replacing a MAY with a SHOULD (which is ideally synchronous with adding a warning to the validator) is backwards compatible because it does not produce an ERROR and would still be considered valid GTFS.
Put another way - you shouldn't get a warning in the validator for something that isn't a SHOULD in the GTFS spec.
from transit.
Hi @e-lo
We are in alignment on a consolidated reference - my example was specifically about agency_id (not the entirety of agency.txt) being an Optional element for single-agency feeds, but a Recommended element for all feeds per the BPs. @emmambd’s recommendations in #376 are a great step towards bringing the spec, the BPs, and the validator all in tighter sync which Trillium fully supports.
from transit.
Related Issues (20)
- Add rider_category_id to fare_products.txt
- Add maximum waiting time to transfers.txt HOT 2
- Documentation: inconsistencies between gtfs-realtime.proto
- GTFS-Fares v2: Improvement of filling with stops of a certain area HOT 3
- Add recommended presence: reconciling confusion between best practices and spec HOT 10
- Thoughts on forbidding "subfolder inside zip"? HOT 8
- TripDescriptor.start_date matching between GTFS-RT + GTFS-static HOT 2
- GTFS-Flex: Service Discovery HOT 11
- Add rule_priority field to fare_leg_rules.txt HOT 7
- Add fare_media_type=1 to fare_media.txt HOT 8
- GTFS-Fares v2: Add networks.txt & route_networks.txt HOT 13
- GeoJSON in GTFS? (Or the future of GTFS serialisation) HOT 20
- Phone number international format in GTFS HOT 2
- stop_times.shapes_dist_traveled shouldn't be defined if the trip doesn't have shapes associated HOT 7
- GTFS changes - voting agents HOT 12
- Moving Schedule Best Practices into the Spec: Phasing Plan HOT 5
- [GTFS-Flex] Remove referencing location.geojson ids in stop_areas.txt (formerly location_groups.txt)?
- [GTFS-Flex] Replace areas.txt/stop_areas.txt with locations.geojson MultiPoint feature to describe collections of stops? HOT 11
- GTFS-fares v2: Fare Leg Rule "Scope" Support
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from transit.