Giter VIP home page Giter VIP logo

Comments (17)

LeoFrachet avatar LeoFrachet commented on May 13, 2024

Pinging:

from transit.

mgilligan avatar mgilligan commented on May 13, 2024

+1 for option 1

from transit.

barbeau avatar barbeau commented on May 13, 2024

+1 for option 1

from transit.

gcamp avatar gcamp commented on May 13, 2024

I would prefer option 2, since this would reduce the complexity of fallbacks that don't seem necessary.

from transit.

skinkie avatar skinkie commented on May 13, 2024

I am missing an other argument in the discussion. https://developers.google.com/transit/gtfs/reference/gtfs-extensions#translationstxt is also using BCP 47. I would also go for option 1.

But I would also want to know what was the rationale of adding feed_lang? Specially if multiple agencies could have different languages within a single feed. Belgium is very a practical example if a national feed would exist ;-)

from transit.

LeoFrachet avatar LeoFrachet commented on May 13, 2024

What was the rationale of adding feed_lang? (asked by @skinkie)

According to the original draft (by Joe Hughes) of feed_info.txt, the field feed_lang was meant to override any value agency_lang values in agency.txt.

feed_lang (required):
The feed_lang field contains a two-letter ISO 639-1 code for the default language used for the text in this feed. This setting helps GTFS consumers choose capitalization rules and other language-specific settings. Please refer to http://www.loc.gov/standards/iso639-2/php/code_list.php for a list of valid values. This value overrides any agency_lang values in agency.txt

(Source: https://groups.google.com/d/msg/gtfs-changes/muxms00iBos/vXOGJZSOU9oJ)

The logic behind such requirement might be explained in the following message from Joe Hughes, on the same thread, explaining that agency_lang could be deprecated (which is Option 2):

[...] As I recall, the purpose of the feed_timezone (and feed_lang) is to deprecate agency_timezone and agency_lang, as those are holdovers from the early days when only one agency was allowed per feed. Once multiple agencies were allowed, the semantics of those fields became blurred.

That said, we should consider whether there are any meaningful cases for having two agencies in the same feed with differing timezones. Are there any clients out there that handle the case of differing agency_timezones in a single feed, and if so, how do you interpret & handle it?

(Source: https://groups.google.com/d/msg/gtfs-changes/muxms00iBos/TrCb9YYteCMJ)

Brian Ferris answered that question by saying that he cannot find such feed in the "Google Transit corpus" (and even suggest that in fact feed_lang might be useless):

[...] That is, I haven't been able to find a GTFS feed with two agencies and more than one language defined in the Google Transit corpus.  As such, we could pretty safely constrain the spec to say that 'agency_lang' must be the same for all agencies if multiple agencies are defined in the same feed.  And
finally, we'd come to the same conclusion that 'feed_lang' isn't really buying us all that much and should probably be removed.

(Source: https://groups.google.com/d/msg/gtfs-changes/Sh0e4o9o2Gw/z91_sSyHXFsJ)

My conclusion is that agency.txt was the old meta-data file. When they allowed multiple agencies to be defined in it, they ended up loosing the possibility to define meta-data, and they added feed_info.txt. agency_lang seems therefore to be a deprecated field without real value, just existing for historical reason (therefore the option 2).

Mutiple agencies with different languages within a single feed.

@skinkie, you asked:

Specially if multiple agencies could have different languages within a single feed. Belgium is very a practical example if a national feed would exist ;-)

If you want to use agency_lang to describe a multilingual transit network in a bilingual country (FWIW I'm living in Canada which is bilingual), you need every agency to be fully unilingual, which means:

  • no area of the country is bilingual (or this area has no transit service).
  • no trip is spanning across the language borders.

... Which is obviously not the case in Canada, Belgium and Switzerland, where you have bilingual areas (e.g. respectively New Brunswick, Brussels and Bern) and where country-wide train services are obviously spanning on the whole country and need to go to railway stations using the different names of the city.

Therefore, IMHO, agency_lang is not the best way to describe multilingual network (and is therefore useless). We need a way to translate fields, and the private Google extension you mentioned is one example of it (https://developers.google.com/transit/gtfs/reference/gtfs-extensions#translationstxt).

But let's just focus on agency_lang vs feed_lang for today, and we'll define an official translation extension for GTFS in the future.

If the majority goes to option 1, I'll open a proposal with this option.

from transit.

abyrd avatar abyrd commented on May 13, 2024

Thanks for the mailing list archaeology @LeoFrachet. Considering this past conversation, maybe we should seriously consider the option of deprecating all language fields. I don't see any entity in GTFS (including the Agency or an entire feed) that corresponds to a linguistic border. In many places there is no one-to-one mapping between contiguous geographic areas, organizations, and languages.

One way I can see feed_lang being useful is if multiple identical feeds were published, with only the stop and route names changed. That seems like a lot of redundant, potentially mismatched data though just to swap in different stop names.

I think this deserves further discussion, but I am +1 on option 2: deprecating the agency_lang field and keeping the BCP-47 feed_lang field until proper translation files are added.

from transit.

skinkie avatar skinkie commented on May 13, 2024

I still think this becomes a very bad idea. There can be multiple agencies in the same feed having different languages. Because the targetgroup is different. You want to differentiate between these agencies. Removing the agency_lang would not allow that behavior, and you would not be able to publish a feed that has French and Dutch, but no generic translation.

In addition agency lang might also describe the way an agency communicates (the phonenumbers, e-mail addresses, etc.) This would also be not-uniform.

from transit.

LeoFrachet avatar LeoFrachet commented on May 13, 2024

Ok, so if I try to sum up the state of the conversation:

  • Option 1 has the consensus: Nobody opposed to have agency_lang being extended to BCP-47. There seem to be a consensus around it.
  • Option 2 has 3 in favor and 1 against: Some people would go the extra mile and remove agency_lang once of all (@gcamp, @abyrd and myself), but some oppose (@skinkie).
  • An option 3 has emerged: Some people even suggest to remove both fields (@abyrd).

=> I would suggest to move forward with the simplification (aka option 1), which is a simple and easy change which can move forward easy, with a short-term win (simplification for validators).

=> I would suggest to wait for a broader conversation on translations to change further the structure. Both fields (agency_lang & feed_lang) are optionals anyway.

Therefore I'm gonna open a pull request with option 1, and try to draft something related to translations.

Edit:

  • Here is the link to the pull request about simplification of definitions: #98
  • Here is the link to the issue about translations and localization: #97

from transit.

abyrd avatar abyrd commented on May 13, 2024

Thanks for summarizing @LeoFrachet. I am not necessarily in favor of removing both fields. But we should seriously ask what the goal is of including language information in the feed, whether it reliably provided, whether is ever actually used for anything, and whether there is even any entity in the feed that is "naturally" monolingual.

@skinkie is it a common/recurring thing for each agency to have a single language in polyglot areas? It seems to me just a coincidence that in some places, linguistic borders might follow agencies.

from transit.

skinkie avatar skinkie commented on May 13, 2024

@abyrd common should be a feed that includes a border region. For example The Netherlands / Germany. So integrated feeds have this problem opposed to single operator feeds. And why not single operator feeds: because they would not be able to mode transfers between agencies.

from transit.

abyrd avatar abyrd commented on May 13, 2024

So I see that such situations exist in cross-border (merged) feeds where different agencies are imported from different sides of the border. But consider these points:

  1. agency_lang covers your merged cross-border case, but does not cover any other case, e.g. Switzerland where the agency is omnilingual (SBB CFF FFS) but signs and information will be in different languages depending on the city or the individual using the data. It seems like no combination of existing or future X_lang fields is going to cover all these cases because in the end it's the passengers who want/need different languages, not the agencies or trips or other entities.

  2. How is this information supposed to be used by the consumers of the feed? Or put differently, why is this information included in the feed? Does the fact that NS trains are run by a "Dutch speaking" agency really tell us anything about how to communicate to a French-speaking Belgian mobile app user?

from transit.

skinkie avatar skinkie commented on May 13, 2024

@abyrd

  1. there is translations.txt?
  2. isn't this about how the agency communicates names of cities, or what number to call for a specific language support?

from transit.

abyrd avatar abyrd commented on May 13, 2024

@skinkie

  1. Yes, translations.txt has been proposed as a solution to an actual identifiable problem, of providing names and URLs for multiple different languages. I'm not sure what problem or need the x_lang fields are trying to address.
  2. Maybe, I don't know. The source of my questions is not knowing what these fields are for. Based on the email exchanges @LeoFrachet unearthed from as far back as 7 years ago, the people who created these fields were questioning whether they were needed, which is what led me to ask the same question .

from transit.

gcamp avatar gcamp commented on May 13, 2024

Can't say for the advantage of having a lang per agency, but having the lang for a feed is definitely useful for us a consumers. If we want to add/change some translations automatically, knowing the lang of the strings before translations is required.

from transit.

abyrd avatar abyrd commented on May 13, 2024

Thanks @gcamp for the feedback. I see the original intent of the field, which is similar to what you are describing. But do you actually do this in practice, i.e. would you really automatically translate strings found in a GTFS feed (which would include place names and other difficult to translate expressions) based on the declared language in this field of a non-required file?

It seems likely to me that the decision to translate fields of an entire feed would not be taken lightly and would in fact always be initiated manually by someone who knows the source of the feed and the source language.

Anyway I'm not going to push for removal of the field, it seems like potentially useful metadata to have in the feed. I was just trying to take seriously the comments from Brian in 2011 that this field was not very useful.

from transit.

LeoFrachet avatar LeoFrachet commented on May 13, 2024

But do you actually do this in practice, i.e. would you really automatically translate strings found in a GTFS feed

@abyrd: The lang field is used for a few cleaning and improving processes:

  • Capitalisation: e.g. "LA UNION STATION" will be cased as "LA Union Station" in English, but as "La Union Station" in French (because "la" is a world in French, but an abbreviation in English)
  • Mode names: e.g. route_type 2 will be displayed differently in New York or Paris:
    • In Paris, it will be "Métro" for the francophone users, but "Métro (Subway)" for the anglophone users
    • In NYC, it will be "Subway" for the anglophones users, but "Subway (Métro)" for the francophone users
    • => The logic behind is that an anglo in Paris will need to know that the mode is called "Métro", because it will be written as such, but he needs to know what is a "Métro", and therefore the translation is provided in parentheses.
  • Validation: A simple way to validate the textual data in a GTFS is to check if the *_name fields of the GTFS contain only "allowed" characters, and to send a warning if there are unusual characters. A francophone dataset will emit a warning on ñ but not on ç, and an hispanophone dataset will do the opposite.

Those are three examples which are (AFAIK) currently being run on production. They all rely on the language field.

from transit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.