Giter VIP home page Giter VIP logo

gtfs2lc's People

Contributors

brechtvdv avatar dependabot[bot] avatar derhuerst avatar greenkeeper[bot] avatar j-steinbach avatar julianrojas87 avatar pietercolpaert avatar rodklerc avatar sballieu avatar smazzoleni avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gtfs2lc's Issues

Split in workers

In order to achieve a faster translation, we could split the connections.txt file in parts and launch multiple gtfs2lc.js workers, depending on the number of cores of the machine.

Platform numbers for SNCB/NMBS

We are starting to talk about using Itinero-transit commercially, but not having platform numbers is a bit of a deal breaker for that.

I know this is a bit of an issue, so I wanted to log this here and (re)start the effort to get the data.

An in-range update of fast-csv is breaking the build 🚨

The dependency fast-csv was updated from 3.2.0 to 3.3.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

fast-csv is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

Release Notes for v3.3.0
  • [FIXED] First row of CSV is removed when headers array is provided #252
Commits

The new version differs by 2 commits.

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Frequencies

We need to process frequencies as well...

This means reading an extra file if it exists, and adding extra connections based on the connectionRules stream

Add a way to configure base URIs

Right now, example.org is used in every RDF output. This should be changed to something configurable

I suggest -b --baseUris <baseUri> : a mapping file with base URIs for RDF outputs

gtfs2lc in macOS

I'm trying to run gtfs2lc in a macOS but some problems appear:

  • When I run gtfs2lc-sort it seems like the command remove all "r" from the files and change their format, an example: https://github.com/dachafra/spaingtfs/tree/master/gtfs2lc/metro. The output doesn't say anything about an error:
    dchaves$ gtfs2lc-sort metro/
    Converting newlines dos2unix
    Removing UTF-8 artifacts in directory metro/
    Sorting files in directory metro/
  • After that, if I run gtfs2lc and the output is:
    dchaves$ gtfs2lc metro/
    GTFS to linked connections converter use --help to discover more functions

The same dataset in an Ubuntu dist works perfectly.

fails to convert

I tried to convert the 2021-02-12 VBB GTFS feed.

npm init --yes
npm i gtfs2lc -D
wget -r --no-parent --no-directories -P gtfs -N 'https://vbb-gtfs.jannisr.de/2021-02-12/'
# rename all .csv to #.txt …
env NODE_ENV=production gtfs2lc gtfs -f jsonld | head -n 3
# GTFS to linked connections converter use --help to discover more functions
# Indexing of stops, services, routes and trips completed successfully!
# Created worker thread (PID 1)
# Created worker thread (PID 2)
# Created worker thread (PID 3)
# Created worker thread (PID 4)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_0.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_0.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_1.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_1.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_2.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_2.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)
# [Error: ENOENT: no such file or directory, open 'gtfs/connections_3.txt'] {
#   errno: -2,
#   code: 'ENOENT',
#   syscall: 'open',
#   path: 'gtfs/connections_3.txt'
# }
# Error: Worker stopped with exit code 1
#     at Worker.<anonymous> (/Users/j/playground/vbb-gtfs-lc/node_modules/gtfs2lc/lib/gtfs2connections.js:145:27)
#     at Worker.emit (node:events:378:20)
#     at Worker.[kOnExit] (node:internal/worker:260:10)
#     at Worker.<computed>.onexit (node:internal/worker:187:20)

It has also created 4 files inside gtfs:

ls -l gtfs
# -rw-r--r--@  1 j  staff       3537 Feb 22 01:10 agency.txt
# -rw-r--r--   1 j  staff      79382 Feb 22 01:14 calendar.txt
# -rw-r--r--   1 j  staff     859354 Feb 22 01:14 calendar_dates.txt
# -rw-r--r--@  1 j  staff         64 Feb 22 01:10 frequencies.txt
# -rw-r--r--@  1 j  staff        140 Feb 22 01:10 pathways.txt
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_0.json
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_1.json
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_2.json
# -rw-r--r--   1 j  staff          0 Feb 22 01:23 raw_3.json
# -rw-r--r--@  1 j  staff      48812 Feb 22 01:10 routes.txt
# -rw-r--r--@  1 j  staff  143590907 Feb 22 01:10 shapes.txt
# -rw-r--r--   1 j  staff  269753688 Feb 22 01:14 stop_times.txt
# -rw-r--r--@  1 j  staff    4723089 Feb 22 01:10 stops.txt
# -rw-r--r--@  1 j  staff    4200935 Feb 22 01:10 transfers.txt
# -rw-r--r--@  1 j  staff   14019736 Feb 22 01:10 trips.txt

Trying gtfs2lc with EMT data from Madrid

After installing with "npm install" the modules csv, level, n3, q and unzip, I started the execution of gtfs2lc at 10:35 am: ./gtfs-csv2connections path/to/data/transitEMT.zip > path/to/data/emtConnections.ttl

Let me know if you want to access the EMT GTFS data.

After completing some steps, at 4:15 pm it crashed with the following message:
Draining Agencies
Transforming Calendar
Transforming CalendarDates
Transforming Frequencies
Transforming Routes
Draining Shapes and Shape Segments
Draining Stops
Transforming Stop Times
Transforming Trips
Transforming GTFS store to arrival/departures
FATAL ERROR: JS Allocation failed - process out of memory
Aborted (core dumped)

The system crash message is titled "nodejs crashed with SIGABRT in v8::Function::Call()". A crash report was created at /var/crash/ (~90MB).

The following folders were created at the execution path: arrivals, dates, departures, and stop_times. They all contain .ldb documents and a LOG, among others. Let me know if you need to see any of the logs.

I am using Ubuntu 14.04.1 LTS "Trusty Tahr", running on a Toshiba Portégé with Intel CORE i7 and 16GB RAM.

An in-range update of fast-csv is breaking the build 🚨

The dependency fast-csv was updated from 4.0.3 to 4.1.0.

🚨 View failing branch.

This version is covered by your current version range and after updating it in your project the build failed.

fast-csv is a direct dependency of this project, and it is very likely causing it to break. If other packages depend on yours, this update is probably also breaking those in turn.

Status Details
  • continuous-integration/travis-ci/push: The Travis CI build could not complete due to an error (Details).

Commits

The new version differs by 4 commits.

  • 682710d v4.1.0
  • b9dd314 Merge pull request #327 from C2FO/v4.1.0-rc
  • 22e4fb7 Added benchmarks for files of 1000 and 10000
  • c0d8f72 Added headers event #321

See the full diff

FAQ and help

There is a collection of frequently asked questions. If those don’t help, you can always ask the humans behind Greenkeeper.


Your Greenkeeper Bot 🌴

Map a mode to a trip

A trip is executed with a certain mode. E.g., Bus, Tram, Gondola, etc.

How can we map this into gtfs2lc?

time zone

Make sure that timezone is configurable or readable from the feed info

Interpolation support

Is valid GTFS to define stop_times with both empty arrival_time and departure_time. However we cannot have Connections with empty departure or arrival times.

According to the spec, this implies that the consumer needs to interpolate these stop times, which is difficult in this case given that Connections are created in a streaming way.

A possible fix could be to do this as part of the pre-processing step that already takes place to order stop_times. Reusing an existing tool that handles this scenario, might make things easier.

PickupType and DropOffType unavailable

If both are unavailable, we just discard this connection right now. However, when the Real-Time version is than ran, it will not be able to match a RT update with this connection. We should keep it in there after all.

Transfer times

We need to map the minimum transfer times to Linked Connections as well. Not sure how to model that. Any ideas?

Basically we want to express that if you’re at a gtfs:Station, you can transfer from its gtfs:Stops onlly if you take into account a minimum transfer time of X seconds.

Add a URI template parameter feed_version to indicate the version of the feed

Different options to implement the identifier strategy for e.g., keeping a block ID persistent:

Using just local identifiers for e.g., a block id will give 2 problems:

  1. break federation, as multiple GTFS feeds get translated into Linked Connections and will reuse the same block IDs.
  2. When an updated GTFS feed gets published, the block ids might conflict with the earlier version.

Suggestion solution to introduce a global identifier

https://example.org/blocks/{block_id}

Solved the problem with federating over different source, but not yet the problem of making it work when an updated GTFS file gets translated to LC (unless for your GTFS feed, block ids are incremental over time and you can rely on this).

So we need to scope it to the specific GTFS feed and this brings us to another problem: how do you identify this specific GTFS feed or the fact it got translated to RDF here.

Suggestion for a version number:

  1. Rely on feed_version in feed_info.txt -- Design issue: don’t include patch version so that a block id stays the same when the minor and major version number didn’t change? (e.g., 1.2.0 → 1.2.1)
  2. When the GTFS feed’s version is not set, instead use a timestamp from the moment we started gtfs2lc

URI template for e.g., block then becomes:
https://example.org/blocks/{feed_version}/{block_id}

Error when calendar.txt is not set

calendar.txt is required according to the GTFS reference, yet not all GTFS feeds actually give a calendar.txt. We should handle the case when there's no calendar.txt available.

Fatal crash on GTFS files depending on calendar_dates

If a lot of calendar entries are present in a GTFS file for the same trip id, the script will crash. This is caused in calendar.js:88.


RangeError: Maximum call stack size exceeded
    at RegExp.test (<anonymous>)
    at expandFormat (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:627:48)
    at configFromStringAndFormat (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2407:18)
    at prepareConfig (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2575:13)
    at createFromConfig (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2544:44)
    at createLocalOrUTC (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2631:16)
    at createLocal (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:2635:16)
    at hooks (/home/bert/Desktop/linked-connections-server/node_modules/moment/moment.js:12:29)
    at StreamIterator.next (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/StreamIterator.js:34:56)
    at CalendarToServices._processCalendarDates (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:86:33)
    at /home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:88:14
    at StreamIterator.next (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/StreamIterator.js:36:5)
    at CalendarToServices._processCalendarDates (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:86:33)
    at /home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:88:14
    at StreamIterator.next (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/StreamIterator.js:36:5)
    at CalendarToServices._processCalendarDates (/home/bert/Desktop/linked-connections-server/node_modules/gtfs2lc/lib/services/calendar.js:86:33)

Example GTFS file (Too big for git): https://filehost.net/89f4172762918be7

I am aware that this GTFS file does not follow GTFS best practices (calendars.txt is not used, but instead calendar_dates is used), but if mass adoption is to follow, it might be best to support this.

Jsonld and mongold format are slow

When converting to jsonld (or mongold), our process takes a lot longer due to a jsonld compacting taking place, after raw triples are brought together in a json object.

We could however do the conversation to jsonld a lot faster by doing just converting the json objects that come out of the transformer instead of using the jsonld-stream library.

URI strategy for connections

How are we going to give a unique and persistent identifier to the connections, even when the data gets erased and added again?

First of all, for a real-world connection published by us, I suggest using http://id.linkedconnections.org/{feedid}/{version}/{localid}.

The localid is then composed out of:

  • gtfs:trip local identifier
  • the YYYY-MM-DD string for when the trip is executed
  • a count for the x-th time the trip is being executed that day

Mind that this URI strategy is specific to gtfs2lc. A different URI strategy can be chosen for different systems.

I have created a separate repository for redirecting (303) GET requests to URIs of pages containing this contection

Use methods for the base URIs instead

After thinking about this with @smazzoleni, we need to have any flexibility for the baseURIs to be generated with any kind of javascript functionality instead of only uri templates.

The solution we favored was to make the baseURIs.json config file a JS file with methods instead. Every kind of GTFS file will need to extend this class into their own system.

One such class could actually be an implementation where a config file can be taken account (for backwards compatibility) with URI templates, and possibly link it as follows for slightly more functionality (idea by @smazzoleni):

{
  "stop": "http://data.gtfs.org/example/stops/{stop_id}",
  "route": "http://data.gtfs.org/example/routes/{route_short_id}",
  "trip": "http://data.gtfs.org/example/trips/{trip_id}/{trip_startTime}",
  "connection": "http://example/linkedconnections.org/connections/{trip_startTime}/{departureStop}/{trip_id}",
  "resolve": {
    "route_short_id": "connection.trip.route.route_id.substring(0,5)",
    "trip_id": "connection.trip.trip_id",
    "trip_startTime": "format(connection.trip.startTime, 'YYYYMMDDTHHMM');",
    "departureStop": "connection.departureStop"
  }
}

Crash when a trip isn't in trips.txt

When trying to convert the attached GTFS file, GTFS2LC crashes


3|datasets | Indexing services and routes succesful!
3|datasets | Error: Unhandled "error" event. (Did not find this trip id in trips.txt: 55408414)
3|datasets |     at ConnectionsBuilder.emit (events.js:186:19)
3|datasets |     at ConnectionsBuilder.onerror (_stream_readable.js:663:12)
3|datasets |     at emitOne (events.js:116:13)
3|datasets |     at ConnectionsBuilder.emit (events.js:211:7)
3|datasets |     at onwriteError (_stream_writable.js:417:12)
3|datasets |     at onwrite (_stream_writable.js:439:5)
3|datasets |     at ConnectionsBuilder.afterTransform (_stream_transform.js:90:3)
3|datasets |     at _expandTrip.then.catch (/var/www/dk.lc.bertmarcelis.be/node_modules/gtfs2lc/lib/ConnectionsBuilder.js:79:5)
3|datasets |     at <anonymous>
3|datasets |     at process._tickCallback (internal/process/next_tick.js:189:7)

DSB.zip

My guess: FeedValidator makes no problem of unused trips. Only trips which are in trips.txt should be used, not all trips in stop_times.txt.

Removing references to this trip from stop_times resolves the issue.

Trains that split or merge

The NMBS operates several trains which are split into two indepent trains at some point in their journey. The reverse situation also occurs, with two trains merging into one. Splitting or merging always takes place in a station where traveller can (dis)embark the train.

Splitting

First of all, the train drives the first part of its journey as a whole, during which it is identified by a single identifier, in this case IC4310. When the train splits, one part of the train keeps the identifier (IC4310) for the remaining of its journey. The other part gets a new identifier, in this case (IC4410). Even though it is clear to see that this is one train, even the NMBS website indicates that there are 2 trips. This causes routeplanning to think a transfer is needed, even though travellers can remain seated if they are in the right part of the train. NMBS' own routeplanning takes this into account, but we need a way to determine this for 3rd party routeplanning. NMBS does not publish data on which carriages travel where.

splitting train
splitting train

IC4310 in trips.txt:

220,000095,88____:007::8821006:8832409:13:1102:20180625,Hamont,4310,,5481,,1

IC4410 in trips.txt:

225,000297,88____:007::8832409:8831005:9:1152:20180625,Hasselt,4410,,5658,,1

Merging

Merging trains are similar, and have different identifiers until they merge. Once they merge, one of the trains will continue to travel with the identifier of one of said two trains. It should be noted that the train which loses its identifier on merging doesn't seem to have platform information. Again, routeplanning will consider this as a transer, even though passengers can remain seated (an thus don't have to transfer). NMBS' own routeplanning takes this into account, but we need a way to determine this for 3rd party routeplanning.

merging train
merging train

Solutions for routeplanning

A possible solution for this would be to label the common parts of a journey with both trip ids, by publishing two connection objects for each connection, both objects being identical except for the trip/route id.

Another possible solution would be to create an index of splitting trains, where each row contains two identifiers which belong together in one splitting or merging train. This would be less 'intrusive' to the Linked Connections list, and would prevent certain edge cases (splitting Linked Connections into fragments of a certain filesize could cause two "identical" connections to be split over multiple pages, having to combine and check multiple connection objects would be more complicated compared to only checking a list to determine whether a transfer is 'real' or a split/merge during routeplanning).

Current status

  • Investigate how the NMBS handles this
  • Investigate how this is published in GTFS

This data is not published in GTFS. There doesn't seem to be any field which implies a train split, or that a train drove part of its journey attached to another train.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.