Giter VIP home page Giter VIP logo

gobble's People

Contributors

devinmatte avatar hamima-halim avatar hhalim1 avatar mathcolo avatar nathan-weinberg avatar

Watchers

 avatar  avatar

Forkers

nathan-weinberg

gobble's Issues

Make data available to the dashboard

So right now events are being written to disk on the instance. The data dashboard needs to get a hold of them somehow, though. I see a few distinct options...

a) Upload the events files to S3 overnight every night, accepting that we just won't have live bus or CR. (lame)
b) Every time we append an events.csv on disk, upload the entire thing to S3. (maybe, but like...no)
c) Serve live events over http that the dashboard can request on-demand.

My hunch is that we want to do (c), with some (a) sprinkled in. It's cool when things are live, and we shouldn't give that up. So rough steps:

  1. In a new process, create an express server that serves up events from ./output.
  2. Throw a load balancer in front of it, and wire up the load balancer to a .labs DNS record with the wildcard cert for https.
  3. Maybe add pre-shared key auth, since this is for internal use only?

FAQ
a) Why cannot the dashboard talk to the EC2 instance via its private IP, such that we can keep the EC2 instance off the public internet? That's possible, but it's a pain
b) If the EC2 instance has a public IP address, can the dashboard lambda just talk to that? Yes, it could. But the load balancer option lets us easily add https using the wildcard cert, which, even with no private data involved is good citizenry.

Duplicate events

Maybe this bus became stalled or something? gobble wrote the same event to disk a bunch of times because the current stop kept flip-flopping.

2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:51.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:54.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:54.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:57.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:00.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:03.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:03.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:06.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:11.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:39.000Z,0,0

Check for new GTFS bundle every day

At the moment the list of GTFS bundles is being read into memory once at launch, but we should check every day to see if that's the same one we should still be using. (Or something?)

Calculate scheduled headway per event

In order to display colored dots in the data dashboard, scheduled_headway needs to be filled in.

Unfortunately, we don't immediately know what the scheduled headway is from a particular vehicle, so we need to calculate it ourselves from GTFS. It also might be possible to request them on-demand, once per day, from the MBTA v3 API.

Ensure service is always running

We can't afford for the service to stop running for a long period of time, so we need ways to ensure it is running

  • Restarts
    • We should ensure that the service restarts every night when the T isn't running to clear any potential memory leaks and things that can be fixed with a restart
  • Monitoring
    • Datadog should notify us when the service isn't currently running so we can intervene

Crash due to nonexistent stop id

Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: Traceback (most recent call last):
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:   File "/home/ubuntu/gobble/src/gobble.py", line 45, in <module>
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:     main()
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:   File "/home/ubuntu/gobble/src/gobble.py", line 40, in main
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:     process_event(update, current_stop_state, gtfs_service_date, scheduled_trips, scheduled_stop_times, stops)
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:   File "/home/ubuntu/.cache/pypoetry/virtualenvs/gobble-i42h0hpV-py3.11/lib/python3.11/site-packages/ddtrace/tracer.py", line 975, in func_wrapper
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:     return f(*args, **kwargs)
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:            ^^^^^^^^^^^^^^^^^^
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:   File "/home/ubuntu/gobble/src/event.py", line 85, in process_event
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:     ) = reduce_update_event(update)
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:   File "/home/ubuntu/gobble/src/event.py", line 49, in reduce_update_event
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:     stop_id = update["relationships"]["stop"]["data"]["id"]
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]:               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: TypeError: 'NoneType' object is not subscriptable
Jan 08 02:51:28 ip-172-31-95-22 systemd[1]: gobble.service: Main process exited, code=exited, status=1/FAILURE
Jan 08 02:51:28 ip-172-31-95-22 systemd[1]: gobble.service: Failed with result 'exit-code'.
Jan 08 02:51:28 ip-172-31-95-22 systemd[1]: gobble.service: Consumed 45min 9.973s CPU time.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: gobble.service: Scheduled restart job, restart counter is at 3963.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: Stopped gobble.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: gobble.service: Consumed 45min 9.973s CPU time.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: Started gobble.

Data Quality: Headways

When viewing the data for the commuter rail in the dashboard, headways are often missing points on the graph, yet, they read out realistic headway numbers on the points that do exist

Screenshot 2023-12-26 at 12 06 07โ€ฏPM

Some charts look very strange

Screenshot 2023-12-26 at 12 07 28โ€ฏPM

Add more bus lines to gobble

Gobble currently queries and upload realtime data for the 1 bus only. We'll want to add at least the bus lines currently available on the dashboard so that we have parity with the ~monthly backfill data.

intuit stop ID for vehicles that dont report them

fairly regularly (5000ish times maybe), we will get gps pings from vehicles that reporting their route information, but not their upcoming stop. here's a map of ~2000 such pings over the course of a day https://www.google.com/maps/d/u/0/edit?mid=1ttstvWGxhXTY62ZOA7YYQr-o3Srnbj8&usp=sharing

we currently ignore these pings, which does give us decent headway calculations but can produce holes in our records. as far as i can tell, theres 3 types of stop id outages we tend to see

  1. short stretch outages, which occur for less than a minute and tend to happen at the beginning/end of the stop. these are short enough that we could probably ignore them and have reasonable calculations, even if they happen in the middle of a trip.
  2. medium stretch outages, which occur for maybe 2-10 minutes at a time. we see these a lot on the 39 (potentially caused by a glitchy AVL) and they can cause us to lose information for a couple of stops.
  3. long stretch outages, which might be because the AVL for a vehicle wasn't turned on but GPS was still reporting info.

if a vehicle has been dark for more than ~a minute, we should start trying to interpolate its progress along its shape if possible. there will probably be some complexity wrt

  • small route diversions
  • figuring out the inbound/outbound direction of the trip, if that's null (we might be able to grab this from gtfs? or just use previous ping info)
  • how to store shape information in-memory for quick enough calculations without ballooning memory usage (maybe just cache the more problematic route shapes)
  • monitoring the duration of a vehicle's outage in order to kick off this calculation
  • determining whether an event is an arrival or a departure (or neither.) i havent seen any STOPPED_AT events in these outages, and its unclear if the vehicles actually making any stops and not reporting it.

Add Commuter Rail lines

To be able to populate the data dashboard with commuter rail data, it would be nice to start processing at least 1 commuter rail line to start setting up the frontend against ๐Ÿš†

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.