transitmatters / gobble Goto Github PK
View Code? Open in Web Editor NEW๐ฆ Process MBTA events into a format that can be consumed by the Data Dashboard
License: MIT License
๐ฆ Process MBTA events into a format that can be consumed by the Data Dashboard
License: MIT License
So right now events are being written to disk on the instance. The data dashboard needs to get a hold of them somehow, though. I see a few distinct options...
a) Upload the events files to S3 overnight every night, accepting that we just won't have live bus or CR. (lame)
b) Every time we append an events.csv on disk, upload the entire thing to S3. (maybe, but like...no)
c) Serve live events over http that the dashboard can request on-demand.
My hunch is that we want to do (c), with some (a) sprinkled in. It's cool when things are live, and we shouldn't give that up. So rough steps:
express
server that serves up events from ./output
..labs
DNS record with the wildcard cert for https.FAQ
a) Why cannot the dashboard talk to the EC2 instance via its private IP, such that we can keep the EC2 instance off the public internet? That's possible, but it's a pain
b) If the EC2 instance has a public IP address, can the dashboard lambda just talk to that? Yes, it could. But the load balancer option lets us easily add https using the wildcard cert, which, even with no private data involved is good citizenry.
Maybe this bus became stalled or something? gobble wrote the same event to disk a bunch of times because the current stop kept flip-flopping.
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:51.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:54.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:54.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:34:57.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:00.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:03.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:03.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:06.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:11.000Z,0,0
2023-11-03,66,58403721,1,2553,6,0,1734,DEP,2023-11-04T03:35:39.000Z,0,0
At the moment the list of GTFS bundles is being read into memory once at launch, but we should check every day to see if that's the same one we should still be using. (Or something?)
In order to display colored dots in the data dashboard, scheduled_headway
needs to be filled in.
Unfortunately, we don't immediately know what the scheduled headway is from a particular vehicle, so we need to calculate it ourselves from GTFS. It also might be possible to request them on-demand, once per day, from the MBTA v3 API.
We can't afford for the service to stop running for a long period of time, so we need ways to ensure it is running
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: Traceback (most recent call last):
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: File "/home/ubuntu/gobble/src/gobble.py", line 45, in <module>
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: main()
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: File "/home/ubuntu/gobble/src/gobble.py", line 40, in main
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: process_event(update, current_stop_state, gtfs_service_date, scheduled_trips, scheduled_stop_times, stops)
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: File "/home/ubuntu/.cache/pypoetry/virtualenvs/gobble-i42h0hpV-py3.11/lib/python3.11/site-packages/ddtrace/tracer.py", line 975, in func_wrapper
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: return f(*args, **kwargs)
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: ^^^^^^^^^^^^^^^^^^
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: File "/home/ubuntu/gobble/src/event.py", line 85, in process_event
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: ) = reduce_update_event(update)
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: File "/home/ubuntu/gobble/src/event.py", line 49, in reduce_update_event
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: stop_id = update["relationships"]["stop"]["data"]["id"]
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
Jan 08 02:51:27 ip-172-31-95-22 poetry[3252872]: TypeError: 'NoneType' object is not subscriptable
Jan 08 02:51:28 ip-172-31-95-22 systemd[1]: gobble.service: Main process exited, code=exited, status=1/FAILURE
Jan 08 02:51:28 ip-172-31-95-22 systemd[1]: gobble.service: Failed with result 'exit-code'.
Jan 08 02:51:28 ip-172-31-95-22 systemd[1]: gobble.service: Consumed 45min 9.973s CPU time.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: gobble.service: Scheduled restart job, restart counter is at 3963.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: Stopped gobble.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: gobble.service: Consumed 45min 9.973s CPU time.
Jan 08 02:51:33 ip-172-31-95-22 systemd[1]: Started gobble.
Gobble currently queries and upload realtime data for the 1 bus only. We'll want to add at least the bus lines currently available on the dashboard so that we have parity with the ~monthly backfill data.
fairly regularly (5000ish times maybe), we will get gps pings from vehicles that reporting their route information, but not their upcoming stop. here's a map of ~2000 such pings over the course of a day https://www.google.com/maps/d/u/0/edit?mid=1ttstvWGxhXTY62ZOA7YYQr-o3Srnbj8&usp=sharing
we currently ignore these pings, which does give us decent headway calculations but can produce holes in our records. as far as i can tell, theres 3 types of stop id outages we tend to see
if a vehicle has been dark for more than ~a minute, we should start trying to interpolate its progress along its shape if possible. there will probably be some complexity wrt
STOPPED_AT
events in these outages, and its unclear if the vehicles actually making any stops and not reporting it.We should have datadog monitoring both at the EC2 instance level (agent) and the python code level (APM)
This will allow us to track EC2 resources, network usage, and code performance
https://docs.datadoghq.com/agent/basic_agent_usage/ansible/
https://docs.datadoghq.com/tracing/trace_collection/dd_libraries/python/
To be able to populate the data dashboard with commuter rail data, it would be nice to start processing at least 1 commuter rail line to start setting up the frontend against ๐
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.