airq-dev / hazebot Goto Github PK

Building the 411 for air quality in the United States: a texting platform accessible to all, that provides actionable local information to protect your and your community.

Home Page: https://www.hazebot.org/

License: MIT License

Dockerfile 0.30% Python 94.59% Shell 1.23% Mako 0.19% HTML 3.69%

air-quality purpleair sms sustainability flask postgresql docker python python3 education

hazebot's Introduction

Hazebot

Building the 411 for air quality in the United States: a texting platform accessible to all, that provides actionable local information to protect your and your community. Simply text your zipcode to (262) 747-2332 to receive timely alerts when the air quality near you changes.

You can also visit us at hazebot.org. Hazebot is built on top of data from PurpleAir.

Contributing

Contributions are very welcome. Please see a detailed guide to contributing here. You can always reach us on our Slack if you'd like to get involved.

Features

To use Hazebot, simply text your zipcode to 26AQISAFE2 or (262) 747-2332, and we will send you an alert when the air quality in your zipcode changes categories. Hazebot sends each user no more than one alert every two hours, and only between the hours of 8AM and 9PM. You can also customize your alerting preferences via SMS.

We also support several SMS "commands", the full list of which can be viewed in the Hazebot menu (by texting "M" to Hazebot).

If interested, you can read about the technical implementation of Hazebot in our architecture docs.

hazebot's People

Contributors

Stargazers

Watchers

Forkers

jungrishi

hazebot's Issues

Be more permissive in what commands we accept

From Slack:

"Might want to account for punctuation in the menu options, if we’re sticking w/ number options? It’s not totally clear if “.” is required after the number from the instructions, but if you input “1.” you’re told option is unrecognizable"

We should accept both "1." and "1" as a valid command.

Add SQLAlchemy stubs

Right now all the SQLAlchemy code is effectively untyped, and it's a lot of code. That's dangerous. Adding stubs would protect us from dumb bugs.

Filter out or weight "low confidence" PurpleAir stations

If sensors are faulty or reporting abnormally for some other reason (PurpleAir's algorithm), they would ideally be discarded or weighted lower in averaging.

Gracefully handle JSONDecodeError when reading PurpleAir Data

We get this every once in awhile:

Expecting ',' delimiter: line 1 column 5999899 (char 5999898)

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/celery/app/trace.py", line 412, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/app/app/airq/celery.py", line 52, in __call__
    return self.run(*args, **kwargs)
  File "/home/app/app/airq/tasks.py", line 8, in models_sync
    models_sync()
  File "/home/app/app/airq/sync/__init__.py", line 33, in models_sync
    purpleair_sync()
  File "/home/app/app/airq/sync/purpleair.py", line 264, in purpleair_sync
    purpleair_data = _get_purpleair_data()
  File "/home/app/app/airq/sync/purpleair.py", line 41, in _get_purpleair_data
    results = resp.json().get("results", [])
  File "/usr/local/lib/python3.8/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/local/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/usr/local/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/local/lib/python3.8/json/decoder.py", line 353, in raw_decode
    obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 1 column 5999899 (char 5999898)

The app can tolerate a partial failure for 30 minutes or so, the data just gets a bit stale, so this isn't a big issue. We should gracefully handle this error — probably still log an exception, but with a better message? Or we could log a warning and then log an exception later on in the sync job if the data is more than 30 minutes stale (or something like that).

Improve architecture docs

We should include an overview of our schema and the technologies we use in our architecture docs. This will help would-be contributors get up to speed.

Stagger AQI updates

From Slack:

We will notify users only when AQI changes by a certain amount, if they've been notified in the past several hours. That is, if we texted you less than six hours ago (or something like that), we'll only text you again if AQI has changed by more than 40 points (and moved categories) since the last time we texted you. (Input on exactly how this should work is really welcome!)

We need to flesh out exactly how this will work, but that's the gist.

Translations

Support translations. This will require integrating with some sort of Flask translation extension (Flask-Babel, maybe?) and then wrapping all the strings with some version of gettext. I think the whole process should be made pretty clear on the Flask-Babel website. We will need to get someone to actually do the translations, of course.

Then we will create a new Twilio phone number for each new language. We'll update the sms-reply endpoint sms-reply/en, sms-reply/es, etc. The second part of the path will determine the lang.

Better Details / Recommendations

When using recommendations / details, we could also recommend behaviors: "go outside", "stay in", stuff like that.

We also might want to consolidate recommendations and details? They're so similar in what they do.

Delete old relationships when a sensor moves

Right now, when a sensor moves, we don't refresh the mapping between it and each zipcode near it — we just add to that mapping. This is obviously wrong. We should delete the mapping and start fresh.

This is actually relatively simple to implement and I'd be happy to walk someone through it. It just involves updating the _relationships_sync() function in airq.sync.purpleair so that, before processing a given sensor, we delete all its existing relationships.

Add task to build zipcodes table every week

We should start storing zipcodes and the distances from them to their nearby sensors in Postgres, and then migrate off using sqlite3 entirely. This will let us simplify our architecture and use only one DB for queries.

About text message

Switch to Purpleair's Experimental JSON API

I think switching to Purpleair's experimental JSON API, in which sensors are encoded as lists instead of maps, would offer a slight speed increase for uncached requests because the amount of data we'd need to receive over the wire would be substantially less. For details on the experimental JSON API, see this document under the JSON data available from PurpleAir section.

Integrate adding a sharable link into our menu

Users should be able to share Hazebot with just a couple clicks. I think we can use a link for this and put it in our menu.

Create LIST command

list subscription zipcodes -- what is the mechanism for unsubscribing? U 94118? etc.?

Add tests

Exactly what it sounds like. Add any tests for any part of this app. This will help us make sure we're delivering a consistent experience to users. Use whatever testing framework you want 😄

Nightly dump from DB to google sheets

This will be a "fun" integration to set up.

Parse apart recommendations into "Better air near you"

Create "better air near you" menu option that only returns cities/zipcodes with better air and removes AQI details

Handle Twilio error responses appropriately

Noticed this in the logs:

2020-09-16 20:10:33[2020-09-17 03:10:33,083] INFO in http_client: POST Response: 400 {"code": 21610, "message": "The message From/To pair violates a blacklist rule.", "more_info": "https://www.twilio.com/docs/errors/21610", "status": 400}779396a2-5448-49c2-b661-b59b6dac1f75
[2020-09-17 03:10:33,083] INFO in http_client: POST Response: 400 {"code": 21610, "message": "The message From/To pair violates a blacklist rule.", "more_info": "https://www.twilio.com/docs/errors/21610", "status": 400}

We should handle the 21610 error code by unsubscribing the client.

Support subscribing to multiple zipcodes

We've had numerous requests to support subscribing to multiple zipcodes. I'm not sure how it would work, but it's an interesting area for exploration.

Show 3 closest areas with better AQI regardless of distance

for better air quality nearby, instead of saying there's nothing better would be more helpful to share the three closed areas with better air quality, regardless of distance (so you know how bad it really is)

Roll out subscriptions

Get subscriptions live. This one is for me.

Allow granular location input

It's possible to share your location via text on Android or iOS. If Twilio supports this feature (does it?), we could allow users to send us their exact locations, which would let us provide even more targeted AQI updates.

Here's an example from https://www.communityconnectlabs.com/interactive-surveys:

Command to get the AQI for the last zipcode you entered

There should be a command to get the AQI for the last zipcode you entered.

I'm not yet sure what that command should be... maybe just "l" for Last?

We'd query postgres for your last request, pull up the zip, and then carry on as if you'd entered that zip.

Surface distances (in miles) in recommendations

It would be useful to know how far away places with good AQI are. We could surface this information (in mi) when giving recommendations, since we already calculate it when determining the nearest zipcodes. We'd just need to convert km to mi and add it to the copy.

Weight readings by distance

When calculating the estimated AQI of a zipcode, we can do better than taking the average of all sensors within n km if we instead "weight" the reading of each sensor by its distance from the zipcode centroid.

Caveat: I'm not exactly sure how to do this. But it seems doable.

Enhance test script debug output

make the test script dump the logs if the server doesn't start

Website enhancements

Add in Resource Center
Add in Impact Page
Add in donation page ***

Come up with better SMS schedule

We shouldn't just send SMS every 3 hours: we probably want to start getting smart about what qualifies as a valid time to alert someone.

We need to think about this and come up with a set of rules.

CC @wdanfort .

Migrate GeoDB code to SQLAlchemy

GeoDB code would be a lot cleaner if it used SQLAlchemy instead of sqlite3 directly.

Command to re-enable alerts instead of auto-reenabling them

When a user who has disabled alerts inputs a different zipcode, we should not automatically re-enable their alerts, as doing so would be impolite. Instead, we should return some additional copy telling them they can type "Yes" to re-enable alerts. Or something like that. We would then need a command to make "Yes" re-enable alerts.

Resend menu and unsub info every nth alert

Every nth alert, we should remind people about the menu and that they can unsubscribe. Or it might suffice to just send info about the menu every alert (from which they can unsub), since that's easy and still one segment.

Store phone numbers (or hashes of them)

In order to recognize a return user and not have to require zip code every time (plus added benefit of accepting new "commands").

Forecasting

Investigate plugging into a forecasting API and sending forecasts every morning.

Use ELK stack for logging

Right now we do our own custom logging in Postgres. This is nice because we can use these events for business logic, but it won't scale. We should at some point switch to using a "real" logging solution. I think using the ELK stack would make sense because integration with AWS is easy.

Website

We should serve up something better than "OK" when someone hits /.

This likely needs some design and PM work in addition to just coding.

Create an admin UI for responding to feedback

It should be possible to respond to feedback. If the client responds to our response, we should be able to respond to that. Basically this will require creating a page per feedback "thread" and persisting whatever message we send to the client in response to their feedback as a new type of event. We can then have an index page from which you can access these various threads.

Feedback command

Would be cool to have a command to give feedback directly over the app.

Like typing: "Feedback: This app is garbage."

Refactor subscriptions table

In the new world clients only have one subscription at a time. I'm bastardizing the current schema to make this work. We should refactor so that the client holds an FK to one subscription.

Steps:

Add subscription_id (nullable int) column to clients table.
Make this point to the client's current subscription.
Backfill this value from existing subscriptions.
Start reading from clients.subscription_id.
Drop the client_id column on the subscriptions table.

Before doing this we might want to discuss whether we envision a world where clients can have multiple subscriptions. If so there's no need for this change. CC @wdanfort

Split welcome message into two texts

Instead of sending 1 big scary text when people join, we should send the normal text and then enqueue a job to follow up with the second text.

Staging environment

Will be helpful to have a staging environment as more people use service and features become more complex to avoid major service disruptions and breaking changes. Potentially we can use our original Twilio number?

Slack integration

We should provide a Slackbot integration along with our existing SMS functionality.

Functionality should be pretty much identical, but formatting of the response should differ.

LRAPA conversion

We should either do the LRAPA conversion by default or add it as an option.

Allow users to select which AQI conversion they want to use

Users should be able to set a preference to choose which AQI conversion (none, LRAPA, USEPA) they want applied to data.

To support the USEPA conversion, we'll need to start tracking humidity and something called pm_cf_1 as well. pm_cf_1 is not yet available in the PurpleAir API, but it's coming soon per the person I talked to.

In terms of how to track this additional data, I'm starting to feel like it's redundant to store all of this on both client and zipcode. Instead, I think we should add a new table as follows:

CREATE TABLE metrics (
    id INTEGER NOT NULL,
    pm25 double precision NOT NULL,
    pm_cf_1 double precision NOT NULL,
    humidity INTEGER NOT NULL
)

Then zipcodes and clients can grow a new metrics_id column joining them to the metrics table.

As part of the sync process, we will insert into the metrics table instead of updating the zipcodes table. We will, however, keep track of which zipcode rows map to which metrics. We will then point these rows to the new metrics entries and delete the old metrics if they are not referenced by any other rows (more on this later).

Then, when we send an alert to a client, we will update the clients metrics_id to point to its zipcode's metrics row. We will not delete a metrics row as long as any client still points to it.

In the future, metrics could grow a unique hash column and we could use that to avoid duplicating rows. However, as metrics range over real numbers, I don't foresee this having as much space savings as just doing automatic GC when a row becomes unreferenced.

Text menu

Create interface for future feature development

Filter out "indoor" PurpleAir stations

They can be misleading when they are lower than the outdoor readings, and also bring down averages.

Create test harness

We need to setup a separate docker configuration to run tests.

The harness should probably use a docker-compose override file to spin up our infrastructure with different image names, and then run tests using pytest or unittest.

The test runner will create all tables before running the tests, and drop them afterwards.

I don't think we'll need to run celery for these tests, at least not initially. So we can remove the worker, scheduler and redis from our config, and instead test the sync functionality directly.

For fixture data, we can download files from geonames and purpleair and use them as input to the sync process. This will then build the database used by the rest of tests. So we might want three distinct test suites:

testing syncing: runs the synchronization process and asserts the output is correct
testing the app: makes requests against an endpoint and ensures the responses look correct
unit tests: testing functionality which does not depend on the DB being populated

Then we could run the sync tests (1) and use the output of that to run the tests against the app (2). Then when re-running app tests, you wouldn't need to build all the data each time.

Create Messages Table

We should replace the Requests table with a more generic table about the messages a user has sent us. We'd probably like to store:

the command type
the client_id
the timestamp
details_json [e.g., the zipcode if it's a command to look up a zipcode]

This table could get quite large, so we could consider an alternate schema which scales more linearly:

the command type
the client_id
the number of times the client sent us this message
the first timestamp at which the client sent us this message
the last timestamp at which the client sent us this message
details_json

The problem here is that if we make this table unique on command_type, client_id, the details_json will not preserve a perfect history of message sending. OTOH, do we really need a perfect history?

Just some food for thought...

We should also consider whether to store invalid messages.

CC @emjgreen @wdanfort

Refactor PurpleAir sync process to remove intermediate tables

Now that all metrics we need to serve queries and send alerts are stored in the zipcodes and clients tables, we could do away with the sensors and sensors_zipcodes tables entirely. We'd need to build the mappings between sensors and zipcodes in-memory every time we sync, which might be slow (it's a lot of geohashing), but it's definitely doable.

This is just an avenue for exploration.

Add proper typing to the middleware module

I can't for the life of me understand how to add typing to middleware.py. Mypy just seems to accept whichever types I give it as valid.