isg-ics / wildfires Goto Github PK

View Code? Open in Web Editor NEW

7.0 7.0 4.0 764.36 MB

Python 67.72% JavaScript 0.44% TypeScript 27.19% HTML 0.89% CSS 2.65% PLpgSQL 1.11%

wildfires's People

Contributors

Stargazers

Watchers

Forkers

yuanf9 lipanpanpanpan jinzhe-zhang totemprotocol

wildfires's Issues

Frontend Development Server Failure

Describe the bug
Temporarily, we are using the Angular Development Server to deploy the demo. However, it will fail about 2 days.

To Reproduce
Launch the Angular server on wildfires.ics.uci.edu, for more than 2 days.

Expected behavior
Either solution would benefit us:

Fix the issue so that the development server will not go down after about 2 days.
Use production mode to generate the html, css, js. and use backend flask server to serve.

Error Messages

URIError: Failed to decode param '/%NETHOOD%/'
    at decodeURIComponent (<anonymous>)
    at decode_param (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:172:12)
    at Layer.match (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:123:27)
    at matchLayer (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:574:18)
    at next (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:220:15)
    at expressInit (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/middleware/init.js:40:5)
    at Layer.handle [as handle_request] (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:317:13)
    at /extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:284:7
    at Function.process_params (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:335:12)
    at next (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:275:10)
    at query (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/middleware/query.js:45:5)
    at Layer.handle [as handle_request] (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:317:13)
    at /extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:284:7
    at Function.process_params (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:335:12)
events.js:170
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:171:27)
Emitted 'error' event at:
    at emitErrorNT (internal/streams/destroy.js:91:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
    at processTicksAndRejections (internal/process/task_queues.js:81:17)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `ng serve --host 0.0.0.0 --port 2333 --disableHostCheck`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/yicongh1/.npm/_logs/2019-10-05T05_08_45_087Z-debug.log
yicongh1@cloudberry05 22:08:45 /extra/yicongh10/demo/Wildfires/frontend

Desktop (please complete the following information):

OS: CentOS 7

Dynamic Tweet Crawler Keyword Expansion

Whenever there is a new wildfire according to the fire data, expand the running crawler dynamically with the new wildfire's name.

Duplicate Fires Crawled

There are multiple entries with the same fire name in the database. related to Fire data runnable.

@ScarlettZ98 can you check please?

ngx-leaflet Restructure Frontend

Restructure frontend with `ngx-leaflet`

Currently, the tweet crawler work as following:

send a request to twitter.com/search?q={keyword} to get the html page.
using regex to match for all the related tweet id on this html.
after collecting a batch (>100) of tweet ids, it will use Tweet API to request for the completed tweet data with the collected ids.
crawled data is sent to dumper for insertion.

This logic has the following issue:

it cannot get rid of duplicate ids between batches.
due to restriction of tweet ip in step 3 above, we cannot request for tweets data too often, otherwise the API key will likely get banned. Thus current model has a static wait time of 20 seconds between each API call. Since the whole process is in one single thread, step 1 - 3 are within the same thread, thus step 1 is waiting 20 seconds between each search request.
If the records is generated too fast, the 20 seconds interval will leak some records.
due to single thread issue, if the crawler is down, the real time stream data will be lost and not able to recover.

A proposed solution:

separated step 1, 2 with 3. When getting a tweet id that is never crawled before, insert to database.
#3 should be run in another thread, consuming tweet id that has no data in database, using Tweet API in a highest frequency that won't be banned, requesting for tweet data with the id, and dump to database.

Is your feature request related to a problem? Please describe.
The current design, all tweets are fetched to frontend and let the frontend tweet.layer to handle data slicing and display, which consumes almost all available memory on frontend.

Describe the solution you'd like
two step loading tweets:

when initialized, load tweet count aggregated by date.
when selecting a time range, request for the exact tweet id and location with in the time range.
select tweets based on map range
frontend buffer to cache tweets for duplicate or overlap selection

A nice example of temperature layer

https://darksky.net/forecast/33.6712,-117.8303/us12/en

Develop Environment

The problem
The production system and the develop system should be separated. Right now, we are developing on the same shared production system which is hosted on wildfires server.

Describe the solution you'd like
In order to separate develop environment out, we need several followings to setup:

A mock PostgreSQL, with tables mirrored from the production system.
Some mock data, including tweets, locations, fire polygon, PRISM, NOAA data.
A docker file to setup all environments, including mock data.

ImageFromTweet Runnable Error

Describe the bug
When launching ImageFromTweet Runnable, it gives the following error:

SQL: select id, text from records r WHERE NOT EXISTS (select distinct id from images i where i.id = r.id) limit 100
[DATABASE] HOST = cloudberry05.ics.uci.edu, CONNECTION COUNT = 13, MAXIMUM = 100
extracting [], results = []
error: Traceback (most recent call last):
  File "/extra/yicongh10/wildfires/backend/task/image_from_tweet.py", line 25, in run
    f"select id, text from records r WHERE NOT EXISTS (select distinct id from images i where i.id = r.id) limit {batch_num}")})
  File "/extra/yicongh10/wildfires/backend/task/image_from_tweet.py", line 23, in <dictcomp>
    self.dumper.insert({id: self.extractor.extract(text) for id, text in
  File "/extra/yicongh10/wildfires/backend/data_preparation/extractor/tweet_media_extractor.py", line 28, in extract
    link_type: MediaURL = URLClassifier.classify(short_url)
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 91, in new_function
    return timeout_wrapper(*args, **kwargs)
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 150, in __call__
    return self.value
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 173, in value
    raise load
requests.exceptions.ConnectionError: None: Max retries exceeded with url: /how-to-make-a-hydrogen-conversion-kit-at-home/ (Caused by None)```


**To Reproduce**
1. start `TaskManager`
2. start a thread for `ImageFromTweet`
3. set loop time 600, other parameter are default

**Expected behavior**
Should work without error and extract image links from tweets.

**Desktop (please complete the following information):**
 - OS: CentOS 7

Getting more Tweets from Twitter Sample API

Currently we have a running crawler that utilizes Twitter search API to fetch data with keyword search.

There is another API, Twitter Sample API, which can give us randomly 1% of the tweets, which could including another set of tweets that related to wildfires.

We want to maximize the data set. So there are two ways to do this:

Since cloudberry has an ongoing crawler that is getting data with Twitter Sample API daily, we could just get data directly from cloudberry server, which sits on top of the AsterixDB. Essentially, this is fetching data from the AsterixDB. We could run the crawler adapter daily, to get data from AsterixDB, and dump into our database.
Re-implement the crawler with Twitter Sample Api, which is independent from other projects. But for storage issue, we may not store all the tweets, but only those that are interested to us. How to define "interested" is another issue.

Time Selection Error

Describe the bug
When selecting the time on the time bar, could not select the latest day (today)

To Reproduce
Steps to reproduce the behavior:

Open Fire Tweets or Fire Polygon layer, or both
Use mouse to select time range on the upper time bar.
Try to select the lastest day (today)
The selected tweets or fire polygon are not matching the date (today).

Expected behavior
Be able to display the latest day (today)'s data.