Giter VIP home page Giter VIP logo

wildfires's People

Contributors

ac-unicorn avatar dependabot[bot] avatar gutingxuan avatar hugoqn avatar huiminliu09 avatar jsid8qihgds3 avatar kathyzhuang avatar pagefau1t avatar roseart avatar scarlettz98 avatar shu155 avatar suquantum avatar xinyuehan7 avatar yicong-huang avatar yuanf9 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

wildfires's Issues

Frontend Development Server Failure

Describe the bug
Temporarily, we are using the Angular Development Server to deploy the demo. However, it will fail about 2 days.

To Reproduce
Launch the Angular server on wildfires.ics.uci.edu, for more than 2 days.

Expected behavior
Either solution would benefit us:

  1. Fix the issue so that the development server will not go down after about 2 days.
  2. Use production mode to generate the html, css, js. and use backend flask server to serve.

Error Messages

URIError: Failed to decode param '/%NETHOOD%/'
    at decodeURIComponent (<anonymous>)
    at decode_param (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:172:12)
    at Layer.match (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:123:27)
    at matchLayer (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:574:18)
    at next (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:220:15)
    at expressInit (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/middleware/init.js:40:5)
    at Layer.handle [as handle_request] (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:317:13)
    at /extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:284:7
    at Function.process_params (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:335:12)
    at next (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:275:10)
    at query (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/middleware/query.js:45:5)
    at Layer.handle [as handle_request] (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:317:13)
    at /extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:284:7
    at Function.process_params (/extra/yicongh10/demo/Wildfires/frontend/node_modules/express/lib/router/index.js:335:12)
events.js:170
      throw er; // Unhandled 'error' event
      ^

Error: read ECONNRESET
    at TCP.onStreamRead (internal/stream_base_commons.js:171:27)
Emitted 'error' event at:
    at emitErrorNT (internal/streams/destroy.js:91:8)
    at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
    at processTicksAndRejections (internal/process/task_queues.js:81:17)
npm ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR! [email protected] start: `ng serve --host 0.0.0.0 --port 2333 --disableHostCheck`
npm ERR! Exit status 1
npm ERR!
npm ERR! Failed at the [email protected] start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.

npm ERR! A complete log of this run can be found in:
npm ERR!     /home/yicongh1/.npm/_logs/2019-10-05T05_08_45_087Z-debug.log
yicongh1@cloudberry05 22:08:45 /extra/yicongh10/demo/Wildfires/frontend

Desktop (please complete the following information):

  • OS: CentOS 7

ngx-leaflet Restructure Frontend

Restructure frontend with ngx-leaflet

  • core-map

  • time-bar

  • side-bar

  • time-service

  • tweet-service

  • map-service

  • fire-tweet-layer

  • wind-layer

  • fire-polygon-layer

Tweet Crawler Logic Change

Currently, the tweet crawler work as following:

  1. send a request to twitter.com/search?q={keyword} to get the html page.

  2. using regex to match for all the related tweet id on this html.

  3. after collecting a batch (>100) of tweet ids, it will use Tweet API to request for the completed tweet data with the collected ids.

  4. crawled data is sent to dumper for insertion.

This logic has the following issue:

  1. it cannot get rid of duplicate ids between batches.

  2. due to restriction of tweet ip in step 3 above, we cannot request for tweets data too often, otherwise the API key will likely get banned. Thus current model has a static wait time of 20 seconds between each API call. Since the whole process is in one single thread, step 1 - 3 are within the same thread, thus step 1 is waiting 20 seconds between each search request.
    If the records is generated too fast, the 20 seconds interval will leak some records.

  3. due to single thread issue, if the crawler is down, the real time stream data will be lost and not able to recover.

A proposed solution:

  1. separated step 1, 2 with 3. When getting a tweet id that is never crawled before, insert to database.

  2. #3 should be run in another thread, consuming tweet id that has no data in database, using Tweet API in a highest frequency that won't be banned, requesting for tweet data with the id, and dump to database.

Tweet Load Memory Issue

Is your feature request related to a problem? Please describe.
The current design, all tweets are fetched to frontend and let the frontend tweet.layer to handle data slicing and display, which consumes almost all available memory on frontend.

Describe the solution you'd like
two step loading tweets:

  1. when initialized, load tweet count aggregated by date.
  2. when selecting a time range, request for the exact tweet id and location with in the time range.
  3. select tweets based on map range
  4. frontend buffer to cache tweets for duplicate or overlap selection

Develop Environment

The problem
The production system and the develop system should be separated. Right now, we are developing on the same shared production system which is hosted on wildfires server.

Describe the solution you'd like
In order to separate develop environment out, we need several followings to setup:

  1. A mock PostgreSQL, with tables mirrored from the production system.
  2. Some mock data, including tweets, locations, fire polygon, PRISM, NOAA data.
  3. A docker file to setup all environments, including mock data.

ImageFromTweet Runnable Error

Describe the bug
When launching ImageFromTweet Runnable, it gives the following error:

SQL: select id, text from records r WHERE NOT EXISTS (select distinct id from images i where i.id = r.id) limit 100
[DATABASE] HOST = cloudberry05.ics.uci.edu, CONNECTION COUNT = 13, MAXIMUM = 100
extracting [], results = []
error: Traceback (most recent call last):
  File "/extra/yicongh10/wildfires/backend/task/image_from_tweet.py", line 25, in run
    f"select id, text from records r WHERE NOT EXISTS (select distinct id from images i where i.id = r.id) limit {batch_num}")})
  File "/extra/yicongh10/wildfires/backend/task/image_from_tweet.py", line 23, in <dictcomp>
    self.dumper.insert({id: self.extractor.extract(text) for id, text in
  File "/extra/yicongh10/wildfires/backend/data_preparation/extractor/tweet_media_extractor.py", line 28, in extract
    link_type: MediaURL = URLClassifier.classify(short_url)
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 91, in new_function
    return timeout_wrapper(*args, **kwargs)
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 150, in __call__
    return self.value
  File "/extra/yicongh10/wildfires/venv/lib/python3.7/site-packages/timeout_decorator/timeout_decorator.py", line 173, in value
    raise load
requests.exceptions.ConnectionError: None: Max retries exceeded with url: /how-to-make-a-hydrogen-conversion-kit-at-home/ (Caused by None)```


**To Reproduce**
1. start `TaskManager`
2. start a thread for `ImageFromTweet`
3. set loop time 600, other parameter are default

**Expected behavior**
Should work without error and extract image links from tweets.

**Desktop (please complete the following information):**
 - OS: CentOS 7 

Getting more Tweets from Twitter Sample API

Currently we have a running crawler that utilizes Twitter search API to fetch data with keyword search.

There is another API, Twitter Sample API, which can give us randomly 1% of the tweets, which could including another set of tweets that related to wildfires.

We want to maximize the data set. So there are two ways to do this:

  1. Since cloudberry has an ongoing crawler that is getting data with Twitter Sample API daily, we could just get data directly from cloudberry server, which sits on top of the AsterixDB. Essentially, this is fetching data from the AsterixDB. We could run the crawler adapter daily, to get data from AsterixDB, and dump into our database.

  2. Re-implement the crawler with Twitter Sample Api, which is independent from other projects. But for storage issue, we may not store all the tweets, but only those that are interested to us. How to define "interested" is another issue.

Time Selection Error

Describe the bug
When selecting the time on the time bar, could not select the latest day (today)

To Reproduce
Steps to reproduce the behavior:

  1. Open Fire Tweets or Fire Polygon layer, or both
  2. Use mouse to select time range on the upper time bar.
  3. Try to select the lastest day (today)
  4. The selected tweets or fire polygon are not matching the date (today).

Expected behavior
Be able to display the latest day (today)'s data.

Desktop (please complete the following information):

  • OS: CentOS
  • Browser: Chrome & Safari

Additional context
The time-series might only be able to select the range as [start, end). so it does not include the last day.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.