Machine-Learning-Project-China-Air-Pollution

This hub is for a UCLA machine learning Math 285J course project on China air pollution PM 2.5, including research references, data sources, and a list of our codes and results.

Nowadays, China air pollution is a pressing issue in the China society, since it might be the cause of the recent dramastic inceases of lung cancers.

##Background reading:

Machine Learning research on pollution prediction

Evolving the neural network model for forecasting air pollution time series
Intercomparison of air quality data using principal component analysis, and forecasting of PM10 and PM2.5 concentrations using artificial neural networks
Machine learning in geosciences and remote sensing

Data Source on weather and pollution

Weather data on NOAA
Air pollution data sources: [1] (http://aqi.cga.harvard.edu/china/), [2] (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/24826)

Model Assumptions

After plotting the time series at various stations in Beijing, there is a clear intraday seasonality, every 8 hours there is a peak of pollution. However, no significant short-term trends are identified. Based on these observations, the following are assumed:

The PM 2.5 pollutants index is driven by the previous 8 hours weather conditions and the pollution status.

Model Formulation

Suppose the time series for different pollutants are denoted by P_i(t), where i denotes the i-th pollutant and t denotes the time in hour. Suppose the time series for different weather conditions such as wind speed, temperature, humidity, and air pressure, are denoted by W_j(t).

Then,
PM2.5(t) = F(PM2.5(t-8), P_1(t-8), ..., P_n(t-8), W_1(t-8), ..., W_m(t-8))

The project is going to learn F using various machine learning methods, linear models (Lasso, Ridge), Random Forest, Extra-Trees, and Neural Networks.

Codes

This is a SQL codes for preprocessing data.
This is a python codes for vanilla nerual networks of arbitrary number of layers, using mini-batch SGD.
This is a python codes for model selections among various methods, Ridge, Lasso, Random Forest, Extra-Tree, and M-regression.

qianqian-yang / machine-learning-china-air-pollution Goto Github PK

machine-learning-china-air-pollution's Introduction

Machine-Learning-Project-China-Air-Pollution

Machine Learning research on pollution prediction

Data Source on weather and pollution

Model Assumptions

Model Formulation

Codes

machine-learning-china-air-pollution's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent