Giter VIP home page Giter VIP logo

aquaviva's Introduction

AquaViva

Contributors Forks Stargazers Issues MIT License



Table of Contents
  1. About
  2. Data
  3. Machine Learning
  4. Visualization
  5. Future Work
  6. Contributing
  7. Built With
  8. License
  9. Contact
  10. Acknowledgements
  11. References

About

AquaViva is an innovative project aimed at addressing one of the most important sustainable development goals and overall global humanitarian challenges of our time - the lack of access to clean water (SDG 6). To accomplish this, we are using cutting-edge machine learning models, trained on various datasets including satellite imagery, climatic variables, and geological features, to produce near real-time, high resolution maps of groundwater level.

We believe that this tool has great potential to help communities mitigate water scarcity, monitor groundwater, and efficiently identify suitable sources of clean water. As such, we are committed to keeping our project open-source and free-to-use, and we welcome any contributors to build off of what we have done. This project is part of NASA's Pale Blue Dot Challenge, which shares our deep commitment to using technology for environmental and social good. We are ecstatic to say that we have been recognized as Best Overall in the challenge!

Created by Team Viva Aqua | Francisco Furey, Adam Zheng, Malena Vildoza, El Hadji Malick DIEYE (AKA Jay) 😊

(back to top)

Data

For our training data, we conducted an extensive literature review into past studies, as well as key concepts such as the water balance equation, in order to determine the variables that would provide a comprehensive set of information for predicting groundwater level. We then collected, cleaned, preprocessed, and integrated the datasets together using Python scripts (see scripts/preprocessing) and Jupyter Notebooks (see notebooks/preprocessing)

  1. Data Collection: First and foremost, we used IGRAC/GGIS to obtain piezometric (groundwater level) data from 2015-2022 for 36 wells distributed across Gambia. Then we gathered corresponding data for our 13 input variables (see Features), sourced from AρρEEARS, ClimateSERV, BGS, and GGIS (see Data Sources). Most of the raw data is available under data/original_data (except for a few files that were too large to upload)
  2. Data Cleaning and Preprocessing: We used Jupyter notebooks (see notebooks/preprocessing) to manage the various data formats (.nc4, .nc, .csv), visualize/analyze the raw data, and account for missing/erroneous data using nearest neighbor algorithms and linear interpolation. QGIS was also used to process hydrogeological region and topographical data. All processed data is available under data/processed_data
  3. Data Integration: Using pandas & geopandas, we merged datasets based on date, latitude, and longitude to form our primary dataset, which consisted of ~6600 rows (see data/processed_data/wells_data_gambia_for_machine_learning.csv)

Data Sources

  • Global Groundwater Information System (GGIS): An interactive portal by IGRAC that compiles data on global groundwater resources. We use it to access groundwater level data as well as data on hydrogeological regions.

  • British Geological Survey (BGS): This research project by BGS focused on the resilience of African groundwater to climate change. We incorporate their depth to groundwater data, which classifies data into 6 categories (0-7, 7-25, 25-50, 50-100, 100-250, >250 meters) - significantly lower resolution & precision than our targets, but still potentially useful.

  • Application for Extracting and Exploring Analysis Ready Samples (AρρEEARS): We used this tool to extract various parameters such as NDVI, MIR, EVI, Elevation, Curvature, Drainage Density, and Slope.

  • ClimateSERV: A tool by SERVIR, NASA, & USAID that provides climatic and vegetation data. We wrote a custom Python library (climateservAccess) for accessing the ClimateSERV API and used it to gather soil moisture, evapotranspiration, streamflow, and precipitation data.

Features

Datatype Description Data Source Resolution
LIS_Soil_Moisture_Combined Soil Moisture ClimateSERV/LIS 3 km
LIS_Streamflow Streamflow ClimateSERV/LIS 3 km
LIS_ET Evapotranspiration ClimateSERV/LIS 3 km
MOD13Q1_061__250m_16_days_EVI Enhanced Vegetation Index (EVI) AρρEEARS/MODIS 250 m
MOD13Q1_061__250m_16_days_MIR_reflectance Mid-Infrared Reflectance AρρEEARS/MODIS 250 m
MOD13Q1_061__250m_16_days_NDVI Normalized Difference Vegetation Index (NDVI) AρρEEARS/MODIS 250 m
NASA_IMERG_Late Precipitation ClimateSERV/IMERG 10 km
DepthToGroundwater Estimated Groundwater Level Range BGS 5 km
Curvatu_tif2 Curvature AρρEEARS/NASADEM 30 m
Drainage_density Drainage Density AρρEEARS/NASADEM 30 m
Slope_tif2 Slope AρρEEARS/NASADEM 30 m
Hydrogeo Hydrogeological Region IGRAC/GGIS N/A
NASADEM_HGT Elevation AρρEEARS/NASADEM 30 m

Output

Datatype Description Data Source
GROUNDWATER_LEVEL Groundwater Level IGRAC/GGIS

(back to top)

Machine Learning

All relevant Jupyter Notebooks are located in notebooks/machine_learning.

  1. Model Selection and Training: First, we divided our dataset based on well IDs to avoid overfitting, allocating 83% for training and 17% for testing. We trained 6 different regression models using scikit-learn: SVR, AdaBoostRegressor, GradientBoostingRegressor, RandomForestRegressor, SGDRegressor, and LinearSVR. Our computational resources limited our ability to test more computationally intensive models like neural networks. However, with access to more powerful machines, exploring these models could yield even more promising results.
  2. Model Evaluation: We employed metrics like Mean Squared Error (MSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²) for performance assessment, achieving our best result (MAE = 2.6 m, R² = 0.42) with Linear SVR.
  3. Model Optimization: We also applied Cross-Validation and GridSearchCV for hyperparameter tuning to optimize the model's performance, and combined LinearSVR with Nystroem for kernel optimization.

(back to top)

Visualization

  1. Visualization Data: We first defined an area of interest within QGIS, and then split it up into ~2874 points, each representing a 500m pixel. We then gathered feature data for each of these points (see data/final_dataset), processed and compiled it as before (see notebooks/gambia_dataset), and ran it through the Linear SVR model (see notebooks/gambia_dataset/LinearSVR_final_dataset.ipynb) to get predicted groundwater levels. Note: we only used 500m resolution due to time constraints, higher resolutions would have otherwise been entirely feasible.
  2. Visualization Creation: Once we had ML model results, we used IDW (Inverse Distance Weighting) interpolation in QGIS to increase the resolution to about 177m, and exported data to a csv. Then we uploaded the csv to kepler.gl and put together our interactive visualization, exported it to an html file, and customized it to create our Aqua Viva website.

(back to top)

Future Work

Given the enormous potential scale of this project, and the fact that we were just 4 people who worked on this for about a month, there is much else that remains to be done:

  • Model verification. Although our model was trained on the best open-source data we could find, it was still limited (6600 data points across 36 wells). Despite our best efforts and what we believe to be reasonably accurate results, groundwater level is still a very complex variable to predict and this project would benefit greatly from more data to verify/improve our model.
  • Streamline model usage. This was just a rough first pass for the process of getting feature data, running it through our model, and visualizing the results. So an important next step would be to create some sort of tool (perhaps a single Jupyter notebook) that streamlines this process and allows the user to adjust parameters easily.
  • Time series data. Due to time constraints, we only visualized data for one day (December 1, 2023). Especially once model usage is streamlined, it will be much easier to visualize time series data, which would be very useful for evaluating changes in groundwater level over time.
  • Near real-time data. It is entirely possible to create a tool that automatically retrieves near real-time data, runs it through our model, and outputs data for visualization. Such a tool could be used for groundwater monitoring.
  • Expand area of interest. Again, due to time constraints, we narrowed our focus to a smaller (but still high-impact) region of Gambia. Of course, with more time, it would be relatively trivial to create a visualization for all of Gambia. We have no idea if the model can be extrapolated to other regions of the world, but we think it might potentially be successful in regions with a similar biome to Gambia. More work should be done to verify this.

(back to top)

Contributing

Whether you would like to help with any of the future work outlined above, add your own data/ML models, or have any other ideas/suggestions - all contributions are welcome and encouraged! Simply fork the repo and create a pull request. You can also open an issue with the tag "enhancement". Thanks in advance for your contributions, and feel free to contact us with questions!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

Built With

Python scikit-learn TensorFlow Jupyter HTML CSS JavaScript Pandas NumPy Matplotlib Seaborn SciPy Geopandas netCDF4 Xarray climateservAccess QGIS Kepler.gl ChatGPT Copilot

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Contact

GMAIL LinkedIn LinkedIn

(back to top)

Acknowledgements

  • Thank you to TANGO (The Association of Non-Governmental Organizations in the Gambia) for their insights and for connecting us with community leaders in the Gambia
  • Thank you to Jun Yuan Zhang for his advice regarding groundwater level prediction

(back to top)

References

(back to top)

aquaviva's People

Contributors

franfurey avatar adamzhen avatar

Stargazers

Garv Saxena avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.