Giter VIP home page Giter VIP logo

gsoc's Introduction

Google Summer of Code

Google Summer of Code logo           IOOS logo

Announcement: GSoC 2024 Project Ideas Needed!

IOOS is looking for new project ideas for GSoC 2024! GSoC Organization applications are due February 6, 2024 - we need a strong list of projects and mentors by this deadline to reapply for 2024.

If you are interested in mentoring a student during this year's GSoC program, please submit your project idea below:

2024 Mentors: submit your project idea here

Questions about the GSoC mentoring experience can go to any of IOOS' former GSoC mentors - see Past IOOS GSoC Projects for all of our former projects, students and mentors, or posted to the 'ioos_tech' mailing list.

U.S. IOOS® - Eyes on the Ocean, Coasts, and Great Lakes™

Our Mission

To produce, integrate, and communicate high quality ocean, coastal and Great Lakes information that meets the safety, economic, and stewardship needs of the Nation.

The U.S. Integrated Ocean Observing System (IOOS®) is a national-regional partnership working to provide new tools and forecasts to improve safety, enhance the economy, and protect our environment. Integrated ocean information is available in near real time, as well as retrospectively. Easier and better access to this information is improving our ability to understand and predict coastal events - such as storms, wave heights, and sea level change. Such knowledge is needed for everything from retail to development planning.

For more information about IOOS visit: https://ioos.noaa.gov.

Our Community

The IOOS Data Management and Cyberinfrastructure (DMAC) community is a diverse and inclusive group of scientific software developers focused on developing and supporting open source software to deliver oceanographic and other earth science data to users worldwide.

Events

IOOS hosts annual DMAC data management meetings, code sprints (2019, 2022), monthly technical webinars, and maintains virtual communications through our 'ioos_tech' mailing list and other platforms such as Slack and GitHub.

2024 IOOS Code Sprint - Washington, DC: IOOS will be hosting a Code Sprint in Washington, DC (May 21 - 23, 2024), providing a great opportunity for mentors and GSoC students to have some face time to work together in person. Some travel grants/reimbursements will be available. Mark your calendars!

U.S. IOOS and Google Summer of Code

IOOS has participated as a GSoC mentoring organization during GSoC 2021 and 2022.

Past IOOS GSoC Projects

IOOS' project ideas list from the GSoC 2023 applicattion is available here.

gsoc's People

Contributors

7yl4r avatar leewujung avatar lsetiawan avatar mathewbiddle avatar melissazweng avatar mwengren avatar ocefpaf avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

gsoc's Issues

Machine Learning with Sea Floor Sampling Video

Project Description:

Traditionally, surverys of the sea floor are conducted via vessel-mounted cameras which record video as the vessel moves in the water. Hundreds of hours of video are recorded and are often manually processed to determine which species are present on the locations in the video. This project seeks to automate the process using image processing. The intern will help prepare machine learning models, such as artificial neural networks, using available video footage from benthic surverying missions. The intern will partner with biology staff and software staff to train the model and perform data validation.

Expected Outcomes: A capable machine learning model which can be used to identify species from video transect data of the sea floor.

Skills required:

Familiarity with a programming language (Python, R) and a general understanding of how machine learning models operate. Experience with image processing is a bonus.

Difficulty:

Moderately difficult

Mentor(s):

@daltonkell Dalton Kell (Software Engineer), Matt Iannucci (Software Engineer), Tara Franey (GIS Specialist), Stephanie Berkman (Biologist), Joe Zottoli (Biologist)

Guidance regarding Machine Learning with Sea Floor Sampling Video for GSOC'21

I am a Pre-final year computer Engineering student studying at Pune Institute of Computer Technology. My CGPA is 9.5/10 for the first four semesters. I am excited about advances in Machine Learning and have been exploring this field lately.
I am interested in this project. I read the abstract provided on the projects page of GSOC'21.
I would like to get more insights on this topic as to what is expected for the project and would be glad to get your guidance regarding how to proceed further.
Regards,
Kunal Shah.

Update instructions to point folks to template issue

I created a template for proposing GSoC projects. Maybe it's not a great idea, but I wanted to give it a shot. I was thinking folks could put their project proposals in as tickets to this repo, instead of having them spread around different places.
image

SDM toolkit

Project Description:

Develop packages that enable seamless Species Distribution Model (SDM) creation for people with zero programming experience with openly-available taxonomic occurrence data pulled from OBIS, GBIF, and other DwC-compliant, FAIR databases.

Expected Outcomes:

The output from this project will be an R or Python package (or a combination of these programming languages) enabling users to run SDM’s with for multiple taxonomic groups and species using records available on open-access databases (OBIS, GBIF, EcoTaxa, others).

Skills required:

Python, Jupyter Notebooks, Species Distribution Modeling Frameworks, data schema (ideally Darwin Core)

Mentor(s):

Tylar Murray (@7yl4r), Enrique Montes (@eqmh), Mat Biddle (@MathewBiddle), Ayush Anand (@ayushanand18)

Expected Project Size (175 or 350 hours):

350 hours

Difficulty:

Medium

Enhance IOOS colocate library for web-based query, preview and download of oceanographic data in ERDDAP

Project Description:

The ioos/colocate project started in OceanHackWeek 2019 as a way to find similar (colocated in space, time) oceanographic data hosted in ERDDAPs worldwide - see Awesome ERDDAP's server list for an idea of ERDDAP's popularity. The initial version was intended to be run as a lightweight UI in a Juypter Notebook environment, but it doesn't need to be limited to be notebook-based only. A number of enhancements can be made from the existing code base to extend colocate's capabilities:

  • improve query functionality and performance of colocate's ERDDAP query implementation, including update the erddapy API dependency and/or implement better parallelism in query algorithms
  • develop a standalone/webapp-like version to run via Panel or PyScript
  • allow interactive preview and/or download of ERDDAP datasets returned via user query (e.g. generate a timeseries plot for CF timeSeries or timeSeriesProfile datasets, scatter or profile plot for trajectory datasets)
  • build a colocate extension for Jupyter Lab

See the following issue in ioos/colocate for more info about the project and to get started: ioos/colocate#29.

Expected Outcomes:

The colocate package is a useful tool for lightweight discovery of in situ oceanographic data. IOOS sees benefit in developing tools to assist users in searching, previewing, and downloading oceanographic data from the constellation of ERDDAP servers operated worldwide. Any or all of the enhancements above will help to bring IOOS' and other organizations' oceanographic data to the fingertips (and data analysis environments) of scientists, data analysts, and general users worldwide.

Skills required:

Python, erddapy/ERDDAP, Panel/HoloViz/DataShader, PyScipt, Jupyter Notebook, Jupyter Lab, familiarity with netCDF and oceanographic data standards including Climate and Forecast (CF) conventions is helpful but not required - willingness to learn the data types is ok

Mentor(s):

Micah Wengren (@mwengren), Mathew Biddle (@MathewBiddle), Filipe Fernandes (@ocefpaf)

Expected Project Size (175 or 350 hours):

175 or 350 hours

Difficulty:

Medium/Hard

Echopype: Upgrade robustness and scalability of ocean sonar data processing

Project Description

Echosounders, or high-frequency ocean sonar systems, are the workhorse to study life in the ocean. They provide continuous observations of fish and zooplankton by transmitting sounds and analyzing the echoes bounced off these animals, just like how medical ultrasound images the interior of the human body. In recent years echosounders are widely deployed on ships, autonomous vehicles, or moorings, bringing in significant volumes of data that allow scientists to study the rapidly changing marine ecosystems. This project aims to upgrade the robustness and scalability of the Echopype package, which standardizes data from different echosounder instruments into widely accessible netCDF or Zarr files. The project work will focus on making the Echopype testing suite more robust by overhauling its Continuous Integration (CI) mechanisms and tackling distributed computing bottlenecks in processing irregularly spaced echosounder data across computing agents.

Expected Outcomes

[1] Robust Continuous Integration mechanisms that utilize GitHub release assets for hosting test files; [2] Increased test coverage for foundational data conversion functions; [3] Improved distributed computing performance for major processing functions on large (100s of GB) data sets

Skills required

Python; Libraries: Xarray, Dask, Zarr; Interests in working with oceanographic, acoustic or geospatial data

Mentor(s)

Wu-Jung Lee (@leewujung), Valentina Staneva (@valentina-s)

Expected Project Size

175 or 350

What is the difficulty of the project?

Intermediate

"Big Gridded Data": Distributed Cloud Storage for Physical Oceanography Data

Project Description:

Storing highly-voluminous and highly-dimensional data has always presented challenges, and while hardware advancements have eased some of the burden, software remains the critical component in data management systems. This project will explore burgeoning solutions in the big-data realm to store massive volumes of highly-dimensional numeric data across distributed cloud platforms. Participants will examine tradeoffs between technologies and develop deeper understanding of how new data storage and access solutions may be implemented in the oceanography industry.

Expected Outcomes:

A software cost-benefit analysis of data storage and access scenarios.

Skills required:

Familiarity with Linux/UNIX operating systems and a working knowledge of Python, C/C++. Understanding basic database architecture is a plus.

Difficulty:

Moderately difficult

Mentor(s):

@daltonkell Dalton Kell (Software Engineer), @benjwadams Ben Adams (Software Engineer)

Plankton Imagery Processing

Project Description:

The SE Marine Biodiversity Observation Network program (SE MBON) is collecting 10’s of thousands of plankton images during oceanographic surveys across south Florida that must become publicly available on several open-access repositories like OBIS, GBIF, and EcoTaxa. In order to do this, an automated system for aligning taxonomic records extracted from imagery and associated metadata to the Darwin Core schema is needed. Plankton imagery is collected from > 80 stations during bi-monthly cruise in south Florida waters with a Continue Particle Imaging and Classification System (underwater microscope) mounted on a CTD-rosette.

Expected Outcomes:

The output from this project will be an R or Python package (or a combination of these programming languages) enabling users to align image-based taxonomic data and metadata to the Darwin Core standard and publish the data in the aforementioned data repositories.

Skills required:

Python, R, experience with machine-learning-based image classification methods, knowledge of Darwin Core, image-upload API

Mentor(s):

Tylar Murray (@7yl4r), Enrique Montes (@eqmh), Mat Biddle (@MathewBiddle)

Expected Project Size (175 or 350 hours):

175 hours

Difficulty:

Medium

Making ocean biodiversity data easily accessible with python (pyobis revamp)

Project Description:

The Ocean Biodiversity Information System (OBIS) is a global open-access data and information clearing-house on marine biodiversity for science, conservation and sustainable development. OBIS harvests occurrence records from thousands of datasets and makes them available as a single integrated dataset via various services including the OBIS API. This project is intended to update the existing pyobis python package to use the new OBIS API and ensure the package is usable for product generation in the future.

Expected Outcomes:

  • Update pyobis to use new API.
  • Make mof response more efficient.
  • Create tutorial usage of each module.
  • Clarify listing of GBIF instead of OBIS in package descrip ref.
  • Increase test coverage.
  • Use CI push to PyPI.
  • Merge in URL changes done on the forks.
  • Create jupyter notebook with demo to analyze/visualize data grabbed using this package.

Skills required:

  • Python
  • Some knowledge/use of Web Application Programming Interfaces (API)

Difficulty:

  • Moderate

Mentor(s):

Tylar Murray (@7yl4r), Filipe Fernandes (@ocefpaf), Mathew Biddle (@MathewBiddle)

Efficient access to IOOS data in the cloud

1 1jqgZt_YkMHiOAfvoOsYIg

Project Description:

Want to spend your summer getting paid by Google to improve our ability to work with climate, weather and remote sensing data in the Cloud? Come join us to work on Kerchunk, a Python package that turbocharges the worlds most common scientific data formats, allowing efficient, parallel, cloud-native access!

Much of IOOS data, especially that from models, is in NetCDF format. Currently it's served primarily through ERDDAP and THREDDS, but in the best interests of open science, IOOS data could be made available on the Cloud, and services like ERDDAP and THREDDS layered on top if necessary.

While new cloud-performant formats like Zarr have been created to represent the NetCDF Data Model, it has been shown that NetCDF files themselves can be made cloud-performant by creating a JSON file that makes a collection of JSON files readable by the Zarr library.

There is a package that assists in the creation of these JSON files, called Kerchunk. It currently reads a collection of NetCDF4, GRIB2 files but could be expanded to cover a wider array of datasets, including NetCDF3, a common format used in IOOS.

The student will work with Kerchunk to expand the capablities of kerchunk and develop pipelines that convert massive collections of forecast and remote sensing data on the Cloud into virtual Zarr datasets that can be used efficiently and effectively in Python based workflows, for example, https://registry.opendata.aws/ecmwf-era5/.

See this Medium blog post for a description of this powerful unifying approach to handling scientific data in the Cloud: https://medium.com/pangeo/cloud-performant-netcdf4-hdf5-with-zarr-fsspec-and-intake-3d3a3e7cb935

Expected Outcomes:

The GSoC student would work with Mentors to extend the current code base, and generate Jupyter notebooks, documentation and blog posts that demonstrate the new capabilities added, and workflows generated. All work will be done on github, and weekly virtual meetings will take place with mentors.

Skills required:

Python

Difficulty:

Moderate

Mentor(s):

@rsignell-usgs (researcher oceanographer, USGS)

@martindurant (professional open-source software developer, Anaconda, Inc)

update README table

The table in the readme reflects last year's projects. We should update that once we close #8

Interactive visualization of cloud-hosted ocean sonar data

Project Description:

Ocean sonar systems, such as echosounders, are the workhorse to study life in the ocean. They provide continuous observations of fish and zooplankton by transmitting sounds and analyzing the echoes bounced off these animals, just like how medical ultrasound images the interior of human body. In recent years these systems are widely deployed on ships, autonomous vehicles, or moorings, bringing in a lot of data that allow scientists to study the change of the marine ecosystem.

This project aims to create the capability to interactively visualize large, cloud-based ocean sonar data to accelerate data exploration and discovery. Developments in this project will go hand-in-hand with ongoing development of the echopype package that handles the standardization, pre-processing, and organization of these data.

Expected Outcomes:

The GSoC contributor will work with mentors to develop a new package echoshader that provides core ocean sonar data visualization functionalities based on the HoloViz suite of tools, test configuration for using echoshader widgets in Panel dashboards, and create Jupyter notebooks to demo use of the combination of tools.

Skills required:

  • Python
  • Interest in working with oceanographic, acoustic and geospatial data

Bonus skills:

  • Cloud computing
  • Visualization

Project Size:

  • 175 or 350 h

Difficulty

  • Moderate

Mentor(s):
Wu-Jung Lee (@leewujung), Emilio Mayorga (@emiliom), Valentina Staneva (@valentina-s), Landung "Don" Setiawan (@lsetiawan), Brandon Reyes (@b-reyes)

Cloud-optimized package for interactive visualization of ocean sonar data

Project Description:

Ocean sonar systems, such as echosounders, are the workhorse to study life in the ocean. They provide continuous observations of fish and zooplankton by transmitting sounds and analyzing the echoes bounced off these animals, just like how medical ultrasound images the interior of human body. In recent years these systems are widely deployed on ships, autonomous vehicles, or moorings, bringing in a lot of data that allow scientists to study the change of the marine ecosystem.

echoshader (repo, docs) is an open source Python package developed by GSoC’22 contributor Dingrei Lei (@ldr426) to facilitate ocean sonar data visualization. It currently contains the basic building blocks for a variety of visualizations of ocean sonar data such as echogram plots, ship cruise maps, curtain 3D renderings. The goal of this GSoC project is to develop and test a robust cloud deployment of the package, and demonstrate the capabilities on several types of datasets.

The objectives are:

  • test and optimize the toolbox for large datasets hosted on the cloud
  • create a cloud-based demo of the functionalities
  • improve code structure
  • add functionality to annotate and visualize regions of interest

Expected Outcomes:

  • A functional demo of the software on the cloud
  • Software components easy to reuse by researchers

Skills required:

  • Python, object oriented programming, cloud computing
  • Interest in working with oceanographic, acoustic and geospatial data

Bonus skills:

  • Visualization
  • UX design
  • Work with large datasets

Mentor(s):

Wu-Jung Lee (@leewujung), Valentina Staneva (@valentina-s)

Expected Project Size (175 or 350 hours):

175 or 350 h

Difficulty:

Medium

Add links to student project summaries after each GSOC year

We should at a minimum link to the summary report/page that each student submits at the end of their projects.

Ideally, make a GH pages site or some other way to present these in a compelling way. If that's too complicated, we can just make a markdown page and list them out with links.

Hopefully, most of the places the students posted their reports are persistent (?)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.