Giter VIP home page Giter VIP logo

model_zoo's Introduction

Project BioPY


Project Live Demo Link
Deployed on Vercel

High-Level Summary

We have crafted a comprehensive, dynamic database that seamlessly integrates publicly available large-scale biomedical datasets, space biology datasets, and pertinent pre-trained models.

  1. Our database meticulously connects both terrestrial and space datasets with their associated pre-trained models.
  2. We've pinpointed the primary challenges in space biology and curated the most relevant datasets and models to address them.
  3. We've established a user-friendly website to facilitate scientists in leveraging models for transfer learning and also encourage their active contributions.
  4. With our interactive platform bolstered by a robust dual-linkage database, we envision the creation of a cohesive dataset-model map in the future. This map will elucidate the interrelationships between models and datasets, fostering deeper insights and propelling forward the pace of research.

Deployment/Run locally

In order to run the frontend locally, you need to follow these steps.

  1. Clone the repository:
    $ git clone https://github.com/OWO1430/Model_zoo/
    
  2. Instal the dependancies:
    $ npm install 
    
  3. Fill in the required environment variables in the ".env" file with the help of the ".env.example"
    GOOGLE_CLIENT_ID="Your ID"
    GOOGLE_CLIENT_SECRET="Your secret"
    NEXTAUTH_SECRET="Your secret"
    
    1. To get the google client id and secret, visit Google Cloud console for more.
    2. To generate the NEXTAUTH_SECRET, run this following command:
$ openssl rand -base64 32

Problems you may encouter

  1. When cliking/compiling the "Network" button on the sidebar, errors might occur.

The cause of the problem is that the local development server uses only http protocol, not https.
Since we've likned external resources using https protocol, error might occour, this won't happen on cloud deployments.

External links/ resources used

  1. The 3D graph on the "network" page - Source

    We used it to demonstrate our future vision, creating an intuitive graph on how the models and datas in the database relate to each others!


How do users use model zoo?

  1. Exploration and Search: Dr. Smith is conducting a space biology project to study the impact of microgravity on plant genes. She needs a model to help classify gene patterns. She visited the Space Biology Model Zoo and searched for models related to plant genetics.

  2. View Model Details: She found a model called "Plant Genome Sequence Classifier - Microgravity Effects." The model card explains that it was trained on thousands of plant genome sequences and has been optimized to identify patterns related to microgravity effects.

  3. Download/Import Model: Dr. Smith downloaded the model's weights and configuration files. The Model Zoo provided direct download links and also offered code snippets in popular programming languages for the import process.

  4. Fine-Tuning: Despite the promising nature of this model, Dr. Smith has her own specific data collected from experiments. She decided to fine-tune this model on her own data to ensure it adapts to her experimental conditions.

  5. Deployment and Usage: After fine-tuning, she deployed the model in her genome analysis pipeline. This model successfully helped her classify gene patterns, accelerating her research.

  6. Feedback and Contribution: Several months later, Dr. Smith further improved the model. She contributed her version along with notes on its enhanced performance on specific plant species back to the Model Zoo.

These steps creates a positive feedback loop for the science community especially space biology related researchers.

What might be the models that we should add into model zoo?

Models

Models trained on the TCGA (The Cancer Genome Atlas) could be a good starting point for genomic tasks. For imaging tasks, models trained on large-scale medical imaging datasets, such as those from the RSNA (Radiological Society of North America), could be adapted.

  1. DeepBind: Description: This deep learning model predicts DNA and RNA binding specificities for different proteins. Potential Application: Understanding protein-DNA/RNA interactions in space biology, especially under microgravity conditions, to study gene regulation.

  2. AlphaFold: Description: Developed by DeepMind, AlphaFold predicts the 3D structures of proteins based on their amino acid sequences. Potential Application: Predicting the structural changes of proteins that may be induced under space conditions, which can offer insights into functional alterations.

  3. EpiDeep: Description: This model predicts epigenomic features, such as histone modifications, from DNA sequences. Potential Application: Understanding the epigenetic landscape of organisms in space and how it might differ from terrestrial conditions.

  4. Resnet for Microscopy: Description: Residual networks (Resnets) that are fine-tuned for high-content microscopy images. Potential Application: Analyzing morphological changes in cells or tissues during spaceflight, including aspects like cell shape, organelle health, and cell-cell interactions.

  5. seq2seq for DNA Sequences: Description: Sequence-to-sequence models that can predict, for instance, potential coding sequences within DNA. Potential Application: Discovering new genes or regulatory elements that become active in space environments


What are the corresponding datasets?

Microbiology

Open Source Repository: NCBI Microbiome Central - A collection of databases and tools designed to support the study of microbiomes. Space Experiment Data: NASA GeneLab - Provides datasets from numerous space biology experiments. Some of these experiments have focused on the effects of space on microbial organisms, including bacteria and fungi.

Cell and Molecular Biology

Open Source Repository: GEO (Gene Expression Omnibus) - A public functional genomics data repository that archives and freely distributes microarray, next-generation sequencing, and other forms of high-throughput functional genomics data. Space Experiment Data: NASA GeneLab - Contains datasets from various cell biology experiments conducted in space. For instance, studies on human cells to understand the impact of microgravity on cellular function.

Plant Biology

Open Source Repository: TAIR (The Arabidopsis Information Resource) - Provides a comprehensive collection of data and information on the genetics and molecular biology of the plant Arabidopsis thaliana. Space Experiment Data: NASA GeneLab - Includes experiments that investigate the effects of spaceflight on different plant species. For example, how plants grow in microgravity or how space radiation affects plant genetics.

Animal Biology

Open Source Repository: Ensembl - Offers high-quality genome-wide sequence and annotation data for vertebrates and key model organisms. Space Experiment Data: NASA GeneLab - Houses datasets from experiments on various animals, like rodents, sent to space. These studies can range from understanding bone density loss in microgravity to more complex behavioral studies.

Developmental, Reproductive and Evolutionary Biology:

Open Source Repository: EvoDevoJ (Evolution & Development Journal) - While not a database in the traditional sense, this is a leading journal in the field of evolutionary developmental biology, and many articles provide supplemental data. Space Experiment Data: NASA GeneLab - While it may not have a vast collection in this specific field, there are some datasets that explore how microgravity affects development, reproduction, and potentially evolutionary trajectories. For instance, studies might investigate how animals develop in space from embryo to maturity.

Microbiology

Human Microbiome Project (HMP) Data: A comprehensive resource that has sequences of microbial genomes found in the human body. HMP Dataset IMG/M: The Integrated Microbial Genomes & Microbiomes system offers tools for the analysis of microbial community genomes. IMG/M Cell and Molecular Biology

The Cancer Genome Atlas (TCGA): Detailed genomic information for over 30 types of cancer. TCGA Dataset Gene Expression Omnibus (GEO): A public functional genomics data repository supporting MIAME-compliant data submissions. GEO Plant Biology

The 1001 Genomes Project for Arabidopsis thaliana: Sequencing of over 1000 different strains of the model plant Arabidopsis. 1001 Genomes Dataset Plant PhenomeNET: A dataset connecting phenotypic effect with gene function in plants. Plant PhenomeNET Animal Biology

Mouse Genome Informatics (MGI): A comprehensive database on the genetics and genomics of the laboratory mouse. MGI Zebrafish Model Organism Database (ZFIN): Provides integrated access to curated zebrafish genetic and genomic data. ZFIN Developmental, Reproductive and Evolutionary Biology

FaceBase: Datasets aimed at studying craniofacial development and disorders. FaceBase TreeBASE: A repository of phylogenetic information, specifically user-submitted phylogenetic trees and the data used to generate them. TreeBASE Bgee: A database to retrieve and compare gene expression patterns in multiple animal species, produced from multiple data types such as RNA-seq, microarrays, and in situ hybridization. Bgee

How users use a pre-train model?

GeneLab's data corresponding Earth models.

RNA-Seq data from plants to study gene expression changes in space:

DeepCount: A deep learning model for predicting gene expression levels based on sequence information. D-GEX: Uses deep learning to predict gene expression across different conditions. Transformer models like BERT and its variations have been adapted for biological sequences in tools like BioBERT or BioTransformers. Though they aren't pretrained on RNA-Seq data per se, they can be fine-tuned on such data. Microbial gene expression data to study microbial behavior in space:

DeepMAsED: A deep learning-based method for differential expression analysis. DRAGON: A deep learning model that can predict gene expression levels from the gene's regulatory region sequence. Again, Transformer models adapted for biological sequences could be fine-tuned on microbial gene expression datasets. Animal protein expression data to study protein synthesis changes in microgravity:

DeepProfile: Uses autoencoders to learn embeddings of gene expression profiles, which can be used for various downstream tasks. DeepAffinity: Predicts protein-ligand affinity using convolutional neural networks. Alphafold: Though it's a model for protein structure prediction, it signifies how deep learning models can be used effectively for protein-related tasks. Fine-tuning a model like Alphafold on protein expression data can provide meaningful embeddings or predictions.

Other image datas and corresponding pre-trained models

  1. Microscopy of Cellular Structures: Observing cells in space can reveal how microgravity affects cellular structure and function. For instance, observing changes in the cytoskeleton of cells can provide insights into how cells sense and adapt to microgravity.

  2. Bone Densitometry: Astronauts in space undergo bone density loss. Imaging the bone over time using densitometry can help in understanding the rate of bone degradation and the efficacy of countermeasures.

  3. MRI Scans of Astronauts' Brains: Some studies have indicated changes in astronauts' brain structures after prolonged spaceflight. MRI scans can help in mapping these changes and understanding their implications.

  4. Optical Coherence Tomography (OCT) for Eye Health: Extended space missions can affect eye health. OCT provides detailed images of the retina, helping in monitoring the health of an astronaut's eyes over time.

  5. Biofilm Formation: Microorganisms in space have been observed to form biofilms differently than on Earth. Observing these structures can help understand microbial behavior in space.

  6. Plant Growth Patterns: Microgravity affects how plants grow. Imaging the growth patterns can provide insights into plant behavior in space, crucial for potential long-term space missions where plants might be used for food and oxygen.

  • Convolutional Neural Networks (CNNs):

  • VGG (VGG16, VGG19): These are excellent for basic image classification tasks and can be fine-tuned for specific space biology imaging data.

  • ResNet (ResNet50, ResNet101): These have deeper architectures and can capture more complex patterns in images.

  • InceptionV3: Known for its efficiency and high performance in image classification.

  • U-Nets: Particularly useful for segmentation tasks, such as segmenting specific cellular structures in microscopy images.

An example

  1. Goal & Hypothesis: The space biologist aims to decipher how specific plant genes react to the microgravity conditions in space. She hypothesizes that certain genes play a pivotal role in plant adaptation to space and may be responsible for observed changes in growth or health.
  2. Data Collection: She begins with the Arabidopsis thaliana datasets OSD-427 and OSD-480 from NASA GeneLab which have RNA-Seq data of the plant in microgravity. She also has her own RNA-Seq data from a similar experiment she conducted recently.
  3. Pre-trained Model Exploration: On browsing the model zoo, she identifies a promising model from 2022 named scBERT specifically designed for RNA-Seq data. The model has been pre-trained on a vast array of Earth-based RNA-Seq datasets, making it adept at capturing the nuances of gene expression data.
  4. Data Preprocessing: Before utilizing scBERT, she pre-processes the RNA-Seq data to: Normalize gene expression values, handle missing data and align sequences and quantify them
  5. Transfer Learning with scBERT: She loads the scBERT model and fine-tunes it using her space-based RNA-Seq datasets: The model is trained on OSD-427, OSD-480, and her experiment data. During training, she adjusts the model's parameters slightly to adapt its knowledge to the specifics of microgravity-based gene expressions.
  6. Results & Interpretation: Once training is completed, she utilizes the fine-tuned scBERT model to: Identify genes that have significantly altered expression in space. Understand the potential biological pathways impacted by these genes. Determine if any of these genes are associated with stress responses, growth patterns, or other vital processes in the plant.
  7. Contribute The scientist upload her model to the model zoo.

Aim:

  1. To design a comprehensive database of publicly available biomedical datasets that could be used to pretrain different models for a “model zoo,” and
  2. To determine relevant publicly available space biology datasets that could then be used to refine the models to investigate specific space biology questions.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.