Giter VIP home page Giter VIP logo

epigenetic-modeling-for-cancer-detection's Introduction

Cancer Analysis Pipeline

This repository contains a pipeline for analyzing cancer and control patient datasets using next-generation sequencing data. The pipeline involves several functions that perform various tasks, from data preprocessing to neural network-based analysis. Please note that due to the intensive nature of the analysis, there might be cases where your computer's memory might not be sufficient to process all the data.

Function Descriptions

1. metadata_treat.py

This function utilizes data from the SraRunTable.txt metadata file, which can be obtained through the NCBI SRA Run Selector. It creates a dataset containing information about control and cancer patients, along with their SRA run names.

2. sra_script.py

Using the SRA run names from the dataset generated by metadata_treat.py, this function downloads SRA run data, aligns it to the human hg38 genome, and generates BED files. Replace "cancer" with "control" if using the df_control dataset.

3. Test_on_Script.py

This function conducts tests on the BED files to enable thorough analysis of the dataset. Various tests and quality checks are performed to ensure the reliability of the data.

4. Chrom_info.py

Extracts chromosome positioning information from the BED files, which is essential for generating histograms containing fragment distribution data.

5. histogram_creation.py

Uses the chromosome positioning information to create histograms that provide insights into fragment distribution patterns within the dataset.

6. AI_simple_NN_WRST.py

Implements a small neural network using the histogram_creation data. This neural network aids in data analysis and treatment.

Important Notice

Please be aware that due to the complexity of the analysis and the large amount of data involved, your computer's memory might be insufficient to handle all aspects of this pipeline. It's recommended to have a system with sufficient memory and processing capabilities before attempting to run this analysis.

Usage

  1. Clone the repository to your local machine.

  2. Create and activate a Conda environment to isolate dependencies for this pipeline:

    conda create -n cancer_analysis_env python=<python_version>
    conda activate cancer_analysis_env
    

Replace <python_version> with the desired Python version.

Install the required dependencies using Conda and Bioconda, including:

  • FASTQC

  • Bedtools

  • Samtools

    conda install -c bioconda fastqc bedtools samtools
    

Ensure you are using a Linux-based system, as the pipeline is designed to work best on this platform.

Run the functions in the order specified above, ensuring that you provide the necessary inputs and configurations.

Monitor memory usage during execution and consider utilizing a system with higher memory capacity if memory-related errors occur.

Dependencies

  • FASTQC
  • Bedtools
  • Samtools

Contributions

Contributions to this repository are welcome. If you encounter issues or have ideas for improvements, feel free to open an issue or submit a pull request.

epigenetic-modeling-for-cancer-detection's People

Contributors

sammburn avatar samuelbernard4 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.