Giter VIP home page Giter VIP logo

hcapca's Introduction

hcapca

hcapca is an R script for performing Hierarchical Clustering Analysis (HCA) and Principal Component Analysis (PCA) on LC-MS data. It was tested on the following operating systems:

  • macOS 10.14.4
  • Ubuntu 18.04.2 LTS
  • Windows 7
  • Windows 10

Demo

hcapca-demo

Instructions

1. Install Docker

Install Docker Community Edition (CE) for your operating system. For older Mac and Windows systems, you will need to install Docker Toolbox instead.

2. Get example data, scripts, and config file

Example data, R scripts, and config_file.yaml can be found by downloading this repo such that your directory structure looks like this:

base
  |--config_file.yaml
  |--hcapca.R
  |--accessory_functions.R
  |--app.R
  |--data
      |--example_feature_table.csv

You can use your own data instead of example data here. The format of each file must be like the example data.
Note: The base directory is where you unzip the data folder, R scripts, and config_file.yaml.

3. Housekeeping:

  • You must have administrator access to install Docker or Docker Toolbox.
  • You should increase the memory limits to allow the script to run. I recommend 4 GB of RAM and 2 cores to run the example data.
    • For Windows 7, open VirtualBox (installed as part of Docker Toolbox) from Start Menu as admin and stop the virtual machine default that is running. In settings for default, change the RAM and processor allocation.
    • For Linux, you don't need to do this since Docker has access to the entire system's resources.
    • For Windows 10 and macOS, open the preferences from the Docker app and increase resources as needed.
  • You must enable shared folders:
    • In Windows 10, you need to enable shared folders in preferences. Right click on the Docker icon in the system tray > settings > shared drives > check appropriate drives > Apply
    • In Windows 7, as before, access the VirutalBox as admin > stop the default virtual machine > go to settings for the virtual machine > Shared Folders > Add as needed
  • For Windows 7 please make a note of the IP address that is displayed when you first start the Docker Quickstart Terminal. Usually it is something like 192.168.99.100. This will be important in step 5 below.

4. Run the script to process data

4.1 For macOS and Linux

Open a terminal, cd to the base directory and run:

docker run --rm \
 --tty \
 --interactive \
 --volume $(pwd):/srv/shiny-server/hcapca:rw \
 --workdir /srv/shiny-server/hcapca \
 schanana/hcapca:devel hcapca.R

4.2 For Windows 10

Use the Powershell (not x86 or ISE, just Powershell) and type:

docker run --interactive `
  --tty `
  --rm `
  --volume //c/Users/username/path/to/base/directory:/srv/shiny-server/hcapca `
  --workdir /srv/shiny-server/hcapca `
  schanana/hcapca:devel hcapca.R

Be sure to replace /username/path/to/base/directory in the above command with the path to the base directory.

4.3 For Windows 7

Use the Docker Quickstart Terminal, cd to the base directory and run:

 docker run --interactive \
   --tty \
   --rm \
   --volume $(pwd):/srv/shiny-server/hcapca:rw \
   --workdir /srv/shiny-server/hcapca \
   schanana/hcapca:devel hcapca.R

You can use devel in place of latest in the last argument for a more cutting edge version.

Regardless of OS, a folder called output should be created within the base directory with the following structure:

base
   |--output
         |--report.html
         |--hca
         |   |--lots_of.pdfs
         |
         |--pca
            |--names_with_underscores.html
            |--directories_with_the_same_names

If your config file has parameter output_pca as FALSE, then the pca folder will be empty. Do not worry, you can explore individual PCAs for each node in the next step.

5. Explore results

5.1 For macOS and Linux

In the base run:

docker run --rm \
   --interactive \
   --tty \
   --name hcapca \
   --detach \
   --volume $(pwd):/srv/shiny-server/hcapca:rw \
   --workdir /srv/shiny-server/hcapca/ \
   --publish 3838:3838 \
   schanana/hcapca:devel

Navigate to http://127.0.0.1:3838/hcapca to view your results in an interactive website!

5.2 For Windows 10

Use the Powershell (not x86 or ISE, just Powershell) and type:

docker run --rm `
   --interactive `
   --tty `
   --name hcapca `
   --detach `
   --volume //c/Users/<username>/path/to/base/directory:/srv/shiny-server/hcapca:rw `
   --workdir /srv/shiny-server/hcapca `
   --publish 3838:3838 `
   schanana/hcapca:devel

As before, replace username/path/to/base/directory with the path to the base directory and navigate to http://127.0.0.1:3838/hcapca to view your results in an interactive website!

5.3 For Windows 7

On Windows 7, use the Docker Quickstart Terminal and in the base directory run:

docker run --interactive \
  --tty \
  --rm \
  --detach \
  --name hcapca \
  --volume "$(pwd)":/srv/shiny-server/hcapca \
  --workdir /srv/shiny-server/hcapca \
  --publish 3838:3838 \
  schanana/hcapca:devel

Navigate to http://your.ip:3838/hcapca where you should replace your.ip with the ip shown when docker starts up - as mentioned in section #3 Housekeeping above.

Table Format

1. Analyses.dat

Contains: Sample names
Format: One name per line; names can be any combination of letters and numbers. Duplicates will be removed.

A123  
B123  
C122
B455

2. Variables_m.dat

Contains: m/z values separated by spaces all on one line. Keep them in the same order as the retention time and table data.

198.0390 758.3503 0.0000 395.1877 366.2404 ...

3. Variables_t.dat

Contains: retention time values (in minutes or seconds) separated by spaces all on one line. Keep them in the same order as the m/z and table data.

12.3 8.0 7.3 2.02 12.3 ...

4. Table.dat

Contains: intensity values
Format: Space separated values; numbers only. Each line contains the values corresponding to the sample names from Analyses.dat above.

198.0390 758.3503 0.0000 395.1877 366.2404 ...
438.1456 378.1118 435.0968 525.2830 0.0000 ...
312.1296 349.1325 184.0433 0.0000 201.0702 ...
425.0835 0.0000 311.1384 371.2302 337.0677 ...

Some Example outputs

1. HCA

b0-15.pdf HCA of node b0 with 15 samples
b0-15.pdf

2. PCA

b0_PC1-2_S.html PCA Scores plot of b0 for PC1 and PC2 b0_PC1-2_S.html

Troubleshooting:

  • In some flavors of Windows, there may be an error in mounting a shared drive. If that happens, try the following:

    • Make sure the path is specified as //<drive_letter>/<path>
    • Make sure the shared drives are enabled and that particular path is shared
    • Make sure the terminal is running as administrator
  • Docker abruptly stops running the script

    • Make sure to allocate sufficient RAM and processing power to Docker. Usually, if the virual OS cannot get more memory, it experiences an Out Of Memory (OOM) error and kills the offending process thereby exiting the container.
  • For any issues please open an issue here on github or email me at [email protected]

Wiki

Here is a link to wiki page which is still under construction.

License

This project is licensed under the GNU General Public License v3.0 - please see the LICENSE.md file for full license.

Acknowledgments

A huge thank you to Chris Thomas for helping bring this idea to fruition by being a wall to bounce off ideas. He does not have a github account so here is a link to a pubmed search with his publications. Another big thank you to @cecileane for pointing me in the right direction to generate a tree from a dataframe. Finally, a shoutout to @WiscEvan for helping figure out bits and pieces of code here and there, especially with the shiny UX.

hcapca's People

Contributors

chanana avatar

Stargazers

 avatar  avatar  avatar

Forkers

wtwong316

hcapca's Issues

can't find function "auto_process"

hiヾ(•ω•`)o,I want to run this script, but when I reach the tree building step, the system reminds me that the function "auto_process" is missing, I don't know how to install it, can you help me::>_<::?

------calculating pca, hca-------

------calculated pca, hca-------

------processing tree-------

[1/4] found node at position: 1
[2/4] selected and processed data and made dendrogram
[3/4] made left child called: b0
[4/4] made right child called: b1Error in auto_process(dataframe = df, numOrVar = parameters$numOrVar, :
could not find function "auto_process"
Execution halted

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.