Giter VIP home page Giter VIP logo

ddot's Introduction

The Data-Driven Ontology Toolkit (DDOT)

The Data-Driven Ontology Toolkit (DDOT) facilitates the inference, analysis, and visualization of biological hierarchies using a data structure called an ontology.

  • Open-source Python package under MIT license. Supports Python 2.7 or >=3.6.
  • The HiView web application visualizes hierarchical structure and the biological evidence for that structure.

Documentation

For a quick start on DDOT's functionality, please see the tutorial and other Jupyter notebooks in the examples folder.

For further documentation, please see http://ddot.readthedocs.io/. This includes a description of the Ontology class and a list of utility functions.

Please post questions or issues to the Google Groups forum.

Installation

DDOT requires the following software

The recommended method for installing these dependencies is to use the Anaconda distrubution of Python, and then install Python packages via the conda and pip repositories.

# Create and activate a virtual environment (optional, but recommended).
# Learn more about virtual environments at https://conda.io/docs/user-guide/tasks/manage-environments.html
conda create -n <environment_name>
source activate <environment_name>
 
# Install dependencies
conda install -y pandas numpy scipy networkx=1.11
conda install -y -c conda-forge python-igraph
conda install -y libiconv # Needed for igraph to run properly
pip install tulip-python
pip install ndex-dev

Install the ddot Python package

After dependencies are satisfied, download or clone this repository

git clone https://github.com/michaelkyu/ddot.git

Next, compile C++ files for running CliXO v0.3 and an ontology alignment algorithm.

cd /path/to/ddot_repository
make

Finally, install ddot using pip. If you are installing within a conda virtual environment, remember to enter the environment with source activate <environment_name> before running pip.

pip install /path/to/ddot_repository

Known installation problems and tips

  • Older versions of Anaconda (<= v4.5) might not install the dependencies correctly. Consider updating Anaconda to the newest version by running conda update conda (outside of a virtual environment).
  • Make sure that no other local installations of Python is conflicting with Anaconda. In particular, check that the directory $HOME/.local/lib does not contain Python packages. If it does contain Python packages, check that those packages are not being imported.
  • If ddot does not import successfully in a Python terminal, first check that the dependencies can be imported. In particular, check that import ndex, networkx, igraph, tulip, numpy, scipy, pandas works.
  • Please raise any other installation problems as an issue on this github repo.

Docker image

A Docker image of DDOT is located online at Docker Hub. To learn more about Docker, see https://docs.docker.com/get-started/

Download and run image from Docker Hub

For Python 3.6,

# Download image installed with DDOT in anaconda3 (Python 3.6)
docker pull michaelkyu/ddot-anaconda3
# Run image in a container
docker run -it -p 8888:8888 michaelkyu/ddot-anaconda3

For Python 2.7,

# Download image installed with DDOT in anaconda2 (Python 2.7)
docker pull michaelkyu/ddot-anaconda2
# Run image in a container
docker run -it -p 8888:8888 michaelkyu/ddot-anaconda2

Using DDOT in Docker

After running the image, you will be inside the container's command line. Here, you can run DDOT in a basic Python terminal

(base) root@<container>:/$ python

Python 2.7.14 |Anaconda, Inc.| (default, Dec  7 2017, 17:05:42) 
[GCC 7.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import ddot

Alternatively, you can run DDOT in example Jupyter notebooks. To do so, start a Jupyter server in the container's command line

(base) root@<container>:/$ jupyter notebook --no-browser --allow-root --ip 0.0.0.0 --NotebookApp.token=''

Next, open up your web browser and access the notebooks at http://0.0.0.0:8888/. We recommend starting with the tutorial Tutorial.ipynb.

Citing DDOT

If you find DDOT helpful in your research, please cite

Michael Ku Yu, Jianzhu Ma, Keiichiro Ono, Fan Zheng, Samson H Fong, Aaron Gary, Jing Chen, Barry Demchak, Dexter Pratt, Trey Ideker. "DDOT: A Swiss Army Knife for Investigating Data-Driven Biological Ontologies". Cell Systems. 2019 Mar 27;8(3):267-273.

ddot's People

Contributors

antonkratz avatar michaelkyu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

ddot's Issues

[`to_ndex`] "description" field does not actually upload from ddot into NDEx

With the to_ndex function, it seems that the "description" field does not actually upload from ddot into NDEx. Or it uploads but NDEx does not accept it. The description field could be extremely useful (I am uploading a large number of hierarchies and am auto-generating description text which describes model building parameters, data provenance etc), but it does not seem to get into NDEx, the "description" field there is just empty.

url, _ = ont.to_ndex(name="sho6Nooz",
  description="a_0.07_b_0.5_m_0.004_z_0.1, cutoff 0.287 algnmt GO BP DNA Repair",
  ndex_server='http://test.ndexbio.org',
  ndex_user='kratz',
  ndex_pass='XXXXXXXX',
  network=sim_long,
  main_feature='similarity',
  layout='bubble-collect')

I have described this issue in the Cytoscape Slack on 2019-01-19.

Servers I'm using: http://test.ndexbio.org, http://hiview-test.ucsd.edu, http://hiview.ucsd.edu​

Pre-compiled CLIXO does not work

Having pre-compiled C++ code in the repository is a bit confusing, since it doesn't generally work on each person's system. This isn't obvious from using the python code, since using subprocess.Popen fails completely silently if the current system can't run the clixo binary. Note: I'm running Mac OS 11.13 Sierra with the following clang:

$ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 10.0.0 (clang-1000.11.45.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

Further, I went to re-compile myself and got the following error due to the inclusion of the -static flag to include static libraries:

$ make clean
rm *.o
(ddot) [529] [13:20] [cthoyt@wlan-224:~/dev/ddot/ddot/mhk7-clixo_0.3-cec3674]
$ make all
g++ -Wall -O4 -c -std=c++11 -fomit-frame-pointer -funroll-loops -fforce-addr -fexpensive-optimizations -static -I . clixo.cpp
clang: warning: -O4 is equivalent to -O3 [-Wdeprecated]
clang: warning: optimization flag '-fexpensive-optimizations' is not supported [-Wignored-optimization-argument]
In file included from clixo.cpp:3:
./dagConstruct.h:850:32: warning: '/*' within block comment [-Wcomment]
    /*if ((printClusterInfo && /*!combiningNow &&*//* !currentClusters[clusterToDelete].wasCheckedFinal()) || (currentClusters[clusterToDelete]...
                               ^
./dagConstruct.h:932:14: warning: '/*' within block comment [-Wcomment]
      /*if ((/*!combiningNow && *//*!clusterToExtend_it->wasCheckedFinal()) || clusterToExtend_it->isValid()) {
             ^
./dagConstruct.h:1403:17: warning: private field 'numClusters' is not used [-Wunused-private-field]
  unsigned long numClusters;
                ^
3 warnings generated.
g++ -Wall -O4 -std=c++11 -fomit-frame-pointer -funroll-loops -fforce-addr -fexpensive-optimizations -static -I . clixo.o -o clixo
ld: library not found for -lcrt0.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [clixo] Error 1

I'm not much of a dev-ops person and I'm still working through the issue, but this thread had some information that sort of worked: https://discussions.apple.com/thread/1945589?answerId=9213715022#9213715022. It suggested removing the -static flags, or even better, adding more conditionals in the makefile to check what system is being run and if this is a good idea or not. After, I was able to re-compile it and got the following output:

$ make clean
rm *.o
(ddot) [533] [13:23] [cthoyt@wlan-224:~/dev/ddot/ddot/mhk7-clixo_0.3-cec3674]
$ make all
g++ -Wall -O4 -c -std=c++11 -fomit-frame-pointer -funroll-loops -fforce-addr -fexpensive-optimizations -I . clixo.cpp
clang: warning: -O4 is equivalent to -O3 [-Wdeprecated]
clang: warning: optimization flag '-fexpensive-optimizations' is not supported [-Wignored-optimization-argument]
In file included from clixo.cpp:3:
./dagConstruct.h:850:32: warning: '/*' within block comment [-Wcomment]
    /*if ((printClusterInfo && /*!combiningNow &&*//* !currentClusters[clusterToDelete].wasCheckedFinal()) || (currentClusters[clusterToDelete]...
                               ^
./dagConstruct.h:932:14: warning: '/*' within block comment [-Wcomment]
      /*if ((/*!combiningNow && *//*!clusterToExtend_it->wasCheckedFinal()) || clusterToExtend_it->isValid()) {
             ^
./dagConstruct.h:1403:17: warning: private field 'numClusters' is not used [-Wunused-private-field]
  unsigned long numClusters;
                ^
3 warnings generated.
g++ -Wall -O4 -std=c++11 -fomit-frame-pointer -funroll-loops -fforce-addr -fexpensive-optimizations  -I . clixo.o -o clixo
g++ -Wall -O4 -c -std=c++11 -fomit-frame-pointer -funroll-loops -fforce-addr -fexpensive-optimizations -I . clustersToDAG.cpp
clang: warning: -O4 is equivalent to -O3 [-Wdeprecated]
clang: warning: optimization flag '-fexpensive-optimizations' is not supported [-Wignored-optimization-argument]
In file included from clustersToDAG.cpp:3:
./dagConstruct.h:850:32: warning: '/*' within block comment [-Wcomment]
    /*if ((printClusterInfo && /*!combiningNow &&*//* !currentClusters[clusterToDelete].wasCheckedFinal()) || (currentClusters[clusterToDelete]...
                               ^
./dagConstruct.h:932:14: warning: '/*' within block comment [-Wcomment]
      /*if ((/*!combiningNow && *//*!clusterToExtend_it->wasCheckedFinal()) || clusterToExtend_it->isValid()) {
             ^
./dagConstruct.h:1403:17: warning: private field 'numClusters' is not used [-Wunused-private-field]
  unsigned long numClusters;
                ^
3 warnings generated.
g++ -Wall -O4 -std=c++11 -fomit-frame-pointer -funroll-loops -fforce-addr -fexpensive-optimizations  -I . clustersToDAG.o -o clustersToDAG

Tulip does not work on python37

Right now, there hasn't been a wheel added for py37 so it can't yet be used. It seems like they do not have their source code publicly available so I don't think it's an option to build from scratch either...

unable to pull any of the two docker images

I am trying to use ddot by using docker. As in subject line, I am unable to pull any of the two docker images:

kratz@kratz-VirtualBox:~ (master) $ docker pull michaelkyu/ddot-anaconda2
Using default tag: latest
Error response from daemon: manifest for michaelkyu/ddot-anaconda2:latest not found
kratz@kratz-VirtualBox:~ (master) $ docker pull michaelkyu/ddot-anaconda3
Using default tag: latest
Error response from daemon: manifest for michaelkyu/ddot-anaconda3:latest not found

EmptyDataError when running Clixo

I am trying to run CliXO on a pairwise matrix of gene similarity scores. If I'm understanding the error message correctly, the temporary table generated to store the ontology is empty and resulting in this error. Do you know why that is? Or, if I am reading the error message incorrectly, could you elaborate on how I can go about fixing the issue?

CliXO_HPAbicor.pdf
HPA_bicor.csv

CalculateFDR failed to write the FDRs output

I got this error sed: -e expression #1, char 5: unknown command: `'after all the iterations are done, and then the FDRs output is blank. I figured that there may be something wrong with writing the final outputs. Then I found that the error is from the line 47-51 in the CalculateFDR function:
for i in $(seq 0 $(($NUM_ITER - 1)) )
do
END_OF_HEADER=$(grep -n Matched $RESULTS_DIR/rand_files/alignment_"$i" | sed 's/:.*//');
sed "1,$END_OF_HEADER""d" $RESULTS_DIR/rand_files/alignment_"$i" | sort -r -n -k 3 | cut -f 1,2,3 > $RESULTS_DIR/rand_files/alignment_without_descendents_$i
done

This loop looks for the line number where 'Matched' occurs in the file and gathers information after that line. However, for some of my alignment files, there are 2 'Matched' in the file, which leads to an error when generating "alignment_without_descendents" file. I am curious what causes the additional 'Matched' to appear in the file? Thanks

Ddot Tutorial Error

Hi, I installed all of the packages required to run ddot per instructions and cloned the repository, and I was trying to run the tutorial notebook in Jupyter. When I try to construct the ontology in the tutorial, I get this error:

TypeError Traceback (most recent call last)
in
20
21 # Construct ontology
---> 22 ont = Ontology(hierarchy, mapping)
23
24 # Prints a summary of the ontology's structure

~\Anaconda3\lib\site-packages\ddot\Ontology.py in init(self, hierarchy, mapping, edge_attr, node_attr, parent_child, add_root_name, propagate, ignore_orphan_terms, verbose, **kwargs)
606
607 if edge_attr is None:
--> 608 self.clear_edge_attr()
609 else:
610 assert edge_attr.index.nlevels == 2

~\Anaconda3\lib\site-packages\ddot\Ontology.py in clear_edge_attr(self)
709
710 self.edge_attr = pd.DataFrame()
--> 711 self.edge_attr.index = pd.MultiIndex(levels=[[],[]],
712 labels=[[],[]],
713 names=['Child', 'Parent'])

TypeError: new() got an unexpected keyword argument 'labels'

It seems to be an error in the source code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.