systemsgenetics / kinc Goto Github PK

View Code? Open in Web Editor NEW

11.0 11.0 4.0 184.29 MB

Knowledge Independent Network Construction

License: MIT License

C++ 68.19% C 4.56% Python 13.04% QMake 1.07% Shell 3.01% Cool 0.30% Dockerfile 0.52% Cuda 5.82% Makefile 0.60% R 2.89%

kinc's People

Contributors

Stargazers

Watchers

Forkers

nlmills johnhadish mitchsgreer mfayk

kinc's Issues

Encoding of sample mask in cluster matrix

Now that the clustering analytics are more or less complete, I wanted to figure out what all the different values for the sample mask are. Right now the analytics just write 1 or 0 to denote membership in a cluster, but I know that KINCv1 used a few additional values (looking at PairWiseCluster.cpp). Which ones do we still need?

Working list of values:

0: not in cluster
1: in cluster
6: not in cluster, removed by minimum threshold
7: not in cluster, removed by pre-clustering outliers
8: not in cluster, removed by post-clustering outliers
9: not in cluster, removed by nan-check

Loop unrolling for pair-wise GMMs

Since we are only using GMMs with D=2, I think we can get a great amount of speedup with loop unrolling. GMMs of this problem size perform many of the following operations:

adding 2x1 vectors
scaling 2x1 and 2x2 matrices
computing products of 2x1 and 2x2 matrices
computing determinants and inverses of 2x2 matrices

These operations are typically done with BLAS / LAPACK routines which use loops, but because these matrices are so small, the loop overhead is significant. I wrote a quick program that runs a few tests with addition and multiplication and I'm able to get roughly 2x speedup. I think this is definitely worth doing in in GPU implementation because the GSL routines won't be available there, but I may go ahead and do it for the serial implementation too.

Comparison of KINC-python / KINCv1 / KINCv3 output data

A few weeks ago I generated cluster data for Yeast using KINCv1, and I used the Import Correlation Matrix analytic to convert the cluster data into CCM and CMX files.

The KINCv1 CCM is 3.4 GB while the KINCv3 CCM is 8 GB, which is a pretty big discrepancy. Now, considering that we're using a different GMM implementation in KINCv3, we should expect some difference. But it seems like the KINCv1 CCM is just generally less dense, so either KINCv1 finds multiple clusters less often or it is filtering out clusters at the end, I will be looking into this.

In the meantime, we can still do a data evaluation for the paper by comparing KINCv1 data with it's equivalent imported data. And that comparison is still good: the EMX is smaller than the raw text, and the combined CCM / CMX is smaller than the KINCv1 cluster data.

Here is a more exhaustive listing of what current data sizes look like for Yeast. Note that I don't have a reliable data point for KINCv3 CMX yet.

6.7G	data/Yeast/clusters-sc
8.1G	data/Yeast/Yeast.ccm
6.7G	data/Yeast/Yeast-clusters.txt
20M	data/Yeast/Yeast-ematrix.txt
5.3M	data/Yeast/Yeast.emx
3.4G	data/Yeast/Yeast-imported.ccm
395M	data/Yeast/Yeast-imported.cmx

Missing header file

spearman2.cl.h is missing... causes problems during make

Crashes with no GPU

When I go to Settings > Set OpenCL Device, then KINC crashes with a core dump. I'm assuming this is because I have no GPU in my VM.

Problem using Spearman with existing cmx file

I tried re-running a Spearman analytic after I had prematurely killed KINC in a previous run. Repeating the same Spearman command led to this error:

ACE Exception Caught!
What: AccelCompEng::NVMemory::Node::NullPtr
Location: void AccelCompEng::NVMemory::Node::read(int64_t):378

removing the output .cmx file that was there from the previous run resolved the problem.

Minor changes

Just a few minor changes:

In the source code let's change the 'run' folder to 'bin'. I think this is more consistent with most software.
In between the console prompt (i.e. KINC:>) and where users type commands can we add a space? This is similar to the bash console.
On Kamiak kinc takes a while to load. I assume it's because it's checking on the GPUs available. Perhaps we should print out something to the screen immediately something like what R does--a bit of a textual "splash screen". Some initial help text would be beneficial. And maybe some text to let the user know why it's hanging a bit.. This way it doesn't seem like just stuck.

Command-line argument error

When I try to run KINC with ACE console, I get an argument error with both clustering analytics. To make sure it wasn't just me, I also tried to run spearman and got the same error:

$ build/KINC run spearman --input "data/Yeast.emx" --output "data/Yeast.cmx" --min 30 --minthresh 0.5 --maxthresh 1 --bsize 4 --ksize 4096
build/KINC: /usr/local/cuda/lib64/libOpenCL.so.1: no version information available (required by build/KINC)
build/KINC: /usr/local/cuda/lib64/libOpenCL.so.1: no version information available (required by /home/bent/software/lib/libacecore.so.0)
0%CRITICAL ERROR

Argument Error

Did not get valid input and/or output arguments.

File: ../src/spearman.cpp
Line: 237
Function: virtual bool Spearman::initialize()

The command was taken verbatim from the KINC GUI. It looks like input / output data objects aren't being initialized properly, even though the ACE source code looks good and this operation works with ACE GUI. As a result I haven't been able to run anything from the command-line yet.

(Also, the version information lines don't seem to cause any problems but if you know how to get rid of them that would be nice to know.)

KINC not using all GPUs?

I'm running KINC on the yeast data on Kamiak and I'm wondering if it's using all GPUs. When I run KINC stand-alone it sees 4 GPUs. There are 2 K80s. If I run KINC with 2 tasks per node then GPU with index 0 shows about 50% busy (my observation). If I run KINC with 4 tasks per node then the GPU with index 0 is 100% busy. However, GPUs 1,2 and 3 show 0% utilization.

I'm using this command to check on utilization:

nvidia-smi -q -g 0 -d UTILIZATION -l 1

Thoughts?

CL_OUT_OF_RESOURCES with GMM OpenCL

I've had a few cases now on Palmetto where when I try to run GMM OpenCL I get this OpenCL error. It has occurred as early as 0% and as late as ~50%, and it has occurred for both single-GPU and multi-GPU. However, it doesn't always happen. I've been able to produce a Yeast CCM multiple times on Palmetto, although I think it was on an interactive node.

Unfortunately this OpenCL error is broadly defined, so there are a few possibilities that I'm aware of:

the device ran out of kernel / memory resources
a segfault occurred in a kernel (such as invalid memory access)
a kernel took too long to execute

I'm interested to see if this problem occurs on Kamiak, but either way I'll be investigating these possibilities today. I've only ever seen this problem on Palmetto, never on my laptop, so there could be some clashing caused by sharing resources with other users. As we've said before, PBS doesn't always hide GPU resources perfectly from users who didn't request them.

Suggested changes to similarity analytic

Can we make these changes to the similarity analytic

Have the ability to turn off clustering all-together? We have 'gmm' and ''kmeans'. Can we have a 'none'?
Change the default "Criterion" from BIC to ICL. I think that for KINC v1 the default was BIC but after discussions with MixMod folks and running my own ICL does seem to be a better default as it does a better job of finding fewer clusters.
Can we put the Block Size and kernel Size in a "GPU settings" section? Does QT support collapsible field sets. That would be nice to hide those away by default but let the user tweak them if needed.
I think the pre and post outlier removal should be checked by default? Any objections?

Issues with creating cluster matrix

I'm still having issues reading the cluster matrix produced by GMM. Again, I'm not sure if it's a problem with my code or the CCMatrix code since there aren't any other analytics that produce this data type. As a quick experiment, I add this code to GMM::runSerial() in order to produce a cluster matrix with just the first few pairs in a few seconds:

   // increment through all gene pairs
   int numPairs = 1000;
   while ( numPairs-- > 0 && vector.geneX() < _output->geneSize() )

   // also remove OpenCL capability in gmm.h to force runSerial()

Then when I try to open the cluster matrix I get this error repeatedly:

Attempting to seek to gene pair 754 when total size is 607.
File: ../src/genepair_base.cpp
Function: void GenePair::Base::seekPair(qint64) const
Line: 277

As far as I can tell, inserting that 1000-pair limit shouldn't affect the validity of the cluster matrix, it only lowers the number of pairs that are written to it. But to be sure I ran GMM completely on Yeast and I encountered the same error but with larger numbers. Am I doing something wrong with creating the cluster matrix? I don't really understand why this is happening.

Optimize gene pair reads in correlation analytics

After looking through the RMT code a while back, I realized that GenePair::Base provides methods to iterate through the sparse gene pair list, which is significantly faster than reading every single gene pair vector, so I should be able to use this technique in the correlation analytics. It might take some gymnastics but adding the CCM input seriously increased runtime of both analytics so it'll be worth it.

Provide correlation analytics with optional cluster matrix input

I'm pretty much done with the clustering analytics, we still need to test them thoroughly but they can run and they can output a cluster matrix. So it looks like the last piece needed to use GMMs in the KINC pipeline is to enable each correlation analytic to use a cluster matrix. I'm guessing this will be an optional argument so that the analytic would use it's current behavior if no cluster matrix is provided.

@4ctrl-alt-del were you already planning to do this? If so then it's ready for you, but I'm prepared to implement it myself if you're working on other things.

Command-line usage text

I'm wondering if it would be helpful to output help text if the user tries something like KINC help. The help text would list the command-line equivalents of each analytic. It would be a great way to show very quickly what all KINC can do from the command line, and it shouldn't be too hard to generate the commands from the analytics since it would use the same information used to generate the input forms. This goes also if the user tries KINC run [...] but provides invalid arguments; as far as I can tell, if that happens then KINC just starts up like nothing happened.

Now that I think about it, this could probably be implemented within the ACE framework. Something to think about.

Request for Kendall Tau Correlation Analytic

Kendall Tau is similar to Speamans, but is more robust for smaller sample sizes. This may make for a nice addition.

No Qt support on Kamiak

Hey @bentsherman so I have ran into a wall trying to run the new KINC on kamiak. It has no Qt support. I tried to install it manually but qt only offers GUI only installers now to install qt apparently. I am at a bit of a loss what to do. All I need it qtcore but I can't found its source did you run into this?

Use all GPUs?

According to @4ctrl-alt-del the ability to use all GPUs on a node will be available in the next update to ACE and then KINC has to be adjusted to use the newer version of ACE. I have access to 20 GPUs, 4 each on 5 nodes. But, I'm only currently able to use 1 at a time on each node. I've found that on Kamiak that 4 processes will keep 1 GPU fully busy. If I could launch a KINC job that can run 60 MPI processes that would be awesome. I was able to process 30% of the yeast data in 2 hours using 10 MPI processes. So, would it be possible to tweak ACE/KINC such that I can use all the GPUs available to me before the next version comes out? This way we can start cranking on networks.

Optimize MPI / OpenCL parameters in analytics

Since we can query OpenCL device information, I'm wondering if there is an optimal way to choose the block size and kernel size based on the capacity of the OpenCL device. For example, set the block size to the number of compute units and the kernel size to the maximum workgroup size. If I understand everything correctly, that would keep the device fully utilized.

However, I wonder about the trade-off of having fewer blocks with proportionately more kernels. For example, for a device with 10 compute units and max workgroup size of 1024 (like my GPU), you could have 10 blocks with 1024 kernels, or 5 blocks with 2048 kernels, or 1 block with 10240 kernels... I think each case would keep the device fully utilized but perhaps with different ramifications for the I/O between host and device. Perhaps if we could automate that trade-off, then we would be set. This would all be very useful once we start using MPI on a cluster of nodes that may not have uniform capabilities.

correlation matrix goes out of range

Something wrong with the diagonal size of cmatrix along with going way beyond the limit of comparisons... investigate.

Multi-processing with MPI

Josh mentioned that you guys plan to use MPI in order to harness multiple compute nodes. Not necessarily a high priority at the moment, but I was wondering if you guys have any particular plans for how you will do this, especially in conjunction with ACE?

Import/export cluster matrix analytics

I think it would be helpful to be able to import output data from KINCv1 into the new format so that we can compare them easily with the new version. The import analytic would take a directory of cluster data produced by similarity and produce both a cluster matrix and a correlation matrix, and vise versa for the export analytic. It also might give us another window into the CCM issues we've been having.

I think I will start working on this when I have time.

Update comments

Just a reminder to myself to add comments to function headers where appropriate, especially in the Similarity classes to clarify how data is moved around.

KINCv3 can't import file

When I run this command on Kamiak:

kinc-cli run import_emx --input Yest-ematrix.txt --output Yeast.emx --nan NA

I'm getting MPI errors:

 --------------------------------------------------------------------------
 It looks like MPI_INIT failed for some reason; your parallel process is
 likely to abort.  There are many reasons that a parallel process can
 fail during MPI_INIT; some of which are due to configuration or environment
 problems.  This failure appears to be an internal failure; here's some
 additional information (which may only be relevant to an Open MPI
 developer):
 
   ompi_mpi_init: ompi_rte_init failed
   --> Returned "(null)" (14) instead of "Success" (0)
 --------------------------------------------------------------------------
 *** An error occurred in MPI_Init
 *** on a NULL communicator
 *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
 ***    and potentially your MPI job)

I'm using openmpi 1.10.1

Error messages more specific

When loading an expression file if you don't provide the --nosample=NA then you get this error message if missing values are present:

KINC[GEM-log-no.emx]:>load GEM-log-no.txt
Importing header information...
Importing gene samples[0%]...ACE Exception Caught!
What: EMatrix::InvalidFile
Location: void EMatrix::read_gene_expressions(std::ifstream&, AccelCompEng::Terminal&, const string&):428

Can you add a message to the Ace::assert function for a message that would help provide a bit more information about why the file is invalid. Otherwise the user doesn't really know why. Just that it's an invalid file.

open -> select -> load shortcut

To keep from needing to execute open, select and load commands I think we need a shortcut command that can do all three at once. Especially if loading fails then you have to do all three again....

Provenance

Add the ability to include provenance from previous steps in the workflow.

Use sample string which embeds all clusters to reduce size

It seems that discussions keep spawning more discussions... but I've had this idea for a while and the previous issue about the similarity analytic prompted me to bring it up.

We currently represent the sample string as a list of lists denoting binary membership in a cluster, for example:

00119 10009 01009

This format is highly redundant but allows us to easily include "error" codes like 6, 7, 8, and 9. However, in the cIustering analytics actually use this format:

1200(-9)

So each number actually denotes the cluster index, with negative numbers denoting error codes. Not as readable, but much more compact. Up to this point I would just use this format and convert appropriately when saving to the CCM, but we could also just use this format in the CCM. I know you guys said you were saving samples as 4-bit values but as far as I can tell from ccmatrix.cpp they are still saved as 8-bit values, so if that's the case then this compressed format should reduce the file size by 2-4x depending on the number of clusters per gene pair. Which might be enough savings to allow us to keep our analytics separated.

We can also convert easily between the two formats for things like displaying the CCM and converting between the KINC.R format.

Organization of clustering and correlation analytics

As I'm looking at how to implement mixture model clustering, I'm beginning to see a multi-stage pipeline with options at several points:

*.emx ---> clustering [---> ???] ---> correlation ---> *.cmx

clustering:
- none
- k-means
- GMM

correlation:
- Pearson
- Spearman
- ...

So I'm trying to figure out how to best implement this pipeline for the long-term. It looks like KINCv1 can combine clustering with any correlation method, with minimal duplication. Perhaps we will need to create a new data type for the "augmented" expression matrix? It would parallel the PairWiseClusterList from KINCv1. Then the clustering and correlation analytics could be kept separate and the user could simply use the pipeline illustrated above.

Combine clustering and correlation analytics

Based on the discussion from #34, it looks like we may need to create an analytic which can perform both clustering and correlation at once. This is because the correlation analytics can throw away output data which does not meet a threshold (currently 0.5 by default), but the clustering analytics can't do this because they don't know the correlation. As a result, the CMX and CCM files produced from an EMX will be mismatched; the CMX will not contain data for every cluster in the CCM, and there will be no way to match the correlation data with the cluster data. So we have two options:

Change the correlation analytics to save all correlations
Create an analytic to perform both clustering and correlation

Since the whole reason for thresholding the correlations was to reduce file size, I feel that we have no choice but to take option 2. I'm imaging an analytic with the following arguments:

input EMX file
output CCM file
output CMX file
clustering method (none, GMM, k-means)
correlation method (pearson, spearman)
min samples
min expression threshold
min clusters
max clusters
criterion (BIC, ICL)
remove pre-outliers
remove post-outliers
min correlation threshold
max correlation threshold

So essentially an analytic which parallels KINCv1 similarity.

Any thoughts @spficklin @4ctrl-alt-del ?

GraphML output

KINC should output a GraphML version of a network that has all of the usual KINCv1 output attributes on the edges.

Save sample string with 4-bit elements in CCM

I know we've said that we're saving sample strings with 4-bit elements instead of 8-bit in the CCM format, but as far as I can tell, the CCM code is still saving 8-bit values. @4ctrl-alt-del can you confirm or deny? We should make sure this is implemented at some point.

Parallel random number generator for analytics that use MPI

In particular, the clustering analytics rely on RNG for initialization, so currently if MPI is used then each worker ends up using the same sequence. I handle this problem in the OpenCL kernels by seeding rand() with the work-item's global ID. I could do something similar in the MPI code by calling srand(mpi.rank()). However, I think this issue might warrant some thought.

Right now I just use rand(), which is the most basic RNG, but there are many choices: C also provides drand48(), C++11 provides <random>, and Qt provides QRandomGenerator. It may even be worthwhile to create a class in KINC (or even in ACE) which provides a unified solution to RNG for both single-core and MPI.

SIngularity image for version1

It is difficult to compile KINC 1.0 in new environments, with it often being necessary for the user to modify the makefile significantly. Mixmodlib is not typically installed as software modules on HPC clusters (ie palmetto or comet), and we've had issues with memory leaks in the past. I think that having a singularity image for a stable kinc build would be helpful. It will be a large image, but would save users a lot of headache when trying to compile kinc 1.0.

Compile issues

When trying to compile KINC v3.2.0 on kamiak I get the following errors:

/usr/bin/ld: cannot find -lmpi_cxx
/usr/bin/ld: cannot find -lkinccore

I had to edit the makefile to remove the -lmpi_cxx and I had to alter the path of the build directory where libkinccore from this:

-L/data/ficklin/software/src/KINC/src/../../build/libs

to this:

-L/data/ficklin/software/src/KINC/src/../build/libs

GSL error with RMT on yeast data

When running RMT on the yeast data I'm getting the following error:

threshold: 0.99
prune matrix: 5246
eigenvalues: 5246
unique eigenvalues: 2560
gsl: interp.c:83: ERROR: x values must be monotonically increasing

Here's my command-line:

kinc run rmt --input yeast.cmx --log yeast-RMT.log --tstart 0.99 --tstep 0.001 --tstop 0.5 --minpace 10 --maxpace 40 --bins 60

Make help items for all data/analytics

Yes.

Wrong input args for KINC cause MPI to hang.

I reported this verbally to @4ctrl-alt-del but I thought I'd add it here for the record. When running KINC v3 on Kamiak for the first time I requested 4 nodes with 4 GPUs and used the following submission script:

#!/bin/sh
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=4
#SBATCH --gres=gpu:tesla:4
#SBATCH --time=12:00:00    
#SBATCH --job-name=SC_similarity
#SBATCH --output=logs/02-SC_similarity.log
#SBATCH --mail-type=ALL

module load gcc/6.1.0 MPI/openmpi/3.0.0 cuda/9.1.85 qt/5.10.1 ACE/dev KINC/3.2.1

srun -v --mpi=pmi2 -l kinc run similarity --input "Yeast.emx" --clus "Yeast.ccm" --corr "Yeast.cmx" --clusmethod "gmm" --corrmethod "spearman" --minexpr -inf --minsamp 15 --minclus 1 --maxclus 5 --crit "ICL" --preout TRUE --postout TRUE --mincorr 0.5 --maxcorr 1 --ksize 4096

The job launched just fine, but I observed the following incorrect behavior:

The processes on 3 of the 4 nodes properly bound to the GPUs, but they were using 0% of the GPU. Thus I had 12 threads assigned to 12 GPUs but there were doing nothing not even on the CPU.
The processes on the master node (where the MPI master was running) were not bound to the GPU and one of the processors was periodically busy. An 'strace' showed it seemed to just be looping and sleeping.

It turns out the problem was caused by me providing old arguments and not using the new updated arguments. When I changed the arguments to the following then it worked just fine and finished in 1hr 13 mins:

srun -v --mpi=pmi2 -l kinc run similarity --input Yeast.emx --ccm yeast.ccm --cmx yeast.cmx --clusmethod gmm --corrmethod spearman -minexpr -inf --minsamp 15 --minclus 1 --maxclus 5 --crit ICL --preout TRUE --postout TRUE --mincorr 0.5 --maxcorr 1 --ksize 4096

So, KINC or ACE needs to handle the situation gracefully when bad, unknown or incorrect arguments are provided.

Unit tests

Need to develop a suite of unit tests as our code begins to solidify. Going to try to document some tests here so that I remember to develop them later:

Data types: test writing to file / reading from file

CCMatrix
CorrelationMatrix
ExpressionMatrix

Algorithms: test with small data (Iris / Yeast), verify results against other libraries (gsl, sklearn)

GMM
KMeans
Pearson
Spearman

Analytics: test all applicable capabilities with small data, verify results across capabilities

import/export
similarity
RMT
extract

Random number generation in OpenCL kernels

I should preface by saying that I could be doing something wrong here, and I'm not an expert on the CCMatrix source code, but as far as I can tell, everything looks right in the clustering analytics when writing to the cluster matrix. I ran the Yeast GEM through K-means, and I know that some gene pairs produced multiple clusters, but when I view the cluster matrix I never see more than one sample mask per pair. I think either the clustering analytic isn't saving the clusters properly or the CCMatrix itself isn't displaying the clusters properly. I'd like to see if anyone else can produce the same cluster matrix as me and see for themselves:

1, Import the Yeast GEM to emx
2. Run K-means with Yeast GEM (should take only a few minutes with GPU)
3. Open cluster matrix

The relevant code can be seen in KMeans::savePair() and KMeans::runReadBlock(). The GMM analytic uses the same code but K-means is much quicker to test.

Structure of sample mask for gene pair clusters

While implementing the k-means analytic I came across a design choice over the sample string used in the CCM. That is, the sample string could contain values for all samples, or it could contain values for only those sample pairs that didn't have NaNs. The latter is a little easier to implement in the k-means analytic (it's what I have now) and it would probably compress the CCM a little bit, but gene pairs in the CCM wouldn't have a constant size. I'm wondering if that would cause difficulties for other pieces of code that will use the CCM? I figure if you always have to filter out NaNs anyway, might as well compress the sample string too.

OpenCL error OUT_OF_RESOURCES in similarity with --clusmethod=gmm

While testing on Palmetto I get an OUT_OF_RESOURCES OpenCL error when I run similarity with GMMs enabled. It happens consistently on a K40 GPU; it seems to happen only sometimes on P100 GPUs, particularly if I try to do multi-GPU. It does not seem to happen on my local machine (GTX 1080Ti) which is frustrating. This problem is currently keeping me from being able to do multi-GPU tests.

This OpenCL error typically occurs when the GPU runs out of memory or something akin to a segfault happens on the GPU. Since we're not anywhere near the global memory limit I'm guessing it's the latter, so I'm going to study the GMM kernel for potential bugs. I'll update this issue when I find anything.

Criterion option for GMM clustering, implement BIC and ICL

Need to implement ICL criterion for GMM clustering, which requires entropy metric from the pair-wise GMM. Also need to provide a GUI / command line option to select criterion as a drop-down.

mixture model data makes files too big

mixture model data makes file sizes of insanely large size with large sample sizes(74000 genes and 2000 samples produces a 6.67 terrabyte file).

As a result need to remove mixture model data from correlation matrix.

Compare results of serial/OpenCL implementations for clustering analytics

Need to make sure that serial and OpenCL implementations of each clustering analytic are getting similar results (that is, compare KMeans serial to KMeans OpenCL, compare GMM serial to GMM OpenCL). Because of randomness we shouldn't expect identical results, but I would expect to see similar distributions of cluster sizes. That is, if GMM serial picks K=2 a lot but GMM OpenCL picks K=5 a lot, I think that would be a problem.

Also may want to consider how we want to seed the RNG (random number generator). Currently all implementations use a fixed seed, which means that several independent runs of GMM clustering on the same dataset should yield the exact same results. We could instead seed the RNG with something like time so that it varies across each run. Not sure how I would do that in OpenCL though.

Add documentation

There is a lack of documentation for how to run the three steps (similarity, threshold, extract). At the very least, example command lines should be provided for the version1 branch (I can work on this).

Waiting for multiple OpenCL events in parallel

I wanted to point out that there are now a few places where multiple OpenCL events are generated and ostensibly should happen in parallel but right now the implementation is not completely correct. In particular runExecuteBlock in KMeans / GMM and runStartBlock in Spearman / Pearson. In all four cases, two buffers are copied in parallel, so block.event only has the event for the second copy at the end of the function. Since the last event happens to be for the larger buffer in every case, practically speaking it probably isn't a problem, but it's still a race condition I'd rather not have.

@4ctrl-alt-del what do you think the best way would be to handle parallel events like this? Now that I'm writing about it, I could replace block.event with block.events and write a method in Block that checks if all events are done... but still I'd like to get your thoughts on this issue, make sure I'm understanding it correctly.

Unknown error with RMT.

Construction of the CMX seemed to have worked just fine for the Yeast data. It took 1hour 13 minutes using 15 GPUs. However, while trying to run RMT for the yeast data I get the following error and I'm not sure how to debug.

$ kinc run rmt --input Yeast.cmx --log Yeast-RMT.log --tstart 0.99 --tstep 0.001 --tstop 0.5 --minpace 10 --maxpace 40 --bins 60
../../src/core/ace_dataobject.cpp:473
void Ace::DataObject::read(char*, qint64) const
SYSTEM ERROR
Failed reading from data object file: Unknown error

RMT considerations

The RMT analytic still assumes that input correlation matrix only has one cluster per gene pair, need to refactor it so that it reads all clusters.

Also it might be good to have RMT just write the recommended threshold to the output log instead of generating a new correlation matrix, since we're going to make another analytic for extracting the net list anyway.

--nosample on EMatrix data object

The --nosample argument should be renamed to --missing as the word 'sample' can be confusing.