xfurna / coalapy Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 80.76 MB

coalapy: A multimodal data clustering solution.

License: GNU General Public License v3.0

Python 100.00%

coalapy's Introduction

coalapy's People

Contributors

Watchers

coalapy's Issues

Identical relevance of multiple modalities

Quoting myself-

I was experimenting around with some omics data taken from TCGA and I came across cases where multiple modalities had the same relevance. The paper has not discussed such cases. I thought dampening the weightage of modalities with the same relevance might evaporate the cluster information. And when I ran the algorithm on such data, keeping the value of alpha for the modalities with the same relevance equal (phrasing alternatively: if I keep 阝= 1), I found that the cluster labels were biased towards the clusters of that modality which occurred prior to others.
So for example, if I had two modalities with equal relevance then my final cluster labels would have more common labels with the ground truth of that modality which was fed to the algorithm as X1.

[FEATURE] Normalize the final matrix V

Refer the following pseudocode-

# column normalization
            for(i in columns(V))){
                rms<-rootmeansquare(V)
               V[:,i]/=rms
            }
# row normalization
            V=rownormalize(Lkstar)

[BUG] Sum of alpha not 1

Scope of improvements in relevance computation.
Try-

def compurte_alpha(...):
    .
    .
    .
    return your_alpha
your_alpha=compute_alpha(...)
new_alpha = [Alpha/np.sum(your_alpha) for Alpha in your_alpha]
.
.
.

Similarity matrix does not seem to align

Gaussian kernel has issues. Look into it.

[BUG]orthonormal basis construction

The orthonormal basis is taking low rank approximation of laplacians. Which seems off-track. Look into it.

[FEATURE] Enhance data (CLT)

Mix Central Limit Theorem with your data.
try manual scaling of data points-

(x - mean(x)) / sd(x)

[FEATURE] improve codecov

Write tests.

[BUG] write transformations

Some data may need to undergo a transformation before processing (like power, log, etc..).

[BUG] check orthogonalisation algorithm

Orthogonalisation process giving wrong (though orthogonal) results

[BUG] kmeans is occuring with first eigenvector

It goes with #15 as eigenvector under consideration has not been omitted.

round off at some decimal place

recommended: 1e-10

[BUG]list of relevance unaligned

np.partition is applied incorrectly. Fix it.

[BUG]make case of >1 eigenvalue multiplicity

Try apply this for it.

[FEATURE]Weight every feature

Every feature in each modality can be weighted according to the structure it provides to the clusters.

IndexError: list index out of range

Error while running tests on RawToLap with generate_data within it.
generate_data alone however is cool.
verbose-

(base)  evi1haxor@Devi1ixir > /hdd/Ztudy/BTP/code/CoALa/algo >  master ● > python checks.py
Saving toy data csv w/d 3
Saving toy data csv w/d 3
recieved dframe_csv object with path- /hdd/Ztudy/BTP/code/CoALa/algo/X1.csv
calling matrix with df_csv args /hdd/Ztudy/BTP/code/CoALa/algo/X1.csv
Traceback (most recent call last):
  File "checks.py", line 69, in <module>
    RawToLap()
  File "checks.py", line 58, in RawToLap
    x1 = src.modalities.modality(path_x1, mat_type="gaussian")
  File "/hdd/Ztudy/BTP/code/CoALa/algo/src/modalities.py", line 8, in __init__
    self.W = helperFunc.get_similarity(dfhandler.dframe_csv(path, mat_type=mat_type))
  File "/hdd/Ztudy/BTP/code/CoALa/algo/src/helperFunc.py", line 13, in get_similarity
    mat = matrix(df_csv = df_csv) #dframe_csv obj
  File "/hdd/Ztudy/BTP/code/CoALa/algo/src/helperFunc.py", line 23, in matrix
    return sm.Gaussian(df_csv.df, df_csv.df.shape[1])
  File "/hdd/Ztudy/BTP/code/CoALa/algo/src/matrices.py", line 12, in Gaussian
    A[x][y]=hf.dist(x+1,y+1, df)
  File "/hdd/Ztudy/BTP/code/CoALa/algo/src/helperFunc.py", line 35, in dist
    result_vector = df[list(df.columns)[i]] - df[list(df.columns)[j]]
IndexError: list index out of range