xfurna / coalapy Goto Github PK
View Code? Open in Web Editor NEWcoalapy: A multimodal data clustering solution.
License: GNU General Public License v3.0
coalapy: A multimodal data clustering solution.
License: GNU General Public License v3.0
Quoting myself-
I was experimenting around with some omics data taken from TCGA and I came across cases where multiple modalities had the same relevance. The paper has not discussed such cases. I thought dampening the weightage of modalities with the same relevance might evaporate the cluster information. And when I ran the algorithm on such data, keeping the value of alpha for the modalities with the same relevance equal (phrasing alternatively: if I keep 阝= 1), I found that the cluster labels were biased towards the clusters of that modality which occurred prior to others.
So for example, if I had two modalities with equal relevance then my final cluster labels would have more common labels with the ground truth of that modality which was fed to the algorithm as X1.
Refer the following pseudocode-
# column normalization
for(i in columns(V))){
rms<-rootmeansquare(V)
V[:,i]/=rms
}
# row normalization
V=rownormalize(Lkstar)
Scope of improvements in relevance computation.
Try-
def compurte_alpha(...):
.
.
.
return your_alpha
your_alpha=compute_alpha(...)
new_alpha = [Alpha/np.sum(your_alpha) for Alpha in your_alpha]
.
.
.
Gaussian kernel has issues. Look into it.
The orthonormal basis is taking low rank approximation of laplacians. Which seems off-track. Look into it.
Mix Central Limit Theorem with your data.
try manual scaling of data points-
(x - mean(x)) / sd(x)
Write tests.
Some data may need to undergo a transformation before processing (like power, log, etc..).
Orthogonalisation process giving wrong (though orthogonal) results
It goes with #15 as eigenvector under consideration has not been omitted.
recommended: 1e-10
np.partition
is applied incorrectly. Fix it.
Try apply this for it.
Every feature in each modality can be weighted according to the structure it provides to the clusters.
tests
on RawToLap
with generate_data
within it.generate_data
alone however is cool.(base) evi1haxor@Devi1ixir > /hdd/Ztudy/BTP/code/CoALa/algo > master ● > python checks.py
Saving toy data csv w/d 3
Saving toy data csv w/d 3
recieved dframe_csv object with path- /hdd/Ztudy/BTP/code/CoALa/algo/X1.csv
calling matrix with df_csv args /hdd/Ztudy/BTP/code/CoALa/algo/X1.csv
Traceback (most recent call last):
File "checks.py", line 69, in <module>
RawToLap()
File "checks.py", line 58, in RawToLap
x1 = src.modalities.modality(path_x1, mat_type="gaussian")
File "/hdd/Ztudy/BTP/code/CoALa/algo/src/modalities.py", line 8, in __init__
self.W = helperFunc.get_similarity(dfhandler.dframe_csv(path, mat_type=mat_type))
File "/hdd/Ztudy/BTP/code/CoALa/algo/src/helperFunc.py", line 13, in get_similarity
mat = matrix(df_csv = df_csv) #dframe_csv obj
File "/hdd/Ztudy/BTP/code/CoALa/algo/src/helperFunc.py", line 23, in matrix
return sm.Gaussian(df_csv.df, df_csv.df.shape[1])
File "/hdd/Ztudy/BTP/code/CoALa/algo/src/matrices.py", line 12, in Gaussian
A[x][y]=hf.dist(x+1,y+1, df)
File "/hdd/Ztudy/BTP/code/CoALa/algo/src/helperFunc.py", line 35, in dist
result_vector = df[list(df.columns)[i]] - df[list(df.columns)[j]]
IndexError: list index out of range
It seems omitting first eigenvector would yield better results.
Laplacian seem to have varied precisions. Round them off once and for all.
Test dataset has modalities with equal relevance. So due to assignment of inappropriate alpha, tests are failing.
commit failed tests. Maybe test script needs to be updated.
The resulting orthonorm_basis
from either package are not aligning beyong rth column.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.