Giter VIP home page Giter VIP logo

jajupmochi / graphkit-learn Goto Github PK

View Code? Open in Web Editor NEW
122.0 122.0 19.0 446.14 MB

A python package for graph kernels, graph edit distances, and graph pre-image problem.

Home Page: https://graphkit-learn.readthedocs.io

License: GNU General Public License v3.0

Jupyter Notebook 95.84% Python 3.80% Shell 0.01% C++ 0.16% Cython 0.19%
chemoinformatics graph-edit-distance graph-kernels graph-representations kernel-methods machine-learning paths pattern-recognition pre-image walks

graphkit-learn's People

Contributors

bgauzere avatar gitter-badger avatar jajupmochi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

graphkit-learn's Issues

function generate_median_preimages_by_class() does not work correctly sometimes

When using the function generate_median_preimages_by_class(), the results after the first class are not correct when I run the code in Ubuntu terminal using # python3 ....

The answer here says it has something to do with the Cython and conda. I do have conda installed. So I use a fresh virtual environment without conda and the results seems correct, but I still do not know exactly why.

Reproducing code example:

Here is the test.py:

import multiprocessing
import functools
from gklearn.utils.kernels import deltakernel, gaussiankernel, kernelproduct
from gklearn.preimage.utils import generate_median_preimages_by_class


def xp_median_preimage_1_1():
	"""xp 1_1: Letter-high, sspkernel.
	"""
	# set parameters.
	ds_name = 'Letter-high'
	mpg_options = {'fit_method': 'k-graphs',
				   'init_ecc': [3, 3, 1, 3, 3],
				   'ds_name': ds_name,
				   'parallel': True, # False
				   'time_limit_in_sec': 0,
				   'max_itrs': 100,
				   'max_itrs_without_update': 3,
				   'epsilon_residual': 0.01,
				   'epsilon_ec': 0.1,
				   'verbose': 2}
	mixkernel = functools.partial(kernelproduct, deltakernel, gaussiankernel)
	sub_kernels = {'symb': deltakernel, 'nsymb': gaussiankernel, 'mix': mixkernel}
	kernel_options = {'name': 'structuralspkernel',
					  'edge_weight': None,
					  'node_kernels': sub_kernels,
					  'edge_kernels': sub_kernels, 
					  'compute_method': 'naive',
					  'parallel': 'imap_unordered', 
# 						  'parallel': None, 
					  'n_jobs': multiprocessing.cpu_count(),
					  'normalize': True,
					  'verbose': 2}
	ged_options = {'method': 'IPFP',
				   'initialization_method': 'RANDOM', # 'NODE'
				   'initial_solutions': 1, # 1
				   'edit_cost': 'LETTER2',
				   'attr_distance': 'euclidean',
				   'ratio_runs_from_initial_solutions': 1,
				   'threads': multiprocessing.cpu_count(),
				   'init_option': 'EAGER_WITHOUT_SHUFFLED_COPIES'}
	mge_options = {'init_type': 'MEDOID',
				   'random_inits': 10,
				   'time_limit': 600,
				   'verbose': 2,
				   'refine': False}
	save_results = True
	
	# print settings.
	print('parameters:')
	print('dataset name:', ds_name)
	print('mpg_options:', mpg_options)
	print('kernel_options:', kernel_options)
	print('ged_options:', ged_options)
	print('mge_options:', mge_options)
	print('save_results:', save_results)
	
	# generate preimages.
	for fit_method in ['k-graphs', 'expert', 'random', 'random', 'random']:
		print('\n-------------------------------------')
		print('fit method:', fit_method, '\n')
		mpg_options['fit_method'] = fit_method
		generate_median_preimages_by_class(ds_name, mpg_options, kernel_options, ged_options, mge_options, save_results=save_results, save_medians=True, plot_medians=True, load_gm='auto', dir_save='../results/xp_median_preimage/')


if __name__ == "__main__":
	
	#### xp 1_1: Letter-high, sspkernel.
 	xp_median_preimage_1_1()

Error message:

When I run in Ubuntu terminal:

python3 test.py

The output results are not correct after the first class. However, If I remove the first class before computation, then the results of the first class in the remainder (the original second class) is correct, and the results of the new second class (the original third class) is wrong. This problem does not occur in Spyder3 (4.1.1) console IPython 7.0.1 or fresh virtualenv with only Python modules required installed.

graphkit-learn/Python version information:

Python 3.6.9
graphkit-learn 0.1
Ubuntu 18.04.4 LTS

model_selection_precomputed.py文件中,关于normalization的问题

Hi,linlin同学。两个问题想请教下:
1.在model_selection_precomputed.py文件中,144行我看到您想删除和自身核函数为0的图

remove graphs whose kernels with themselves are zeros

Kmatrix 删除Kmatrix_diag中值为0的坐标处的行和列
但在151行中,normalization又取了Kmatrix_diag的值重新赋给Kmatrix,我使用自己的测试数据(因为我没能把你完整的代码跑下来)发现如果上一步中Kmatrix_diag如果有0的元素出现,这里的乘积为0,相除数组越界。请问这一个公式的目的是什么,你会遇到这个bug吗

normalization

      Kmatrix[i][j] /= np.sqrt(Kmatrix_diag[i] * Kmatrix_diag[j])
      Kmatrix[j][i] = Kmatrix[i][j]`

2.另外在几个"run_"和"test_"开头的ipynb文件中发现几个变量名错误(大小写之类,可能是留下的版本不同的代码?或者是我比较菜没看出来),所以请问您这套代码哪些是新的可以使用的,哪些是旧的不用的,我在测试中notebook目录下的ipynb文件出现的问题比较多,求大神指教,谢谢!

Citing

How can I cite the library?

Key Error gklearn.kernels.treeletKernel

For some Graphs gklearn throws a "Key Error" when generating canonical keys. This does not happen for all graphs, I assume it is limited to this pattern. Help would be highly appreciated!

File ~\Anaconda3\lib\site-packages\gklearn\kernels\treeletKernel.py:128, in treeletkernel(sub_kernel, node_label, edge_label, parallel, n_jobs, chunksize, verbose, *args)
126 canonkeys = []
127 for g in (tqdm(Gn, desc='getting canonkeys', file=sys.stdout) if verbose else Gn):
--> 128 canonkeys.append(get_canonkeys(g, node_label, edge_label, labeled,
129 ds_attrs['is_directed']))
131 # compute kernels.
132 from itertools import combinations_with_replacement

File ~\Anaconda3\lib\site-packages\gklearn\kernels\treeletKernel.py:324, in get_canonkeys(G, node_label, edge_label, labeled, is_directed)
322 treelet = []
323 for pattern in patterns[str(i) + 'star']:
--> 324 canonlist = [tuple((G.nodes[leaf][node_label],
325 G[leaf][pattern[0]][edge_label])) for leaf in pattern[1:]]
326 canonlist.sort()
327 canonlist = list(chain.from_iterable(canonlist))

File ~\Anaconda3\lib\site-packages\gklearn\kernels\treeletKernel.py:325, in (.0)
322 treelet = []
323 for pattern in patterns[str(i) + 'star']:
324 canonlist = [tuple((G.nodes[leaf][node_label],
--> 325 G[leaf][pattern[0]][edge_label])) for leaf in pattern[1:]]
326 canonlist.sort()
327 canonlist = list(chain.from_iterable(canonlist))

File ~\Anaconda3\lib\site-packages\networkx\classes\coreviews.py:51, in AtlasView.getitem(self, key)
50 def getitem(self, key):
---> 51 return self._atlas[key]

KeyError: 1

Weisfeiler_Lehman graph kernel

您好, 我刚刚入门graph, 请问一下 使用:Weisfeiler_Lehman graph kernel,链接矩阵必须是对称的(无向图)吗???

Request for the atom types labels of NCI1 datasets

I have searched almost all the literature, but have not found the atomic type corresponding to the node label of the NCI1 dataset. We know that the NCI1 dataset is made up of molecule graphs composed of atoms in 37 categories. just like the molcules in MUTAG dataset are composed of atoms in 7 categories: {0:'C', 1:'N', 2:'O', 3:'F', 4:'I', 5:'Cl', 6:'Br'}. Sincerely ask everyone to reply !!!

In ./notebooks/utils/plot_all_graphs.py, it plots graphs in MUTAG dataset with node(atom) labels like this:

# line [19 - 40]
    dataset, y = loadDataset("../../datasets/MUTAG/MUTAG_A.txt")
    for idx in [6]: #[65]:#
        G = dataset[idx]
        ncolors= []
        for node in G.nodes:
            if G.nodes[node]['atom'] == '0':
                G.nodes[node]['atom'] = 'C'
                ncolors.append('#bd3182')
            elif G.nodes[node]['atom'] == '1':
                G.nodes[node]['atom'] = 'N'
                ncolors.append('#3182bd')
            elif G.nodes[node]['atom'] == '2':
                G.nodes[node]['atom'] = 'O'
                ncolors.append('#82bd31')
            elif G.nodes[node]['atom'] == '3':
                G.nodes[node]['atom'] = 'F'
            elif G.nodes[node]['atom'] == '4':
                G.nodes[node]['atom'] = 'I'
            elif G.nodes[node]['atom'] == '5':
                G.nodes[node]['atom'] = 'Cl'
            elif G.nodes[node]['atom'] == '6':
                G.nodes[node]['atom'] = 'Br'

Is there any chance to add more node_label and edge_label?

Hi, lin. In the file 'graphfiles.py' you used the 'loadDataset' function to read the dataset and build graphs. I found that all the functions corresponding to different datasets like 'loadCT' , 'loadGXL ' , 'loadSDF' etc. usually add one node lable 'atom' and one edge label 'bond_type'. Is there any chance to add more node_label and edge_label to make the classification more accurate?
Thanks, man.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.