Giter VIP home page Giter VIP logo

hin-datasets-for-recommendation-and-network-embedding's Introduction

Dataset Statistics

MovieLens

(Containing rating and timestamp information)

(Note: We utilize the Pearson's coefficient to measure the similiarities in the KNN algorithm)

(Source : https://grouplens.org/datasets/movielens/)

Entity #Entity
User 943
Age 8
Occupation 21
Movie 1,682
Genre 18

Relation Statistics

Relation #Relation
User - Movie 100,000
User - User (KNN) 47,150
User - Age 943
User - Occupation 943
Movie - Movie (KNN) 82,798
Movie - Genre 2,861

Douban Movie

(Containing rating information)

Entity Statistics

Entity #Entity
User 13,367
Movie 12,677
Group 2,753
Actor 6,311
Director 2,449
Type 38

Relation Statistics

Relation #Relation
User - Movie 1,068,278
User - Group 570,047
User - User 4,085
Movie - Actor 33,587
Movie - Director 11,276
Movie - Type 27,668

Douban Book

(Containing rating information)

Entity Statistics

Entity #Entity
User 13,024
Book 22,347
Group 2,936
Location 38
Author 10,805
Publisher 1,815
Year 64

Relation Statistics

Relation #Relation
User - Book 792,062
User - Group 1,189,271
User - User 169,150
User - Location 10,592
Book - Author 21,907
Book - Publisher 21,773
Book - Year 21,192

Amazon

(Containing rating and timestamp information)

(Source : http://jmcauley.ucsd.edu/data/amazon/)

Entity Statistics

Entity #Entity
User 6,170
Item 2,753
View 3,857
Category 22
Brand 334

Relation Statistics

Relation #Relation
User - Item 195,791
Item - View 5,694
Item - Category 5,508
Item - Brand 2,753

LastFM

(Note: We utilize the Pearson's coefficient to measure the similiarities in the KNN algorithm)

(Source : https://grouplens.org/datasets/hetrec-2011/)

Entity Statistics

Entity #Entity
User 1,892
Artist 17,632
Tag 11,945

Relation Statistics

Relation #Relation
User - Artist 92834
User - User (Original) 25,434
User - User (KNN) 18,802
Artist - Artist (KNN) 153,399
Artist - Tag 184,941

Yelp

(Containing rating information)

Entity Statistics

Entity #Entity
User 16,239
Business 14,284
Compliment 11
Category 511
City 47

Relation Statistics

Relation #Relation
User - Business 198,397
User - User 158,590
User - Compliment 76,875
Business - City 14,267
Business - Category 40,009

Yelp-2

(Containing rating information)

Entity Statistics

Entity #Entity
User 1,286
Business 2,614
Service 2
Star level 9
Reservation 2
Category 3

Relation Statistics

Relation #Relation
User - Business 30,838
Bussiness - Service 2,614
Bussiness - Star level 2,614
Business - Revervation 2,614
Business - Category 2,614

DBLP

(Note: author_map_id.dat map the author id to the unique id)

Entity Statistics

Entity #Entity
Author 14,475
Paper 14,376
Author_label 4
Conference 20
Type 8,920

Relation Statistics

Relation #Relation
Author - Label 4,057
Paper - Author 41,794
Paper - Conference 14,376
Paper - Type 114,624

Aminer

(Note: author_map_id.dat map the author id to the unique id)

Entity Statistics

Entity #Entity
Author 164,472
Paper 127,623
Papel_label 10
Conference 101
Reference 147,251

Relation Statistics

Relation #Relation
Paper - Label 127,623
Paper - Author 355,072
Paper - Conference 127,632
Paper - Reference 392,519

hin-datasets-for-recommendation-and-network-embedding's People

Contributors

librahu avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hin-datasets-for-recommendation-and-network-embedding's Issues

关于Aminer数据集

您好,我想问一下这个异构网络的Aminer数据集可以做作者重名消歧问题嘛

关于Aminer数据集的问题

你好,感谢你分享该数据集。
我目前在做的是基于时序的HIN关系强度挖掘,我在分析Aminer数据集的时候,发现每年的论文数量非常的不均衡,其中23年(含)之前的论文数在2000-9000之间,而23之后的每年的论文数只有10-1000左右的数量,请问原因是什么?
非常感谢你的分享,期待你的回复

关于Yelp_2数据集的疑问

请问在Yelp数据集中business_user.txt文件中有3列,这与该文件夹下其他数据截然不同,请问这个第三列是指代的什么意思呢,谢谢!

关于 relation

不知道您写的这些数据集中包含trust信息的,是单向还是双向

关于Amazon数据集

你好,我想问一下,Amazon数据集中有一个item_view.dat,请问这个view指代的是原数据集中的哪个信息呢?因为我调研到Amazon数据集中,商品信息描述中并没有这一个属性。不介意的话,请问可以上传一下数据集处理的代码吗?

关于Aminer和DBLP数据集的两个疑问

首先非常感谢您公开这些异构信息网络数据集,我近期在做相关的research,并且看了您的数据集,有两个疑问需要您的解答:
第一,在Aminer数据集中,我没有看懂paper_type.txt 文件中的意思是什么?为什么论文的类型会有这么多?
第二,在DBLP数据集中,author_label.dat的意思是作者的标签,指的是作者所处的领域吗?

期待您的解答。

关于KNN

你好,
以movielens数据,用户和用户之间的关系是通过 对每位用户用KNN算法找出与其最近邻相似的50位用户,想问一下在使用KNN计算距离时,使用的用户特征是什么呢?

数据集问题

我尝试过这里的数据集,一直感觉是不是有问题啊,用不同元路径相乘特别稠密。

关于Yelp数据集,有问题向您请教

非常感谢您公开分享这些数据集。
我在使用Yelp数据集(30k个节点的版本)时发现,您在readme文档中写的是category为47个,但是给出的business-category文件中的category数量却是511个,两者并不相符,请问是不是business-category文件其实是business-city文件,而business-city文件才是business-category文件,希望您能解答我的疑问,谢谢。

关于豆瓣数据集的原始数据

非常感谢您公开这些异构信息网络数据集~
请问您是否可以上传豆瓣电影数据和豆瓣书籍数据的原始数据,比如电影名称,演员名称或书籍名称等信息?
万分感谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.