Giter VIP home page Giter VIP logo

information-retrieval's Introduction

Information-Retrieval

Information Retrieval and Data Mining

submit

Expriment 1 (2021.9.14 -- 2021.10.13)

在tweets数据集上构建inverted index

实现Boolean Retrieval Model,使用TREC 2014 test topics进行测试 https://trec.nist.gov/data/microblog/2014/topics.desc.MB171-225.txt

Boolean Retrieval Model:

Input:a query (like Ron and Weasley)

Output: print the qualified tweets.

支持and, or ,not(查询优化可以选做)

对于tweets与queries使用相同的预处理

Expriment 2 (2021.10.13 -- 2021.11.9)

在Expriment 1 的基础上实现最基本的Ranked retrieval model

  • Input: a query (like Ron and Weasley)

  • Output: return the top K (eg., K=100) relevant tweets.

使用SMART notation: lnc.ltn

Document: logarithmic tf (l as first character), no idf and cosine normalization

Query: logarithmic tf (l in leftmost column), idf (t in second column), no normalization

改进inverted index

在Dictionary中存储每个term的DF

在posting list中存储term在每个doc中的TF with pairs (docID, tf)

选做:支持所有的SMART Notations

Expriment 3 (2021.11.9 -- 2021.12.8)

实现以下指标评价,并对Experiment2的检索结果进行评价

Mean Average Precision (MAP)

Mean Reciprocal Rank (MRR)

Normalized Discounted Cumulative Gain (NDCG)

information-retrieval's People

Contributors

nuanbaobao avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.