Giter VIP home page Giter VIP logo

klik-engine's Introduction

The KLiK Engine is a C++ Powered File Search Engine for the Enron Email Sample Dataset

Table of Contents

Features

Components

Languages

C++
PHP 5.6.40
SQL 14.0
HTML5
CSS3

Dataset

Development Environment

Visual Studio 2017
WampServer Stack 3.0.6
Windows 10

Database

MySQL Database 8.0.13

API

MySQLx APIs

Frameworks and Libraries

C++ Boost Library
BootStrap v4.2.1

Details

Details of important Features of the Application

Performance Analysis

  • Forward Indexing:
300000 files/s
Incremental Processing (10000 files): 10 min
Total Time: 3hr
  • Reverse Indexing
352000 files/s
Incremental Processing (10000 files): 15 min
Total Time: 2.4hr
  • Querying
Single Word Querying:   0.1 - 0.7 sec
Multi Word Querying:    0.4 - 2.3 sec

Forward Indexer

  • Implementation of C++ Boost Library to facilitate in I/O processes, since the dataset had many small files.
  • Email Files loaded into memory at an increment of 10000, followed by mass processing of all loaded files. After that, the memory was freed and the process was started anew for the next 10000 files.
  • Stopping Words filtered out of the email files
  • Implementation of MySQLx APIs for SQL connections.
  • Implementation of unordered maps for memory performance enhancement
  • Time calculation of entire as well as the incremental processes.

Reverse Indexer

  • Implementation of forward Indexer for reverse index creation
  • Incremental File Processing like in forward indexing.
  • Time Calculation for the incremental and complete processes
  • Implementation of ranking to ease in later searching
  • Implementation of Relevance Ranking
  • Implementation of Search Normalization to prevent misuse of the ranking system by too many same words in a common file.

Searcher

  • Implementation of reverse index in searching

  • Calculation of document score and inverse document score for relevance ranking.

  • Retrieval of search query/string from the GUI

  • Top 15 results returned from calculated search results.

  • Stopping words safely removed from search string

    single word querying

    • score calculation of each result and ordering in descending order.

    multi word querying

    • score of results concerning key-words belonging to same files multiplied to get common score.
    • implementation of ordered maps for automatic ordering of results with respect to their scores

GUI

  • Created in PHP / HTML5 & CSS3
  • implementation of BootStrap4 Framework for a presentable interface
  • Passing of input search query to the C++ Searcher script and receiving list of results as output.
  • Display of all results with email subject as title along with the file path
  • The result titles are file links redirecting to a new browser windows displaying all of the relevant file content.
  • Implementation of time calculation on the GUI so user can see the query time as well

Future Improvements

  • Optimization (in components like indexing)
  • Implementing of more advanced indexing and ranking algorithms
  • Continuous Bug fixes and improvements

The Team

A huge thanks to the wonderful team without which this entire project would not have been possible. Check out their profiles and star their repos! :)

msaad1999 mshaharyar17 ahmed aitasadduq

KLiK - Social Media Website

Check out the complete project for this login system. KLiK is a complete Social Media website, along with a Complete Login/Registration system, Profile system, Chat room, Forum system and Blog/Polls/Event Management System.

Check out KLiK here

Do star my projects! :)

If you liked my work, please show support by starring the repository! It means a lot to me, and is all im asking for.

klik-engine's People

Contributors

aitasadduq avatar msaad1999 avatar mshaharyar17 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.