Giter VIP home page Giter VIP logo

sego's Introduction

sego

Search Engine written in Go.

This engine will index the linux API documentation stored in linux-docs folder inside linux-kernel-docs.tgz archive using the TF-IDF method.

Also, it can:

  • Accept queries about the documents through an API.
  • Accept queries about the documents through a web.

Documentation

Wikipedia

๐Ÿ““

  • For Term Frequency, we use the raw count weighting scheme.
  • For Inverse document Frequency, we use the inverse document frequency smooth weighting scheme.

Run

  • Index files:
go run main.go -index
  • Serve files:
go run main.go -serve
  • Query the server:
curl 'localhost:4000/search?query=memory%20management'
  • Specify the result count (defaults to 5):
curl 'localhost:4000/search?query=memory%20management&count=10'

Frontend

cd ui
npm install
npm run dev

Inner workings

  • Index: parse the .html docs into a json that maps, for each document, every word occurrence inside it.
  • Serve: load the json file and apply TF-IDF algorithm to the search terms.

TODO

  • enable debug logs
  • try changing representation format to a more performant one
  • docker/docker-compose

Indexed files

We will index the linux kernel documentation. We have obtained this docs from the linux repo:

git clone --depth 1 https://github.com/torvalds/linux.git
cd linux
make htmldocs

Now, inside Documentation/output, there will be all the docs in .html format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.