Search Engine written in Go.
This engine will index the linux API documentation stored in linux-docs
folder inside linux-kernel-docs.tgz
archive using the TF-IDF
method.
Also, it can:
- Accept queries about the documents through an API.
- Accept queries about the documents through a web.
๐
- For Term Frequency, we use the
raw count weighting scheme
. - For Inverse document Frequency, we use the
inverse document frequency smooth weighting scheme
.
- Index files:
go run main.go -index
- Serve files:
go run main.go -serve
- Query the server:
curl 'localhost:4000/search?query=memory%20management'
- Specify the result count (defaults to 5):
curl 'localhost:4000/search?query=memory%20management&count=10'
cd ui
npm install
npm run dev
- Index: parse the .html docs into a json that maps, for each document, every word occurrence inside it.
- Serve: load the json file and apply
TF-IDF
algorithm to the search terms.
- enable debug logs
- try changing representation format to a more performant one
- docker/docker-compose
We will index the linux kernel documentation. We have obtained this docs from the linux repo:
git clone --depth 1 https://github.com/torvalds/linux.git
cd linux
make htmldocs
Now, inside Documentation/output
, there will be all the docs in .html
format.