kidsearch

A search engine completed by Qingsong Lv, Shulin Cao and Yifan Wang. Our tutors are Qian Yin and Xin Zheng.

This repository is based on a national undergraduate scientific research and innovation project: simplified Chinese search engine for kids.

Sorry this project is not available now, but will be available soon.(maybe in one year)

introduction

When we did kidsearch project, we were sophomores. As time going by, we realize that there are more we can do to make it more valuable. So we decide to create this github repository. This project aims to tidy up codes of kidsearch which were written by us from 2016 to 2017 and make part of them opened. We will try our best to make this project a unified system and provide as many APIs as we can.

We think the best explanation of APIs should be comments of codes, but there will also be some tutorials available soon. If you want to get some literal thoughts now, related work may help.

The initial version of our project is based on Java(Lucene), Python(Crawler), PHP(frontend) and Socket(Communication). The most useful part we think is socket because we added multi-threading in it. Since Python is so popular at present, we also use PyLucene to replace Lucene and Django to replace PHP, which can simplify part of socket communications to build another Python version of our project. Both of the two versions will be open-sourced.

Actually, our project is mainly for simplified Chinese search engine. The reason for using English in documents and comments is that we think this project may also helpful to some other languages.

environment

This project is aimed to help do some lightweight search engine tasks. So the running environment is mainly on Windows.

requirement

Python3.x (x>=5), Django(maybe django-rest is also needed?), PyLucene, Apache, MySQL.

Some other python packages are also needed: requests, ...

goal

Make a wonderful convenient Python package to do tasks about search engine. Here is an ideal example:

import kidsearch as ks
webpages = ks.crawler(['http://www.61tom.com', 'http://www.61baobao.com/'], max_page=1000, max_depth=10)
indexes = ks.make_index(webpages)
results = indexes.search(key_words)
print(ks.show(results))

polluxbao / kidsearch Goto Github PK

kidsearch's Introduction

kidsearch

introduction

environment

requirement

goal

related work

kidsearch's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent