Giter VIP home page Giter VIP logo

rearchor's Introduction

Rearchor

轻量级搜索引擎

运用案例:

http://www.smalinuxer.net/?page_id=218 建议使用IOC进行装配/配置,易于管理;

实现描述:

  • 责任链制定爬虫规则
  • 请求异常重回线程池请求池
  • 队列-消费者生产者-拥塞控制
  • lucene core 分词 + search高亮 + 某著名词典
  • netty展示页面

可定制接口:

  • 蜘蛛人抓取的范围,比如某个域名,蜘蛛人抓取的内容,比如去除#块链接
  • 线程池的配置,线程池可以根据您当前的环境做相应的变化
  • lucene配置工作,包括分词,索引,存取模型等
  • 前端接口配置,更改前端功能,更改前端显示
  • 已爬过站点配置,序列化快速恢复已爬过站点,配置过期时间
  • 自定义接口评分

效率测试:

  • 10M 宽带 i3双核处理器 4G内存
  • 135个页面 73s 内存占用 100M 网络流量发送不稳定

瓶颈:

-做不到快速释放http链接,原生java http做不到快速释放

平台移植:

只需要修改Searchor.java中LOCAL_IP即可; 但需要注意平台编码的支持

可实现功能:

  • 快速爬取站点,定时重新爬取站点
  • 爬取特定站点特定数据,并避免恶意环回
  • 可扩展支持rebot.txt文件
  • 自定义站点评分规则
  • 前端显示title,url,高亮显示搜索词,并截取相关内容线程
  • 重定向记录点击次数,并实现网页快照
  • 自定义前端显示,可仿谷歌百度
  • 支持各国语言检索,多国语言配合检索,比如中文加日文检索.
  • 检索分词匹配,使得匹配相关度高
  • 后端支持内网访问,过滤ip等多种后台功能
  • 后端使用套接字提供Http服务,轻量,访问快
  • 读写不冲突

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.