Giter VIP home page Giter VIP logo

pse's Introduction

Personal Search engine

Combined Bookmarks and external search

What is this ?

Aren't you frustrated having a boatload of quality bookmarks, but not using them because it is faster to just fire a browser and do a Google search, instead ! Yeah, me too !

You no longer need to do that. Enter the Personal search engine (PSE), which you can use to index your bookmarks and search like you do with Google.

But wait there is more, when you issue your search query the PSE in the background does a Google search for you (or other search if you implement it :)) and displays both results.

The code is working but is still in Alpha stage. When it is Beta, I will write an article on http://www.igrok.site how it works. Below is a quick recepie how to install it and use it.


INSTALLATION AND RUNNING

Clone the PSE repository

> git clone https://github.com/vsraptor/pse.git
> cd pse

Dependencies

You probably have those already installed, but I list them here for completness. Skip this section in general.

Dependencies :

> apt-get install build-dep build-essential
> apt-get install python-dev python-numpy python-scipy libatlas-dev libatlas3-base

Installation

You would need to install scikit-learn (for Tfidf support) and Flask for the web app

> pip install lxml
> pip install numpy
> pip install requests
> pip install stop_words
> pip install scikit-learn
> pip install flask
> pip install flask-script
> pip install flask-bootstrap

Create url.lst file.

Next either create manually url.lst file in the data directory or generate one using bin/bm2urlst.py. Btw url.lst is simply list of URLs. (This repo contains one just for tests, but better generate your own once you have the app running. You can also have empty lines or comment urls with hash so they don't get included in the index)

Create the index

Now you have to run the indexer to create the tfidf index matrices. This will go trough the list of URLs, fetch the pages and create index, which later you will use to do the searches.

> cd bin
> python idx.py

Run the cmd-line app

There cmd line app, is mainly for testing purposes. You can run it like this (-b bookmark search, -g google search) :

> cd bin
> python query.py -b -q 'history biology'

Run the the web-app

Or better run the Web app :

> cd site
> python manage.py runserver

Then go to the following web address :

http://localhost:5000

Converting firefox bookmarks to url.lst

> cd bin
> python bm2urlst.py /path/to/bookmarks.html | grep -v 'png$\|gif$\|jpg$' > ../data/url.lst

pse's People

Contributors

ich123 avatar vsraptor avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

Forkers

wuben3125

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.