Giter VIP home page Giter VIP logo

cosr-back's Introduction

cosr-back

Chat with us on Slack Build Status Coverage Status Apache License 2.0

This repository contains the main components of the Common Search backend.

Your help is welcome! We have a complete guide on how to contribute.

Understand the project

This repository has 4 components:

  • cosrlib: Python code for parsing, analyzing and indexing documents
  • spark: Spark jobs using cosrlib.
  • urlserver: A service for getting metadata about URLs from static databases
  • explainer: A web service for explaining and debugging results, hosted at explain.commonsearch.org

Here is how they fit in our general architecture:

General technical architecture of Common Search

Local install

A complete guide available in INSTALL.md.

Launching the tests

Make sure to start the services (make start_services) before trying any tests.

Inside Docker, you can run our full test suite easily:

make test

Alternatively, you can run it from outside Docker with:

make docker_test

You may also want to run only part of the tests, for instance all which do not use Elasticsearch:

py.test tests/ -v -m "not elasticsearch"

If you want to evaluate the speed of a component, for instance HTML parsing, you can repeat the tests N times and output a Python profile:

py.test tests/cosrlibtests/document/html/ -v --repeat 50 --profile

Launching an index job

spark-submit spark/jobs/index.py --source commoncrawl:limit=1 --plugin plugins.filter.Homepages:index=1 --profile

After this, if you have a cosr-front instance connected to the same Elasticsearch service, you will see the results!

Using plugins

Common Search supports the insertion of user-provided plugins in the indexation pipeline. Some are included by default, for instance:

spark-submit spark/jobs/index.py --source url:https://about.commonsearch.org/ --plugin plugins.filter.All:index=0 --plugin 'plugins.grep.Words:words=common search,path=/tmp/grep_result'

See the plugins/ directory for more examples.

Launching the explainer

The explainer allows you to debug results easily. Just run:

make docker_explainer

Then open http://192.168.99.100:9703 in your browser (Assuming 192.168.99.100 is the IP of your Docker host)

cosr-back's People

Contributors

sylvinus avatar sentimentron avatar jhildreth avatar mlinksva avatar bakztfuture avatar hjacobs avatar vanhalt avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.