Giter VIP home page Giter VIP logo

examsearch's Introduction

Exam Search

ExamSearch is a website that allows Cambridge students to go through past Cambridge exam papers and search them for content to study.

Currently, ExamSearch only supports Biology 9700 Multiple Choice Papers, but support for free-response and other types of papers in Biology 9700 as well as for other subjects is on the roadmap.

Installation

Note

It is important to note that the following instructions for installation are simply for creating your own instance of the server A working copy of this project can be found at my website

If you'd like your own version of the parser, continue following the directions. Clone the repository, and install the requirements listed below.

Requirements

Python 3

Well, this is a Python project, so I guess it's expected for you have to Python. Make sure you get Python 3!

Tesseract-OCR

Follow installation instructions here. This is the main OCR tool used. Make sure to add this to the path! Otherwise, you will have issues.

Poppler

Download the latest binary for Windows here. You can find binaries for other systems with a Google search. For this library, you just need to extract the package and add the bin folder to the path.

PyOCR

Python 3

pip install pyocr

PyOCR is necessary for the majority of the heavy lifting as it is the wrapper between tesseract-ocr and Python. Installing PyOCR also installs Pillow, which is also used.

PDF2Image

Python 3

pip install pdf2image

If you're stuck with installing pdf2image, this is the Github page. It details out the dependencies for pdf2image as well

NLTK

Python 3

pip install nltk

Before you run main.py, make sure you download the stopwords corpus via

import nltk
nltk.download('stopwords')

You only need to run this once before you run main.py.

Usage

  • First, you will need to run initializeDirectories() from main.py in order to download all past Cambridge papers from, currently, the Biology section on PapaCambridge. This will allow you to proceed with the next steps.
  • Grab any multiple choice pdf file path and feed it through pdfToText(filePath)
  • run getMultipleChoiceQuestions(filePath) in order to, well, get the multiple choice questions
  • Run tagImage(filePath) in order to get image tags for each question into a database
  • Finally, you may run search() in order to search for questions! search() has been moved to the app/routes.py because it is part of the website search algorithm now

Happy studying!

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Road Map

Currently, Exam Search is able to parse a majority of Biology 9700 Multiple Choice papers.

Planned features

  • Expand Exam Search to all Biology 9700 papers
  • Expand Exam Search to other subjects like A Level History
  • Pull up the mark scheme alongside the question
  • Index the Biology textbook and pull up relevant paragraphs from the text to be used to answer the question
    • This feature is the end goal for Exam Search at least for Biology 9700
  • Work on UI/Create a good-looking application For further details, visit this page for the complete roadmap.

Authors and acknowledgment

Head Developer - Nithish Narasimman

forthebadge

forthebadge forthebadge

examsearch's People

Watchers

Nithish Narasimman avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.