Giter VIP home page Giter VIP logo

pistachio-moustachio's Introduction

MedicalLetterAI-CoreNLP

Implementation of MedicalLetterAI concepts using Stanford's pretrained models in CoreNLP and the Tesseract-OCR engine and pytesseract module.

Program execution:

NOTE: To run the code, you must install unzip the stanford-ner-2017-06-09.zip into a file called nerzip in the root dir to run any code in these files.

The main program can be executed by entering the main app folder, and running: $python3 main.py --d

Input directory may contain an arbitrary amount of pdf's with minimal preprocessing required. The PDFs must be of relatively high quality, for the OCR engine to accurately recognize the words, and the page mustn't be skewed or rotated in orientation.

Output is in the form of numbered text files, corresponding to the alphabetically ordered contents of the input directory folder. Each txt file contains the encountered "person" names in the pdf document, in order of appearance, followed by a "#####reached EOF or error occured#####" symbolizing the EOF. A single line is generated after that line, specifying the hypothesized name of the patient discussed in the pdf medical letter. The hypothesis is based on a intuitive heuristic which assigns the patient name based on the the highest number of the following fields matching in the pdf: -Patient first or last name -Personal Health Number -Date of Birth

The heuristic may not always be accurate, so there is room for improvement here.

pistachio-moustachio's People

Contributors

sofiabesenski-old avatar sofiabesenski4 avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.