Giter VIP home page Giter VIP logo

fileindexer's Introduction

##File Indexer This project is written as a step towards implementing an indexer/term frequency counter that finds the top ten words for a given collection of texts. In its current development state, it contains the logic for word splitting (note: while this implementation will process texts containing Unicode code points, the results may be poor) and term frequency counting, yielding the top ten most common words and the number of times they occur in the text.

###Installation It is suggested that you install this module into a virtual environment to make per-project dependency management easier. Resources on creating virtual environments: virtualenv docs, a useful guide.

Once your virtual environment is installed and activated, run:

> git clone https://github.com/fhocutt/fileindexer
> pip install fileindexer

You should then be able to run import fileindexer.counter with no errors.

###Tests Unit tests, along with an example text, are located in the tests/ folder. To run the unit tests:

> cd /path/to/fileindexer/tests/
> pytest

You also may run the counter.py script from the top fileindexer folder and print the top ten words of Anne of Green Gables with the following commands:

> cd /path/to/fileindexer/
> python3 fileindexer/counter.py

Next steps

  • Write a command-line application to take multiple texts as input and return the top ten words from all of them combined.
  • Extend to execute concurrently.
  • Extend to be distributed.

This project is coded as an exercise for Rackspace Managed Security.

fileindexer's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.