Giter VIP home page Giter VIP logo

analytics-info-parser's Introduction

analytics-info-parser

#Implementation of a text cleaning module which validates, parses and cleans the following as inputs:
a) People names (e.g. Jeffery O’brien)
b) Email addresses (e.g. [email protected])
c) Phone number (e.g. +61 4123 567 891) .
d) HTTP URLs (e.g. https://www.linkedin.com/company/corvid/) .
e) Addresses (e.g. 123 Accra Road, Dansoman City, Australia) \

  • The main goal is to validate, clean and standardize the data, handle different scenarios, remove noise, so that it can be used further for other purposes like analytics.\
  • TODO
  • wrap with cmdline argument parser
  • SETUP
  • a) Make sure you have java install and get your java path from your os
    b) open terminal and create a virtual env
    c) activate and run pip install -r requirements.txt in venv .
    d) update java path in script
    e) run app with cmd : python cleantext.py \

  • Block of Text
  • """My name is Ernest Appau , I am an Engineer at Corvid.ai . You can contact me on 02344077208 and +233501591897 or 703-4800500 . I live in Ghana and want to travel one day to the US ,UK ,China,France and Australia. Corvid.ai is an Artificial Intelligence consulting company .The website address is www.corvid.com .You can reach out to the administrator of the site by [email protected] or mine ([email protected]). Please note the website is not any of these go to https//:www.givers.com to donateThe link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd, http://test.com/method?param=wasd¶ms2=kjhdkjshd The code below catches all urls in text and returns urls in list . The address of the company is 123 Accra Road, Dansoman City, Australia """

  • Results for Block of Text
  • {'names': ['Ernest', 'Appau'], \ 'numbers': [{'GH': ['+233501591897']}, \ {'US': ['+233501591897', '+17034800500']}], \ 'emails': ['[email protected]', '[email protected]'], \ 'urls': ['Corvid.ai', 'Corvid.ai', 'www.corvid.com', 'corvid.ai', 'corvid.ai', 'www.givers.com', 'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 'www.google.com', 'facebook.com', 'http://test.com/method?param=wasd', 'http://test.com/method?param=wasd¶ms2=kjhdkjshd'], \ 'locations': ['Ghana', 'US', 'UK', 'China', 'France', 'Australia', 'Accra', 'Road', 'Dansoman', 'City']}

  • Recommend Projects

    • React photo React

      A declarative, efficient, and flexible JavaScript library for building user interfaces.

    • Vue.js photo Vue.js

      🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

    • Typescript photo Typescript

      TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

    • TensorFlow photo TensorFlow

      An Open Source Machine Learning Framework for Everyone

    • Django photo Django

      The Web framework for perfectionists with deadlines.

    • D3 photo D3

      Bring data to life with SVG, Canvas and HTML. 📊📈🎉

    Recommend Topics

    • javascript

      JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

    • web

      Some thing interesting about web. New door for the world.

    • server

      A server is a program made to process requests and deliver data to clients.

    • Machine learning

      Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

    • Game

      Some thing interesting about game, make everyone happy.

    Recommend Org

    • Facebook photo Facebook

      We are working to build community through open source technology. NB: members must have two-factor auth.

    • Microsoft photo Microsoft

      Open source projects and samples from Microsoft.

    • Google photo Google

      Google ❤️ Open Source for everyone.

    • D3 photo D3

      Data-Driven Documents codes.