#Implementation of a text cleaning module which validates, parses and cleans the following as inputs:
a) People names (e.g. Jeffery O’brien)
b) Email addresses (e.g. [email protected])
c) Phone number (e.g. +61 4123 567 891) .
d) HTTP URLs (e.g. https://www.linkedin.com/company/corvid/) .
e) Addresses (e.g. 123 Accra Road, Dansoman City, Australia) \
a) Make sure you have java install and get your java path from your os
b) open terminal and create a virtual env
c) activate and run pip install -r requirements.txt in venv .
d) update java path in script
e) run app with cmd : python cleantext.py \
"""My name is Ernest Appau , I am an Engineer at Corvid.ai . You can contact me on 02344077208 and +233501591897 or 703-4800500 . I live in Ghana and want to travel one day to the US ,UK ,China,France and Australia. Corvid.ai is an Artificial Intelligence consulting company .The website address is www.corvid.com .You can reach out to the administrator of the site by [email protected] or mine ([email protected]). Please note the website is not any of these go to https//:www.givers.com to donateThe link of this question: https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string Also there are some urls: www.google.com, facebook.com, http://test.com/method?param=wasd, http://test.com/method?param=wasd¶ms2=kjhdkjshd The code below catches all urls in text and returns urls in list . The address of the company is 123 Accra Road, Dansoman City, Australia """
{'names': ['Ernest', 'Appau'], \ 'numbers': [{'GH': ['+233501591897']}, \ {'US': ['+233501591897', '+17034800500']}], \ 'emails': ['[email protected]', '[email protected]'], \ 'urls': ['Corvid.ai', 'Corvid.ai', 'www.corvid.com', 'corvid.ai', 'corvid.ai', 'www.givers.com', 'https://stackoverflow.com/questions/6038061/regular-expression-to-find-urls-within-a-string', 'www.google.com', 'facebook.com', 'http://test.com/method?param=wasd', 'http://test.com/method?param=wasd¶ms2=kjhdkjshd'], \ 'locations': ['Ghana', 'US', 'UK', 'China', 'France', 'Australia', 'Accra', 'Road', 'Dansoman', 'City']}