daveoncode / python-string-utils Goto Github PK

A handy Python library to validate, manipulate and generate strings

License: MIT License

Python 99.98% Shell 0.02%

string-manipulation strings camelcase snake-case utility manipulating-strings random-string roman-number-converter isbn-13 isbn-10 ipv4 ipv6 email-validation string-compressor

python-string-utils's Introduction

Python String Utils

Latest version: 1.0.0 (March 2020)

A handy library to validate, manipulate and generate strings, which is:

Simple and "pythonic"
Fully documented and with examples! (html version on readthedocs.io)
100% code coverage! (see it with your own eyes on codecov.io)
Tested (automatically on each push thanks to Travis CI) against all officially supported Python versions
Fast (mostly based on compiled regex)
Free from external dependencies
PEP8 compliant

What's inside...

Library structure

The library basically consists in the python package string_utils, containing the following modules:

validation.py (contains string check api)
manipulation.py (contains string transformation api)
generation.py (contains string generation api)
errors.py (contains library-specific errors)
_regex.py (contains compiled regex FOR INTERNAL USAGE ONLY)

Plus a secondary package tests which includes several submodules.
Specifically one for each test suite and named according to the api to test (eg. tests for is_ip() will be in test_is_ip.py and so on).

All the public API are importable directly from the main package string_utils, so this:

from string_utils.validation import is_ip

can be simplified as:

from string_utils import is_ip

Api overview

Bear in mind: this is just an overview, for the full API documentation see: readthedocs.io

String validation functions:

is_string: checks if the given object is a string

is_string('hello') # returns true
is_string(b'hello') # returns false

is_full_string: checks if the given object is non empty string

is_full_string(None) # returns false
is_full_string('') # returns false
is_full_string(' ') # returns false
is_full_string('foo') # returns true

is_number: checks if the given string represents a valid number

is_number('42') # returns true
is_number('-25.99') # returns true
is_number('1e3') # returns true
is_number(' 1 2 3 ') # returns false

is_integer: checks if the given string represents a valid integer

is_integer('42') # returns true
is_integer('42.0') # returns false

is_decimal: checks if the given string represents a valid decimal number

is_decimal('42.0') # returns true
is_decimal('42') # returns false

is_url: checks if the given string is an url

is_url('foo.com') # returns false
is_url('http://www.foo.com') # returns true
is_url('https://foo.com') # returns true

is_email: Checks if the given string is an email

is_email('[email protected]') # returns true
is_eamil('@gmail.com') # retruns false

is_credit_card: Checks if the given string is a credit card

is_credit_card(value)

# returns true if `value` represents a valid card number for one of these:
# VISA, MASTERCARD, AMERICAN EXPRESS, DINERS CLUB, DISCOVER or JCB

is_camel_case: Checks if the given string is formatted as camel case

is_camel_case('MyCamelCase') # returns true
is_camel_case('hello') # returns false

is_snake_case: Checks if the given string is formatted as snake case

is_snake_case('snake_bites') # returns true
is_snake_case('nope') # returns false

is_json: Checks if the given string is a valid json

is_json('{"first_name": "Peter", "last_name": "Parker"}') # returns true
is_json('[1, 2, 3]') # returns true
is_json('{nope}') # returns false

is_uuid: Checks if the given string is a valid UUID

is_uuid('ce2cd4ee-83de-46f6-a054-5ee4ddae1582') # returns true

is_ip_v4: Checks if the given string is a valid ip v4 address

is_ip_v4('255.200.100.75') # returns true
is_ip_v4('255.200.100.999') # returns false (999 is out of range)

is_ip_v6: Checks if the given string is a valid ip v6 address

is_ip_v6('2001:db8:85a3:0000:0000:8a2e:370:7334') # returns true
is_ip_v6('123:db8:85a3:0000:0000:8a2e:370,1') # returns false

is_ip: Checks if the given string is a valid ip (any version)

is_ip('255.200.100.75') # returns true
is_ip('2001:db8:85a3:0000:0000:8a2e:370:7334') # returns true
is_ip('255.200.100.999') # returns false
is_ip('123:db8:85a3:0000:0000:8a2e:370,1') # returns false

is_isnb_13: Checks if the given string is a valid ISBN 13

is_isbn_13('9780312498580') # returns true
is_isbn_13('978-0312498580') # returns true
is_isbn_13('978-0312498580', normalize=False) # returns false

is_isbn_10: Checks if the given string is a valid ISBN 10

is_isbn_10('1506715214') # returns true
is_isbn_10('150-6715214') # returns true
is_isbn_10('150-6715214', normalize=False) # returns false

is_isbn: Checks if the given string is a valid ISBN (any version)

is_isbn('9780312498580') # returns true
is_isbn('1506715214') # returns true

is_slug: Checks if the string is a slug (as created by slugify())

is_slug('my-blog-post-title') # returns true
is_slug('My blog post title') # returns false

contains_html: Checks if the strings contains one ore more HTML/XML tag

contains_html('my string is <strong>bold</strong>') # returns true
contains_html('my string is not bold') # returns false

words_count: Returns the number of words contained in the string

words_count('hello world') # returns 2
words_count('one,two,three') # returns 3 (no need for spaces, punctuation is recognized!)

is_palindrome: Checks if the string is a palindrome

is_palindrome('LOL') # returns true
is_palindrome('ROTFL') # returns false

is_pangram: Checks if the string is a pangram

is_pangram('The quick brown fox jumps over the lazy dog') # returns true
is_pangram('hello world') # returns false

is_isogram: Checks if the string is an isogram

is_isogram('dermatoglyphics') # returns true
is_isogram('hello') # returns false

String manipulation:

camel_case_to_snake: Converts a camel case formatted string into a snake case one

camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'

snake_case_to_camel: Converts a snake case formatted string into a camel case one

snake_case_to_camel('the_snake_is_green') # returns 'TheSnakeIsGreen'

reverse: Returns the string in a reversed order

reverse('hello') # returns 'olleh'

shuffle: Returns the string with its original chars but at randomized positions

shuffle('hello world') # possible output: 'l wodheorll'

strip_html: Removes all the HTML/XML tags found in a string

strip_html('test: <a href="foo/bar">click here</a>') # returns 'test: '
strip_html('test: <a href="foo/bar">click here</a>', keep_tag_content=True) # returns 'test: click here'

prettify: Reformat a string by applying basic grammar and formatting rules

prettify(' unprettified string ,, like this one,will be"prettified" .it\' s awesome! ')
# the ouput will be: 'Unprettified string, like this one, will be "prettified". It\'s awesome!'

asciify: Converts all non-ascii chars contained in a string into the closest possible ascii representation

asciify('èéùúòóäåëýñÅÀÁÇÌÍÑÓË') 
# returns 'eeuuooaaeynAAACIINOE' (string is deliberately dumb in order to show char conversion)

slugify: Convert a string into a formatted "slug"

slugify('Top 10 Reasons To Love Dogs!!!') # returns: 'top-10-reasons-to-love-dogs'

booleanize: Convert a string into a boolean based on its content

booleanize('true') # returns true
booleanize('YES') # returns true
booleanize('y') # returns true
booleanize('1') # returns true
booelanize('something else') # returns false

strip_margin: Removes left indentation from multi-line strings (inspired by Scala)

strip_margin('''
        line 1
        line 2
        line 3
''')

#returns:
'''
line 1
line 2
line 3
'''

compress/decompress: Compress strings into shorted ones that can be restored back to the original one later on

compressed = compress(my_long_string) # shorter string (URL safe base64 encoded)

decompressed = decompress(compressed) # string restored

assert(my_long_string == decompressed) # yep

roman_encode: Encode integers/string into roman numbers

roman_encode(37) # returns 'XXXVII'

roman_decode: Decode roman number into an integer

roman_decode('XXXVII') # returns 37

roman_range: Generator which returns roman numbers on each iteration

for n in roman_range(10): print(n) # prints: I, II, III, IV, V, VI, VII, VIII, IX, X
for n in roman_range(start=7, stop=1, step=-1): print(n) # prints: VII, VI, V, IV, III, II, I

String generations:

uuid: Returns the string representation of a newly created UUID object

uuid() # possible output: 'ce2cd4ee-83de-46f6-a054-5ee4ddae1582'
uuid(as_hex=True) # possible output: 'ce2cd4ee83de46f6a0545ee4ddae1582'

random_string: Creates a string of the specified size with random chars

random_string(9) # possible output: 'K1URtlTu5'

secure_random_hex: Creates an hexadecimal string using a secure strong random generator

secure_random_hex(12) 
# possible ouput: 'd1eedff4033a2e9867c37ded' 
# (len is 24, because 12 represents the number of random bytes generated, which are then converted to hexadecimal value)

Installation

pip install python-string-utils

Checking installed version

import string_utils
string_utils.__version__
'1.0.0' # (if '1.0.0' is the installed version)

Documentation

Full API documentation available on readthedocs.io

Support the project!

Do you like this project? Would you like to see it updated more often with new features and improvements? If so, you can make a small donation by clicking the button down below, it would be really appreciated! :)

python-string-utils's People

Stargazers

Watchers

Forkers

gucky92 danielmellado thelonelyghost iotspace alvistack gunungpw musicinmybrain greg4cr

python-string-utils's Issues

Email validation- constraints on domain label and presence of unicode unhandled

Hello,

I'm listing some scenarios where the is_email fails:

domain with localhost not accepted by is_email: email@localhost, email@[127.0.0.1] are valid while the function returns False
unicode not handled- this should be valid but returns false: [email protected].\\xe0\\xa4\\x89\\xe0\\xa4\\xa6\\xe0\\xa4\\xbe\\xe0\\xa4\\xb9\\xe0\\xa4\\xb0\\xe0\\xa4\\xa3.\\xe0\\xa4\\xaa\\xe0\\xa4\\xb0\\xe0\\xa5\\x80\\xe0\\xa4\\x95\\xe0\\xa5\\x8d\\xe0\\xa4\\xb7\\xe0\\xa4\\xbe
domain labels can't begin or end in hyphens '-': These should be invalid but is_email gives true: [email protected] and [email protected]

Truncate/ellipsify long strings

Would it be acceptable adding a function for truncating long strings at word boundaries?

I am thinking something along the lines of this answer: https://stackoverflow.com/a/250373

Happy to prepare a PR for this if that would be ok.

List of values is also a valid json

The is_json function only seems to support a collection of name-value pairs. However, an ordered list of values is also a valid json format (http://json.org/) now, which the validator does not take into account.

For instance, is_json('{"abcs"}') should return True, which is not currently the case.

Check for IPv6 address and false positives with ipv4

The is_ip() seems to be missing on checks for the validity of ipv6 addresses, for instance, "3ffe::1".
Also, for ipv4 address, the validator unexpectedly returns True for the invalid address "016.016.016.016"

snake case check fails for one word strings

# This should return True I believe
>>> string_utils.is_snake_case("user")
False
# looks like correct behavior
>>> string_utils.is_snake_case("user_name")
True

Support for ipv4 mapped ipv6 address

Hello,
The current is_ip and is_ip_v6 functions do not validate an ipv4 mapped ipv6 address.
For instance, is_ip("7::128.128.0.127") returns False while it should return True.

Deprecation warnings over invalid escape sequences in Python 3.7

Deprecation warning are generated over invalid escape sequences. This can be fixed by using raw strings or escaping the literals again.

find . -iname '*.py'  | xargs -P 4 -I{} python -Walways -m py_compile {} 

./string_utils.py:59: DeprecationWarning: invalid escape sequence \d
  SNAKE_CASE_REPLACE_DASH_RE = re.compile('(-)([a-z\d])')
./string_utils.py:303: DeprecationWarning: invalid escape sequence \d
  re_template = '^[a-z]+([a-z\d]+{sign}|{sign}[a-z\d]+)+[a-z\d]+$'
./string_utils.py:494: DeprecationWarning: invalid escape sequence \d
  r = re_map.get(separator, re.compile('({sign})([a-z\d])'.format(sign=re.escape(separator))))
./tests.py:590: DeprecationWarning: invalid escape sequence \ 
  self.assertEqual(words_count('. . ! <> [] {} + % --- _ = @ # ~ | \ / " \''), 0)
./string_utils.py:541: DeprecationWarning: invalid escape sequence \*
  """

Double quotes in email address

RFC 5322 (https://tools.ietf.org/html/rfc5322) allows usage of double quotes in email, such as "test@test"@example.com. However, the is_email() function does not account for this.

Check on slug

This is a valid slug '123____123' while the is_slug returns false for instances containing consecutive underscore('_')

Prettify function screws up with urls

Install fails with UnicodeDecodeError in some locales

When trying to install python-string-utils with python 3.6 and pip 20.0.2 in a SLES 15 SP1 docker container, the install fails with this error:

    ERROR: Command errored out with exit status 1:
     command: /usr/bin/python3 -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-4lc71oyc/python-string-utils/setup.py'"'"'; __file__='"'"'/tmp/pip-install-4lc71oyc/python-string-utils/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-install-4lc71oyc/python-string-utils/pip-egg-info
         cwd: /tmp/pip-install-4lc71oyc/python-string-utils/
    Complete output (7 lines):
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-4lc71oyc/python-string-utils/setup.py", line 4, in <module>
        long_description = readme.read()
      File "/usr/lib64/python3.6/encodings/ascii.py", line 26, in decode
        return codecs.ascii_decode(input, self.errors)[0]
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 8392: ordinal not in range(128)
    ----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.

README.md has some non-ASCII characters, and it appears that the locale in the docker container uses ascii encoding by default. Likely setup.py just needs to specify utf-8 encoding to open README.md.

Url validation

Hello,

Here are some scenarios where is_url shows unexpected behaviour:

http://google.abcdefghi is valid but is_url states False in its test suite- constraining TLD to length 6 but TLD length specified by RFC 1034 is 63 octets. Real TLDs available here of length >6 http://data.iana.org/TLD/tlds-alpha-by-domain.txt
url with host ending in dot: http://www.foo.bar./ is valid as per https://www.w3.org/Addressing/URL/url-spec.txt#page13 while is_url returns False

package is installed but script says it is not installed via script

Hi. Below is AI generated code to check if a certain python package is installed or not.
It checks for python-string-utils and jsonschema packages existance.
For python-string-utils it prints below EVERY TIME the script is ran:

python-string-utils is not installed or is not the desired version. Installing...
python-string-utils (1.0.0) has been successfully installed.

For jsonschema it prints below immediately:

jsonschema (4.17.3) is already installed.

What could be the issue that is causing python-string-utils to be checked always?
Any leads is appreciated.

Code:

import importlib
import subprocess

# Define a list of packages and their desired versions
packages_to_check = [
    {"name": "python-string-utils", "version": "1.0.0"},
    {"name": "jsonschema", "version": "4.17.3"},
]

for package_info in packages_to_check:
    package_name = package_info["name"]
    desired_version = package_info["version"]

    try:
        # Attempt to import the package
        importlib.import_module(package_name)
        print(f"{package_name} ({desired_version}) is already installed.")
    except ImportError:
        print(f"{package_name} is not installed or is not the desired version. Installing...")

        # Install the package with the desired version
        install_command = ["pip3", "install", f"{package_name}=={desired_version}"]

        # Run the installation command
        installation_result = subprocess.run(install_command, capture_output=True, text=True)

        if installation_result.returncode == 0:
            print(f"{package_name} ({desired_version}) has been successfully installed.")
        else:
            print(f"Failed to install {package_name} ({desired_version}).")
            print("Installation error output:")
            print(installation_result.stderr)

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.