Machine Learning Things

Machine Learning Things is a lightweight python library that contains functions and code snippets that I use in my everyday research with Machine Learning, Deep Learning, NLP.

I created this repo because I was tired of always looking up same code from older projects and I wanted to gain some experience in building a Python library. By making this available to everyone it gives me easy access to code I use frequently and it can help others in their machine learning work. If you find any bugs or something doesn't make sense please feel free to open an issue.

That is not all! This library also contains Python code snippets and notebooks that speed up my Machine Learning workflow.

ML_things: Details on the ml_things libary how to install and use it.
Snippets: Curated list of Python snippets I frequently use.
Comments: Some small snippets of how I like to comment my code.
Notebooks: Google Colab Notebooks from old project that I converted to tutorials.
Final Note

ML_things

Installation

This repo is tested with Python 3.6+.

It's always good practice to install ml_things in a virtual environment. If you guidance on using Python's virtual environments you can check out the user guide here.

You can install ml_things with pip from GitHub:

pip install git+https://github.com/gmihaila/ml_things

Functions

pad_array [source]

def pad_array(variable_length_array, fixed_length=None, axis=1)

Description:	Pad variable length array to a fixed numpy array. It can handle single arrays [1,2,3] or nested arrays [[1,2],[3]].
Parameters:	:param variable_length_array: Single arrays [1,2,3] or nested arrays [[1,2],[3]]. :param fixed_length: max length of rows for numpy. :param axis: directions along rows: 1 or columns: 0 :param pad_value: what value to use as padding, default is 0.
Returns:	:return: numpy_array: axis=1: fixed numpy array shape [len of array, fixed_length]. axis=0: fixed numpy array shape [fixed_length, len of array].

Example:

>>> from ml_things import pad_array
>>> pad_array(variable_length_array=[[1,2],[3],[4,5,6]], fixed_length=5)
array([[1., 2., 0., 0., 0.],
       [3., 0., 0., 0., 0.],
       [4., 5., 6., 0., 0.]])

batch_array [source]

def batch_array(list_values, batch_size)

Description:	Split a list into batches/chunks. Last batch size is remaining of list values.
Parameters:	:param list_values: can be any kind of list/array. :param batch_size: int value of the batch length.
Returns:	:return: List of batches from list_values.

plot_array [source]

plot_array(array, step_size=1, use_label=None, use_title=None, use_xlabel=None, use_ylabel=None,
               style_sheet='ggplot', use_grid=True, width=3, height=1, use_linestyle='-', use_dpi=20, path=None,
               show_plot=True)

Description: Create plot from a single array of values.

Parameters: :param
array: list of values. Can be of type list or np.ndarray.
:param
step_size: steps shows on x-axis. Change if each steps is different than 1.
:param
use_label: display label of values from array.
:param
use_title: title on top of plot.
:param
use_xlabel: horizontal axis label.
:param
use_ylabel: vertical axis label.
:param
style_sheet: style of plot. Use plt.style.available to show all styles.
:param
use_grid: show grid on plot or not.
:param
width: horizontal length of plot.
:param
height: vertical length of plot.
:param
use_linestyle: whtat style to use on line from ['-', '--', '-.', ':'].
:param
use_dpi: quality of image saved from plot. 100 is prety high.
:param
path: path where to save the plot as an image - if set to None no image will be saved.
:param
show_plot: if you want to call plt.show(). or not (if you run on a headless server).

Returns:

Description:	Create plot from a single array of values.
Parameters:	:param array: list of values. Can be of type list or np.ndarray. :param step_size: steps shows on x-axis. Change if each steps is different than 1. :param use_label: display label of values from array. :param use_title: title on top of plot. :param use_xlabel: horizontal axis label. :param use_ylabel: vertical axis label. :param style_sheet: style of plot. Use plt.style.available to show all styles. :param use_grid: show grid on plot or not. :param width: horizontal length of plot. :param height: vertical length of plot. :param use_linestyle: whtat style to use on line from ['-', '--', '-.', ':']. :param use_dpi: quality of image saved from plot. 100 is prety high. :param path: path where to save the plot as an image - if set to None no image will be saved. :param show_plot: if you want to call `plt.show()`. or not (if you run on a headless server).
Returns:

plot_dict [source]

plot_dict(dict_arrays, step_size=1, use_title=None, use_xlabel=None, use_ylabel=None,
              style_sheet='ggplot', use_grid=True, width=3, height=1, use_linestyles=None, use_dpi=20, path=None,
              show_plot=True)

Description: Create plot from a dictionary of lists.

Parameters: :param
dict_arrays: dictionary of lists or np.array
:param
step_size: steps shows on x-axis. Change if each steps is different than 1.
:param
use_title: title on top of plot.
:param
use_xlabel: horizontal axis label.
:param
use_ylabel: vertical axis label.
:param
style_sheet: style of plot. Use plt.style.available to show all styles.
:param
use_grid: show grid on plot or not.
:param
width: horizontal length of plot.
:param
height: vertical length of plot.
:param
use_linestyles: array of styles to use on line from ['-', '--', '-.', ':'].
:param
use_dpi: quality of image saved from plot. 100 is pretty high.
:param
path: path where to save the plot as an image - if set to None no image will be saved.
:param
show_plot: if you want to call plt.show(). or not (if you run on a headless server).

Returns:

Description:	Create plot from a dictionary of lists.
Parameters:	:param dict_arrays: dictionary of lists or np.array :param step_size: steps shows on x-axis. Change if each steps is different than 1. :param use_title: title on top of plot. :param use_xlabel: horizontal axis label. :param use_ylabel: vertical axis label. :param style_sheet: style of plot. Use plt.style.available to show all styles. :param use_grid: show grid on plot or not. :param width: horizontal length of plot. :param height: vertical length of plot. :param use_linestyles: array of styles to use on line from ['-', '--', '-.', ':']. :param use_dpi: quality of image saved from plot. 100 is pretty high. :param path: path where to save the plot as an image - if set to None no image will be saved. :param show_plot: if you want to call `plt.show()`. or not (if you run on a headless server).
Returns:

plot_confusion_matrix [source]

plot_confusion_matrix(y_true, y_pred, classes='', normalize=False, title=None, cmap=plt.cm.Blues, image=None,
                          verbose=0, magnify=1.2, dpi=50)

Description:	This function prints and plots the confusion matrix. Normalization can be applied by setting normalize=True. y_true needs to contain all possible labels.
Parameters:	:param y_true: array labels values. :param y_pred: array predicted label values.:param classes: array list of label names. :param normalize: bool normalize confusion matrix or not. :param title: str string title of plot. :param cmap: plt.cm plot theme. :param image: str path to save plot in an image. :param verbose: int print confusion matrix when calling function. :param magnify: int zoom of plot. :param dpi: int clarity of plot.
Returns:

Description:

This function prints and plots the confusion matrix.
Normalization can be applied by setting normalize=True.
y_true needs to contain all possible labels.

Parameters:

:param
y_true: array labels values.
:param
y_pred: array predicted label values.:param
classes: array list of label names.
:param
normalize: bool normalize confusion matrix or not.
:param
title: str string title of plot.
:param
cmap: plt.cm plot theme.
:param
image: str path to save plot in an image.
:param
verbose: int print confusion matrix when calling function.
:param
magnify: int zoom of plot.
:param
dpi: int clarity of plot.

Returns:

download_from [source]

download_from(url, path)

Description:	Download file from url.
Parameters:	:param url: web path of file. :param path: path to save the file.
Returns:	:return: path where file was saved

clean_text [source]

clean_text(text, full_clean=False, punctuation=False, numbers=False, lower=False, extra_spaces=False,
               control_characters=False, tokenize_whitespace=False, remove_characters='')

Description: Clean text using various techniques.

Parameters: :param
text: string that needs cleaning.
:param
full_clean: remove: punctuation, numbers, extra space, control characters and lower case.
:param
punctuation: remove punctuation from text.
:param
numbers: remove digits from text.
:param
lower: lower case all text.
:param
extra_spaces: remove extra spaces - everything beyond one space.
:param
control_characters: remove characters like \n, \t etc.
:param
tokenize_whitespace: return a list of tokens split on whitespace.
:param
remove_characters: remove defined characters form text.

Returns: :return:
cleaned text or list of tokens of cleaned text.

Description:	Clean text using various techniques.
Parameters:	:param text: string that needs cleaning. :param full_clean: remove: punctuation, numbers, extra space, control characters and lower case. :param punctuation: remove punctuation from text. :param numbers: remove digits from text. :param lower: lower case all text. :param extra_spaces: remove extra spaces - everything beyond one space. :param control_characters: remove characters like `\n`, `\t` etc. :param tokenize_whitespace: return a list of tokens split on whitespace. :param remove_characters: remove defined characters form text.
Returns:	:return: cleaned text or list of tokens of cleaned text.

Snippets

This is a very large variety of Python snippets without a certain theme. I put them in the most frequently used ones while keeping a logical order. I like to have them as simple and as efficient as possible.

Name	Description
Read FIle	One liner to read any file.
Write File	One liner to write a string to a file.
Debug	Start debugging after this line.
Pip Install GitHub	Install library directly from GitHub using `pip`.
Parse Argument	Parse arguments given when running a `.py` file.
Doctest	How to run a simple unittesc using function documentaiton. Useful when need to do unittest inside notebook.
Fix Text	Since text data is always messy, I always use it. It is great in fixing any bad Unicode.
Current Date	How to get current date in Python. I use this when need to name log files.
Current Time	Get current time in Python.
Remove Punctuation	The fastest way to remove punctuation in Python3.
PyTorch-Dataset	Code sample on how to create a PyTorch Dataset.
PyTorch-Device	How to setup device in PyTorch to detect if GPU is available.

Comments

These are a few snippets of how I like to comment my code. I saw a lot of different ways of how people comment their code. One thing is for sure: any comment is better than no comment.

I try to follow as much as I can the PEP 8 — the Style Guide for Python Code.

When I comment a function or class:

# required import for variables type declaration
from typing import List, Optional, Tuple, Dict

def my_function(function_argument: str, another_argument: Optional[List[int]] = None,
                another_argument_: bool = True) -> Dict[str, int]
       r"""Function/Class main comment. 

       More details with enough spacing to make it easy to follow.

       Arguments:
       
              function_argument (:obj:`str`):
                     A function argument description.
                     
              another_argument (:obj:`List[int]`, `optional`):
                     This argument is optional and it will have a None value attributed inside the function.
                     
              another_argument_ (:obj:`bool`, `optional`, defaults to :obj:`True`):
                     This argument is optional and it has a default value.
                     The variable name has `_` to avoid conflict with similar name.
                     
       Returns:
       
              :obj:`Dict[str: int]`: The function returns a dicitonary with string keys and int values.
                     A class will not have a return of course.

       """
       
       # make sure we keep out promise and return the variable type we described.
       return {'argument': function_argument}

Notebooks

This is where I keep notebooks of some previous projects which I turnned them into small tutorials. A lot of times I use them as basis for starting a new project.

All of the notebooks are in Google Colab. Never heard of Google Colab? 🙀 You have to check out the Overview of Colaboratory, Introduction to Colab and Python and what I think is a great medium article about it to configure Google Colab Like a Pro.

If you check the /ml_things/notebooks/ a lot of them are not listed here because they are not in a 'polished' form yet. These are the notebooks that are good enough to share with everyone:

Name	Description	Google Colab
PyTorchText	This notebook is an example of using pytorchtext powerful BucketIterator function which allows grouping examples of similar lengths to provide the most optimal batching method.
Pretrain Transformers	This notebook is used to pretrain transformers models using Huggingface.

Final Note

Thank you for checking out my repo. I am a perfectionist so I will do a lot of changes when it comes to small details.

Lern more about me? Check out my website gmihaila.github.io

maybeee18 / ml_things Goto Github PK

ml_things's Introduction

Machine Learning Things

Table of contents

ML_things

Installation

Functions

pad_array [source]

batch_array [source]

plot_array [source]

plot_dict [source]

plot_confusion_matrix [source]

download_from [source]

clean_text [source]

Snippets

Comments

Notebooks

Final Note

ml_things's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org