Giter VIP home page Giter VIP logo

flow2ml's Introduction

Flow2ML

Issues PRs Forks Stars Contributors PyPi Version


Table Of Contents

Introduction

Write only a Few Lines of Machine learning code using Flow2Ml

Quickly design and customize pre-processing workflow in machine learning. Obtain training, validating samples with only 3 lines of code using Flow2ML toolkit

Check Installation and sample code to flow into your ML model much faster and efficiently.

Why Flow2ML

Flow2ML is an open-source library to make the machine learning process much simpler. It loads the image data and applies the given filters and returns train data, train labels, validation data, and validation labels. For all these steps it just takes 3 lines of code. It mostly helps beginners in the field of machine learning and deep learning where the user would deal with image-related data.

Programming languages and technologies used:

  1. Python
  2. HTML
  3. Numpy library
  4. OpenCV
  5. Machine Learning

Dependencies

Before Running the code you need to have certain packages to be installed. They are listed out here

  1. cv2
  2. os
  3. shutil
  4. sklearn
  5. numpy
  6. matplotlib
  7. tensorflow
  8. tensorflowjs
  9. unittest2
  10. pandas
  11. seaborn

Open Source programs that Flow2ML is a part of:

Download all Dependencies by :

pip install -r requirements.txt

Getting Started

Flow2ML for Python can be installed from Source,Pypi and Docker container installation methods.

From Source

$ git clone https://github.com/flow2ml/Flow2ML.git
$ cd flow2ML

Or using PIP:

$ git clone https://github.com/flow2ml/Flow2ML.git
$ cd flow2ML <br / $ pip install .

From Pypi

$ pip install flow2ml

From Docker Image / Container

Clone this repo and cd into it:
$ git clone https://github.com/flow2ml/Flow2ML.git
$ cd flow2ML <br /

Build the docker image:
$ docker build -t flow2ml .

Now you can run any of the code in this directory:

Run the container, and specifying which code to run.
-v : The volume on which repo code is mounted
Replace "script.py" with the name of your code you'd like to run
Replace "pwd" with the path of the example file you'd like to run.

$ docker run -it --rm \ -v $(pwd)/:/root/flow2ml/ python script.py

Sample Code

Sample code for using the package can be found here.

Contributing

If you want to contribute to Flow2Ml, Please look into issues and propose your solutions to them. We promote contributions from all developers regardless of them being a beginner or a pro. We go by the moto Caffeinate☕|| Collaborate🤝🏼|| Celebrate🎊 before that, please read contributing guidelines

Contributors👩🏽‍💻👨‍💻

Credits goes to these wonderful people:✨

flow2ml's People

Contributors

aditi1403 avatar ainy-123 avatar anirudhsai20 avatar arkaprabhachakraborty avatar aryamanz29 avatar ashukv avatar aslmanasa avatar chebroluharika avatar earnsans avatar fomalhauting avatar himanshu007-creator avatar khareyash05 avatar neerajap2001 avatar noobkid2411 avatar polokghosh53 avatar rajeshpanjiyar avatar rubyruins avatar sahu-01 avatar shikha-16 avatar yvkrishna avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

flow2ml's Issues

add contributing.md

Hi ! I would like to create a file named contributing.md which will
Add the following -

1.Difference between GIT and GITHUB
2.How to clone,fork repository
3.How to create a branch and then use git push to push to repo
4.Create a PR
5.Squash commits in a single issue into one
6, Updating the forked and local repo as the updations are made in the upstream
I would like to work on this as a part of GSSOC'21

Automate Confussion matrix, Roc Curves, Accuracy plots codes.

After training a model, we need to check it using few result metrics like Accuracy plots codes, confusion matrix, Roc Curves, etc.

This issue focuses on coding the functions for those metrics:

  • For Accuracy plots:
    • Take the trained model as input and save the model accuracy plots based on the metrics the user has chosen while training.
  • For Confusion matrix and Roc Plots:
    • Take the trained model and test data as inputs and fetch the predicted results. Compare those values to truth values and generate corresponding confusion matrix and roc curves.

Improving code quality

  1. Add exception handling for all scenarios like file management, for loop exhaust, OS management etc.,
  2. Add Documentation wherever required.
  3. Removed commented code.

Automated unit tests for functionalities

Now, as we are working in Test Driven Environment, writing unit tests is crucial for any software development.

  1. Write unit tests for methods in .py files using pytest or unittest framework.
  2. Create github workflow to run those unit tests for every pull-request raised.
  3. Make sure that coverage should be above 90%

Contribution for GSSOC

Hi, I would love to contribute to this project during GSSOC 21. Could you tell me a bit more about the progress on the project and what sort of features you expect us to implement?

Add Examples

Need to add examples for this library. Users can able to understand the usage of the library if they can have a few examples.
Also documenting the example code helps.

Duplicate images being stored after processing

Current Behavior:

Currently, this is the code I have run:

# To be given input by the user.
img_dimensions = (150,150,1)
test_val_split = 0.5

# Import flow2ml package
from flow2ml import Flow

# Give the Dataset and Data directories
flow = Flow( 'dataset_dir' , 'data_dir' )

# Define The Filers to be used
filters = ["median", "gaussian", "sobelx", "sobely"]

# Apply The Filters
flow.applyFilters(filters)

(train_x, train_y, val_x, val_y) = flow.getDataset(img_dimensions, test_val_split)

I have used 2 classes with 5 images each in for the filters feature. A new folder processedData contains the processed images with the filters applied, however, the processed images are also present in the original folder data_dir containing the class images. The same image is duplicated and stored in 2 different places. Screenshot attached below:

Screenshots (optional):

Expected Behavior:

The processed images are getting stored in the data_dir for the class, and are duplicates of the images in the processedData folder. It would be more convenient for the user if the processed images are only there in the separate folder processedData, and not in the original folder containing the class images.

I am also working on the unit tests issue, and it would be easier to test the functionalities if this issue is resolved.

Solution:

After the processedData folder is generated, delete the duplicate images and folders from the data_dir folder. Eg: data_dir/apple/GaussianImages, data_dir/apple/MedianImages etc.

Deploying tensorflow models

You will be having the TensorFlow model trained using the data.
As part of this issue, we need to automate the process of model deployment using TensorFlow method like tfjs.converters.save_keras_model()

As part of this issue,

  1. we need to use the flow.py file.
  2. We need to have deploy_tensorflow_model() as a class method in it which can be used to convert the input model to tfjs model or tflite model based on user inputs.

Later we can try to include various other frameworks as well.

Update .gitignore file

A gitignore file specifies intentionally untracked files that Git should ignore.
Update a gitignore file and include *.egg-info package as it gets created when we run setup-tools to upload package to Pypi. Hence, it doesn't needs to be tracked by Git.

Store image array as a class variables

Up to now, we tried to store processed images as image files in folders,

This issue focuses on creating a class variable.
All the images would be stored in the class variable and also folders as input by the user.

Bug in create_dataset() and prepare_dataset() in DataLoader.py

There were few fixes needed for DataLoader.py file.

  1. Fixing post mentioned error in createDataSet function.

Percent: [##########] 100% Filtered all images in apple ...
Percent: [##########] 100% Filtered all images in banana ...

Percent: [----------] 0.0% Traceback (most recent call last):
File "tryout.py", line 18, in
(train_x, train_y, val_x, val_y) = flow.getDataset( img_dimensions, test_val_split )
File "C:\Users\ACER\Downloads\Flow2ML\flow2ml\Flow.py", line 116, in getDataset
self.img_label = self.create_dataset(path)
File "C:\Users\ACER\Downloads\Flow2ML\flow2ml\Data_Loader.py", line 94, in create_dataset
folderName = list(classPath.split("/"))[2]
IndexError: list index out of range

  1. Correcting tulpe to tuple.

  2. Create numpy dataset dynamic enough accordingly with loaded image dimensions and number of channels.

Complete visualizeFilters method in Filters.py File

In Filters.py file At last there is a method named visualizeFilters().

In general, if a user tries to apply few filters then their respective image data would be stored in processedData folder.
As part of this issue, we need to take an original sample image and their filtered versions and plot the original image vs filtered versions of that image in a single plot. The input to that method would be the filters selected by the user.

I am attaching the expected output after completing this issue. The image is a color image. We can neglect Hog Feature and keep other filters selected by the user

sample image

Initial Test

Before adding on new features, It would be better to test the codebase till now so that the corrected code can then be reused for implementing new features

Requirements not updated

Describe the bug:

Running pip install -r requirements.txt gives the following error:

image

To Reproduce:

Please provide the steps to reproduce this bug/error:
This issue is faced when working in a virtual environment and trying to run any sample code from this package. I think there are a few packages that are not yet updated in the requirements. The code in question is given below and was previously working before merging with the upstream.

from flow2ml import Flow

flow = Flow( 'dataset_dir' , 'data_dir' )
operations = {'crop': [50, 100, 50, 100]}
flow.applyAugmentation( operations )

Expected behavior:

A clear and concise description of what you expected to happen.
The code should run smoothly and free of errors after installing all the requirements.


Generating a Requirement.txt File

I'd like to generate a requirements.txt file for this project, It would help people to download all the dependencies in one place and work faster.

I'd like to work on this issue as part of the GSSOC-21
Thanks

Add Examples -Horses Vs Humans

Need to add examples for this library. Users can able to understand the usage of the library if they can have a few examples.
Also documenting the example code helps.

Before starting contributing to this issue it is necessary that to complete #55 and its sub-issues.

If data augmentation techniques have also been completed then we can complete this issue.

Add automated analysis support for tensorflow models

  • I checked to make sure that this issue has not already been filed.

  • I'm reporting the issue to the correct repository (for multi-repository projects) (optional)

Expected Behavior:

There should be a class for automated evaluation of tensorflow models that will accept the model's history as an input and make the roc curves, confusion matrix and pr curves. The class can have an additional method to create a doc file with those plots.

Solution:

Similar to like Auto_Results class.

Data Augumentation -- Rotation

This is a sub-issue for #55

This issue needs to be resolved asap.

Need to create a method for Data_Augumentation.py file for rotating the image.

Face_Recognition using KNN

Hi, I'm a Gssoc'21 participant and want to contribute to your repository
I want to contribute the face_detection part using Knn Algorithm which detects the face with the help of KNN algorithm

@yvkrishna Please assign this part to me

Updates for README.md

  1. Provide link to contributing.md file under contribution section.
  2. Provide proper introduction.

Data Augumentation -- Cropping

This is a sub-issue for #55

This issue needs to be resolved asap.

Need to create a method for Data_Augumentation.py file for Cropping the image.

Add exception handing for os related commands

Add proper exception handling for os related commands
os.mkdir() - it creates a new directory in given specific path.

Now let's say, what if directory with that given name already exists?
Create a utility function with below mentioned pipeline and call this function wherever you want to create a new directory using os.mkdir()
Display a message on console saying "Directory with same name already exists, do you want me to delete it and create a new one", if user gives input as "Y", then remove that old directory and create it again, else if user input is "N", then display a message "Please provide me another name for directory", then create a directory with that new proposed name.

Create github workflow

Create GitHub workflow for below scenarios

  1. Run syntax checks for .py files for every pull-request.
  2. Publish package to pip whenever there is a new release published in GitHub.

Add welcome bot

I would like to add a welcome bot for new issues, new prs.
Kindly assign it to me under gssoc, thanks!

Making code compatible enough for non-image datasets

Need automating functionalities like

  • exploring data,
  • check missing values count,
  • check if there are duplicate values in columns,
  • visualizing features with scatter plots,
  • plot correlation matrix,
  • one hot encoding for target categorical variables if there are any etc.,

which are initial pre-processing steps for non-image data.

Please feel free to add like those steps as apart of this issue which you think they might be useful for text datasets.

Data Augumentation -- Shearing

This is a sub-issue for #55

This issue needs to be resolved asap.

Need to create a method for Data_Augumentation.py file to apply shearing to the image.

Color reversal in filters feature

For the example image given below, I applied the filters functionality on it:
apple_3

This is the result after applying Gaussian filter on the image:
gaussianapple_3

This is the median filter result:
Medianapple_3

The other 3 filters seem to be working fine (Sobelx, Sobely and Laplacian), but for these images there seems to be a colour reversal. If it is a known issue, can I work on this, please?

Create a report File

You will be having the result metrics like confusion matrix, roc curves, etc in a folder.
As part of this issue, one needs to figure out a solution to generate a report.pdf file that contains the following information.

  1. Accuracy-Loss plots,
  2. Confusion Matrix,
  3. Roc Curves

Data Augumentation -- scaling

This is a sub-issue for #55

This issue needs to be resolved asap.

Need to create a method for Data_Augumentation.py file for scaling the image.

Data Augumentation

Data augmentation plays one of the key roles in any ml project. So need to implement data augmentation techniques such as

  1. Flipping
  2. Rotation
  3. Shearing
  4. Cropping
  5. Zoom in, Zoom out
  6. scaling

using python dictionary. Where the key denotes the data augmentation type. value denotes the amount of that particular technique that needs to be applied

For example
augmentation_techniques = { rotation:15, zoom_range: 0.5, shear_range: 0.5, .. ... ... }

Add Examples -Cats Vs Dogs

Need to add examples for this library. Users can able to understand the usage of the library if they can have a few examples.
Also documenting the example code helps.

Before starting contributing to this issue it is necessary that to complete #55 and its sub-issues.

If data augmentation techniques have also been completed then we can complete this issue.

Automating Results

Feature request:

Results are another main part of any machine learning project. Results consist of accuracy-loss plots, confusion matrix, ROC curves, etc. Flow2Ml focuses on making the entire pipeline of machine learning much easier. Thus In this feature issue, I propose another idea of automating the results.

The Idea:

A class should be created which contains various methods for various functionalities by taking the trained models as input. With the trained models, the methods should generate various result metrics such as confession matrix, roc curves, etc. With all available result metrics finally, a report.pdf file should be created describing the entire process involved in the workflow.

Uses:

With only a few lines of code, the entire results part could be completed and finally, a report would be generated which can be beneficial to the researchers.


Side Note:
If anyone has anything to add to this idea they are free to mention in this thread

Update Readme File

Hey, I am GSSOC'21 Participant.
I would like to update the readme file by adding some contents like "How to contribute" and "How to make pull request".
Can to please assign this to me?

Data Augumentation -- Flipping

This is a sub-issue for #55

This issue needs to be resolved asap.

Need to create a method for Data_Augumentation.py file for flipping the image.

Create Logo

To create a new Logo for this organization.
Need UI developers.

Data Augumentation

Data augmentation plays one of the key roles in any ml project. So need to implement data augumentation techniques such as

  1. Flipping
  2. Rotation
  3. Shearing
  4. Cropping
  5. Zoom in, Zoom out
  6. scaling

Visualize countPlot of categories after loading dataset

This should be the pipeline:

  1. After initializing and getting classNames of dataset_dir that we pass, calculate the number of images are present in each classificationType.
  2. Display the count on console.
  3. Load that countplot in a pdf with visualization.
    4 If class-imbalance problem exists, give an option to user if he want to view that pdf, if "Y", then display it.
  4. Then show message saying like "Class-imbalance problem exists for your dataset, do you want me to proceed", if user gives "Y" please continue the execution else exit.

Please feel free to add any enhancement if you think that will add a value to this.

Readme file enhancement

The Readme file can be enhanced by adding a list of all the Open Source programs in which this project was listed in.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.