klareda / klar-eda Goto Github PK

A python library for automated exploratory data analysis

Home Page: https://klareda.github.io/klar-EDA/

License: MIT License

Python 47.17% Makefile 0.28% CSS 11.85% JavaScript 29.00% HTML 11.39% Batchfile 0.32%

csv-data-preprocessing klar-eda csv-data-visualization exploratory-analysis python-library image-data-preprocessing image-preprocessing csv-preprocessing csv-visualization data-preprocessing

klar-eda's Introduction

klar-eda

A python library for automated exploratory data analysis

Overview

Documentation - https://klareda.github.io/klar-EDA/

Presentation - https://youtu.be/FsDV6a-L-wo

The library aims to ease the data exploration and preprocessing steps and provide a smart and automated technique for exploratory analysis of the data

The library consists of the following modules

CSV Data Visualization
CSV Data Preprocessing
Image Data Visualization
Image Data Preprocessing

Usage

You can install the test version of the library by the below command::

$ pip3 install -i https://test.pypi.org/simple/ klar-eda

The above-mentioned modules can be used as below::

>>> import klar_eda

CSV Data Visualization

>>> from klar_eda.visualization import visualize_csv

>>> visualize_csv(<csv-file-path>) 

OR

>>> visualize_csv(<data-frame>)

CSV Data Preprocessing

>>> from klar_eda.preprocessing import preprocess_csv

>>> preprocess_csv(<csv-file-path>) 

OR

>>> preprocess_csv(<data-frame>)

Image Data Visualization

>>> from klar_eda.visualization import visualize_images

>>> ds = tfds.load('cifar10', split='train', as_supervised=True)
>>> images = []
>>> labels = []
>>> for image, label in tfds.as_numpy(ds):
        h = randint(24, 56)
        w = randint(24, 56)
        image = cv2.resize(image, (w, h))
        images.append(image)
        labels.append(label)

>>> visualize_images(images, labels)

Image Data Preprocessing

>>> from klar_eda.preprocessing import preprocess_images

>>> preprocess_images(<images-folder-path>)

If you liked our project, it would be really helpful if you could share this project with others.

Contributing

For contributing to this project, feel free to clone the repository::

git clone https://github.com/klarEDA/klar-EDA.git

For installing the necessary packages, run the below command::

$ pip3 install -r requirement.txt

Documentation

To test the documentation in local::

$ cd docsource/
$ make html

To push the latest documentation in github::

$ cd docsource/
$ make github

License

klar-eda is released under the MIT license.

Please feel free to contact us for any issues OR for discussion of future scope of the library at [email protected]

Owners

Ashish Kshirsagar Rishabh Agarwal Sayali Deshpande Ishaan Ballal

References

https://test.pypi.org/project/klar-eda/

klar-eda's People

Contributors

Stargazers

Watchers

klar-eda's Issues

Implement a method for date feature extraction in csv data preprocessor

Description

a. Write a method to identify the columns of type date (this may include iterating over the list of columns and using an appropriate strategy to identify if a column has values of type date)

b. Implement another method that should be able to convert the date column into a specific static format (for example - YYYY-MM-DD) and split the date column into separate columns with the following attribute values:

Date of the month (for example - 28 for '2021-12-28')

Month (Numerical)

Year

Day of the week

c. Appropriate test methods should be implemented in the date_format_tests file

Assumptions

The following assumptions can be made during the implementation

No time is present in the given input date.

The data frame must contain column names

A list of input patterns can be assumed. (For example - you can assume the input will be in either of any known formats mentioned).
input_date_format = [ 'DD/MM/YYYY', 'YYYY/DD/MM', 'MM/DD/YYYY', 'YYYY/MM/DD', 'DD-MM-YYYY', 'YYYY-DD-MM', 'MM-DD-YYYY', 'YYYY-MM-DD' ]

Input (Method -1)

None

Output (Method-1)

list of column names with values of type date

Method details

Use the data frame from the self.df variable.

Input (Method -2)

An expected format the input date should be converted to

Output (Method-2)

None

Method details

Use the data frame from the self.df variable.

Implement a method for the same with appropriate name and parameters in the csv_preprocess.py file.

In the implementation use the method convert_date_format for converting the date into a specific format & the method-1 mentioned above to get a list of columns with date type.

Note

The use of standard python libraries is highly recommended.

JOIN THE SLACK CHANNEL HERE if you wish to contribute to this issue.

Update README file

@ashish-hacker @harshasridhar @ishaanballal21 @rishabh-me
README is the first file one should read when starting a new project. It's a set of useful information about a project, and a kind of manual. A README text file appears in many various places and refers not only to programming. So i want to make your README file more meaningful and more easier to understand the whole project.

Add documentation for the implemented methods in csv data preprocessor

Description

Document the methods present in the csv_preprocess.py abiding by the Sphinx Documentation examples

Tech Stack Update ( LOGO add )

I will like to add tech stack logo used in our project this will make our project attractive !! Will start to work on this issue as soon as i get assigned !!

Identify missing csv data visualization methods and implement the methods with a test case | Generic Issue - Not to be assigned

Description

Identify the missing methods of CSV data visualization in this repository.
Find suitable cases of data and machine learning problems for which the method should be used.
Implement the method only for those cases by adding smart logic.

Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.

Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.

Error while import

Indentation error on import
PFA logs

 >>>from klar_eda.preprocessing import preprocess_csv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocessing.py", line 1, in <module>
    from .preprocess.csv_preprocess import CSVPreProcess
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocess/csv_preprocess.py", line 71
    if ret == True:
                  ^
IndentationError: unindent does not match any outer indentation level

>>> from klar_eda import visualization
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/visualization.py", line 1, in <module>
    from .visualize.csv_visualize import CSVVisualize
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/visualize/csv_visualize.py", line 13, in <module>
    from ..preprocess.csv_preprocess import CSVPreProcess
  File "/Users/harshams/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocess/csv_preprocess.py", line 71
    if ret == True:
                  ^
IndentationError: unindent does not match any outer indentation level

Implement a format identifier method for date in csv data preprocessor

Description

The method should be able to identify and convert the date into a specific static format.
The functionality of the method can be described as below -

Take in any type of date as input (for example - 2021-11-13)

Identify the format (for example - YYYY-MM-DD)

Convert the date into any desired format (for example - DD/MM/YYYY)

Assumptions

The following assumptions can be made during the implementation

No time is present in the given input date.

The input will be only a string.

A list of input patterns can be assumed. (For example - you can assume the input will be in either of any known formats mentioned).
input_date_format = [ 'DD/MM/YYYY', 'YYYY/DD/MM', 'MM/DD/YYYY', 'YYYY/MM/DD', 'DD-MM-YYYY', 'YYYY-DD-MM', 'MM-DD-YYYY', 'YYYY-MM-DD' ]

Input

A string in any of the formats mentioned above (The contributor is free to add any other formats)

An expected output format the input date should be converted to

Output

A date in string converted into the desired format.

Note

The use of standard python libraries is highly recommended.

Add documentation for the implemented methods in image data preprocessor

Description

Document the methods present in the image_preprocess.py abiding by the Sphinx Documentation examples

Adding Contributor section in readme ( Automatically )

I will like to add a feature this will update contributor list in read me automatically !! Will start to work on this issue as soon as i get assigend !!

Restructure the image preprocessing module to provide submodules

Is your feature request related to a problem? Please describe.

Image preprocessing is a broad umbrella which encompasses various kind of techniques, which can be grouped into a submodule. It is always good to segregate related functionalities into a submodule, gives a good user experience as well

Describe the solution you'd like

The module can be divided into sub-modules such as transformation(pixel brightness, geometric), filtering(spatial, frequency), segmentation(edge-based, region-based), morphology(binary, grayscale) and so on.

Input

Raw image

Output

Preprocessed image

Note

The use of standard python libraries is highly recommended.

GUI for Exploratory Data Analysis

Is your feature request related to a problem? Please describe.

This is a calculator for Exploratory Data Analysis.
We can import dataset as csv and excel format.
We can then put row and column and then there mathematical aspects .
For example mean ,median ,mode, max min , variance.
This also help in making some plots like
line, histogram, scatter, regression ,etc.
Input

CSV or excel file

Output

Numeric or graphical

Note

This is just additional support to library .

Additional context

I will be working on these feature under GSSoC'22
We don't have to go to code to just find the some value again and again.

Add help description for the package

Add help description to the package, that helps the user understand the purpose of the project, modules, submodules etc.
Currently no description provided, it is as follows:

Help on package klar_eda:

NAME
    klar_eda

PACKAGE CONTENTS
    preprocessing
    visualization

SUBMODULES
    preprocess
    visualize

FILE
    (built-in)

(END)

Add documentation for the implemented methods in csv data visualizer

Description

Document the methods present in the csv_visualize.py abiding by the Sphinx Documentation examples

Identify missing image data preprocessing methods and implement the methods with a test case | Generic Issue - Not to be assigned

Description

Identify the missing methods of Image data preprocessing in this repository.
Find suitable cases of data and machine learning problems for which the method should be used.
Implement the method only for those cases by adding smart logic.

Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.

To add contributor's name in README.md file.

Is your feature request related to a problem? Please describe.

To add contributor's name in README.md file.

Identify missing csv data preprocessing methods and implement the methods with a test case | Generic Issue - Not to be assigned

Description

Identify the missing methods of CSV data preprocessing in this repository.
Find suitable cases of data and machine learning problems for which the method should be used.
Implement the method only for those cases.

Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.

Update README.md

remove grammatical mistakes

add contributing.md

Hi ! I would like to create a file named contributing.md which will
Add the following -

1.Difference between GIT and GITHUB
2.How to clone,fork repository
3.How to create a branch and then use git push to push to repo
4.Create a PR
5.Squash commits in a single issue into one
6, Updating the forked and local repo as the updations are made in the upstream
I would like to work on this as a part of GSSOC'21

Update README.md file

I have made some minor grammatical edits to the README.md file. Kindly refer to PR #35.

Implement different normalization techniques in csv data preprocessor

Description

The implementation can take one or multiple methods. After the implementations of the method(s), the following things are
achievable :

Mean Normalisation of features

Standardization of features

Assumptions

For standardization of features, it is assumed that the data is in Gaussian Distribution

Input

DataFrame

Output

Processed data according to the method. After Standardization or Normalisation.

Add documentation for the implemented methods in image data visualizer

Description

Document the methods present in the image_visualize.py abiding by the Sphinx Documentation examples

Fix pip installation issues

pip3 install -i https://test.pypi.org/simple klar-eda fails with error because of a missing dependency
It starts failing sklearn, opencv-python, tensorflow, pandas, sphinx, matplotlib and seaborn.
Impacts severely on ease of use and user experience.

Contribution for GSSOC

Hi, I would love to contribute to this project during GSSOC 21. Could you tell me a bit more about the progress on the project and what sort of features you expect us to implement during the contribution period?

klareda / klar-eda Goto Github PK

klar-eda's Introduction

klar-eda

Overview

Usage

CSV Data Visualization

CSV Data Preprocessing

Image Data Visualization

Image Data Preprocessing

Contributing

Documentation

License

Owners

References

klar-eda's People

Contributors

Stargazers

Watchers

Forkers

klar-eda's Issues

Description

Description

Description

Description

Assumptions

Input

Output

Recommend Projects

Recommend Topics

Recommend Org