Giter VIP home page Giter VIP logo

klareda / klar-eda Goto Github PK

View Code? Open in Web Editor NEW
17.0 7.0 22.0 354 KB

A python library for automated exploratory data analysis

Home Page: https://klareda.github.io/klar-EDA/

License: MIT License

Python 47.17% Makefile 0.28% CSS 11.85% JavaScript 29.00% HTML 11.39% Batchfile 0.32%
csv-data-preprocessing klar-eda csv-data-visualization exploratory-analysis python-library image-data-preprocessing image-preprocessing csv-preprocessing csv-visualization data-preprocessing

klar-eda's Introduction

Build Status

klar-eda

A python library for automated exploratory data analysis

Overview

Documentation - https://klareda.github.io/klar-EDA/

Presentation - https://youtu.be/FsDV6a-L-wo

The library aims to ease the data exploration and preprocessing steps and provide a smart and automated technique for exploratory analysis of the data

The library consists of the following modules

  • CSV Data Visualization
  • CSV Data Preprocessing
  • Image Data Visualization
  • Image Data Preprocessing

Usage

You can install the test version of the library by the below command::

$ pip3 install -i https://test.pypi.org/simple/ klar-eda    

The above-mentioned modules can be used as below::

>>> import klar_eda

CSV Data Visualization

>>> from klar_eda.visualization import visualize_csv

>>> visualize_csv(<csv-file-path>) 

OR

>>> visualize_csv(<data-frame>)

CSV Data Preprocessing

>>> from klar_eda.preprocessing import preprocess_csv

>>> preprocess_csv(<csv-file-path>) 

OR

>>> preprocess_csv(<data-frame>)

Image Data Visualization

>>> from klar_eda.visualization import visualize_images

>>> ds = tfds.load('cifar10', split='train', as_supervised=True)
>>> images = []
>>> labels = []
>>> for image, label in tfds.as_numpy(ds):
        h = randint(24, 56)
        w = randint(24, 56)
        image = cv2.resize(image, (w, h))
        images.append(image)
        labels.append(label)

>>> visualize_images(images, labels)

Image Data Preprocessing

>>> from klar_eda.preprocessing import preprocess_images

>>> preprocess_images(<images-folder-path>)

If you liked our project, it would be really helpful if you could share this project with others.

Contributing

For contributing to this project, feel free to clone the repository::

git clone https://github.com/klarEDA/klar-EDA.git

For installing the necessary packages, run the below command::

$ pip3 install -r requirement.txt

Documentation

To test the documentation in local::

$ cd docsource/
$ make html

To push the latest documentation in github::

$ cd docsource/
$ make github

License

klar-eda is released under the MIT license.

Please feel free to contact us for any issues OR for discussion of future scope of the library at [email protected]

Owners

Ashish Kshirsagar Rishabh Agarwal Sayali Deshpande Ishaan Ballal

References

https://test.pypi.org/project/klar-eda/

klar-eda's People

Contributors

abh33 avatar aditi1403 avatar ashish-hacker avatar ask149 avatar harshasridhar avatar ishaanballal21 avatar kajalsinghbaghel avatar klareda-admin avatar lekshmissunil avatar psy2d avatar rishabh-me avatar shankhanil avatar sibasish-padhy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

klar-eda's Issues

Implement a method for date feature extraction in csv data preprocessor

Description

a. Write a method to identify the columns of type date (this may include iterating over the list of columns and using an appropriate strategy to identify if a column has values of type date)

b. Implement another method that should be able to convert the date column into a specific static format (for example - YYYY-MM-DD) and split the date column into separate columns with the following attribute values:

  1. Date of the month (for example - 28 for '2021-12-28')
  2. Month (Numerical)
  3. Year
  4. Day of the week

c. Appropriate test methods should be implemented in the date_format_tests file

Assumptions

The following assumptions can be made during the implementation

  1. No time is present in the given input date.
  2. The data frame must contain column names
  3. A list of input patterns can be assumed. (For example - you can assume the input will be in either of any known formats mentioned).
    input_date_format = [ 'DD/MM/YYYY', 'YYYY/DD/MM', 'MM/DD/YYYY', 'YYYY/MM/DD', 'DD-MM-YYYY', 'YYYY-DD-MM', 'MM-DD-YYYY', 'YYYY-MM-DD' ]

Input (Method -1)

None

Output (Method-1)

list of column names with values of type date

Method details

Use the data frame from the self.df variable.

Input (Method -2)

An expected format the input date should be converted to

Output (Method-2)

None

Method details

Use the data frame from the self.df variable.

Implement a method for the same with appropriate name and parameters in the csv_preprocess.py file.

In the implementation use the method convert_date_format for converting the date into a specific format & the method-1 mentioned above to get a list of columns with date type.

Note

The use of standard python libraries is highly recommended.

JOIN THE SLACK CHANNEL HERE if you wish to contribute to this issue.

Update README file

@ashish-hacker @harshasridhar @ishaanballal21 @rishabh-me
README is the first file one should read when starting a new project. It's a set of useful information about a project, and a kind of manual. A README text file appears in many various places and refers not only to programming. So i want to make your README file more meaningful and more easier to understand the whole project.

Tech Stack Update ( LOGO add )

I will like to add tech stack logo used in our project this will make our project attractive !! Will start to work on this issue as soon as i get assigned !!

Identify missing csv data visualization methods and implement the methods with a test case | Generic Issue - Not to be assigned

Description

  1. Identify the missing methods of CSV data visualization in this repository.
  2. Find suitable cases of data and machine learning problems for which the method should be used.
  3. Implement the method only for those cases by adding smart logic.

Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.

Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.

Error while import

Indentation error on import
PFA logs

 >>>from klar_eda.preprocessing import preprocess_csv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocessing.py", line 1, in <module>
    from .preprocess.csv_preprocess import CSVPreProcess
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocess/csv_preprocess.py", line 71
    if ret == True:
                  ^
IndentationError: unindent does not match any outer indentation level
>>> from klar_eda import visualization
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/visualization.py", line 1, in <module>
    from .visualize.csv_visualize import CSVVisualize
  File "/Users/harsha/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/visualize/csv_visualize.py", line 13, in <module>
    from ..preprocess.csv_preprocess import CSVPreProcess
  File "/Users/harshams/opt/anaconda3/envs/klar/lib/python3.7/site-packages/klar_eda/preprocess/csv_preprocess.py", line 71
    if ret == True:
                  ^
IndentationError: unindent does not match any outer indentation level

Implement a format identifier method for date in csv data preprocessor

Description

The method should be able to identify and convert the date into a specific static format.
The functionality of the method can be described as below -

  1. Take in any type of date as input (for example - 2021-11-13)
  2. Identify the format (for example - YYYY-MM-DD)
  3. Convert the date into any desired format (for example - DD/MM/YYYY)

Assumptions

The following assumptions can be made during the implementation

  1. No time is present in the given input date.
  2. The input will be only a string.
  3. A list of input patterns can be assumed. (For example - you can assume the input will be in either of any known formats mentioned).
    input_date_format = [ 'DD/MM/YYYY', 'YYYY/DD/MM', 'MM/DD/YYYY', 'YYYY/MM/DD', 'DD-MM-YYYY', 'YYYY-DD-MM', 'MM-DD-YYYY', 'YYYY-MM-DD' ]

Input

A string in any of the formats mentioned above (The contributor is free to add any other formats)

An expected output format the input date should be converted to

Output

A date in string converted into the desired format.

Note

The use of standard python libraries is highly recommended.

Restructure the image preprocessing module to provide submodules

Is your feature request related to a problem? Please describe.

Image preprocessing is a broad umbrella which encompasses various kind of techniques, which can be grouped into a submodule. It is always good to segregate related functionalities into a submodule, gives a good user experience as well

Describe the solution you'd like

The module can be divided into sub-modules such as transformation(pixel brightness, geometric), filtering(spatial, frequency), segmentation(edge-based, region-based), morphology(binary, grayscale) and so on.

Input

Raw image

Output

Preprocessed image

Note

The use of standard python libraries is highly recommended.

GUI for Exploratory Data Analysis

Is your feature request related to a problem? Please describe.

This is a calculator for Exploratory Data Analysis.
We can import dataset as csv and excel format.
We can then put row and column and then there mathematical aspects .
For example mean ,median ,mode, max min , variance.
This also help in making some plots like
line, histogram, scatter, regression ,etc.
Input

CSV or excel file

Output

Numeric or graphical

Note

This is just additional support to library .

Additional context

I will be working on these feature under GSSoC'22
We don't have to go to code to just find the some value again and again.

Add help description for the package

Add help description to the package, that helps the user understand the purpose of the project, modules, submodules etc.
Currently no description provided, it is as follows:

Help on package klar_eda:

NAME
    klar_eda

PACKAGE CONTENTS
    preprocessing
    visualization

SUBMODULES
    preprocess
    visualize

FILE
    (built-in)

(END) 

Identify missing image data preprocessing methods and implement the methods with a test case | Generic Issue - Not to be assigned

Description

  1. Identify the missing methods of Image data preprocessing in this repository.
  2. Find suitable cases of data and machine learning problems for which the method should be used.
  3. Implement the method only for those cases by adding smart logic.

Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.

Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.

Identify missing csv data preprocessing methods and implement the methods with a test case | Generic Issue - Not to be assigned

Description

  1. Identify the missing methods of CSV data preprocessing in this repository.
  2. Find suitable cases of data and machine learning problems for which the method should be used.
  3. Implement the method only for those cases.

Please Note -
This is a generic issue and multiple students can work on the same. Notify the mentors once you identify a method (as mentioned above). The mentor will create a separate issue and assign you the same.

Contribution guidelines will be updated soon. Please refer them for guidance before committing any development work.

add contributing.md

Hi ! I would like to create a file named contributing.md which will
Add the following -

1.Difference between GIT and GITHUB
2.How to clone,fork repository
3.How to create a branch and then use git push to push to repo
4.Create a PR
5.Squash commits in a single issue into one
6, Updating the forked and local repo as the updations are made in the upstream
I would like to work on this as a part of GSSOC'21

Implement different normalization techniques in csv data preprocessor

Description

The implementation can take one or multiple methods. After the implementations of the method(s), the following things are
achievable :

  • Mean Normalisation of features
  • Standardization of features

Assumptions

For standardization of features, it is assumed that the data is in Gaussian Distribution

Input

  1. DataFrame

Output

Processed data according to the method. After Standardization or Normalisation.

Fix pip installation issues

pip3 install -i https://test.pypi.org/simple klar-eda fails with error because of a missing dependency
It starts failing sklearn, opencv-python, tensorflow, pandas, sphinx, matplotlib and seaborn.
Impacts severely on ease of use and user experience.

Contribution for GSSOC

Hi, I would love to contribute to this project during GSSOC 21. Could you tell me a bit more about the progress on the project and what sort of features you expect us to implement during the contribution period?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.