Giter VIP home page Giter VIP logo

adv's Introduction

Automatic Data Visualization Tool - Idea

This file contains ideas and workplans of The ADV tool.

The Idea:

    Automatic Data Visualization (ADV), is a tool that will allow users to enter any kind of data to the pipeline and get the plottings of it.

    ADV basicly analyses the contents of a file and decides best plotting algorithms or methods to plot the data that is provided from user. Input will be analysed from all kinds aspects such as: column types, relations, data variances, means, medians etc. After, the tool will generate a metadata of input, no matter the shape of input. System will use state-of-the-art technics of Machine Learning to help itself deciding what is best while only looking the metadata. Results of desicions will be input for plotter engine which is a high-graded algorithm that can perform any kind of plotting action. Later on, users can view the results that algorithm generated within seconds with a good looking UI which they can interract with results and do one last touch on them before exporting. The last step will be the export of the results which will be prepared in a special way to preserve the means of all aspects of the data.

    In the end, one simple tool will do all of the job that a data scientist will perform on a data to get meaningful plottings. The mission here is actually helping the individuals by not only showing what is hidden inside the data, but help them to understant it.

Process:

  1. Import API
  2. Input analysing
    1. File format checker
    2. File content checker
      1. File datatypes checking
      2. Language analysis & defining categorical values
      3. Numeric data stats calculation
    3. Metadata (Data dictionary) generating
      1. Metadata export API
  3. Plot models selection (Metadata fed to ML models)
  4. Plotter Engine
  5. Interactive Visualizator Engine
  6. Export API

Design Notes on Steps:

  1. Import API

    Basic import api. Gets the desired data through an UI. On production, places files under S3 or other secure data storage.

  2. Input analysing

    1. File format checker

      Checks the validation of file format. Start with .csv and expand later on. Diversity of formats mustn't affect the tools work model since they all will end up on a same metada. Which indicates metadata generator is the adaptor for all tasks with user side.

    2. File content checker

      1. File datatypes checking

        Checks possible problems and erros about data. Outputs can be: go or no go for entire process.

      2. Language analysis & defining categorical values

        NLP tools decide whether data is hierarhical or checks the context. #Cont. improvement.

      3. Numeric data stats calculation

        Basic datascience formulas and calculations will be applied on numerical columns to measure and identify the data and patterns of it.

    3. Metadata (Data dictionary) generating

      The most important step of the pipeline. Convert the input data to one single metadata for feeding to ML models.

      1. Metadata Export API (Export API can be used)

        Extra Feature: Export the metadata of the input-data to user.

  3. Plot models selection (Metadata fed to ML models)

    ML models that gets metadata and returns the best plot options (with custom parameters) for each column of the input-data. #Cont. improvement.

  4. Plotter Engine

    Plots the real data with the order and info provided by ML models.

  5. Interactive Visualization Engine & Display

    An UI for user-end side of the tool. Users can edit, view and interact with the data and export the final form.

  6. Export API

    Exports the last version of the visuals with customized templates.

TO-DO's:

List of unordered tasks that needs to be done.

  • Import API
  • Input Analyse Algorithm
    • Input Analyse Algorithm - Research & Design
    • Input Analyse Algorithm - File Format Check (Done at import api)
    • Input Analyse Algorithm - File Datatype Checker
    • Input Analyse Algorithm - Language Analysis Model Build for EN
    • Input Analyse Algorithm - Language Analysis Model Build for TR
    • Input Analyse Algorithm - Language Analysis Model Integration to Pipeline
    • Input Analyse Algorithm - Language Analysis - Process String Input Data
    • Input Analyse Algorithm - Numeric Input Analyse & Calculation
  • Metadata Generator
  • Export API
  • Plot Deciding Model Development
    • Plot Deciding Model Development - Research & Design
    • Plot Deciding Model Development - Data Finding
    • Plot Deciding Model Development - Training
    • Plot Deciding Model Development - Integration
  • Plotter Engine
    • Plotter Engine - Research & Design
    • Plotter Engine - Build
    • Plotter Engine - Integration
  • UI Development
    • UI Development - Design
    • UI Development - Input Page Development
    • UI Development - Output Page Development
  • Export Wrapper Engine
  • Dockerize
    • Dockerize - Convert to Microservice Architecture
    • Dockerize - Dockerfile Configuration for Each Microservice
    • Dockerize - Docker-Compose Configuration
  • Log Module Implementation
  • Git Repo Design & Structure
  • S3 Bucket Configuration
  • Cloud DevOps - Research & Design
  • Proof of Concept

Example Datas

import pandas as pd

example_input_data = {"Id": [1,2,3,4],
                     "Size":["L","XL","S","S"],
                     "Price":[5.99,6.99,3.99,3.99],
                     "Color":["Red","Red","Red","Blue"],
                     "Weight":[200,300,150,150],
                     "Produce_Year":[2019,2018,2019,2017]}
example_metadata = {"Row_Name":["Id","Size","Price","Color","Weight","Produce_Year"],
                   "Data_Type":["int","str","int","str","int","int"],
                   "is_categorical":[0,1,0,0,0,0],
                   "is_time":[0,0,0,0,0,1],
                   "mean":[None,None,5.24,None,200,2018],
                   "variance":[None,None,1.6875,None,3750.0,0.6875]}
ml_output = {"Row_Name":["Id","Size","Price","Color","Weight","Produce_Year"],
            "Plot_Type":[None,"Histogram","Hisrogram","Piechart","Histogram","Time_series_plot"],
            "Custom_Parameters":[None,"p=2","p=3, k=9","colors=['r','b','g','o']",None,"zoom_level=3"]}
df1 = pd.DataFrame(data=example_input_data)
df2 = pd.DataFrame(data=example_metadata)
df3 = pd.DataFrame(data=ml_output)
print("Input Data:")
df1
Input Data:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Id Size Price Color Weight Produce_Year
0 1 L 5.99 Red 200 2019
1 2 XL 6.99 Red 300 2018
2 3 S 3.99 Red 150 2019
3 4 S 3.99 Blue 150 2017
print("Metadata:")
df2
Metadata:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Row_Name Data_Type is_categorical is_time mean variance
0 Id int 0 0 NaN NaN
1 Size str 1 0 NaN NaN
2 Price int 0 0 5.24 1.6875
3 Color str 0 0 NaN NaN
4 Weight int 0 0 200.00 3750.0000
5 Produce_Year int 0 1 2018.00 0.6875
print("ML Output:")
df3
ML Output:
<style scoped> .dataframe tbody tr th:only-of-type { vertical-align: middle; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
</style>
Row_Name Plot_Type Custom_Parameters
0 Id None None
1 Size Histogram p=2
2 Price Hisrogram p=3, k=9
3 Color Piechart colors=['r','b','g','o']
4 Weight Histogram None
5 Produce_Year Time_series_plot zoom_level=3

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.