Giter VIP home page Giter VIP logo

dataprepkit's Introduction

DataPrepKit

Project Description: In the DataPrepKit capstone project, students will embark on developing a Python package named "DataPrepKit." This package aims to be a comprehensive toolkit for preprocessing datasets. Utilizing their knowledge in NumPy and Pandas, students will create a series of functions that assist in reading data from a variety of file formats, summarizing datasets, managing missing values, and encoding categorical data. The ultimate goal of this project is to publish the DataPrepKit package on PyPI, thereby making it available to the wider Python community.

Key Features

  1. Data Reading:

    • Objective: Implement functions that can read data from different file formats such as CSV, Excel, and JSON.
    • Tools: Use Pandas for efficient data importing.
  2. Data Summary:

    • Objective: Develop functions to print key statistical summaries of the data, including metrics like the average and most frequent values.
    • Tools: Utilize NumPy and Pandas to generate these summaries.
  3. Handling Missing Values:

    • Objective: Create functions for addressing missing values, offering solutions to either remove or impute them based on set strategies.
    • Tools: Employ methods that ensure data integrity.
  4. Categorical Data Encoding:

    • Objective: Design functions for encoding categorical data, allowing their conversion into numerical formats for analysis.
    • Tools: Implement encoding techniques effectively.
  5. Package Deployment:

    • Objective: Successfully publish the DataPrepKit package on PyPI to make it readily accessible for downloading and utilization.
    • Tools: Adhere to PyPI guidelines for package deployment.

Project Requirements:

  • Proficient use of NumPy and Pandas for data analysis and manipulation.
  • Robust function implementation for data reading, summary generation, missing value handling, and categorical data encoding.
  • Successful registration and deployment of the package on PyPI.

Evaluation Criteria:

  • Functionality and correctness of the data preprocessing features implemented.
  • Quality and completeness of the documentation provided.
  • Effectiveness of the test suite in ensuring the package's reliability.
  • Successful deployment of the package on PyPI.
  • Adherence to best practices in coding, packaging, and testing.
  • Creativity and efficiency in managing different file formats and data preprocessing challenges.

Datasets Source

downlaod it from kaggle

  1. Project https://www.kaggle.com/datasets/divu2001/coffee-shop-sales-analysis
  2. netflix_titles https://www.kaggle.com/datasets/arnavvvvv/netflix-movies-and-tv-shows

downlaod it using Kaggle API

  1. Project kaggle datasets download divu2001/coffee-shop-sales-analysis
  2. netflix_titles kaggle datasets download arnavvvvv/netflix-movies-and-tv-shows

dataprepkit's People

Contributors

abdullahhg avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.