Giter VIP home page Giter VIP logo

data-wrangling-with-python's Introduction

GitHub issues GitHub forks GitHub stars PRs Welcome

Data Wrangling with Python by Packt

Data is the new Oil and it is ruling the modern way of life through incredibly smart tools and transformative technologies. But oil does not come out in its final form from the rig. It has to be refined through a complex processing network. Similarly, data needs to be curated, massaged and refined to be used in intelligent algorithms and consumer products. This is called wrangling and (according to Forbes) all the good data scientists spend almost 60-80% of their time on this, each day, every project. It involves scraping the raw data from multiple sources (including web and database tables), imputing, formatting, transforming – basically making it ready, to be used flawlessly in the modeling process. This course aims to teach you all the core ideas behind this process and to equip you with the knowledge of the most popular tools and techniques in the domain. As the programming framework, we have chosen Python, the most widely used language for data science. We work through real-life examples, not toy datasets. At the end of this course, you will be confident to handle a myriad array of sources to extract, clean, transform, and format your data for the great machine learning app you are thinking of building. Hop on and be the part of this exciting journey.

What you will learn

  • Able to manipulate complex and simple data structure using Python and it’s built-in functions
  • Use the fundamental and advanced level of Pandas DataFrames and numpy.array. Manipulate them at run time.
  • Extract and format data from various formats (textual) – normal text file, SQL, CSV, Excel, JSON, and XML
  • Perform web scraping using Python libraries such as BeautifulSoup4 and html5lib
  • Perform advanced string search and manipulation using Python and RegEX
  • Handle outliers, apply advanced programming tricks, and perform data imputation using Pandas
  • Basic descriptive statistics and plotting techniques in Python for quick examination of data
  • Practice data wrangling and modeling using the random data generation techniques - Bonus Topic

Hardware requirements

For an optimal student experience, we recommend the following hardware configuration:

  • OS: Windows 7 SP1 64-bit, Windows 8.1 64-bit or Windows 10 64-bit, Ubuntu Linux, or the latest version of macOS
  • Processor: Intel Core i5 or equivalent
  • Memory: 8GB RAM or more
  • Hard disk: 40GB or more
  • Stable Internet connection

Software requirements

You'll also need the following software installed in advance:

  • Browser: Google Chrome/Mozilla Firefox Latest Version
  • Python 3.4+ (preferably Python 3.6) installed
  • Python libraries as needed (Jupyter, Numpy, Pandas, Matplotlib, BeautifulSoup4, and so)
  • Notepad++/Sublime Text (latest version), Atom IDE (latest version) or other similar text editor applications.

The following Python libraries are needed:

  • NumPy
  • Pandas
  • SciPy
  • scikit-learn
  • Matplotlib
  • BeautifulSoup4

data-wrangling-with-python's People

Contributors

tirthajyoti avatar steffimonteiro avatar vishalmewadapackt avatar vishalmewara avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.