Giter VIP home page Giter VIP logo

pythondataies's Introduction

Data Processing in Python (JEM207)

The course site for the Data Processing in Python from IES. See information on SIS. The course is taught by Martin Hronec and Vítek Macháček

Course description

The aim of the course is to provide a hands-on experience with the data-manipulation techniques in Python. The special emphasis is put on standard libraries such as Pandas, Numpy or Matplotlib and also collecting web data with requests and BeatifiulSoup. The students will also be guided through the modern social-coding and open-source technologies such as GitHub, Jupyter and Open Data.

The students will gain their experience using the data from the IES website and subject evaluation protocols.

The course would make use of the DataCamp online sources to provide the students with reliable and yet simple resources for learning Python programming.

Learning outcomes

After passing the course, the students will be able to download the data from APIs or directly from the web, pre-process it, analyze it and visualize it.

Prerequisities

Econometrics II. (JEB110) is an explicit prerequisite for bachelor students.

The course is designed for students that have at least some basic coding experience. It does not need to be very advanced, but they should be aware of concepts such as for loop ,if and else,variable or function.

No knowledge of Python is required for entering the course.

Materials

Git

Pro Git book, Atlassian Git tutorials, Github resources for learning Git

Python

Resources from the official Python webpage

Documentations

Python, Pandas, Numpy, requests, BeautifulSoup and Matplotlib.

Recommended DataCamp Courses

Tools

Introduction to Git for Data Science

General Python

Introduction to Python

Intermediate Python for Data Science

pandas

pandas Foundations

Manipulating DataFrames with pandas

Merging DataFrames with pandas

Cleaning Data in Python

Web Data Formats

Importing Data in Python (Part 1)

Importing Data in Python (Part 2)

Web Scraping with Python

Data Visualizations

Introduction to Data Visualization

Interactive Data Visualization in Bokeh

SQL

Introduction to SQL for Data Science

Introduction to Databases in Python

Others

LearnPython

Learn Python on CodeAcademy

pandas Cookbook

Practical Introduction to Web Scraping in Python

Credits

Passing the course is rewarded with 5 ECTS credits.

Course requirements

The requirement for passing the course are DataCamp assignments (6x5pts) and the final project (70pts).

DataCamp Assignments (30%)

Assignment 1 - Submission on 27/2 (Introduction to Python Course)

  1. Python Lists
  2. Python Basics
  3. Function and Packages

Assignment 2 - Submission on 6/3 (Manipulating DataFrames with pandas)

  1. Numpy
  2. Extracting and Transforming Data
  3. Advanced Indexing

Assignment 3 - Submission on 13/3 (Object-Oriented Programming in Python)

  1. Getting ready for object-oriented programming
  2. Deep dive into classes and objects
  3. Fancy classes, fancy objects

Assignment 4 - Submission on 20/3 (Web Scraping in Python Course)

  1. Introduction to HTML
  2. XPaths and Selectors
  3. CSS Locators, Chaining, and Responses

Assignment 5 - Submission on 27/3 (Importing Data in Python (Part 2) Course)

  1. Importing data from the Internet
  2. Interacting with APIs to import data from the web
  3. Diving deep into the Twitter API

Assignment 6 - Submission on 4/4 (Merging DataFrames with pandas Course)

  1. Concatenating and merging data
  2. Rearranging and reshaping data
  3. Grouping data

Final project (70%)

Description:

  • Students in teams by 2
  • The task is to download any data from API or directly from the web. These data should be processed and visualized in the Jupyter Notebook. The project should be submitted as a GitHub repository
  • The selection of the data is entirely up to the students.

Deadlines:

March 27th: Project Topic First Submission

April 10th: Project Topic Final Submission

May 31st: Project Submission (to be confirmed)

Evaluation Criteria:

  1. The project use correctly downloaded data from the public API or website.
  2. The data were cleaned appropriately
  3. The data are visualized
  4. The project is submitted as a public GitHub repository
  5. All team members collaborated on the GitHub repository (note that the history)
  6. The code is readable, commented and appropriately structured
  7. One ready-to-run method for downloading the data.
  8. Submitted as a jupyter notebook.

Grading scale

  • A: above 90 (not inclusive)
  • B: between 80 (not inclusive) and 90 (inclusive)
  • C: between 70 (not inclusive) and 80 (inclusive)
  • D: between 60 (not inclusive) and 70 (inclusive)
  • E: between 50 (not inclusive) and 60 (inclusive)
  • F: below 50 (inclusive)

Our materials

Jupyter and GitHub intro here

The Jupyter notebook with IES web parser

Course syllabus

Date Topic who Project HW
20-21/2 Intro + GitHub, Jupyter, DataCamp Martin
27-28/2 Strings, Floats, Lists, Dictionaries, Functions Vítek HW 1
6-7/3 Pandas, Matplotlib, Numpy Martin HW 2
13-14/3 Object-Oriented Programming Martin HW 3
20-21/3 HTML, XML, JSON, requests, APIs, BeautifulSoup Vítek HW 4
27-28/3 IES Web Scraper Vítek Project Topic Proposal
3-4/4 Introduction to SQL Vítek HW 5
10-11/4 Advanced Pandas Martin Project Topic Approval HW 6
17-18/4 Project Work 1 Vítek
24-25/4 Efficient Computing Martin
1-2/5 Project Work 2
8-9/5 Guest Lecture Guest

pythondataies's People

Contributors

vitekzkytek avatar martinhronec avatar hronec avatar kubistmi avatar matejkourilek avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.