Giter VIP home page Giter VIP logo

hw2's Introduction

Homework 2

  • Assigned: 2/18
  • Due: 3/1 at 8:40am
  • worth 3.75% of your grade

We want to emphasize that we have picked this dataset for educational purposes and are not encouraging you to drink.

Solutions

1. Analyze some data

(2 points each, 10 points total)

You will be using the same subset of the Iowa Liquor Dataset from hw0 and writing some Python functions to understand transaction and inventory information.

You will write your code in hw2.py found in this same directory. This file contains predefined methods such as def q1() with dummy method bodies that you will fill in to answer the corresponding questions below. We have also included a load_data() function so you can pass in the file path to the dataset and it will read the file and return a python list. Try running it and printing the first couple of list elements to see what the method returns.

Run the program with:

    python hw2.py (path to iowa-liquor-sample.csv)

Questions To Answer

NOTE: In all cases, we expect string comparisons to be case sensitive and match exactly. That means the SQL strings 'hello world', 'Hello World', and 'hello world!' are all different.

  • Q1: How many distinct types of items (by description attribute) are in this dataset?

    • q1() should return a single number.
  • Q2: How many distinct vendors are in this dataset?

    • q2() should return a single number.
  • Q3: Which store had the most sales (in terms of total bottle_qty)?

    • q3() should return a single number of the store's id.
  • Q4: At the store with the most sales (answer to Q3), what was the most sold item by description (the item that had the highest total bottle_qty)?

    • q4() should return a single string that corresponds to the description attribute's value.
  • Q5: Find the zipcode, with the greatest total bottle_qty for category_name TEQUILA.

    • q5() should return a single number that corresponds to the zipcode attribute’s value.

2. Relational Algebra

(2 points each, 4 points total)

    liquors(<u>lid</u>, name, price, manufacturer)
    sales(<u>month</u>, <u>seller</u>, <u>liquors</u>, county, quantity)

Given the simplified schema above (primary keys are surrounded by <u> </u> tags), construct relational algebra for the following queries. Note: sales(liquors) references liquors(lid), and month is a text field (due to ambiguity in a previous version, we will accept any unambiguous type).

  • Q1: Find the names of liquors that had at least one sale in both "Polk" and "Linn" counties for the month of December.

  • Q2: Find the names of manufacturers that sold at least two different liquors during any January in "Polk" county.

3. More Relational Algebra

(1 point each, 6 points total)

Given the following tables

T1

A B C
1 x a
2 y b
3 z b

T2

A B D
1 x c
3 y a
3 x a

Express the results for the relational algebra expressions:

  1. πB,D(T2)
  2. T2 × πA(T2)
  3. T1 ⨝T1.C=T2.D T2
  4. T1 − (T1 − T2)
  5. (removed please ignore)
  6. T1 ⨝ (σD=x(T2))

Submission

Submit through courseworks

  1. Submit hw2.py for part 1
  2. Submit a text file/pdf for part 2 and part 3

Include your name and UNI at the top of each page.

Hints

  • What does the data argument to the question functions contain? In q1(), try adding print data[0]; print data[1]
  • Test your code on a small sample of data, so you can verify that it works before running it on the entire file.
  • Importing the csv file into Google Sheets or Excel might help visually explore the data.
  • Helper functions can make your code easier to understand. E.g to help refer to the columns by name instead of by position.
  • Python dictionaries are very very useful.

hw2's People

Contributors

evanj avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.