jbryer / data606fall2017 Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 4.0 110.46 MB

DATA 606 - Statistics and Probability for Data Analytics

R 0.73% HTML 62.43% CSS 17.56% JavaScript 19.29%

data606fall2017's People

Contributors

Stargazers

Watchers

Forkers

dstern04 lilesb ahussan anhnguyendepocen

data606fall2017's Issues

Week 2 assignment?

Hi,

Is it possible to post few weeks of assignments ahead? I have a lot of ups and downs in my job so I would like to take a stab at it when I have time.

Reading the csv file from Github repository

I am not able to upload data in R from GitHub, none of the following instructions work it gives different error messages

could you please guide how to upload the data?

smoking-UK <- read.csv(url("https://github.com/jbryer/DATA606Spring2017/tree/master/Data/Data from openintro.org/Ch 1 Exercise/smoking.csv"), header = FALSE)

x <- getURL("https://github.com/jbryer/DATA606Spring2017/tree/master/Data/Data from openintro.org/Ch 1 Exercise/smoking.csv")
y <- read.csv(text = x)

urlfile<-'https://github.com/jbryer/DATA606Spring2017/tree/master/Data/Data from openintro.org/Ch 1 Exercise/smoking.csv'
dsin<-read.csv(urlfile)

Project Proposal

Hi Professor,

When is the project proposal due? I didn't see it listed here - http://data606.net/course-overview/schedule/

Thanks,
Silverio

X11 not available - won't generate PDF or HTML in mac

Hi, the Rmarkdown works, but it won't let me generate the PDf or HTML, machine says X11 is not present, i have just installed last version of XQuartz and I have Mac OS Sierra (10.12.5), not sure why X11 is not picking up.

Hypothesis Testing Using Confidence Interval

In the textbook Example 4.19, the author calculates confidence interval using sample mean +- margin of error, i.e., the interval is calculated as (2.78-1.96x0.256, 2.78+1.96x0.256). Then he concludes that the null value 3.09 is within the interval therefore it cannot be rejected.

But later on in the one-tailed examples, the confidence interval seems to be constructed with the null value being in the center, see Fig 4.13, 4.15, 4.16, 4.18. The author is saying that if null is true, it is unusual to see observed value outside the interval - null is rejected if observed sample mean is outside the interval (null + margin, null - margin).

If we use the same logic for Example 4.19, can we construct an interval (3.09-1.96x0.256, 3.09+1.96x0.256)? Then check if the sample mean 2.79 is within the interval?

In this example, it seems both ways work - they both are within those intervals. But is there a difference? Can both works in all cases?

HW1 Q1.70

Hello Professor,

Could you please elaborate on the simulation part of Q1.70?
My understanding is that running it 100 times helps to determine if the result variation is by chance. But how do we run such simulation in R? Which variables do we randomize?

I submitted my HW with a couple of unanswered questions and would love to fully understand it and correct my assignment.

Any feedback would be helpful.

Due dates for homeworks

Hello everyone, my name is Mezue. Glad to be part of the class. I am having problems figuring out the due date of the homeworks

Thanks

Due date for assignments

I am a bit confused about the due dates of assignments. As per the schedule in data606.net today (09/17) we should be submitting Lab2 and Homework 2 (which I submitted last week). I just submitted Lab3 today and working on homework 3 - but it seems the due date is on October 01 (10/01). Am I missing anything here? - Mehdi

Observation Independent or Not Based on Sample Size

In Chapter 4 - Inference, I frequently encountered this statement:

"Because this is a simple random sample from less than 10% of the population, the observation are independent."

On Pg 173 of the text, it states, "A reliable method to ensure sample observations are independent is to conduct a simple random sample consisting of less than 10% of the population."

I wonder why this is so.

Does that mean that if I sample more than 10% of the population, the observation becomes non-independent?

Course Exam

Hey Dr Bryer,

I was wondering what format the final exam is in and how we would take on in an asynchronous distributed learning environment...this is my first online adventure in a traditional academic environment and I wanted to be sure I am properly prepared for the final exam. Thanks for your help!

Error with CDC

source("https://github.com/jbryer/DATA606/blob/master/inst/labs/Lab1/more/cdc.R")

I'm getting this OUTPUT:
Error in source("https://github.com/jbryer/DATA606/blob/master/inst/labs/Lab1/more/cdc.R") :
https://github.com/jbryer/DATA606/blob/master/inst/labs/Lab1/more/cdc.R:7:1: unexpected '<'
6:
7: <
^

FYI @jbryer

Lab0 will not start in Rstudio

Hello;

I tried starting Lab0 the R commands I used are below:

getLabs()
[1] "Lab0" "Lab1" "Lab2" "Lab3" "Lab4a" "Lab4b"
[7] "Lab5" "Lab6" "Lab7" "Lab8"
startLab('Lab0')
Error in editor(file = file, title = title) :
argument "name" is missing, with no default
startLab('Lab0')
Error in startLab("Lab0") :
The lab did not copy! Not sure why, ask your instructor.
library('DATA606')
startLab('Lab0')
Error in startLab("Lab0") :
The lab did not copy! Not sure why, ask your instructor.
vignette(package='DATA606')
no vignettes found
vignette('os3')
Warning message:
vignette ‘os3’ not found

I went through and tried loading all the libraries one by one:

library("DATA606", lib.loc="/R/win-library/3.4")
library("openintro", lib.loc="/R/win-library/3.4")
Please visit openintro.org for free statistics
materials

Attaching package: ‘openintro’

The following object is masked from ‘package:datasets’:

cars

Warning message:
package ‘openintro’ was built under R version 3.4.1

library("OIdata", lib.loc="/R/win-library/3.4")
Loading required package: RCurl
Loading required package: bitops
Loading required package: maps
Warning messages:
1: package ‘OIdata’ was built under R version 3.4.1
2: package ‘maps’ was built under R version 3.4.1
library("devtools", lib.loc="/R/win-library/3.4")
Warning message:
package ‘devtools’ was built under R version 3.4.1
library("ggplot2", lib.loc="/R/win-library/3.4")
Warning message:
package ‘ggplot2’ was built under R version 3.4.1
library("psych", lib.loc="/R/win-library/3.4")

Attaching package: ‘psych’

The following objects are masked from ‘package:ggplot2’:

%+%, alpha

Warning message:
package ‘psych’ was built under R version 3.4.1

library("reshape2", lib.loc="~/R/win-library/3.4")

Attaching package: ‘reshape2’

The following object is masked from ‘package:openintro’:

tips

Warning message:
package ‘reshape2’ was built under R version 3.4.1

library("knitr", lib.loc="/R/win-library/3.4")
Warning message:
package ‘knitr’ was built under R version 3.4.1
library("markdown", lib.loc="/R/win-library/3.4")
Warning message:
package ‘markdown’ was built under R version 3.4.1
library("shiny", lib.loc="~/R/win-library/3.4")
Warning message:
package ‘shiny’ was built under R version 3.4.1
vignette('os3')
Warning message:
vignette ‘os3’ not found
startLab('Lab0')
Error in startLab("Lab0") :
The lab did not copy! Not sure why, ask your instructor.

What did I do wrong?

Thanks,

Nathan

Rmarkdown did not save or knit correctly

I was answering the questions in Lab0 and the Rmarkdown would not update as I knitted. Frustrated, I saved the script as an .Rmd file and moved on. When I tried to reopen the .Rmd file again in Rstudio, the page is blank except for an output and a plot, which are in reverse order. It appears I lost everything.

I created a test file Rmd and the first knit to PDF works. But its the only knit that works. And if I close that .Rmd file and try to reopen it again, it too is a blank page.

I am running win10, Rstudio v1.0.136, and R v3.4.1.

Assignment # 1

I just wanted to quickly ask if we have any assignments due.
Thank You

The R files in Lab0 do not contain any code/data

Hi,

When I try to open 'arbuhnot' and 'present' in the Lab0/more folder, it pops open a new tab in Rstudio but is a blank page. Is this expected?

update: I did source("more/arbuthnot.R") from the working directory and obtained the data set...

Thanks,
Mike

General Question

Good evening Jason,

I might missed a piece of information when you were mentioning on the meetup, the new schedule for the following weeks. I just wanted to confirm the due date for this week homework(chap 3) and Lab3 would be Sept 20, correct ?

Thank you
Durley Torres

Lab 5 error

I get an error when I use the following code that was embedded in lab 5 in question 4:

inference(y = nc$weight, x = nc$habit, est = "mean", type = "ht", null = 0,
alternative = "twosided", method = "theoretical")

The error I get is:
Error in FUN(dd[x, ], ...) : could not find function "FUN"

Before the error, I do get the following result-
Summary statistics:
n_nonsmoker = 873, mean_nonsmoker = 7.1443,
I thought I would get statistics for smokers also.
Please let me know how to fix this.
Thanks,
Sarah

Lab 3

Good afternoon,

I recently uploaded the lab for week 3, can you please look at it to see if the format is acceptable. I come from a background with SAS so R and R Studio is very new to me.

Thanks in advance

Lab 3

In "try your own section", it says "Note that normal probability plots C and D have a slight stepwise pattern."

To me, it looks like plot C doesn't look skewed nor do I see stepwise pattern. How do I detect stepwise pattern?

What happen when I coerce a categorical variable into numeric when using lm()

Imagine a data set with response variable "salary", and explanatory variable "education", which is a categorical variable with three levels ("HS", "college", "graduate").

If I run lm( salary ~ education, data ), R will result in something like this:
salary = intercept + b1educationcollege + b2educationgraduate

Here, R created two dummies variables "educationcollege" and "educationgraduate" to contrast with "HS".

If I coerce the "education" into numeric, doing something like lm(salary ~ as.numeric(education), data). R will produce:
salary = intercept + b1*education

Here, "education" takes on the value of 1, 2, or 3, each represent an education level.

I'm wondering if doing this affect the result? What will be the difference? Is this approach allowable?

can't access abruthnot data

Hi,
I opened lab 0 but when I type

{r load-abrbuthnot-data, eval=TRUE}
source("more/arbuthnot.R")

at the prompt (or any variation of that), I get the following error

Error in file(filename, "r", encoding = encoding) :
cannot open the connection
In addition: Warning message:
In file(filename, "r", encoding = encoding) :
cannot open file 'more/arbuthnot.R': No such file or directory

Please let me know how to address this.
Thanks,
Sarah

Rmarkdowns not displaying Dataframe rows

Hi,

I am having an issue, with the Rmarkdowns for the labs, when I run the statement to display head(present) dataframe. I don't see it on the same page(markdown sheet). But if I run it directly at the console prompts I see the data.

See screen shot attached

t-scores and z-scores

I am confused about when to calculate a t-score and when to calculate a z-score. In question 5.19, I thought I should use a t-score, but the answer in the back of the book calculated a z-score. Is that because there are 200 samples, and that is much more than 30? Please let me know how I could better figure out when to use each type of table.
Thanks,
Sarah