jmbejara / comp-econ-sp19 Goto Github PK

Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)

Jupyter Notebook 99.71% Python 0.08% Stata 0.01% R 0.20%

comp-econ-sp19's Issues

hw 4, q12

I do this to calculate ave_wage in part 11

However, when I do concat in part 12 my results are slightly different than yours. Do you see a reason for the difference?

I believe columns in the table should read contemporaneous correlation with employment growth, and lagged correlation with employment since you are running corrwith command on employment.pct_change().

Installed scikit-misc but cannot be found

Whenever I try to use 'method = loess' within geom_smooth() I receive the error "For loess smoothing, install 'scikit-misc'", but I already used the terminal to install 'scikit-misc' via 'pip install scikit-misc'. Is there something I'm missing such that 'loess' doesn't run?

Hw 1 Q5 plotting several images using for loop

I am trying to plot the requested images using a for loop and I can print everything except the last one. My code is:

values = [500,300,100,50,10]
for i in values:
plt.imshow(compress(face,i), cmap=plt.cm.gray)
plt.figure(i)
plt.plot()

why can't I print the last image?

MLE Q6 negative std

I got a negative std estimation(-0.03892627616515054), but not sure what goes wrong with my code:

Question 7

For question 7 (analytical), I get this graph... This may be because I put the initial weight to w0. How should I resolve this or do something about this, or is this correct?

HW4 Q6 Real_Wages

When I'm comparing my df.describe with the one provided in the question, all other columns are correct except the "real_wage" column. I think there could be an issue with the way I set up the real wage column, but I'm not really sure what the problem is.

Early final for graduating seniors

It might be a bit early, but do we know the date of the early final yet?

HW 0 is due on April 8th, before 11:59 pm.

I made a typo in the due date for HW 0. It has now been fixed. HW 0 is due on April 8th, before 11:59 pm.

Note that you will submit the HW assignment to your personal repository on GitHub, as we discussed in class. However, you first need to create this repo. I've posted instructions on how to do this here. You can go ahead and try it out now. We'll address any questions you have with this in my office hours tomorrow or in class on Thursday.

Numpy lecture

From the numpy lecture, there seems to be some code we did not go over (the last 3 chunks of the numpy.ipynb lecture). Are we expected to know this for the midterm?

class discreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""

def __init__(self, q):
    """
    The argument q is a NumPy array, or array like, nonnegative and sums
    to 1
    """
    self.q = q
    self.Q = cumsum(q)

def draw(self, k=1):
    """
    Returns k draws from q. For each such draw, the value i is returned
    with probability q[i].
    """
    return self.Q.searchsorted(uniform(0, 1, size=k))

"""
Modifies ecdf.py from QuantEcon to add in a plot method

"""

class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.

Parameters
----------
observations : array_like
    An array of observations

Attributes
----------
observations : array_like
    An array of observations

"""

def __init__(self, observations):
    self.observations = np.asarray(observations)

def __call__(self, x):
    """
    Evaluates the ecdf at x

    Parameters
    ----------
    x : scalar(float)
        The x at which the ecdf is evaluated

    Returns
    -------
    scalar(float)
        Fraction of the sample less than x

    """
    return np.mean(self.observations <= x)

def plot(self, a=None, b=None):
    """
    Plot the ecdf on the interval [a, b].

    Parameters
    ----------
    a : scalar(float), optional(default=None)
        Lower end point of the plot interval
    b : scalar(float), optional(default=None)
        Upper end point of the plot interval

    """

    # === choose reasonable interval if [a, b] not specified === #
    if a is None:
        a = self.observations.min() - self.observations.std()
    if b is None:
        b = self.observations.max() + self.observations.std()

    # === generate plot === #
    x_vals = np.linspace(a, b, num=100)
    f = np.vectorize(self.__call__)
    plt.plot(x_vals, f(x_vals))
    plt.show()

Index Error for Fixed Effects Regression

I'm following the code used in lecture for a fixed effects regression using statsmodels to generate the fixed effects regression for the SeatBelts dataset. My line of code is:

fixedeffectsfit = smf.mixedlm('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age', seatbelts, groups=seatbelts['state']).fit()

However, I'm getting the following error after running this code:

IndexError Traceback (most recent call last)
in ()
----> 1 fixedeffectsfit = smf.mixedlm('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + C(state)', seatbelts, groups=seatbelts['state']).fit()

~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in from_formula(cls, formula, data, re_formula, vc_formula, subset, use_sparse, *args, **kwargs)
918 exog_re=exog_re,
919 exog_vc=exog_vc,
--> 920 *args, **kwargs)
921
922 # expand re names to account for pairs of RE

~/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py in from_formula(cls, formula, data, subset, drop_cols, *args, **kwargs)
172 'formula': formula, # attach formula for unpckling
173 'design_info': design_info})
--> 174 mod = cls(endog, exog, *args, **kwargs)
175 mod.formula = formula
176

~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in init(self, endog, exog, groups, exog_re, exog_vc, use_sqrt, missing, **kwargs)
687
688 # Split the data by groups
--> 689 self.endog_li = self.group_list(self.endog)
690 self.exog_li = self.group_list(self.exog)
691 self.exog_re_li = self.group_list(self.exog_re)

~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in group_list(self, array)
976 if array.ndim == 1:
977 return [np.array(array[self.row_indices[k]])
--> 978 for k in self.group_labels]
979 else:
980 return [np.array(array[self.row_indices[k], :])

~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in (.0)
976 if array.ndim == 1:
977 return [np.array(array[self.row_indices[k]])
--> 978 for k in self.group_labels]
979 else:
980 return [np.array(array[self.row_indices[k], :])

IndexError: index 556 is out of bounds for axis 1 with size 556

I'm not really sure what this error is referring to since there's nothing with index 556 in the dataset or regression specifications.

When I instead use (found on this site:

fitFE1 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + C(state)', seatbelts).fit()

I get a working regression. Can I get some help as to what I should be looking to fix for the first line of code and if there is something wrong with the second method to generate fixed effects?

HW 3 Productivity: Plotting a df with multi-index

Hi! I'm trying to do Q3-Q5 on HW3's productivity section. I'm right now doing something that I feel is inefficient, i.e.:

df['y'].loc[0].plot()
df['y'].loc[1].plot()
df['y'].loc[2].plot()

but my issue is that this seems really inefficient. Is there some better way of doing this? I tried to read through the help for the df.plot() function here but it didn't seem to help me too much.

Thanks!

Data selection HW#4

My df.info() is largely the same as yours, but the file size is twice as large and WKSWORK1, UHRSWORKLY, INCWAGE are floats instead of ints. My guess is that I selected too much data from IPUMS - is something wrong with my selection?

Colab printing to PDF not displaying some outputs

when I print my notebook to pdf (everything has ran and looks right), some of the codeblock output does not show up or is cut off by the next block. Ideas why this may be and/or should I not worry about this since the ipynb is also being submitted? Thanks!

HW 2 Monte Carlo - Var-Cov Matrix

How do you construct the variance-covariance matrix in the function that generates data?

Hw 4 #24, 11

My code for #24 does not seem to be working:

shift = -1
inner_means = (df
               .groupby(by=['YEAR', 'age_binned', 'educ_binned'])
               .apply(lambda x: np.average(x['real_wage'], weights=x.ASECWT))
              )

pd.DataFrame(inner_means)

#Create Bin Weight Sums
weights_2000 = (df[df.YEAR == 2000]
                .dropna()
                .groupby(by=['YEAR','age_binned', 'educ_binned'])
                .ASECWT
                .sum())

adj_series = (inner_means
              .groupby(level='YEAR')
              .apply(lambda x: np.average(x, weights=weights_2000)))
# Lag, since the we use "last years weeks worked", etc.
adj_series = adj_series.tshift(shift)
tdf['adj_ave_wages'] = adj_series

I suspect it has to do with the bolded part.

.groupby(by=['YEAR','age_binned', 'educ_binned'])

In that line, I've also tried including only 'age_binned' and 'educ_binned', and when doing that, or doing the code shown above, I get an error: Axis must be specified when shapes of a and weights differ. This points to the .apply(lambda x: np.average... line.

Additionally, what exactly are we supposed to compute for the employment variable in #11? I used in_labor_force and used np.average for this, but not sure if this is what you wanted.

HW 3-Productivity&Monte Carlo-Q6

We are given to use M=100, but you haven't mentioned any specifications on N and T to use with gen_all to create different data. Should we go ahead and continue to use gen_all(N=3,T=50) as in previous examples or should we loop over gen_all(N=1,T=1) 100 times?

Midterm solutions type error

I was working through last year's midterm and I ran into bug where I got a type error whenever I tried to use pivot tables. When I ran the given solutions, I got the same error:

TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'

This was from question 6 and 8 from last year's midterm. I am running this on jupyter notebook

Hw 6 Replication Question

Hello, I'm slightly confused about the definitions of variables in the data. Can we find Z and W in the data, and what variables represent them? Also in the definition of F_N in the writeup, what exactly does W_L represent? I'm confused what the L subscript means. Thanks!

error with pdf in Q11

def get_e(params, df):
    b0, b1, b2, b3, sigma = params
    e = df['sick'] - (b0 + b1*df['age'] + b2*df['children'] + b3*df['avgtemp_winter'])
    # e is an array
    return e

def pdf_residuals(params, e):
    b0, b1, b2, b3, sigma = params
    likelihoods = (1 / (2 * np.pi * sigma)) * np.exp(-e**2 / (2 * sigma**2))
    return likelihoods

def neg_log_likes(params, df):
    e = get_e(params, df)
    likes = pdf_residuals(params, e)
    log_likes = np.log(likes)
    neg_sum = -(log_likes.sum())
    return neg_sum


params_init = np.array([1,1,1,1,1])
results = minimize(neg_log_likes, params_init, args=(df))
results.x

I am running the code above for q11 and I am getting stuck on an error where the residuals look correct, but when they are put through the pdf, then become all 0 (which throws an error). Yet if I run:

I get real values.

Any ideas for why this may be?

HW1 Q7 [𝜇 𝟙] clarification

Just want to make sure that in Q7, after the formulas for the analytical solutions, you said that "Note that [𝜇 𝟙] is an 𝑁×2 matrix." But I think the number of rows for [𝜇 𝟙] is k rather than N. Which is the correct row number?

Variable Length of Theta depending on N

In the first question when we are supposed to calculate log-likelihood, you write that the function we write should use fixed lengths of n=6 for both alpha and gamma but aren't the lengths of these vectors dependent on the number of firms? Is it implied that we treat all observations with 5 or more firms the same?

Separate Notebooks for HW2?

Since there are 3 different notebooks we reference for HW2, do we need to submit a different notebook corresponding to each one? Or is it alright if we submit a singular notebook that combines the 3?

HW 2 Solutions Released, HW 3 Available

The solutions to HW 2 are available here. https://github.com/econ-21410/hw-jbejarano1/tree/master/hw-submit-02
Please review the solutions. This is material that may appear on the midterm.

Also, HW 3 is available now. You can find it here: https://github.com/jmbejara/comp-econ-sp19/tree/master/HW/hw-03 It is due on the coming Monday. The material in HW 3 (panel data, fixed effects, etc) will NOT be on the midterm.

HW3: accounting for missing data

I was just wondering how we should approach using a dataset to run a regression if there are missing values within it: should we simply use .dropna() to drop the NaN's in the dataset, or should we instead replace NaN with 0 while also using a dummy variable to indicate that the data value is missing (or is there another method we should follow)?

Monte Carlo Demo/Monte Carlo Hw Q1

I still do not understand the generating process for this chunk from monte_carlo_demo:

def simulate_data_ex1(N=50, seed=65594):
    if seed == None:
        pass
    else:
        np.random.seed(seed)
    cov = np.array([[2,1],
                    [1,2]])
    mean = np.array([0, 0])
    X = scipy.stats.multivariate_normal.rvs(mean=mean, cov=cov, size=N)
    epsilon = np.random.randn(N,1)
    beta0, beta1, beta2 = 0, 1, 1
    df = pd.DataFrame(X, columns=['x1', 'x2'])
    df['epsilon'] = epsilon
    df['y'] = beta0 + beta1 * df.x1 + beta2 * df.x2 + df.epsilon
    df = df[['y', 'x1', 'x2', 'epsilon']]
    return df

Why do we need a covariance matrix and a mean array? Why are they set to these particular numbers? In the homework, we are asked to generate data in a way similar to this, and I am not sure what to do other than to remove the x2 and beta2 terms from the code above. How are we supposed to account for the 𝑥1=𝛾0+𝛾1𝑧1+𝑢 equation in the code?

Python Crash Course on Saturday, 12-2 pm in SHFE 103

Hi all. Our crash course introduction to the Python programming language will be on Saturday, 12-2 pm in SHFE 103. This will be a hands-on session. Please bring a laptop. We will teach you the basics of Python and how to use Jupyter notebooks.

Problem with HW 5 Visualization, Q14

I'm trying to understand plotnine's way of grouping and coloring points. I'm able to replicate Q14's intended diagram up to the legend, where whenever I try to add color = 'class' I get the error

ValueError: b'There are other near singularities as well. 0.65044\n'

Any ideas? My exact code is:

(plotnine.ggplot(data = mpg, mapping = plotnine.aes(x = 'displ', y = 'hwy', color = 'class'))
+ plotnine.geoms.geom_point()
+ plotnine.geom_smooth(method = 'loess'))

Hw 2 Chipotle #15

Is there a reason that chipo ['item_price'].sum() returns the full list of item prices instead of summing them? Instead, I have used sum(map(float,chipo['item_price'])), which causes problems since now chipo['item_price'] is of type map.

Previously, this worked: chipo['quantity'].sum()

I think the issue is here (code for #12) . This code worked but I think it may have changed the type for item_price.

chipo['item_price'] = chipo['item_price'].apply(lambda x: x.replace('$','')) #deleting the $ sign
chipo.item_price.apply(lambda x: float(x)) #turning into float

Due to this, I am unable to solve #16 since I have an int and a map.

Thanks!

Probability limit

Hello,

I'm a bit confused about the use of the term "probability limit" in question Q3 of the Monte Carlo IV assignment: the question asks us to derive the expression for the bias of the estimate of 𝛽1-hat in the probability limit, but I was under the impression that the probability limit of 𝛽1-hat is more of a concern when considering the consistency of an OLS estimator. Is this not the case?

Downloading a CSV from IPUMS

Hello,

I downloaded the dataset from IPUMS but I seem to only be able to download the file as a .dat. How do I download it as a csv instead? Thanks!

Monte Carlo Data Generating

Hello, I'm having lots of issues with the Monte Carlo data generating process!

Are we supposed to set up two separate variance covariance matrices for the conditions, or a single one? Also I'm currently generating u, z and epsilon separately as normally distributed variables, but I'm a bit confused about how to make jointly multivariate normal. Finally, how do we ensure that all pairwise combinations not specified have covariances of 0?

HW 2, chipotle (# 19)

The question statement is:

Select and return the rows corresponding to the Barbacoa Bowl. Give two reasons why the item price might vary.

I gave the answer

One reason for the price variation is that different bowls have different choices, and hence, have different final costs depending on the alterations.

Another reason is that there could be multiple Barbacoa Bowls ordered at once (in the same order), so item price would be larger.

However, I wanted to make sure that this is a legitimate claim (specifically the multiple Barbacoa Bowls part). Using

chipo[chipo['item_name'] == 'Barbacoa Bowl']['quantity'].unique()

I get output

array([1])

Which implies that all quantities are 1. So is my answer still legit? It doesn't seem supported by our data set but I can't think of any other possible reason for this, other than heterogeneity of stores (there are 2 stores in our sample, but there is no data on the stores' individually pricing itself).

Thanks!

HW2 Creating Reading Writing Q7

I'm having trouble working with the question 7 in creating reading writing section of HW2. After I downloaded the data, I can't release the file into csv, and below is the code I got.

Slightly different counts

I am dropping too many observations in my dataframe during the Q3 recording portion, which I think is throwing my values slightly off for the rest of the HW. Bellow is how I recoded data--is it different than how you did it?

HW 6 (SQL Part) - Feedback system marks results 'Incorrect' falsely

The feedback system used in the linked Kaggle notebooks will sometimes mark the answers incorrect, even when it's not the case. Moreover, using the proposed solutions will often yield 'Incorrect' result as well.

Also, I believe that the dataset referenced in the 'As & With' notebook has changed since the notebook was posted, so the Q6 and the proposed solution to it aren't relevant anymore.

HW3: Two Fixed Effects

I've found documentation in the 4/22 notes which showed how to add fixed effects in a regression. However, there seems to be little documentation (in or out of class notes) on how to account for two fixed effects at once, like problem 3 requires. Could someone point me towards resources/the correct way to group the data?

Submission channel

When I tried to submit hw1, I did:

Clicked the "Push" button in GitKraken
Upload the ipynb file to econ-21410/hw-sicelyli repository
I'm wondering that are these two steps both required for hw submission? Or is step 2 good enough? What's the result when I pushed my completed hw using GitKraken? Thank you!

Monte Carlo graphing error

Hey, I have been trying to debug this for a bit and am confused as to why it is throwing an error. Unless I am missing something, this is the same code as with the lecture notebooks for running/displaying results at different sizes of N. Yet when I try to display it with the seaborn library it gives:

ValueError: color kwarg must have one color per data set. 1000 data sets and 1 colors were provided

The type of results.x1 is a series, which seems right. Any ideas?

Hw 5, Question 10 and 11

For #10, I am getting a compile error, " invalid callable given" pointing to the integrate.quad lines. My code is:

f = pdf(x, mu_MLE, sig_MLE)
integrate.quad(f, 100000, np.inf)
integrate.quad(f, -np.inf, 75000)

where x= np.linspace(.001, 150_000, 100) (Question 3)
mu_MLE and sig_MLE are generated from optimize.minimize
pdf is given by:

def pdf(xvals, mu, sigma):
pdf_vals= (1/(xvals * sigma * np.sqrt(2 * np.pi))
* np.exp( - (np.log(xvals) - mu)2 / (2 * sigma2)))
return pdf_vals

for #11, I am also getting a compile error, "too many values to unpack (expected 2)" pointing to the results = opt.minimize function:

#initial guesses
beta_0 = 12
beta_age = 0.8 # sig_2
beta_children = 1
beta_temp = 2
sigma = 0.5
params_init2 = np.array([beta_0, beta_age, beta_children, beta_temp, sigma])
mle_args2 = df2[df2.columns[0]] #gets the sick column
results = opt.minimize(crit, params_init2, args=(mle_args2))
beta_0_MLE, beta_age_MLE, beta_children_MLE, beta_temp_MLE, sigma_MLE = results.x

Thanks for your help!

HW 3 Productivity-Monte-Carlo

I keep getting an error when using the fsolve function for Q1.

My code:

from scipy.optimize import fsolve

funct = lambda l_it: (1.0/0.3) * (gamma_i + rho * L_omega_it + 0.3 * k_it + 0.5 * sigma2epsilon - w_it + np.log(0.7)) - l_it

fsolve(funct, 4.0)

Error points to fsolve: TypeError: can't multiply sequence by non-int of type 'float'

I am not sure what I am doing wrong here? I checked and all the variables in funct are recognized as floats.

error reading df hw6

When I run this code from the HW:

url = 'https://raw.githubusercontent.com/jmbejara/comp-econ-sp18/master/lectures/5-08_Structural_IO_with_MLE/'
filename = 'BresnahanAndReiss1991_DATA.csv'
df = pd.read_csv(url + filename)

I get a HTTPError: HTTP Error 404: Not Found error. What should the url be instead?

HW 3 Indexing on Multiindex Dataframe

For Q8, I can't figure out how to group the dataframe by firm since 'firm' is an index name, not a column in the dataframe. I added a new column 'firm' to the dataframe that matches the firm for each data point. This was enough to run a fixed effects model, but I'm wondering if there is a much better way to approach this?

Can we set up our git, etc on VM?

For students coming from the CS 120s, can we simply clone the repository we create into our virtual machine? I am used to dealing with git from the command line in a linux environment so that may be easier for me plus I already have all the packages/ipython3 installed there.

Thanks!

General Question about GitKraken/Git

For class exercises (such as the portfolio_optimization) I have been just opening up the .ipynb file from the comp-econ-sp19 copy I made on my laptop. After saving my file, GitKraken labels this as a WIP.

I don't want to mess up the class's general central file (i.e. the one in the master copy) but I also want to get rid of the "WIP" in GitKraken without deleting my solution for portfolio_optimization.

Should I be staging my files without commiting? Or commiting without pushing? Or just copy the files to my own repository and save them there? Is there something convenient in Git that I can do/what would be the right thing to do in this case?

Thanks in advance.

Update to HW 4

Hi all,

I've made some slight adjustments to HW 4. I've added some hints and removed a problematic question. Please make sure to download the updated version that is available now on GitHub.

Handling data that is "Missing or ___"

I'm working through the guided cleaning of the IPUMS CPS dataset and am removing missing/NIU values. For the variable 'EDUC', the documentation has the following codings:

002: None or preschool
000: NIU or no schooling

How should we treat codes like these that might be coded as Nan but may also be indicating real values?

Hw 1 Question 6 optimize.minimize

I am confused as to what this question is asking: "use scipy.optimize.minimize to minimize the sum of squares, using the function that you wrote previously. Save the optimal parameters to the variable xstar."

Are you saying to disregard yfunc, and to use the minimize function to find a function that will minimize the sum of squares between ydata and this new function, and then to print the value of this sum of squares?

In the problem before, you imply that sum_squares(x) should return a number indicating the value of the sum of squares between yfunc and ydata, when given x_initial. Thus, I do not see how to tie this into the problem above, as it does not seem to me that we should be using yfunc at all.

when is the final

Hey, when is the regular final going to be? I know a survey was sent out, but I don't think I remember hearing a conclusion on that (I need to buy plane tickets soon).

Convert int64 to datetime64-HW4

I'm having trouble converting the year column into datetime64[ns]
I tried to_datetime couple times setting the unit differently, but seems like all of them set the year to 1970

jmbejara / comp-econ-sp19 Goto Github PK

comp-econ-sp19's Issues

For #10, I am getting a compile error, " invalid callable given" pointing to the integrate.quad lines. My code is:

f = pdf(x, mu_MLE, sig_MLE) integrate.quad(f, 100000, np.inf) integrate.quad(f, -np.inf, 75000)

for #11, I am also getting a compile error, "too many values to unpack (expected 2)" pointing to the results = opt.minimize function:

Recommend Projects

Recommend Topics

Recommend Org

f = pdf(x, mu_MLE, sig_MLE)
integrate.quad(f, 100000, np.inf)
integrate.quad(f, -np.inf, 75000)