jmbejara / comp-econ-sp19 Goto Github PK
View Code? Open in Web Editor NEWMain Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)
Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)
I believe columns in the table should read contemporaneous correlation with employment growth, and lagged correlation with employment since you are running corrwith command on employment.pct_change().
Whenever I try to use 'method = loess' within geom_smooth() I receive the error "For loess smoothing, install 'scikit-misc'", but I already used the terminal to install 'scikit-misc' via 'pip install scikit-misc'. Is there something I'm missing such that 'loess' doesn't run?
I am trying to plot the requested images using a for loop and I can print everything except the last one. My code is:
values = [500,300,100,50,10]
for i in values:
plt.imshow(compress(face,i), cmap=plt.cm.gray)
plt.figure(i)
plt.plot()
why can't I print the last image?
It might be a bit early, but do we know the date of the early final yet?
I made a typo in the due date for HW 0. It has now been fixed. HW 0 is due on April 8th, before 11:59 pm.
Note that you will submit the HW assignment to your personal repository on GitHub, as we discussed in class. However, you first need to create this repo. I've posted instructions on how to do this here. You can go ahead and try it out now. We'll address any questions you have with this in my office hours tomorrow or in class on Thursday.
From the numpy lecture, there seems to be some code we did not go over (the last 3 chunks of the numpy.ipynb lecture). Are we expected to know this for the midterm?
class discreteRV:
"""
Generates an array of draws from a discrete random variable with vector of
probabilities given by q.
"""
def __init__(self, q):
"""
The argument q is a NumPy array, or array like, nonnegative and sums
to 1
"""
self.q = q
self.Q = cumsum(q)
def draw(self, k=1):
"""
Returns k draws from q. For each such draw, the value i is returned
with probability q[i].
"""
return self.Q.searchsorted(uniform(0, 1, size=k))
"""
Modifies ecdf.py from QuantEcon to add in a plot method
"""
class ECDF:
"""
One-dimensional empirical distribution function given a vector of
observations.
Parameters
----------
observations : array_like
An array of observations
Attributes
----------
observations : array_like
An array of observations
"""
def __init__(self, observations):
self.observations = np.asarray(observations)
def __call__(self, x):
"""
Evaluates the ecdf at x
Parameters
----------
x : scalar(float)
The x at which the ecdf is evaluated
Returns
-------
scalar(float)
Fraction of the sample less than x
"""
return np.mean(self.observations <= x)
def plot(self, a=None, b=None):
"""
Plot the ecdf on the interval [a, b].
Parameters
----------
a : scalar(float), optional(default=None)
Lower end point of the plot interval
b : scalar(float), optional(default=None)
Upper end point of the plot interval
"""
# === choose reasonable interval if [a, b] not specified === #
if a is None:
a = self.observations.min() - self.observations.std()
if b is None:
b = self.observations.max() + self.observations.std()
# === generate plot === #
x_vals = np.linspace(a, b, num=100)
f = np.vectorize(self.__call__)
plt.plot(x_vals, f(x_vals))
plt.show()
I'm following the code used in lecture for a fixed effects regression using statsmodels to generate the fixed effects regression for the SeatBelts dataset. My line of code is:
fixedeffectsfit = smf.mixedlm('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age', seatbelts, groups=seatbelts['state']).fit()
However, I'm getting the following error after running this code:
IndexError Traceback (most recent call last)
in ()
----> 1 fixedeffectsfit = smf.mixedlm('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + C(state)', seatbelts, groups=seatbelts['state']).fit()
~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in from_formula(cls, formula, data, re_formula, vc_formula, subset, use_sparse, *args, **kwargs)
918 exog_re=exog_re,
919 exog_vc=exog_vc,
--> 920 *args, **kwargs)
921
922 # expand re names to account for pairs of RE
~/anaconda3/lib/python3.6/site-packages/statsmodels/base/model.py in from_formula(cls, formula, data, subset, drop_cols, *args, **kwargs)
172 'formula': formula, # attach formula for unpckling
173 'design_info': design_info})
--> 174 mod = cls(endog, exog, *args, **kwargs)
175 mod.formula = formula
176
~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in init(self, endog, exog, groups, exog_re, exog_vc, use_sqrt, missing, **kwargs)
687
688 # Split the data by groups
--> 689 self.endog_li = self.group_list(self.endog)
690 self.exog_li = self.group_list(self.exog)
691 self.exog_re_li = self.group_list(self.exog_re)
~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in group_list(self, array)
976 if array.ndim == 1:
977 return [np.array(array[self.row_indices[k]])
--> 978 for k in self.group_labels]
979 else:
980 return [np.array(array[self.row_indices[k], :])
~/anaconda3/lib/python3.6/site-packages/statsmodels/regression/mixed_linear_model.py in (.0)
976 if array.ndim == 1:
977 return [np.array(array[self.row_indices[k]])
--> 978 for k in self.group_labels]
979 else:
980 return [np.array(array[self.row_indices[k], :])
IndexError: index 556 is out of bounds for axis 1 with size 556
I'm not really sure what this error is referring to since there's nothing with index 556 in the dataset or regression specifications.
When I instead use (found on this site:
fitFE1 = smf.ols('fatalityrate ~ sb_useage + speed65 + speed70 + ba08 + drinkage21 + np.log(income) + age + C(state)', seatbelts).fit()
I get a working regression. Can I get some help as to what I should be looking to fix for the first line of code and if there is something wrong with the second method to generate fixed effects?
Hi! I'm trying to do Q3-Q5 on HW3's productivity section. I'm right now doing something that I feel is inefficient, i.e.:
df['y'].loc[0].plot()
df['y'].loc[1].plot()
df['y'].loc[2].plot()
but my issue is that this seems really inefficient. Is there some better way of doing this? I tried to read through the help for the df.plot() function here but it didn't seem to help me too much.
Thanks!
when I print my notebook to pdf (everything has ran and looks right), some of the codeblock output does not show up or is cut off by the next block. Ideas why this may be and/or should I not worry about this since the ipynb is also being submitted? Thanks!
How do you construct the variance-covariance matrix in the function that generates data?
My code for #24 does not seem to be working:
shift = -1
inner_means = (df
.groupby(by=['YEAR', 'age_binned', 'educ_binned'])
.apply(lambda x: np.average(x['real_wage'], weights=x.ASECWT))
)
pd.DataFrame(inner_means)
#Create Bin Weight Sums
weights_2000 = (df[df.YEAR == 2000]
.dropna()
.groupby(by=['YEAR','age_binned', 'educ_binned'])
.ASECWT
.sum())
adj_series = (inner_means
.groupby(level='YEAR')
.apply(lambda x: np.average(x, weights=weights_2000)))
# Lag, since the we use "last years weeks worked", etc.
adj_series = adj_series.tshift(shift)
tdf['adj_ave_wages'] = adj_series
I suspect it has to do with the bolded part.
.groupby(by=['YEAR','age_binned', 'educ_binned'])
In that line, I've also tried including only 'age_binned' and 'educ_binned', and when doing that, or doing the code shown above, I get an error: Axis must be specified when shapes of a and weights differ. This points to the .apply(lambda x: np.average... line.
Additionally, what exactly are we supposed to compute for the employment variable in #11? I used in_labor_force and used np.average for this, but not sure if this is what you wanted.
We are given to use M=100, but you haven't mentioned any specifications on N and T to use with gen_all to create different data. Should we go ahead and continue to use gen_all(N=3,T=50) as in previous examples or should we loop over gen_all(N=1,T=1) 100 times?
I was working through last year's midterm and I ran into bug where I got a type error whenever I tried to use pivot tables. When I ran the given solutions, I got the same error:
TypeError: Cannot cast array data from dtype('float64') to dtype('<U32') according to the rule 'safe'
This was from question 6 and 8 from last year's midterm. I am running this on jupyter notebook
Hello, I'm slightly confused about the definitions of variables in the data. Can we find Z and W in the data, and what variables represent them? Also in the definition of F_N in the writeup, what exactly does W_L represent? I'm confused what the L subscript means. Thanks!
def get_e(params, df):
b0, b1, b2, b3, sigma = params
e = df['sick'] - (b0 + b1*df['age'] + b2*df['children'] + b3*df['avgtemp_winter'])
# e is an array
return e
def pdf_residuals(params, e):
b0, b1, b2, b3, sigma = params
likelihoods = (1 / (2 * np.pi * sigma)) * np.exp(-e**2 / (2 * sigma**2))
return likelihoods
def neg_log_likes(params, df):
e = get_e(params, df)
likes = pdf_residuals(params, e)
log_likes = np.log(likes)
neg_sum = -(log_likes.sum())
return neg_sum
params_init = np.array([1,1,1,1,1])
results = minimize(neg_log_likes, params_init, args=(df))
results.x
I am running the code above for q11 and I am getting stuck on an error where the residuals look correct, but when they are put through the pdf, then become all 0 (which throws an error). Yet if I run:
I get real values.
Any ideas for why this may be?
Just want to make sure that in Q7, after the formulas for the analytical solutions, you said that "Note that [๐ ๐] is an ๐ร2 matrix." But I think the number of rows for [๐ ๐] is k rather than N. Which is the correct row number?
In the first question when we are supposed to calculate log-likelihood, you write that the function we write should use fixed lengths of n=6 for both alpha and gamma but aren't the lengths of these vectors dependent on the number of firms? Is it implied that we treat all observations with 5 or more firms the same?
Since there are 3 different notebooks we reference for HW2, do we need to submit a different notebook corresponding to each one? Or is it alright if we submit a singular notebook that combines the 3?
The solutions to HW 2 are available here. https://github.com/econ-21410/hw-jbejarano1/tree/master/hw-submit-02
Please review the solutions. This is material that may appear on the midterm.
Also, HW 3 is available now. You can find it here: https://github.com/jmbejara/comp-econ-sp19/tree/master/HW/hw-03 It is due on the coming Monday. The material in HW 3 (panel data, fixed effects, etc) will NOT be on the midterm.
I was just wondering how we should approach using a dataset to run a regression if there are missing values within it: should we simply use .dropna() to drop the NaN's in the dataset, or should we instead replace NaN with 0 while also using a dummy variable to indicate that the data value is missing (or is there another method we should follow)?
I still do not understand the generating process for this chunk from monte_carlo_demo:
def simulate_data_ex1(N=50, seed=65594):
if seed == None:
pass
else:
np.random.seed(seed)
cov = np.array([[2,1],
[1,2]])
mean = np.array([0, 0])
X = scipy.stats.multivariate_normal.rvs(mean=mean, cov=cov, size=N)
epsilon = np.random.randn(N,1)
beta0, beta1, beta2 = 0, 1, 1
df = pd.DataFrame(X, columns=['x1', 'x2'])
df['epsilon'] = epsilon
df['y'] = beta0 + beta1 * df.x1 + beta2 * df.x2 + df.epsilon
df = df[['y', 'x1', 'x2', 'epsilon']]
return df
Why do we need a covariance matrix and a mean array? Why are they set to these particular numbers? In the homework, we are asked to generate data in a way similar to this, and I am not sure what to do other than to remove the x2 and beta2 terms from the code above. How are we supposed to account for the ๐ฅ1=๐พ0+๐พ1๐ง1+๐ข equation in the code?
Hi all. Our crash course introduction to the Python programming language will be on Saturday, 12-2 pm in SHFE 103. This will be a hands-on session. Please bring a laptop. We will teach you the basics of Python and how to use Jupyter notebooks.
I'm trying to understand plotnine's way of grouping and coloring points. I'm able to replicate Q14's intended diagram up to the legend, where whenever I try to add color = 'class'
I get the error
ValueError: b'There are other near singularities as well. 0.65044\n'
Any ideas? My exact code is:
(plotnine.ggplot(data = mpg, mapping = plotnine.aes(x = 'displ', y = 'hwy', color = 'class'))
+ plotnine.geoms.geom_point()
+ plotnine.geom_smooth(method = 'loess'))
Is there a reason that chipo ['item_price'].sum() returns the full list of item prices instead of summing them? Instead, I have used sum(map(float,chipo['item_price'])), which causes problems since now chipo['item_price'] is of type map.
Previously, this worked: chipo['quantity'].sum()
I think the issue is here (code for #12) . This code worked but I think it may have changed the type for item_price.
chipo['item_price'] = chipo['item_price'].apply(lambda x: x.replace('$','')) #deleting the $ sign
chipo.item_price.apply(lambda x: float(x)) #turning into float
Due to this, I am unable to solve #16 since I have an int and a map.
Thanks!
Hello,
I'm a bit confused about the use of the term "probability limit" in question Q3 of the Monte Carlo IV assignment: the question asks us to derive the expression for the bias of the estimate of ๐ฝ1-hat in the probability limit, but I was under the impression that the probability limit of ๐ฝ1-hat is more of a concern when considering the consistency of an OLS estimator. Is this not the case?
Hello,
I downloaded the dataset from IPUMS but I seem to only be able to download the file as a .dat. How do I download it as a csv instead? Thanks!
Hello, I'm having lots of issues with the Monte Carlo data generating process!
Are we supposed to set up two separate variance covariance matrices for the conditions, or a single one? Also I'm currently generating u, z and epsilon separately as normally distributed variables, but I'm a bit confused about how to make jointly multivariate normal. Finally, how do we ensure that all pairwise combinations not specified have covariances of 0?
The question statement is:
- Select and return the rows corresponding to the Barbacoa Bowl. Give two reasons why the item price might vary.
I gave the answer
One reason for the price variation is that different bowls have different choices, and hence, have different final costs depending on the alterations.
Another reason is that there could be multiple Barbacoa Bowls ordered at once (in the same order), so item price would be larger.
However, I wanted to make sure that this is a legitimate claim (specifically the multiple Barbacoa Bowls part). Using
chipo[chipo['item_name'] == 'Barbacoa Bowl']['quantity'].unique()
I get output
array([1])
Which implies that all quantities are 1. So is my answer still legit? It doesn't seem supported by our data set but I can't think of any other possible reason for this, other than heterogeneity of stores (there are 2 stores in our sample, but there is no data on the stores' individually pricing itself).
Thanks!
The feedback system used in the linked Kaggle notebooks will sometimes mark the answers incorrect, even when it's not the case. Moreover, using the proposed solutions will often yield 'Incorrect' result as well.
Also, I believe that the dataset referenced in the 'As & With' notebook has changed since the notebook was posted, so the Q6 and the proposed solution to it aren't relevant anymore.
I've found documentation in the 4/22 notes which showed how to add fixed effects in a regression. However, there seems to be little documentation (in or out of class notes) on how to account for two fixed effects at once, like problem 3 requires. Could someone point me towards resources/the correct way to group the data?
When I tried to submit hw1, I did:
Hey, I have been trying to debug this for a bit and am confused as to why it is throwing an error. Unless I am missing something, this is the same code as with the lecture notebooks for running/displaying results at different sizes of N. Yet when I try to display it with the seaborn library it gives:
ValueError: color kwarg must have one color per data set. 1000 data sets and 1 colors were provided
The type of results.x1 is a series, which seems right. Any ideas?
where x= np.linspace(.001, 150_000, 100) (Question 3)
mu_MLE and sig_MLE are generated from optimize.minimize
pdf is given by:
def pdf(xvals, mu, sigma):
pdf_vals= (1/(xvals * sigma * np.sqrt(2 * np.pi))
* np.exp( - (np.log(xvals) - mu)2 / (2 * sigma2)))
return pdf_vals
Thanks for your help!
I keep getting an error when using the fsolve function for Q1.
My code:
from scipy.optimize import fsolve
funct = lambda l_it: (1.0/0.3) * (gamma_i + rho * L_omega_it + 0.3 * k_it + 0.5 * sigma2epsilon - w_it + np.log(0.7)) - l_it
fsolve(funct, 4.0)
Error points to fsolve: TypeError: can't multiply sequence by non-int of type 'float'
I am not sure what I am doing wrong here? I checked and all the variables in funct are recognized as floats.
When I run this code from the HW:
url = 'https://raw.githubusercontent.com/jmbejara/comp-econ-sp18/master/lectures/5-08_Structural_IO_with_MLE/'
filename = 'BresnahanAndReiss1991_DATA.csv'
df = pd.read_csv(url + filename)
I get a HTTPError: HTTP Error 404: Not Found error. What should the url be instead?
For Q8, I can't figure out how to group the dataframe by firm since 'firm' is an index name, not a column in the dataframe. I added a new column 'firm' to the dataframe that matches the firm for each data point. This was enough to run a fixed effects model, but I'm wondering if there is a much better way to approach this?
For students coming from the CS 120s, can we simply clone the repository we create into our virtual machine? I am used to dealing with git from the command line in a linux environment so that may be easier for me plus I already have all the packages/ipython3 installed there.
Thanks!
For class exercises (such as the portfolio_optimization) I have been just opening up the .ipynb file from the comp-econ-sp19 copy I made on my laptop. After saving my file, GitKraken labels this as a WIP.
I don't want to mess up the class's general central file (i.e. the one in the master copy) but I also want to get rid of the "WIP" in GitKraken without deleting my solution for portfolio_optimization.
Should I be staging my files without commiting? Or commiting without pushing? Or just copy the files to my own repository and save them there? Is there something convenient in Git that I can do/what would be the right thing to do in this case?
Thanks in advance.
Hi all,
I've made some slight adjustments to HW 4. I've added some hints and removed a problematic question. Please make sure to download the updated version that is available now on GitHub.
I'm working through the guided cleaning of the IPUMS CPS dataset and am removing missing/NIU values. For the variable 'EDUC', the documentation has the following codings:
002: None or preschool
000: NIU or no schooling
How should we treat codes like these that might be coded as Nan but may also be indicating real values?
I am confused as to what this question is asking: "use scipy.optimize.minimize to minimize the sum of squares, using the function that you wrote previously. Save the optimal parameters to the variable xstar."
Are you saying to disregard yfunc, and to use the minimize function to find a function that will minimize the sum of squares between ydata and this new function, and then to print the value of this sum of squares?
In the problem before, you imply that sum_squares(x) should return a number indicating the value of the sum of squares between yfunc and ydata, when given x_initial. Thus, I do not see how to tie this into the problem above, as it does not seem to me that we should be using yfunc at all.
Hey, when is the regular final going to be? I know a survey was sent out, but I don't think I remember hearing a conclusion on that (I need to buy plane tickets soon).
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.