Giter VIP home page Giter VIP logo

ietoolkit's Introduction

ietoolkit - Stata toolkit for analysis

Content

ietoolkit provides a set of commands that address different aspects of data management and data analysis. Many of the commands are initially conceived in the context of primary data and impact evaluations, but implemented to be general and applicable to many other fields. Some of the commands are related to standardized best practices developed at DIME (The World Bank’s department for Impact Evaluations). For these commands, the corresponding help files provide justifications for the standardized best practices applied.

Command Description
iebaltab Produces balance tables with multiple groups or treatment arms
ieboilstart Applies best practices for collaboration and reproducibility within a project
ieddtab This command runs a Diff-in-Diff regression and displays the baseline values, the two 1st differences and the 2nd difference
iedropone An extension of the command drop with features preventing additional observations are unintentionally dropped
iefolder Sets up project folders and master do-files according to World Bank DIME's standards
iegitaddmd Creates a placeholder file in subfolders of a GitHub repository folder, which allows committing folder structures with empty folders
iegraph Generates graphs based on regressions with treatment dummies common in impact evaluations
iekdensity This command plots univariate kernel density estimates by treatment assignment
iematch Matching base observations towards target observations using on a single continuous variable
iesave Applies best practices before saving data, with option to save meta data report about the data saved
ietoolkit Returns information on the version of ietoolkit installed

Install and Update

Installing published versions of ietoolkit

To install ietoolkit, type ssc install ietoolkit in Stata. This will install the latest published version of ietoolkit. The main version of the code in the repo (the master branch) is what is published on SSC as well.

If you think something is different in version in this repo, and the version installed on your computer, make sure that you both look at the master branch in this repo, and that you have the most recent version of ietoolkit installed. To update all files associated with ietoolkit type adoupdate ietoolkit, update in Stata. (It is wise to be in the habit of regularly checking if any of your .ado files installed in Stata need updates by typing adoupdate.)

When we are publishing new versions of ietoolkit then there could be a discrepancy between the master branch and the version on SSC as the master branch is updates a couple of days before. You can confirm if that could be the case by checking if we recently published a new release.

Installing unpublished branches of this repository

Follow the instructions above if you want the most recent published version of ietoolkit. If you want a yet to be published version of ietoolkit then you can use the code below. The code below installs the version currently in the master branch, but replace master in the URL below with the name of the branch you want to install from. You can also install older version of ietoolkit like this but it will only go back to January 2019 when we set up this method of installing the package.

    net install ietoolkit , from("https://raw.githubusercontent.com/worldbank/ietoolkit/master/src") replace

Requirements

Stata version 11 or later is required for this package of commands.

Background

These commands are developed by people that work at or with the Development Impact Evaluations (DIME) unit at the World Bank. While the commands are developed with best practices for impact evaluations in mind, these commands can be useful outside that field as well.

Bug Reports and Feature Requests

If you are familiar with GitHub go to the Contributions section below for advanced instructions.

An easy but still very efficient way to provide any feedback on these commands is to create an issue in GitHub. You can read issues submitted by other users or create a new issue in the top menu below worldbank/ietoolkit at https://github.com/worldbank/ietoolkit. While the word issue has a negative connotation outside GitHub, it can be used for any kind of feedback. If you have an idea for a new command, or a new feature on an existing command, creating an issue is a great tool for suggesting that. Please read already existing issues to check whether someone else has made the same suggestion or reported the same error before creating a new issue.

While we have a slight preference for receiving feedback here on GitHub, you are still very welcome to send a regular email with your feedback to [email protected].

Contributions

If you are not familiar with GitHub see the Bug reports and feature requests section above for a less technical but still very helpful way to contribute to ietoolkit.

GitHub is a wonderful tool for collaboration on code. We appreciate contributions directly to the code and will of course give credit to anyone providing contributions that we merge to the master branch. If you have any questions on anything in this section, please do not hesitate to email [email protected]. See CONTRIBUTING.md for some more details on for example naming conventions.

The Stata files on the master branch are the files most recently released on the SSC server. README, LICENSE and similar files are updated directly to master in between releases. Check out any of the develop branches (if there are any) if you want to see what future updates we are currently working on.

Please make pull requests to the master branch only if you wish to contribute to README, LICENSE or similar meta data files. If you wish to make a contribution to any Stata file, then please do not use the master branch. If you wish to make a contribution to any Stata files that we have published at least once, then please fork from and make your pull request to the develop branch. The develop branch includes all minor edits we have made to already published commands since the last release that we will include in the next version released on the SSC server. If your addition is related to a specific issue in this repository, then see the naming convention in the CONTRIBUTING.md file.

All Stata commands we are working on that we have yet to release a first version of, are found in the branches called develop-NAME where NAME corresponds to the working name of the command that is yet to be published. If you wish to contribute to any of those commands, then please fork from the branch of the command you want to contribute to, and only make edits to the .ado/.do and .sthlp that correspond to that command. If you want to make contributions to multiple commands that have yet to be released, then you will have to fork from and make pull request to multiple branches.

If you wish to make a contribution by making forks and pull requests but are not exactly sure how to do so, feel free to send an email to [email protected].

License

ietoolkit is developed under MIT license. See http://adampritchard.mit-license.org/ or see the LICENSE file for details.

Contact

DIME Analytics ([email protected])

About us

DIME is the World Bank's impact evaluation department. Part of DIME’s mission is to intensify the production of and access to public goods that improve the quantity and quality of global development research, while lowering the costs of doing IE for the entire research community. This Library is developed and maintained by DIME Analytics. DIME Analytics supports quality research processes across the DIME portfolio, offers public trainings, and develops tools for the global community of development researchers.

Other DIME Analytics public goods are:

ietoolkit's People

Contributors

ankritisingh avatar avnish95 avatar bbdaniels avatar denisseo avatar eatorange avatar kbjarkefur avatar luisesanmartin avatar luizaandrade avatar mariarrt94 avatar mrimal avatar mruzzante avatar pythagoraswitch avatar roshni13khincha avatar saoriiwa avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ietoolkit's Issues

iebaltab : issue in version 11

command Import don't work in version 11.

Investigate if we can substitute for insheet or if we need to set ietoolkit to require version 13

iebaltab : allow grpvar to be string

simply encode it and do not allow any of option that is making references to the codes.

Options that will not be allowed if string variable

  • control(groupcode)
  • order(groupcodelist)
  • grpcodes
  • grplabels(codetitles)

iebaltab: balance tests for different values of a categorical variable

I'm doing balance tests on a single variable for different values of a categorical variable, which means using different if statements. I couldn't find a quick way to do that and have them all in a single table. Is there a way to do that?

If not, maybe it could be interesting to add something like an over(varname) option? Or simply add append as an option when saving the file?

iebaltab : filepath issue in Mac

File paths in save() and savetex() is not working in Mac. It save the full name of the path as file name in the current directory.

iegitaddtxt : test new command

Please find iegitaddtxt.ado in the develop-iegitaddtxt branch. See iegitaddtxt.sthlp in the same branch for the helpfile.

@mrimal please test this command for bugs and errors

@luizaandrade or anyone else, feel free to test it you too if you have some time to spare!

Report any bugs by writing to the branch (or create a pull request) or by commenting in this thread and I will fix.

Manually naming duplicate reports

Is there a way to manually name the file of duplicate reports? Suppose we use “ieduplicates” command for many different dataset, then we need to save each report separately.

iefolder : add readme file in folders

In addition to be a great way of documenting the intended use of the folder structure it would also be a sole the problem of github not syncing empty folders

iefolder : put root file path in global_setup.do

In projects with many rounds there are many places where all users have to specify their root folder path.

By handling root folders in the global_setup.do file and calling that file in each master do file, it will limit the places this have to be specified to one, and it won't conflict with the preference that lead to issue #80

iebaltab: Wrapping text when long varlabels are used

The command does not automatically wrap text when varlabels are especially long and exporting to table to LaTeX. This can be manually solved in LaTeX but there is no option which can do it through iebaltab.

Thanks.

iematch - variable labels for _match*

It would be great if there could be variable labels for the following variables that are generated once you run the iematch command:
_matchResult
_matchID
-matchDiff
-matchCount

iebaltab : file path in save() cannot have a . in the name

When we are testing which file extension that is used in the save() option we are looking for the dot and compare the rest of the string to the file extensions allowed. However, it is possible to have a . in a folder name or anything meaning that there will be a . in the middle of the file path in save(). Then we are not comparing the file extension but a large part of the file path to the allowed file extensions and we will always get an error.

Does this apply to savetex() as well?

This problem should also be the case in commands with similar code, for example iegraph.

two-way option "yaxis and barwidth" doesn't work for iegraph

I wanted to have a specific value labeled on the second y-axis on the graph and therefore tried to generate the second y-axis using yaxis(1 2). But an error happened saying "option yaxis() not allowed". The same situation happened to barwidth.

Thanks,
Wenqing

iebaltab : issue with escape charcter

when a label as a $ inside a word, like in Fanta$tic. Then the escape character does not work. When numbers follows like $2000 then it works.

I have only tested this in the table notes. Not in any other label

iegraph : test that valid dummies are used

In order for the graph to be valid each dummy used in varlist should be a dummy per treatment arm. And there can be anywhere from one to many treatment arms. Since no observation can be in two treatment arms, then there should be no overlap. And some observations, i.e. the control, should have zeros in all dummies. We should test that this is the case, currently we are only testing that they are dummies.

The only thing that makes this difficult is that we also want this command to be working for diff-in-diff regressions. In a diff-in-diff there are exactly three dummies. One of those dummies should be the product of the other two. And then there must be observations in all groups, [0,0], [0,1], [1,0] and [1,1].

And we need to have an error message that explains why we are doing this test.

We should do this, but this does not have to be done in the update we are planning to send in end of July.

Add option for latex in iebaltab

This is on our list and will hopefully be included in next version of ietoolkit. If you have resources for your favorite standardized format for a balance table, then please post them below. There are many different schools on how tables should be formatted and deciding for one of them will be what takes us the most time to add an option for latex to iebaltab.

graph title format

I was trying to create a graph with a long title but the title cannot be shown in full length. So I wrote the code below:

iegraph treat, ///
title ("how easy or difficult is it to find out how the LGA is spending its budget" ///
, size(small) position(11))

But the title doesn't change at all. I'm wondering how can I adjust the size and style of the title?

Thanks,
Wenqing

iebaltab : in latex underscores in variable names are interpreted as subscript

If a variable name is used as row title and that variable name includes and underscore, for example d_harverst, then the "h" will be subscripted to the "D".

They way to get around that now is variable a variable label and use option rowvarlabel, or to give it another label using the rwolabel() option. But if there is an easy way to include an escape character before the underscore, then that would be great.

Add all option to iegitaddmd.ado

Add option to add readme placeholder to every subfolder. Syntax would be:

(line 7) syntax , folder(string) [all]
(line 25) if (`"`flist'`dlist'`olist'"' == "") | ("`all'" == "all") { 

iegraph : yzero() does not work if the bars are negative

In a regression on a variable where the mean (for control) and the mean+beta (for treatment arms) are negative yzero() creates an error. yzero() looks for the maximum value to determine the length of the axis but that is only correct for positive bars. Negative bars should use the minimum value.

This is most likely easily solved with an if-statement based on a test if the values for the bars are positive or negative. The only special case is when some bars are positive and some are negative. But in that case yzero() does not make sense as the 0 on the y-axis is included anyways.

So the solution to this issue is that if yzero() is used, then we should test that the value for all bars (and confidence intervals if used) are all positive or negative. If they are not then throw an error saying that the options cannot be used. If error is not thrown, then test if they are all positive and use maximum, otherwise use minimum.

iebaltab : option onerown in help file is onenrow in ado-file

The default setting is that the N is reported for each group for each variable. There is an option that puts the N at a row below all variables with just one N per group. This option is only possible if the N is the same within each group across all variables.

Currently the help file calls this onerown but the code expect the name to be onenrow. What is the best name for this option? onerown or onenrow? Or is nonerow better? Or something completely different?

@luizaandrade , @eatorange , @morevba @mrimal do either of you have an opinion of what is the best name for this option?

ieduplicates : instead of requiring the daily folder, just create it

Instead of throwing an error or requiring nodaily to be turned on, just use the command mkdir to create it.

nodaily can be kept as an option, but instead of that option disabling the requirement it should instead be that it disabling creating a new folder if it does not exist

iefolder : create an option to organize rounds in sub-folders

In large and complex projects we might have 5 data sources in one round, and maybe have 5 rounds. That creates too many folders. One solution would be to be able to create a round called, f.ex., Baseline which is only a folder and a master dofile. Then create folders inside that folder. The key word round in the command would then be reserved for creating this type of folder, and a new key word source would be used to create what is now called round. For a source you have the possibility to add the key word in that allows you to create that source in that round that is specified. See an example how it would be specified below.

*Create a new project
iefolder new project                        , projectfolder("$projectABC")
	
*Add a unit of observation
iefolder new unitofobs household 	    , projectfolder("$projectABC")
	
*Create a new round called baseline
iefolder new round Baseline	            , projectfolder("$projectABC") 

*Create two new sources in Basline folder
iefolder new source farmer_BL in Baseline   , projectfolder("$projectABC") abbreviation("farmerBL")
iefolder new source hhh_BL in Baseline	    , projectfolder("$projectABC") abbreviation("hhhBL")

*Create a new sources that is not Baseline
iefolder new source monitoring              , projectfolder("$projectABC") abbreviation("monit")

This is not a project for right now, but lets keep this as a place where we can discuss how it should work, and how to implement it. So feel free to brainstorm features, challenges and solutions.

This idea came up in a discussion with @gbedoyaWB

iegitaddtxt : proofread helpfile

Please proofread helpfile iegitaddtxt.sthlp in branch develop-iegitaddtxt.

I will test the command a bit more and then afterwards ask you to also test it.

Thanks!

ieboilstart: Typo in help file

In "ieboilstart"'s help file, "custom" option refers to "example 4" as below,

custom(string) allows the user to add one or multiple custom lines of code. Each line of code should be seperated with a "@". See example 4 below for more details.

However, there's no example 4 in the help file. I believe it should be example 2, which shows an example of "custom" option as below.

ieboilstart, versionnumber(12.1) custom(ssc install estout @ ssc install winsor)
`r(version)'

Bug in syntax for ieboilstart

A test feature was incorrectly left in the syntax for ieboilstart. This bug was fatal but is now fixed for version 2.1 of ietoolkit. The update has been submitted to the server, and this issue will be closed as soon as the update is reflected on the server.

iegraph : change significance level on confidence interval

It is easy to implemented. The solution is to not hard code the value 1.96. (1.96 is only accurate in large samples anyways)

This section:

scalar  conf_int_min_`var'   =	coeff_`var'-(1.96*coeff_se_`var') + ctl_mean
scalar  conf_int_max_`var'   =	coeff_`var'+(1.96*coeff_se_`var') + ctl_mean

Change to:

*from the regression:
local df = `e(df_r)'

*From option or set default
if "confbarval" = "" {
	local confbarval = .95
}

*Go to one tail value
local confbarval_1tail = ( `confbarval' + (1-`confbarval' ) / 2)

*Calculate t-stats
local tstats = invt(`df' , `confbarval_1tail'  )

*Use it in the old code
scalar  conf_int_min_`var'   =	coeff_`var'-(`tstats' * coeff_se_`var') + ctl_mean
scalar  conf_int_max_`var'   =	coeff_`var'+(`tstats' * coeff_se_`var') + ctl_mean

What requires a little bit more thinking is how to document this in the output.

iefolder : add unit globals to the round master do file

In order to make the round master do-files independent from the project master do-file then it also needs to have the unit global section. This issue is related to issue #80

Not great to have to define this at multiple location, but at least it is only at a few places.

Perhaps this can be solved by storing these in a separate file that is run in the different locations.

iebaltab : append tables

Allow appending tables exported from iebaltab the way, for example, estout does it.

Is that something we can implement? Working with and modifying different file formats can be tricky, but if someone sees a straightforward way to implement this, then I would be happy to include it.

I guess it would only apply to tables exported to Excel. Or could this make sense in LaTeX too?

Based on feedback from @aidancoville.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.