Giter VIP home page Giter VIP logo

dod-over-budget-prediction's Introduction

Estimating Defense Contract Overages and Termination Risks with Supervised Learning

Overview/Synopsis

With an annual budget in excess of $500 Billion, the U.S. Department of Defense (DOD) is one of the single largest money-spenders in the world. Each year, they award billions of dollars in contracts to research, develop, build, procure, or otherwise produce materials and technology to promote the public defense. These contracts can take vastly different forms and terms, encompassing everything from next-generation weapon systems to replenishing office supplies.

Given the sheer scope and variety of these contacts, it can be difficult for those making procurement decisions to reliably know what new contracts are the best deals, and which have the highest risk. Using a record of completed DoD contracts from the past 16 years, we use a Random Forest model to see what combination of contract features (e.g. duration, cost, competitive bids) are the most powerful predictors for whether a given new contract will a) exceed its original budget (i.e. a raised cost ceiling), and/or b) be terminated before completion.

Use

A new or proposed contract will be evaluated based on the conditions known at the time of selection, including proposed amount, number of competitive bids, type of work, the prime contractor, etc. Then, two separate Random Forest models (with the same features) that have each been trained on a set of fully completed contracts predict the likelihood that the new contract will fall into one of the two risk categories.

Data Used

The data for this project originally come from the Federal Procurement Data System, available from USAspending.gov. These data consist of tens of millions of line items, recording every instance of an action on federal contracts: Starts, modifications, task orders, and terminations. The data are available from Fiscal Year 2000. New entries appear daily, but are not made publically available until 30 days after the contract award (90 days for DoD). For our analysis, the data will be aggregated to the contract level. Each contract will then have its own characteristics including:

  • Prime Contractor Name and Details
  • Contract Start Date and Projected Completion Date
  • Contract Budget Amount (base and all options)
  • Total amount spent on contract over its full run
  • Contract pricing type (e.g. Fixed Price, Cost-based, Combination)
  • Price and performance incentives associated with the contract
  • Whether a contract was intended for a single purpose, or as a vehicle for multiple task orders
  • Whether a contract was made available for competitive bidding
  • Number of offers received for the contract, if it was bid competitively
  • Which component of DoD awarded the contract (e.g. Army, Navy, Air Force)
  • Whether any of the contract performance took place outside the U.S.
  • Whether the contract was for Products, Services, or R&D

To ensure that all data points are on a similar level of data completeness, we will only include in our analysis contracts that have already concluded, either by completion or cancellation. This will help to avoid the training uncertainty of "contracts that might go over budget but haven't yet."

Usage

All analysis, from exploratory data analysis to model training and predictions, is contained within the file "Defense Contract Prediction.R"

Later versions may include additional code to directly query the raw database for more precise variable aggregation.

Progress Log

  • (April 2017) Identified data source
  • (April 2017) Identified candidate dependent and independent variables
  • (April 2017) Procured codebook descriptions for component variables
  • (May 2017) Narrowing dataset to population of key analysis
  • (May 2017) Recoded variables for more precision and detail
  • (May 2017) Model specification and training for supervised learning algorithm
  • (May 2017) Project results for test group to verify model predictive power
  • (May 2017) Analyze relative importance/predictive power of component variables
  • (May 2017) Produce report for model and results

Credits

Project Team: Bryan Baird, Loren Lipsy, Tommie Thompson -- Georgetown University

Preliminary data processing based on the work of Greg Sanders at the Center for Strategic and International Studies

License

Licensed under GNU General Public License v3.0

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.