Giter VIP home page Giter VIP logo

cap_1's Introduction


The Toxics Release Inventory (TRI)

Contains EPA Data on Chemical Fates within US Industry

 

Motivation and Background...

The United States Environmental Protection Agency (EPA) is charged with keeping our air, the waterways, and ground free from harmful chemicals. The Agency maintains a database which contains the "fate" of industrial chemicals in use by US-based companies, so that we might better track the whereabouts of those which are potentially harmful.

  • Data on the amounts of each chemical traveling in any of the 60+ streams are delivered through self-reporting.
  • Complete sets are available for years 1987 - 2016, inclusive.

The publicly-available repository contains over 6 million records on 600 unique chemicals traversing various pathways. While the data are highly organized, the overlap of certain categories, some of which combine multiple fates, and the emergence of new fates that did not appear in the earlier reports, does allow for obfuscation.

Plenty of ponderables served to motivate this study, such as:

  • Do fates of chemicals predictable and reliably evolve over time?
  • Can identified trends be correlated with enhanced Federal and/or local environmental regulations? Might we become able to better predict the effect of future proposed regulation?
  • Is the early data detectably padded due to the very lax reporting standards which, anecdotally, resulted in over-reporting of certain hazardous chemicals?
  • Are trends toward more society-wide recycling apparent in industry as well?

The primary goal was to plot a few fates of select chemicals in order to test for industry-wide evolution in chemical fate. Quantification would involve Hypothesis Testing.

As the database contains no information regarding environmental regulation, regression testing to model the effects of regulations would require merging this data with a secondary regulations database.

 

Investigation...

The primary data source is the EPA's own website. A single csv contains all of the records. A secondary copy is housed on Kaggle.com, where the records are housed in 29 separate files and sorted by calendar year.

Data was downloaded from both sites. However, the csv file from the EPA site was corrupted, and when imported as a Pandas dataframe, did not reliably return complete records. As an alternative, we reconstructed a single database by merging all of the Kaggle files as individual Pandas dataframes, and saving this object in a single csv file. Data values were maintained in their present form throughout the project. At nearly one GB in size, the single file was far too massive to manage; we addressed the following issues while reorganizing the data:

  • We shrunk the data by dropping 39 of the 109 columns that dealt with company info, EPA-specific chemical classification, or other EPA geographical site-related information. We kept all of the numeric columns associated with chemical fate.
  • We found many rows with NaN values. As these appeared to exist due to the expansion in the reporting options from original conception, no data imputation was warranted at the early stage of exploration. We then split the dataframe into 613 dataframes each of which housed the data for a unique chemical, and stored these as csv files.
  • Each dataframe contained the 66 columns related to fate, and the file names were coded according to chemical name, which meant that the future process of querying any given chemical could be fairly well automated. A lookup function initially queried the monster file for chemical names, and then stored these in a separate tiny file to be accessed at the startup of each session.

 

Start the EDA...

Some of the 630 chemicals

chemical in TRI

New individual csv files

chemical file names

chemical fates
Fate classifications

Cross section of fate data for Benzene

Benzene data

all chemicals

 

Graphing by the 1000's...

A few graphs showing fate over time

Benzene on site release

Benzene total fig_chemicals_BENZENE_TOTAL_RELEASES
Creosote recycling on-site
Ethylene in air
Methanol in air
MEK in air
Toluene in air

 

Gathering Data...

To test the proposed hypothesis, "Fates of chemicals do not change over time across US industry", we next sought to gather fate data for the compelling case of FUGITIVE AIR emissions, which appeared to be diminishing over time, especially for volatile chemicals. We divided the emissions data for this fate into two temporal groupings: 1987-2001 and 2002-2016.

The next distribution graph of emission amounts would demonstrate that an effect is worthy of investigating emperically.

Had this worked well, we could select other fates and attempt to discover which ones change over time.

For starters, the admittedly selective data (emission amounts up to 100,000 pounds) for select volatile chemicals reveal diminished fates in fugitive and stack air emissions. (We also included styrene in water, for additional comparison.)

 

M-Xylene Fugitve

M-Xylene Fugitve

M-Xylene stack

M-Xylene stack

Methylene fugitive

Methylene fugitive

Methylene stack

Methylene stack

Styrene fugitve

Styrene fugitve

Styrene water

Styrene water

Toluene fugitive

Toluene fugitive

Toluene stack

Toluene stack

Xylene fugitive

Xylene fugitive

Xylene stack

Xylene stack

 

Results...

The data are interesting in that emission fates exhibit marked variability over time for many of the chemicals. However, without significant further numerical analysis, we cannot test the data using regression models in an attempt to ascertain true correlation between the various fates.

 

Looking Ahead...

We certainly know that the chemical fate numbers in this database are intrinsically related since the database is affords a complete accounting for the measured chemicals. If one fate decreased as a percentage of the the total, then one or more others must compensate. Further, there may be correlation with outside forces, such as Environmental regulations. We may be able to develop a model that predicts the change in fates based upon regulation, for instance.

 

Data Sources...

The US EPA maintains the TRI database.
Kaggle.com maintains a copy of the original.
With gratitude, we accessed both.

cap_1's People

Contributors

kevhandel avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.