Giter VIP home page Giter VIP logo

wellbook's Introduction

wellbook

The wellbook concept is about a single view of an oil well and its history- something akin to a "Facebook Wall" for oil wells.

This repo is built from data collected and made available by the North Dakota Industrial Commission.

I used the wellindex.csv file to obtain a list of well file numbers (file_no), scraped their respective Production, Injection, Scout Ticket web pages, any available LAS format well logfiles, and loaded them into HDFS (/user/dev/wellbook/) for analysis.

To avoid the HDFS small files problem I used the Apache Mahout seqdirectory tool for combining my textfiles into SequenceFiles: the keys are the filenames and the values are the contents of each textfile.

Then I used a combination of Hive queries and the pyquery Python library for parsing relevant fields out of the raw HTML pages.

Tables:
wellbook.wells -- well metadata including geolocation and owner
wellbook.well_surveys -- borehole curve
wellbook.production -- how much oil, gas, and water was produced for each well on a monthly basis
wellbook.auctions -- how much was paid for each parcel of land at auction
wellbook.injections -- how much fluid and gas was injected into each well (for enhanced oil recovery and disposal purposes)
wellbook.log_metadata -- metadata for each LAS well log file
wellbook.log_readings -- sensor readings for each depth step in all LAS well log files
wellbook.log_key -- map of log mnemonics to their descriptions
wellbook.formations -- manually annotated map of well depths to rock formations
wellbook.formations_key -- Descriptions of rock formations
wellbook.water_sites -- metadata for water quality monitoring stations in North Dakota

Setup:

git clone https://github.com/randerzander/wellbook

#Prereqs
sudo wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
sudo yum groupinstall -y 'development tools'
sudo yum install -y apache-maven mahout
#for python libs
sudo yum install -y python-devel libxslt-devel blas-devel lapack-devel gcc-gfortran
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin
echo export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk.x86_64/bin >> ~/.bashrc

#Download and install virtualenv
wget https://bootstrap.pypa.io/ez_setup.py
sudo python ez_setup.py
sudo easy_install pip
sudo pip install virtualenv

#Create a relocatable Python virtualenv
virtualenv ~/wellbook/pyenv
source ~/wellbook/pyenv/bin/activate
pip install pyquery numpy scipy scikit-learn
cp ~/wellbook/etl/lib/recordhelper.py ~/wellbook/pyenv/lib/python2.6/site-packages/
deactivate
virtualenv --relocatable ~/wellbook/pyenv

function mvn_package(){
  git clone $1
  mv $2 $3/
  cd $3/$2
  mvn package
}
#Download and build the custom Hive InputFormat
mvn_package https://github.com/randerzander/SequenceFileKeyValueInputFormat SequenceFileKeyValueInputFormat ~/wellbook

#Download and build necessary Hive UDFs
mkdir ~/wellbook/udfs
mvn_package https://github.com/Esri/spatial-framework-for-hadoop spatial-framework-for-hadoop ~/wellbook/udfs
mvn_package https://github.com/randerzander/CurveUDFs CurveUDFs ~/wellbook/udfs

#Download and build necessary Hive SerDes
mkdir ~/wellbook/serdes
mvn_package https://github.com/ogrodnek/csv-serde csv-serde ~/wellbook/serdes

cd ~/
#Sets up HDFS folder structure
sh ~/wellbook/scripts/hdfs_setup.sh
#Sets up Hive tables
sh ~/wellbook/scripts/hive_setup.sh

wellbook's People

Contributors

randerzander avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.