Giter VIP home page Giter VIP logo

justingosses / predictatops Goto Github PK

View Code? Open in Web Editor NEW
58.0 9.0 17.0 93.3 MB

Stratigraphic pick prediction via supervised machine-learning

Home Page: https://justingosses.github.io/predictatops/html/index.html

License: MIT License

Makefile 0.97% Python 93.84% Jupyter Notebook 5.18%
geology stratigraphy geoscience machine-learning hackathon-project well-logs dataunderground athabasca athabasca-preprocessed

predictatops's Introduction

predictatops

Code for stratigraphic pick prediction via supervised machine-learning

Yale-Peabody-Triceratops-004Trp

DOI

License: MIT

THIS REPOSITORY HAS BEEN ARCHIVED TO SIGNIFY THERE WILL NOT BE ADDITIONAL WORK. HOWEVER, IT IS HAS ALWAYS BEEN A PROOF OF CONCEPT OF AN APPROACH RATHER THAN A TOOL SO YOUR USE OF IT SHOULD NOT REALLY CHANGE. YOU CAN STILL STAR OR FORK IT.

Status: Runs and ready for others to try. This code project is most useful as a working proof-of-concept. It is not optimized to be used in a plug-n-play or as a dependency. Updated to v0.0.4-alpha October 26th, 2019. Updates to dependencies are done but not frequently. NOTE: Running in a standard google colab notebook may fail during model training due to memory requirements exceeding the default initial amount of RAM.

Current best RMSE on Top McMurray surface is 6.6 meters.

Related Content

The docs provide information additional to this README.

This code is the subject of an abstract submitted to the AAPG ACE convention in 2019.

The slides I presented at AAPG ACE 2019 are available in PDF form. They give an introduction to the theory and thought process behind Predictatops.

Development was in this repo: MannvilleGroup_Strat_Hackathon but is now moving here as the code gets cleaned and modulized. This project is under active development. A few portions of the code still only exist on MannvilleGroup_Strat_Hackathon repo at this time. This is a nights and weekend side project, but will continue to be developed by the main developer.

A more non-coder friendly description of the work can be found in this blog post.

Philosophy

In human-generated stratigraphic correlations there is often talk of lithostratigraphy vs. chronostratigraphy. We propose there is a weak analogy between lithostratigraphy and chronostratigraphy and the different methods of computer assisted stratigraphy. Some of the past efforts, which work very well under certain circumstances, are similar to lithostratigraphy in terms of what they accomplish. They match curve patterns between neighboring wells and rely on the assumption that changes in lithology ~ curve shapes are equivelant to stratigraphy.

Other papers attempt to use code to correlate well logs assuming there was a mathematical or pattern basis for stratigraphic surfaces that can be teased out of individual logs. Although there are recent papers that seem to do better with this type of approach, no code was released, the earlier ones seem to have problems that at least in part were related to their assumption that stratigraphic changes had similar expression across large spatial areas.

In contrast to lithostratigraphy, chronostratigraphy assumes lithology equates to facies belts that can fluctuate gradually in space over time, and are not correlated with time. Two wells with similar lithology patterns can be in different time packages. Traditional chronostratigraphy relies on models of how facies belts should change in space when not otherwise constrained by biostratigraphy, chemostratigraphy, or radiometric dating.

Instead of relying on stratigraphic models, this project proposes known picks can define spatial distribution of, and variance of, well log curve patterns that are then used to predict picks in new wells. This project attempts to focus on creating programatic features and operations that mimic the low level observations of a human geologist and progressively build into higher order clustering of patterns occuring across many wells that would have been done by a human geologist.

Datasets

The default demo dataset used is a collection of over 2000 wells made public by the Alberta Geological Survey's Alberta Energy Regulator. To quote their webpage, "In 1986, Alberta Geological Survey began a project to map the McMurray Formation and the overlying Wabiskaw Member of the Clearwater Formation in the Athabasca Oil Sands Area. The data that accompany this report are one of the most significant products of the project and will hopefully facilitate future development of the oil sands." It includes well log curves as LAS files and tops in txt files and xls files. There is a word doc and a text file that describes the files and associated metadata.

Wynne, D.A., Attalla, M., Berezniuk, T., Brulotte, M., Cotterill, D.K., Strobl, R. and Wightman, D. (1995): Athabasca Oil Sands data McMurray/Wabiskaw oil sands deposit - electronic data; Alberta Research Council, ARC/AGS Special Report 6.

Please go to the links below for more information and the dataset:

Report for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/document/OFR/OFR_1994_14.PDF

Electronic data for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/publications/SPE_006.html Data is also in the repo folder: SPE_006_originalData of the original repo for this project here.

In the metadata file SPE_006.txt the dataset is described as Access Constraints: Public and Use Constraints: Credit to originator/source required. Commercial reproduction not allowed.

The Latitude and longitude of the wells is not in the original dataset. @dalide used the Alberta Geological Society's UWI conversion tool to find lat/longs for each of the well UWIs. A CSV with the coordinates of each well's location can be found here. These were then used to find each well's nearest neighbors.

Please note that there are a few misformed .LAS files in the full dataset, so the code in this repository skips those.

If for some reason the well data is not found at the links above, you should be able to find it here.

Architecture and Abstraction

PLEASE REFER TO THE SECTION Architecture and Abstraction in the DOCs. Information is provided on code architecture, tasks, and folder organization.

GettingStarted

See the Usage and the Installation sections of the docs.

Credits

There's a theme here. Check the docs.


Status

The root mean squared error for the Top McMurray surface is down to ~7 meters (with a handful of wells identified as too difficult to predict, -8% depending on settings).

Distribution of Absolute Error in Test Portion of Dataset for Top McMurray Surface in Meters.

Y-axis is number of picks in each bin, and X-axis is distance predicted pick is off from human-generated pick. image of current_errors_TopMcMr_20190517

Current algorithm used is XGBoost.

predictatops's People

Contributors

bluetyson avatar dependabot[bot] avatar justingosses avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

predictatops's Issues

missing / in path to CSV for pics

Describe the bug
When trying to clone the package into co-lab and run all_runner.py it seems to fall on importing the PICKS csv file because a / is missing in https://github.com/JustinGOSSES/predictatops/blob/master/predictatops/configurationplusfiles.py on line

    `

self.picks_dic = self.data_directory + "OilSandsDB/PICKS.TXT"
which should be self.picks_dic = self.data_directory + "OilSandsDB/PICKS.TXT"
`

To Reproduce
Clone into co-lab and follow normal instructions in docs.

Expected behavior
The CSV inport shouldn't fail.

Desktop (please complete the following information):
colab 2020-05-01

reminder to fix this tomorrow on different computer.

something in steps 1-3 stores path to files instead of filenames, may have been accidental change during documentation?

Describe the bug
User reported via email a bug having to do with the import script where the intersection of file names and names of wells fails as the full path is used for names of wells to import against!

To Reproduce
Steps to reproduce the behavior:

  1. Run notebook for first 3 steps
  2. Try to use import_runner.py on the results of notebook for first three steps.

Expected behavior
The comparison operation should find approximately 1200 well names that match well files

Screenshots

Desktop (please complete the following information):
unknown

Additional context
If you emailed me on this issue, please add more context below! I'll try to get to this within the next week.

UMAP for visualization & feature creation step

Cluster wells using unsupervised learning and then see if clusters can be created that correlated with supervised prediction results. (initial trials with UMAP give encouraging results)

add example of mapping various attributes to ploy.py

Add example of mapping various attributes to plot.py

  • Error
  • Range of depth of top X number of depth predictions by probability
  • Actual top depth
  • Unit thickness
  • max or min statistic of certain curve type
  • availability of different curve lists

Add tests

Make small test datasets with 2 different organizations.

Need to investigate problem in features_runner.py

problem in features_runner.py

len(df_test5) 1302634
/Users/justingosses/anaconda/envs/MannvilleDask2/lib/python3.6/site-packages/pandas/core/generic.py:1996: PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block3_values] [items->['UWI', 'trainOrTest', 'Neighbors_Obj', 'class_DistFrPick_TopTarget', 'class_DistFrPick_TopHelper', 'closerToBotOrTop']]

return pytables.to_hdf(path_or_buf, key, self, **kwargs)

create geojson around different 'regions' based on thickness, turn into one hot vectors

Use thicknesss and potentially a residual of depth or thickness to identify sharp lines where sudden changes occur. Manually or programmatically split map into contiguous "regions". Create a feature dimension for each region. Every well is a 1 in one or more features and a zero in the rest. Features can overlap but might be better if they don't? Not sure.

need to change package name to match Python standards

Suggested names:
wellpickmimic
stratpicksupml
pickmimic
topmimic
stratpick
supervisedstratigraphy
stratipredict
strataterpretor
stratalmimic
strataxtension
pickpredict
predictatops <- gets a cool logo

Supervised Iterative Machine-Learning for Stratigraphic Interpretation Mimicry = simsim
Well-based Iterative Stratigraphic Supervised Machine-Learning = wissml
supervisedstratigraphy
supervisedstrat
superstrat - an approach for supervised machine-learning prediction of chronostratigraphic well tops.

clean & add to documentation

  • How to install
  • put on pypi? or wait on that until further along probably.
  • Provide a couple "how to use" examples.
  • Provide link to demo notebooks.
  • Add all_runner.py description and examples.
  • Shrink to only 1 readme , finish merging markdown into RST format and push more into docs instead of readme.
  • Make all the functions easier to navigate through.
  • Provide a page with the highest level functions only.

to ensure easier use with data that comes in different formats, move merge of all input data to first bits of work.

Currently, things like fining all available curves in wells and finding neighboring wells are done with just the input files needed for those tasks, then the results merged.

If someone is bringing in geographic coordinate data that isn't in txt or CSV form, it might be easier to adapt this code if they only had to write their own imports and transforms at the very beginning.

Some of the code having to do with SiteID <=> UWI <=> Well log filesnames, for example, is transformed a couple of different times in different early modules.

In summary, It would probably make more sense to just do it once and have a big dataframe from then on. The intermediate file sizes would be bigger but the code would be less complex to adapt. If everything was a dataframe from check / load onwards.

Change fetch_demo_data.py to load from a zip file of all files instead of individual file! and then don't have to save unzipped dataset.

Is your feature request related to a problem? Please describe.
Data loads slow and has to be stored upcompressed as fetch_demo_data.py loads files individually instead of loading as zip file and then unzips into place
Describe the solution you'd like
all demo data in single zip file and everything else works as normal

Describe alternatives you've considered
Leave as is. take away zip file as its duplicate
Alternatively, put it in another location and load from there?

Additional context

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.