- Team: Big Data > Big Lore
- Team members: JustinGOSSES,jazzskier, GeophysicsPanda, and dalide.
- When: September 24th, 2017
- Where: Station Houston
- What: Agile Hackathon -> event & sponsors
- Thanks: to AWS cloud services Houston and Sigopt for technical help and the other sponsors for feeding us so we didn't have to leave our keyboard
Predict stratigraphic surfaces based on training on human-picked stratigraphic surfaces. Used 2000+ wells with Picks from the Mannville, including McMurray, in Alberta, Canada.
There has been studies that attempt to do similiar things for decades. A lot of them assume a mathematical pattern to stratigraphic surfaces and either don't train specifically on human-picked tops or do so lightly. We wanted to try as close a geologic approach (as opposed to mathematical or geophysical approach) as possible. What we managed to get done by the end of the hackathon is sorta a small scale first pass. The second page would have been a larger scale first pass, where we generate many (50-500) dumb features and train on all of them using an algorithm, perhaps XGBoosted trees or similar, that does a good job of rapidly ignorning features that aren't doing a good job at prediction.
Eventually, we want to get to the point where we've identified a large number of feature types that both have predictive value and can be tied back to geologist insight. There is a lot of observations happening visually (and therefore not consciously) when a geologist looks at a well log and correlates it. In addition to automating correlation of nearby wells, we think this will help geologist have better discussions, and more quantitative discussions, about the basis of their correlation and why correlations might differ between geologists.
Report for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/document/OFR/OFR_1994_14.PDF
Electronic data for Athabasca Oil Sands Data McMurray/Wabiskaw Oil Sands Deposit http://ags.aer.ca/publications/SPE_006.html Data is also in the repo folder: SPE_006_originalData
Final Data Prep & Machine Learning for the prediction finished by end of hackathon https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/blob/master/data_prep_wells_xgb.ipynb
Version of feature engineering work done during hackathon (but didn't get to include during hackathon) https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/blob/master/Feature_Brainstorm_Justin_vD-Copy1.ipynb
Latest feature engineering work (runs faster and less complication code) https://github.com/JustinGOSSES/MannvilleGroup_Strat_Hackathon/blob/master/Feature_Brainstorm_Justin_vE_PH.ipynb
- Optimize code generative feature engineering code moving as much as possible to more efficient vector numpy A. Features Type A) average value within different windows above, below, and around each depth point. B. Features Type B) average value of a different number of max and min points within different size windows above, around, and below each depth point. C. Features Type C) find difference between a different number of max/min values in different size windows around each depth point.
- Shrink feature generation to more reasonable number
- Explore different ways to pick included features that are more efficient
- Continue deployment on GPU cloud instances with more computing power
- Use geopandas & folium to investigate geographic distribution of pick prediction error
- Explore adding a feature for geographic similarity
- Build upon early investigation for geographic similar well cross-correlation as 1st step before other feature engineering & modeling.
- Explore other time series matching as pre-modeling step for additional feature generation or weighting.
- Visualize probabilty of pick along well instead of just returning max probability prediction in each well.
- Explore prediction accuracy vs. original pick uncertainty level. Graph percent of picks within different depth cut-offs with different lines for different original uncertainty levels in picks.
- Generate average aggregate wells in different local areas for wells at different prediction levels. See if there are trends or if this helps to idenetify geologic meaningful features that correlate to many combined machine-learning model features.
- Explore methods to visualize weigtings of features on individual well basis using techniques similar to those learned in image-based deep-learning.
- Cluster wells using unsupervised learning and then see if clusters can be created that correlated with supervised prediction results.