equinor / fmu-ensemble Goto Github PK

View Code? Open in Web Editor NEW

12.0 16.0 19.0 19.74 MB

Python objectification of reservoir model ensembles left on disk by ERT.

License: GNU General Public License v3.0

Python 99.74% Shell 0.02% C++ 0.24%

reservoir python eclipse

fmu-ensemble's Introduction

https://img.shields.io/github/workflow/status/equinor/fmu-ensemble/fmu-ensemble

Introduction to fmu.ensemble

FMU Ensemble is a Python module for handling simulation ensembles originating from an FMU (Fast Model Update) workflow.

For documentation, see the github pages for this repository.

Ensembles consist of realizations. Realizations consist of (input and) output from their associated jobs stored in text or binary files. Selected file formats (text and binary) are supported.

This module will help you handle ensembles and realizations (and their associated data) as Python objects, and thereby facilitating the use use of other Python visualizations modules like webviz, plotly or interactive usage in IPython/Jupyter.

If run as a post-workflow in Ert, a simple script using this library can replace and extend the existing CSV_EXPORT1 workflow

This software is released under GPL v3.0

fmu-ensemble's People

Contributors

Stargazers

Watchers

Forkers

berland codacy-badger xiaji xclmj jcrivenaes asnyv anders-kiaer hanskallekleiv eivindjahren terryhannant dafeda lgtm-migrator perolavsvendsen bkhegstad equinor hnformentin dansava jonathan-eq

fmu-ensemble's Issues

Easier to work with history vectors observations

Add doc on how to load observations from a yaml string - easier in notebooks compared to inputting the dict-list-dict structure.
Guess histvec automatically, it does not need to be required for smryh
Fails hard on non-existing summary keys, should perhaps just warn about them
Warnings from realization object should include the realization index.
"global" time_index specified at the top level for smryh possible?
Incomplete docs, no mention of scalar values

Default log level

Ensure that default log-level is WARNING.

If logging is not set, not even ERROR messages are printed to the console.

Avoid recurring into restart unsmry files

Parsing data from from UNSMRY files which are restarted from others is supported by libecl, but is sometimes a minefield.

EclSum supports include_restart=False to be set to avoid this recurrence, and this should be supported from fmu-ensemble

Full support for timestamps in observations module

The observation module has been made with date accuracy, so observations timestamped to datetime will not work.

This is relevant for DST well tests.

Task is to add support for it, and with extensive tests.

Add pylint and black to CI (Travis)

black (done in #130)
pylint

Use tmpdir fixture in tests to avoid repo-clutter

Duplicate of #104

Allow initialization of ScratchRealization with dummy index

To allow use of ScratchRealization on non-FMU related Eclipse runs, it should be possible to supply an index explicitly.

Combinations of virtual realization/ensembles

Support for EnsembleCombinations and RealizationCombinations of virtual realizations and ensembles is missing.

EnsembleSet initialization bugs

When standing in the directory containing the realizations directories:

EnsembleSet(frompath='.') does not work, zero realizations.
EnsembleSet(name='foo', frompath='.') does work.
EnsembleSet(frompath='.', name='foo') does not work.

Also check whether frompath='.' can be default in some situations, so all that is needed is
EnsembleSet() when you are in the correct place.

get_smry, time_index

Make it possible to use a spesific date with get_smry

Example:
date='2019-01-01'
smry = ens.get_smry(column_keys=['GPTH:'], time_index=date)

VirtualEnsemble manifest seems to have minor bugs

Line 75 should probably read self._manifest = manifest. Check why this has not been caught by tests.

Defaulting manifest does not work properly, to_virtual() on an EnsembleCombination will f.ex issue a manifest warning which is not relevant.

observations.load_smry does not handle timestamped DATE column

If the DATE column is of type datetime, then load_smry() will fill the observation dictionary with date objects that do not print well in yaml and is not supported by observation.mismatch

Make it possible to turn off auto-discovery of Eclipse UNSMRY files

There can be situations where automatic discovery of UNSMRY files is not wanted. If there are multiple Eclipse runs in eclipse/model, and the user wants to control which one is used, find_files() is the recommended practice. But, if the wanted Eclipse run has crashed in one realization, the auto-discovery might kick in and discover another run, which in that case would be erroneous.

Deploy on pypi

Feasible now as libecl is on pypi.

ensemblecombination.get_df() index guessing

It should be possible to override the guessing of indices in ensemblecombination.get_df(). Especially there will be situations where ZONE and REGION have different names.

Also applies to realizationcombination.

Observations.load_smry() can now use get_smry() from VirtualRealization

After get_smry() was implemented in VirtualRealization, this can be used by the function load_smry() in Observations. Current code will try to guess the internalized dataframe and will probably fail if interpolation is needed.

EnsembleCombinations lack a VirtualEnsemble API

EnsembleCombinations objects should act as VirtualEnsemble objects, and implement functions like agg() so the user do not have to call to_virtual().

It could still be recommended to call .to_virtual() explicitly for the user in case the object is to be reused.

Automatic documentation build and deploy

E.g. on readthedocs or github pages.

Consider changing default behaviour for missing data in Combinations

RealizationCombination and EnsembleCombination can do linear combination of dataframes. When these are indexed by a DATE column, it will only combine for DATEs existing in both datasets, and drop the rest.

For summary data, there is get_smry() in VirtualEnsemble support that will extrapolate any summary data correctly (zero for rate vectors, constant for cumulative vectors), meaning it is technically possible to combine any realizations summary data (even with no overlapping DATE). This is relevant in situations where the end-date of a simulation is variable by design. Right now this can probably be worked around by providing list of datetimes, or possibly an end_date, but would require custom coding.

If functionality is changed to always extrapolate, there will be side-effects when end-date is variable due to errors. It probably makes more sense to put responsibility on the user for filtering out bad simulations.

@asnyv

Add dictionary for metadata to ensembles

Some applications need to associate a dictionary of metadata to ensembles.

This probably has to be a class member, typically initialized from a yaml file.

Should __init__ support it, or should we require an extra function call to set it.

It requires support in virtual_ensembles, and in to/from_disk.

Should it be reset to None in EnsembleCombinations?

Should there be a default filename that can be looked for? A filename in use is share/runinfo/runinfo.yaml relative to the ensemble root.

Method to get EnsembleSet statistics.

The methode would allow to get a dataframe containing the summary-statistics of an EnsembleSet in a form similar to EnsembleSet.get_smry().

It would wrap around Ensemble.get_smry_stats() and aggregates the results in a concated dataframe including a column "ENSEMBLE" describing the ensemble the data came from.

Input: EnsembleSet
Output: dataframe of aggregated summary-statistics of individual ensembles (not statistics of the ensemble-set as a whole).

load_scalar() is not Python3 compatible

=================================== FAILURES ===================================
_____________________________ test_reek001_scalars _____________________________
    def test_reek001_scalars():
        """Test import of scalar values from files
    
        Files with scalar values can contain numerics or strings,
        or be empty."""
    
        if "__file__" in globals():
            # Easen up copying test code into interactive sessions
            testdir = os.path.dirname(os.path.abspath(__file__))
        else:
            testdir = os.path.abspath(".")
    
        reekensemble = ScratchEnsemble(
            "reektest", testdir + "/data/testensemble-reek001/" + "realization-*/iter-0"
        )
    
        assert "OK" in reekensemble.keys()
        assert isinstance(reekensemble.get_df("OK"), pd.DataFrame)
        assert len(reekensemble.get_df("OK")) == 5
    
        # One of the npv.txt files contains the string "error!"
        reekensemble.load_scalar("npv.txt")
        npv = reekensemble.get_df("npv.txt")
        assert isinstance(npv, pd.DataFrame)
        assert "REAL" in npv
        assert "npv.txt" in npv  # filename is the column name
>       assert len(npv) == 5
E       assert 1 == 5
E        +  where 1 = len(   REAL npv.txt\n0     4  error!)
tests/test_ensemble.py:247: AssertionError

Improve observation support

Possible features for the Observation class:

Solve DeprecationWarning on collections

fmu-ensemble/src/fmu/ensemble/realization.py:1759: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working
    if isinstance(value, collections.MutableMapping):

Coupling to ecl2df

There should be some coupling to ecl2df (https://github.com/equinor/ecl2df) to facilitate easy callbacks from fmu-ensemble.

At least, the realization objects should provide an ecl2df.EclFiles object on request.

Observations mismatch failes when one realization are missing the UNSMRY file.

Using the HistoryMatch container in webviz creates an EnsembleSet and mismatches from observations. The fmu.ensemble.Observations.mismatch failes when a realization has failed (does not have a summary file).

https://github.com/equinor/webviz-subsurface/blob/2d90ffc138ace636c051f3a42e52dc12af586739/webviz_subsurface/datainput/_history_match.py#L30

fmu-ensemble/src/fmu/ensemble/observations.py

Line 305 in 0157bcf

)[obsunit["key"]].values[0]

Traceback:

File "/.../venv/lib/python3.7/site-packages/webviz_subsurface/datainput/_history_match.py", line 30, in extract_mismatch
    .mismatch(ens_data)
  File "/.../venv/lib/python3.7/site-packages/fmu/ensemble/observations.py", line 116, in mismatch
    mismatches[(ensname, realidx)] = self._realization_mismatch(real)
  File "/.../venv/lib/python3.7/site-packages/fmu/ensemble/observations.py", line 305, in _realization_mismatch
    )[obsunit["key"]].values[0]
  File "/.../venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/.../venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2658, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'WBP4:SOME_WELL'

Using PR #3 does not help and gives this Traceback:

  File "/.../venv/lib/python3.7/site-packages/webviz_subsurface/datainput/_history_match.py", line 30, in extract_mismatch
    .mismatch(ens_data)
  File "/.../venv/lib/python3.7/site-packages/fmu/ensemble/observations.py", line 122, in mismatch
    mismatches[(ensname, realidx)] = self._realization_mismatch(real)
  File "/.../venv/lib/python3.7/site-packages/fmu/ensemble/observations.py", line 342, in _realization_mismatch
    )[obsunit["key"]].values[0]
  File "/.../venv/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/.../venv/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2658, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'WBP4:SOME_WELL'

Observation.smryh should support explicit dates, not only mnemonics

Metadata regarding summary vector names

It would be useful to get metadata regarding summary vector names. E.g. see this issue equinor/webviz-subsurface#208. Another use case is to automatically plot correct unit for a given vector along Y axis.

Relevant Python methods in libecl:

Tests for `npv.txt` fails on Python3

Use pytest tmpdir fixture, allowing parallell testing

test_ensembleset.py at least is not parallellizable, as it can fail in parallel while working in sequential mode. This is probably race conditions on directories written to. Fix by using tmpdir fixtures in pytest, and test that pytest -n 20 never fails

Evaluate to use configsuite for observation file validation

https://pypi.org/project/configsuite/

Support parameters.json

Ert in 2019 leaves parameters.json in addition to parameters.txt.

Related support for parameters.json is in https://github.com/equinor/ecl2df/blob/master/ecl2df/parameters.py

Not clear what to do when both files are present, with conflicting values.

Align get_smry and load_smry

For a ScratchEnsemble, get_smry() and load_smry() may differ in dates returned - get_smry() first builds a list of dates to obtain data for, and then asks every realization for data at these dates, while load_smry() asks each realization independently. This will only happen if the realizations summary data have different end dates.

It is not given that this difference is a bug and not a feature, so perhaps it should only be documented.

pyarrow dependency

pyarrow is not imported before to_disk() is called, but it is listed as a dependency

pyarrow is not in komodo, so to_disk() will probably fail in such an environment.

Consider making pyarrow an optional dependency, and have to_disk() only write CSV-files when import fails.

Mismatch computations for virtual ensembles

Mismatch calculations on virtual ensembles is not working, implementation was never finished. Can help on issue #50.

Remove `watchdog` as dependency?

On Python3, pip install watchdog appears to break hot reloading in e.g. dash/flask/webviz.

In addition, it looks like it is only defined in the dependencies and not used.

Pandas 0.25 compatibility, Py3.6

With Python 3.6 and Pandas 0.25, a test fails. Works fine with Python 3.6 and Pandas 0.24.

tests/test_ensemble_agg.py:44: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/fmu/ensemble/ensemble.py:1162: in agg
    aggregated = aggobject.quantile(quantile / 100.0)
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/pandas/core/groupby/groupby.py:1908: in quantile
    interpolation=interpolation,
../../../virtualenv/python3.6.7/lib/python3.6/site-packages/pandas/core/groupby/groupby.py:2238: in _get_cythonized_result
    vals, inferences = pre_processing(vals)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
vals = array(['DESIGN2PARAMS', 'DESIGN_KW', 'DESIGN_KW', 'DESIGN_KW',
       'DESIGN_KW', 'DESIGN_KW', 'MAKE_DIRECTORY', 'COP...',
       'RMS_BATCH', 'GENERATE_RELPERM', 'GENERATE_RELPERM', 'INCLUDE_PC',
       'ECLIPSE100_2014.2'], dtype=object)
    def pre_processor(vals: np.ndarray) -> Tuple[np.ndarray, Optional[Type]]:
        if is_object_dtype(vals):
            raise TypeError(
>               "'quantile' cannot be performed against " "'object' dtypes!"
            )
E           TypeError: 'quantile' cannot be performed against 'object' dtypes!

Parameter-merging pr. realization

Merging an ensemble-wide dataframe (f.ex summary) with the ensemble-wide parameters dataframe is a common task. Calling pd.merge on the these two dataframes from the outside is probably inefficient, compared to merging individually pr. realization.

Perhaps an option can be added to get_df() to perform merging to some dataset on the realization level. This will probably speed up standard operations, and also make it possible to utilize multiprocessing in this operation.

Multiprocessing

Operations over an ensembles are trivially parallelizable.

We should utilize Python multiprocessing for this.

multiprocessing is what should be used, as multithreading will suffer from GIL.

This is probably trivial for ensemble.get_smry(), but not so trivial for ensemble.from_smry(), as we need to populate each realization object with smry data in the parent process' memory space.

Maybe ensemble.from_smry() should call realization.get_smry() with multiprocessing, and then the ensemble object (holding the master process) populates each realizations self.data['unsmry-<something>'].

We must ensure CTRL-C works, which is trickier with Multiprocessing.

See this: https://stackoverflow.com/questions/11312525/catch-ctrlc-sigint-and-exit-multiprocesses-gracefully-in-python

When this is in place, we should also be able to skip issues when libecl is core-dumping due to a difficult UNSMRY-file.

Right now, your Python session will die if libecl crashes on rough data.

Reading and writing VirtualEnsembles to disk

VirtualEnsembles should be dumpable and initializable to/from disk and cloud. This is partially implemented. This issue picks up where #7 left.

ScratchEnsembles are not disk-dumpable, as they already are on disk. ScratchEnsembles must be virtualized before dumping to disk.

We need

Upgrade code quality to A rating

Write test for summary handling with uneven time-scale lengths

There should be a test setup to verify correctness when one realization has failed, say in the middle of the schedule period. Perturb a schedule file to end prematurely, run it, and save the UNSMRY file with a different filename that can be injected temporarily by the test-code.

Test that we can filter out by date the realization that failed.

load_smry() on that realization should not have any dates past the crash point, for any time_index argument.

get_smry() at ensemble level should be similar, differing last-date pr. realization.

The behaviour of get_smry_stats() is undefined. Uneven DATE ranges pr realization must be padded when realizations have not been excluded, as it is not given that a simulation that ends earlier represents an error and not intent. It is thinkable to add an option to get_smry_stats() to require similar end-date for all realizations.

Delta profiles should possibly exclude profiles with differing DATE columns.

get_smry_stats wrong doc

get_smry_stats docstring claims default timeindex is None = raw. But output is monthly.

realization.load_status() can crash on input

Stack-trace:

The script 'ExternalErtScript' caused an error while running:
Traceback (most recent call last):
 File "/pr../wf_well_volumes.py", line 458, in <module>
   main()
 File "/projec.......wf_well_volumes.py", line 354, in main
   ens             = ensemble.ScratchEnsemble('ens', args.scratch_dir+'/realization-*')
 File "/project/res/lib/python2.7/site-packages/fmu/ensemble/ensemble.py", line 125, in __init__
   paths, realidxregexp, autodiscovery=autodiscovery
 File "/project/res/lib/python2.7/site-packages/fmu/ensemble/ensemble.py", line 228, in add_realizations
   realdir, realidxregexp=realidxregexp, autodiscovery=autodiscovery
 File "/project/res/lib/python2.7/site-packages/fmu/ensemble/realization.py", line 137, in __init__
   self.load_status()
 File "/project/res/lib/python2.7/site-packages/fmu/ensemble/realization.py", line 463, in load_status
   hms = list(map(int, jobrow["STARTTIME"].split(":")))
ValueError: invalid literal for int() with base 10: '1/Process'

Mismatch computation should not use values from raw times with smryh

In this part of the code:

fmu-ensemble/src/fmu/ensemble/observations.py

Line 280 in 5269d5e

# Will use raw times when available.

the smryh calculation will use all timesteps available in the raw UNSMRY files, which will give unpredictable weighting. It also removes the possibility of controlling the points in time used if the user has done load_smry(time_index='last') prior to mismatch calculation.

credit: rnyb

Add `libecl` to requirements

Add libecl to fmu-ensemble requirements in setup.py. Might depend on equinor/resdata#545.

Support status.json

The STATUS-file parsing should be replaced by parsing of status.json whenever that file is available.

status.json probably appeared in Ert 2.3, ca November 2018.

Current parsing of the STATUS-file has unavoidable problems with dealing with jobs lasting more than 24 hours.