Consolidation and generalisation of data input from file to account for any variable,

As of <a class="commit-link" data-hovercard-type="commit" data-hovercard-url="https://

The benefit of specifying the time dimension per file would be to actually remov

This is now set to be merged into master as part of pull request <a class="issue-link

Extend variables/data which can be loaded from file about calliope HOT 8 CLOSED

calliope-project commented on July 3, 2024

Extend variables/data which can be loaded from file

from calliope.

Comments (8)

sjpfenninger commented on July 3, 2024

As of 47a49c4 this is sort of possible by extending _TIMESERIES_PARAMS -- but that's still hardcoded..

from calliope.

brynpickering commented on July 3, 2024

To get over the hardcoding issue, it seems to make sense to populate _TIMESERIES_PARAMS based on file referrals in the YAML files. Then it is a matter of deciding how to catch only timeseries files (rather than spatial or load-rate based ones). You could:

assume that it is a time series file based on # of lines being the same as set_t, but this could cause issues if there are files with as many lines but referring to something else (unlikely as time series is usually the longest, but still possible).
name the file to make it obvious (e.g. append "_t", or similar), but then the onus is on the user to remember to put them in.
have the first column of all time based files be a timestamp rather than an integer. I suppose this way it also allows you to set time dependency at time intervals that differ from set_t (e.g. a daily change rather than hourly) but then you'd need to know how to interpolate between each value (step change, straight line, etc.).
simply ask them to list all values which change in time at the start of the yaml file. Easier to process but relies on the user to remember to fill it in.

from calliope.

brynpickering commented on July 3, 2024

Having tested extending _TIMESERIES_PARAMSI'm certain that there'll need to be an overhaul if it were to go beyond just r and e_ff. The reason being that in constraint generation other values are assumed to be static and so are not referred to in time (e.g. here for r_eff), simply using get_option instead.

Could have an optional input in get_option for time? I'll test it out as a possibility but I do wonder whether it might not be cleaner to just create arrays for the entire dataset, not just for r and e_eff. In this case, everything would be searchable as e.g. model.m.r_eff['ccgt','r1','2015 01 01 09:00'] rather than model.get_option('ccgt.constraints.r_eff', x=r1) and static values would just be repeated across the time dimension.

from calliope.

sjpfenninger commented on July 3, 2024

Re the first comment, I would say that option 3 (time-based data indexed by timestamps rather than integers) is probably best -- although it would require some thinking to ensure that all data files contain a consistent time dimension (which is easier to do if the time dimension is specified only once in a separate file).

Re the second comment, it is true that values like r_eff are currently assumed to be static in time. Hmm... I suspect the main issue with adding more parameters defined over time and space will be the performance impact, particularly on Pyomo. Perhaps this requires some testing? Iterating over a pandas DataFrame would is likely slower than get_option now is, and I think iterating over a Pyomo parameter would be too (but I'm not certain about that - nor am I certain whether that is a relevant time cost in the context of all the other stuff Pyomo does when constructing the model).

from calliope.

brynpickering commented on July 3, 2024

The benefit of specifying the time dimension per file would be to actually remove the need to have the consistency. For instance, if your set_t file was hourly from 2005-01-01 to 2005-12-31 then you could have an e_eff file with values which are daily from 2005-01-01 to 2005-12-31 and the system could simply infer hourly values. This could get quite complicated, but it essentially says that provided the time-scale has the same lower and upper bound as set_t you could theoretically define any time-step granularity and it could deal with it. Granted, I'm not certain as to whether such functionality would be truly desirable.
I've created a version in my fork that can load all the efficiencies in time. I'll test out scenarios of heavy file reading and light file reading use and see what happens.

from calliope.

brynpickering commented on July 3, 2024

OK, so I've used the example model running over 768 time steps to do some speed tests below. What it shows is an understandable increase in preprocessing, as more files need to be opened and more data sets have to be produced (instead of just relying on get_option). Once produced though, the actual runtime of optimisation is the same, due to the constraints all iterating over time anyway, so it doesn't matter if it is a single number or a number that varies in time. It does show that searching the data array is just as quick as getting the static value from a dictionary.
Basically, Pyomo doesn't change much, but just the act of opening and reading CSVs takes its toll. This suggests that any kind of additional preprocessing (e.g. to only load e_eff as a time dependant set following combining e.g. r_eff and c_eff into it) wouldn't improve solution time by much. It would need more information per CSV file (so fewer would need opening overall).
I should also point out that CPLEX reads the pyomo LP file in ~1s and runs the optimisation in ~1-2s in all cases.

no efficiencies loaded from file - base case
Preprocessing: 25s
Optimisation: 44s
Total: 69.2s
e_eff loaded from file for csp, all static for ccgt:
Preprocessing: 68s
Optimisation: 46s
Total: 113.7s
e_eff loaded from file for both csp and ccgt:
Preprocessing: 68s
Optimisation: 41s
Total: 108.5s
e_eff and r_eff loaded from file for both csp and ccgt:
Preprocessing: 114s
Optimisation: 47s
Total: 161.1s
e_eff, r_eff and c_eff loaded from file for both csp and ccgt:
Preprocessing: 173s
Optimisation: 49s
Total: 222.1s
e_eff, r_eff and c_eff loaded from file for just csp:
Preprocessing: 172s
Optimisation: 59s - not sure what happened here!
Total: 230.9s
e_eff, r_eff and c_eff loaded from file for just ccgt:
Preprocessing: 168s
Optimisation: 51s
Total: 219.1s

Note:

all efficiency files were loaded with randomised data, between 0.1 and 0.8 for csp.e_eff and ccgt.r_eff and 0.8 - 1 for all other e_eff, r_eff, and c_eff cases.
r is always loaded from file for demand and csp.

from calliope.

brynpickering commented on July 3, 2024

So this has been on the backburner for a while. I came back to it today and have done some more in-depth comparisons.
Time-penalty comes purely from getting the data element at any given time-step during the constraint setting phase. For the previous sets of timings, it was using the 'loc' function to search the DataArrays of the timeseries constraints. This is quite time consuming.
I tested it then with getting the data from the Pyomo param, it was significantly faster. Below are the results in some more detail (1/01 - 31/03 in example model):

No efficiencies from file:

Current Master
preprossessing: 77.5s
solving: 115s
tot: 196s
DataArray searching
preprossessing: 86.55s
solving: 112.86
tot: 204s
Pyomo param searching
preprossessing: 78s
solving: 112s
tot: 195s

2.e_eff from file:

Current Master
preprossessing: 87s
solving: 119s
tot: 209s
DataArray searching
preprossessing: 248s
solving: 127s
tot: 378s
Pyomo param searching
preprossessing: 85s
solving: 119s
tot: 207s

3.e_eff & r_eff efficiencies from file:

Current Master (not possible)
DataArray searching
preprossessing: 437s
solving: 190
tot: 671
Pyomo param searching
preprossessing: 86s
solving: 124s
tot: 214s

So loading multiple efficiencies from file doesn't seem to have any greater effect than loading a single efficiency from file, provided that Pyomo Params are used for storing and accessing the data. In the original master this was done for both r and e_eff anyway. I moved away from it to introduce generality, but I've since been able to provide sufficient generality to work with the Params for any possible time-dependent constraint.

I've attached an excel of the python profiler results for these runs (including any functions with tottime >1s)
Issue 7 profiler results.xlsx

from calliope.

brynpickering commented on July 3, 2024

This is now set to be merged into master as part of pull request #28

from calliope.

Extend variables/data which can be loaded from file about calliope HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent