Giter VIP home page Giter VIP logo

holgerteichgraeber / timeseriesclustering.jl Goto Github PK

View Code? Open in Web Editor NEW
79.0 7.0 23.0 175.36 MB

Julia implementation of unsupervised learning methods for time series datasets. It provides functionality for clustering and aggregating, detecting motifs, and quantifying similarity between time series datasets.

License: MIT License

Julia 88.53% TeX 11.47%
clustering optimization energy-systems k-means-clustering k-medoids-clustering hierarchical-clustering representative-days time-series-aggregation julia

timeseriesclustering.jl's People

Contributors

alansill avatar arbrandt avatar danielskatz avatar holgerteichgraeber avatar juliatagbot avatar niklashaag avatar youngfaithful avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

timeseriesclustering.jl's Issues

Clust-values < e-13

If values are too small and close to 0, the constraint is neglected of Gurobi. I fear that I'll need to revise my results in that regard. I think a simple round shall solve that.

0-1 normalization

Implement 0-1 normalization as an option in run_clust().
This would entail a change in naming of the ClustInputData struct:
All normalization methods are of the form (A-x)/y, where x is \mu and y is \sigma for z-normalization, and x is min(A), and y is max(A)-min(A) for 0-1 normalization.
Thus, need to replace mean and sdv with more generic terms.

Tests without Gurobi

To test our package with Travis or so we would need to make the test it independent of Gurobi

OptResult - totaldemand

The OptResult struct contains a field total_demand. Replace that with generic Dict that can contain any data_info.

Automatic testing

Implement automatic testing for clustering methods and extreme value selection.

Import Your Own Data

Hi,
I tried importing my data via your instructions and when I run these lines:

my_path=joinpath(homedir(),"Documents","tutorial","TimeClustInput.csv")
your_data_1=load_timeseries_data(my_path; region="none", T=24)

I get this error:
LoadError: MethodError: objects of type Bool are not callable

Sorry for the basic question!
Dan

Extreme Days

Include extreme days:

Add documentation on testing

Add a short documentation on testing in test/runtest.jl to make future additions to the set of tests easier for other users.

Unable to implement example

Hello,

I am trying to test the ClustForOpt.jl software, however when I try running a variation of the example provided in the readme -

using ClustForOpt
ts_input_data = load_timeseries_data("DAM", "GER")

I get the following error:
MethodError: no method matching load_timeseries_data(::String, ::String)
Closest candidates are:
load_timeseries_data(::Any; region, T, years, att) at /home/nvidia/.julia/packages/ClustForOpt/YNrmS/src/utils/load_data.jl:19

Stacktrace:
[1] top-level scope at In[38]:2

transfer n_digits_data_round to CapacityExpansion.jl

In run_clust.jl

I think the data rounding is an issue that comes up in the CEP, so it should be handled there. There may be more generic applications of ClustForOpt where this hardcoded value adds to complexity.
I am currently rewriting parts of run_clust.jl. After I push to dev, let's aim at this issue.

ClustConfig?

What do you think about ClustConfig like:

"""
        ClustConfig{method::String
        representation::String
        n_clust::Number
        n_init::Number
        n_seg::Number
        iterations::Number
        norm_op::String
        norm_scope::String
        attribute_weights::Dict{String,Float64}}
Collection of cluster configuration
"""
struct ClustConfig
        method::String
        representation::String
        n_clust::Number
        n_init::Number
        n_seg::Number
        iterations::Number
        norm_op::String
        norm_scope::String
        attribute_weights::Dict{String,Float64}
end

Save Clustering Results

As an intermediate step before the optimization, implement saving ClustResults as jld2. save= keyword argument is already provided.

Documentation

Created branch dev .
Todos on master:

  • Documentation clustering
  • Documentation CEP
  • Update examples on master to use module

clust_data undefined dictitionary entries

On master:

clust_data = load_timeseries_data(:CEP_GER1)
clust_data.data.keys

Leads to the following output:

 #undef                
 #undef                
    "solar-germany"    
 #undef                
    "wind-germany"     
 #undef                
 #undef                
 #undef                
 #undef                
 #undef                
 #undef                
 #undef                
 #undef                
 #undef                
 #undef                
    "el_demand-germany"

This seems not to be an issue in any of the current implementations, but if we want to call keys (e.g. I am trying to find extremes among all wind periods ), the #undef entries give errors.

Update to v1.0.1 - findmin and find

From documentation:

findmin, findmax, argmin, and argmax used to always return linear indices. They now return CartesianIndexes for all but 1-d arrays, and in general return the keys of indexed collections (e.g. dictionaries) (#22907).

find has been renamed to findall. findall, findfirst, findlast, findnext now take and/or return the same type of indices as keys/pairs for AbstractArray, AbstractDict, AbstractString, Tuple and NamedTuple objects (#24774, #25545). In particular, this means that they use CartesianIndex objects for matrices and higher-dimensional arrays instead of linear indices as was previously the case. Use LinearIndices(a)[findall(f, a)] and similar constructs to compute linear indices.

--> we should doublecheck if we use any of these, and update accordingly. I will do that.

OptVariable field with set name

Would it make sense to add a fourth field to OptVariable: sets?
Axes gives the values within each set, but I feel it would be useful to have something call the sets explicitly. For example, I want to find the maximum slack variable and get the corresponding day (set K). Right now I have to hardcode the number of the set within axes (note for myself: get_index_inf(::Array{OptVariable}) ).

Maybe call the field axes_name.

Putting this out here, so we don't forget throughout break.

Master branch not working

I am trying to test PR #22 and noticed that even the current master branch on holgerteichgraeber/ClustForOpt_priv.jl is not working for any of the three examples we have in the examples folder:

  • _bat.jl - somehow kwargs was taken out of run_clust() in the last commit, must have slipped through my fingers, that should still be in there. Even when I add it back, it gives an error with gurobi_env. Any idea why?
  • _attribute_weighting.jl : example is linked to ClustForOpt but should be linked to _priv.
  • workflow_example.jl : This one is giving me an error in the load_data function.

Could you please check and fix these, @YoungFaithful ? Thank you.
When you have a look, please branch off of holgerteichgraeber/ClustForOpt_priv.jl - master, without any other changes, let's try to get that one running first.

kmeansexact optimizer type specification

YoungFaithful on Jul 4 Collaborator

Can we not use:
kmexact_optimizer::DataType=DataType or something else defining the Type?
@holgerteichgraeber

Not sure what you are trying to point to?
The issue we had was that the optimizer object itself does not have a specific type.. It's basically > Any
@YoungFaithful
YoungFaithful 21 hours ago Collaborator

I think that changed with the new JuMP update (https://github.com/YoungFaithful /CapacityExpansion.jl/blob/d23ffab3f915d6f02e128774b6d2ae82d1c358f0/src/optim_problems/run_opt.jl#L86)

OptVariable deprecated call

It seems like some elements got lost in merging the last PR. I tested everything on my mashine before pushing and now I got multiple issues due to older code parts. I don't know, how this could have happened.

Implement kshape.jl

Requires rewriting kshape.py into julia.
Possibly appropriate to include in Clustering.jl

bug - workflow example - OptVariable not defined

Somehow the workflow examples throw an error for me, see below. Do you get the same @YoungFaithful ? OptVariable is defined in datastructs.jl, I can't immediately see why this occurs.

julia> include("workflow_example_attribute_weighting.jl")
ERROR: LoadError: LoadError: LoadError: UndefVarError: OptVariable not defined
Stacktrace:
 [1] top-level scope at none:0
 [2] include at ./boot.jl:317 [inlined]
 [3] include_relative(::Module, ::String) at ./loading.jl:1044
 [4] include(::Module, ::String) at ./sysimg.jl:29
 [5] include(::String) at ./client.jl:392
 [6] top-level scope at none:0
 [7] include at ./boot.jl:317 [inlined]
 [8] include_relative(::Module, ::String) at ./loading.jl:1044
 [9] include(::Module, ::String) at ./sysimg.jl:29
 [10] include(::String) at ./client.jl:392
 [11] top-level scope at none:0
 [12] include at ./boot.jl:317 [inlined]
 [13] include_relative(::Module, ::String) at ./loading.jl:1044
 [14] include(::Module, ::String) at ./sysimg.jl:29
 [15] include(::String) at ./client.jl:392
 [16] top-level scope at none:0
in expression starting at /home/holger/.julia/dev/ClustForOpt/src/utils/datastructs.jl:80
in expression starting at /home/holger/.julia/dev/ClustForOpt/src/ClustForOpt_priv_development.jl:30
in expression starting at /home/holger/.julia/dev/ClustForOpt/examples/workflow_example_attribute_weighting.jl:1

get_cep_slack_variables()

get_cep_slack_variables() currently gets variables ["SLACK"]. I think it would be an enhancement to have it catch all variables that have variable_type "sv". What do you think?

Testing fails due to job killed by travis

Testing sometimes fails. The error is something like /home/travis/.travis/functions: line 104: 3596 Killed. This seems to be an issue with memory on travis.

Todo is to rewrite the tests so that they use less memory. Potentially make the kmedoids-exact cbc case smaller, and also reduce the amount of data that is loaded.

Workaround for now is to just rerun the build on travis, so far it has worked the second time.

Renaming the package from ClustForOpt to TimeSeriesClustering

We are currently in the process of renaming the package. It is unclear if it has to be re-registered, which would require everyone to add it again to their julia package list: JuliaRegistries/General#2770

@mleprovost I saw that you forked the package, appreciated! Because of the rename, there could be a chance that there are hickups, but should be resolved in the next 2-3 days.
In the meantime, you should be able to use the package under its old name as ClustForOpt (just do using ClustForOpt) as is.

Best_ids as k_ids into ClustData

Seasonal storage integrated in alpha phase with passing on best_ids "manually" through the functions, needs to be improved. And it's super exiting! I wonder if you should penalize the loading and unloading of the battery rather than the installed capacity (because of the 6.000 cycle per battery life time and not 25 years with whatsoever)
It would be a much nicer workflow for the Opt part, if best_ids would be included in ClustData - What's your opinion on that one?

Loading GER_18 data errs

When I do

data_path=normpath(joinpath(dirname(@__FILE__),"..","data","TS_GER_18"))
ts_input_data = load_timeseries_data(data_path; T=24, years=[2016])

I get the following error:

┌ Error: The time_series dena21 has K=0 != K=366 of the previous
└ @ ClustForOpt ~/.julia/dev/ClustForOpt/src/utils/load_data.jl:80
ERROR: LoadError: BoundsError: attempt to access 0-element Array{Float64,1} at index [1:8784]
Stacktrace:
 [1] throw_boundserror(::Array{Float64,1}, ::Tuple{UnitRange{Int64}}) at ./abstractarray.jl:484
 [2] checkbounds at ./abstractarray.jl:449 [inlined]
 [3] getindex(::Array{Float64,1}, ::UnitRange{Int64}) at ./array.jl:735
 [4] #add_timeseries_data!#13(::Int64, ::Int64, ::Array{Int64,1}, ::Function, ::Dict{String,Array}, ::SubString{String}, ::DataFrames.DataFrame) at /home/holger/.julia/dev/ClustForOpt/src/utils/load_data.jl:84
 [5] #add_timeseries_data! at /home/holger/.julia/dev/ClustForOpt/src/utils/load_data.jl:0 [inlined]
 [6] #add_timeseries_data!#12(::Int64, ::Int64, ::Array{Int64,1}, ::Function, ::Dict{String,Array}, ::SubString{String}, ::String) at /home/holger/.julia/dev/ClustForOpt/src/utils/load_data.jl:58
 [7] (::getfield(ClustForOpt, Symbol("#kw##add_timeseries_data!")))(::NamedTuple{(:K, :T, :years),Tuple{Int64,Int64,Array{Int64,1}}}, ::typeof(ClustForOpt.add_timeseries_data!), ::Dict{String,Array}, ::SubString{String}, ::String) at ./none:0
 [8] #load_timeseries_data#11(::String, ::Int64, ::Array{Int64,1}, ::Array{String,1}, ::Function, ::String) at /home/holger/.julia/dev/ClustForOpt/src/utils/load_data.jl:29
 [9] (::getfield(ClustForOpt, Symbol("#kw##load_timeseries_data")))(::NamedTuple{(:T, :years),Tuple{Int64,Array{Int64,1}}}, ::typeof(load_timeseries_data), ::String) at ./none:0
 [10] top-level scope at none:0

@YoungFaithful do you get the same error on your computer?

When I test the ClustForOpt version #84 in CapacityExpansion, the transmission cases give errors in the test run (which are the only ones that load GER_18 data), this may be one reason.

intra vs. inter -day storage

I've been thinking about naming. Currently, storage can take values "intra" and "inter". I think that is very explicit and I like it. However, I see a challenge for the user that is not too familiar with the model, and may confuse inter and intra (it just happened to me, even though I thought I knew better..).
Maybe naming it storage=simple/seasonal or similar would be more explicit?
This is nothing urgent, but something we should consider before we publish.

Implement dbaclust()

-Update TimeWarp.jl

  • Then implement in our package.
    Open questions:
  • How to implement parallelization in our framework
  • How to deal with multiple attributes/nodes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.