xkdr / tsframes.jl Goto Github PK
View Code? Open in Web Editor NEWTimeseries in Julia
License: MIT License
Timeseries in Julia
License: MIT License
TS.apply()
fails to correctly resample weekly data. Consider the following dataset:
julia> ts_daily_1
(360 x 1) TS with Date Index
Index data
Date Float64
───────────────────────
2007-01-01 -0.790931
2007-01-02 1.45561
2007-01-03 -0.496326
2007-01-04 -2.00011
⋮ ⋮
2007-12-23 0.459265
2007-12-24 0.744704
2007-12-25 0.583233
2007-12-26 0.104833
352 rows omitted
The following operation outputs an incorrect result:
julia> ts_weekly = apply(ts_daily_1, Dates.Week(12), first)
(5 x 1) TS with Date Index
Index data_first
Date Float64
────────────────────────
2007-01-01 -0.790931
2007-01-29 -1.81454
2007-04-23 -0.427515
2007-07-16 -0.678125
2007-10-08 -0.236422
Instead of resampling by 12 weeks each, the first box is only 4 weeks.
I think it is not wise to put data into a repository. Although it us pretty small.
When a TS
object is viewed, if ts.index != 1
then the Index is not the first column displayed. If the TS
contains a large enough number of columns, then the index is not displayed at all in the terminal.
The function Base.getindex()
when accepting arguments of the form (ts, row::T) where {T<:TimeType}
or (ts, row::AbstractVector{T}) where {T<:TimeType}
only returns the first value corresponding to the given date. This seems like incorrect behaviour considering that the TS
object is designed to allow duplicate indices.
julia> ts
(20 x 1) TS with Date Index
Index data
Date Int64
───────────────────
2008-01-01 1
2008-01-01 2
2008-01-01 3
2008-01-01 4
⋮ ⋮
2008-01-01 17
2008-01-01 18
2008-01-01 19
2008-01-01 20
12 rows omitted
julia> ts[Date(2008,1,1), :data]
1
Ideally it should return
julia> ts[Date(2008,1,1), :data]
20-element Vector{Int64}:
1
2
3
4
5
6
⋮
16
17
18
19
20
Consider the function
function Base.getindex(ts::TS, dt::T, j::AbstractVector{Int}) where {T<:TimeType}
idx = findfirst(x -> x == dt, index(ts))
ts[idx, j]
end
and replace the findfirst
with a findall
function, that returns a vector of all instances of the date index.
tail(ts::TS)
should return the last 10 rows of ts
.
DataFrames.jl already supports Tables.jl interfaces, these need to be implemented and then exported in TSx.
Currently, log
is automatically broadcasted. It's best not to do this and just let users call log.(ts)
instead, since special-casing log
doesn't deal with other transformations — for instance, if a user wants to take sqrt(ts)
.
In-place renaming of columns while protecting Index
columns. Also, the requested names argument should not already have a column named Index
.
Presently getindex(ts::TS, ::Dates.Date)
is implemented which returns the corresponding rows of the TS object .
The interface of DataFrames.jl
is also much richer than only remaining by a single vector. https://dataframes.juliadata.org/stable/lib/functions/#DataFrames.rename
Originally posted by @ValentinKaisermayer in #54 (comment)
The function name gets added to the column names.
julia> ts_monthly = apply(ts, Month, last)
(15 x 1) TS with Date Index
Index value_last
Date Float64
────────────────────────
2007-01-01 10.5902
2007-02-01 8.85252
2007-03-01 8.85252
2007-04-01 9.04647
2007-05-01 9.04647
2007-06-01 8.26072
2007-07-01 8.26072
2007-08-01 8.26072
2007-09-01 9.95546
2007-10-01 9.95546
2007-11-01 7.88032
2007-12-01 7.88032
2008-01-01 7.88032
2008-02-01 10.6328
2008-03-01 8.85252
Can we have a parameter to turn this off?
In my case, I had to do:
rename!(ts.coredata, replace.(names(ts.coredata), "_last" => ""))
At the end and construct the ts object again.
Since apply
is really aggregation or downsampling it would be nice to supply some way of applying a function to a column which would keep the index intact.
ts = TS(random(10))
sin.(ts[:x1]) .+ 2
I think it is already possible with rollapply
but this is kind of abusing the API
rollapply(x->sin(x) + 2, ts, :x1, 1)
Since the TS
object can be manipulated externally we need to perform consistency checks to make sure certain conditions are not violated before any expensive operation (such as apply()
, rollapply()
, or even print()
).
Conditions which should be checked:
I think this is one of the most important methods for time series data. Being able to interpolate and aggregate.
I like the interface of Grafana, i.e. being able to specify not only the interpolation or aggregation method but both at the same time.
Useful if you have measurement data at e.g. about 5min intervals but with some holes in it and want to get a clean vector with an equidistant sample time of 15min. Where there is good data it has to be aggregated and where there are holes it has to be interpolated.
For interpolation, common methods would be
And for aggregation
Currently, TSx.join()
can only merge two objects TS
objects. DataFrames.innerjoin()
et all do support joining of two or more objects so it should be possible to replicate that behaviour in TSx join methods.
Currently, there are two methods for rename!()
and both of them contain the same function body. One of them should be calling the other to reduce costs of maintaining two methods.
rename!(ts::TS, colnames::AbstractVector{String})
rename!(ts::TS, colnames::AbstractVector{Symbol})
It is not possible to currently run multiple regressions using rollapply()
because RollingFunctions.jl does not support rolling over tables.
Example code:
function regress(data)
ll = lm(@formula(inrchf ~ usdchf + eurchf + gbpchf + jpychf), data)
co = coef(ll)[coefnames(ll) .== "usdchf"]
sd = Statistics.std(residuals(ll))
return Dict("coeff" => co, "sd" => sd)
end
rollapply(regress, returns, 200) // doesn't work
TSx.describe()
should accept column names just like DataFrames.describe()
.
#47 added code for having colnames
as a keyword argument for the constructors. The documentation and test cases need to be updated to reflect this change.
To locate points of Index
given a particular frequency.
Interface to be similar to R xts:
endpoints(ts::TS, on::Union{String, Symbol}, k::Int=1)
We should be able to make a TS
object into a TimeArray
whenever possible, and vice versa.
For example:
using Dates
dates = collect(Date(2008):Year(1):Date(2010))
ts = TS(1:3, dates)
TimeArray(ts) # Doesn't work
Perhaps TimeArray(ts::TS)
can be put in TimeSeries.jl
and TS(ts::TimeArray)
can be in TSx
This requires creating a getindex
method which takes Vector{Date}
as input.
Output should be like:
julia> dates = [Date(2007, 1, 1), Date(2007, 2, 1)]
julia> ts[dates]
(2 x 1) TS with Date Index
Index value
Date Float64
─────────────────────
2007-01-01 10.8087
2007-02-01 8.7392
One should be able to do: TS()
to return a TS
object containing 0x0 data. Currenlty, DataFrames.jl supports:
julia> DataFrame()
0×0 DataFrame
TSx.plot()
should support scatter plots. Plots package provides ScatterPlot
function, TSx should be able to make use of it.
first()
should return only the first row of the object. head()
should return the first 10 rows of the object.
add more functionalities for subset to allow
subset(ts,:, Date(2007, 2, 1))
#or
subset(ts,Date(2007, 2, 1),:)
otherwise user have to know the first index or last index of the time series
Log from CI build:
getindex(): Test Failed at /home/runner/work/TSx.jl/TSx.jl/test/getindex.jl:36
Expression: unique(Dates.yearmonth.(TSx.index(ts[y, m]))) == [(2007, 3)]
Evaluated: Tuple{Int64, Int64}[] == [(2007, 3)]
Stacktrace:
[1] top-level scope
@ /opt/hostedtoolcache/julia/1.7.3/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:445
[2] include(fname::String)
@ Base.MainInclude ./client.jl:451
[3] macro expansion
@ ~/work/TSx.jl/TSx.jl/test/runtests.jl:15 [inlined]
[4] macro expansion
@ /opt/hostedtoolcache/julia/1.7.3/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
[5] top-level scope
@ ~/work/TSx.jl/TSx.jl/test/runtests.jl:15
getindex(): Error During Test at /home/runner/work/TSx.jl/TSx.jl/test/runtests.jl:14
Got exception outside of a @test
LoadError: MethodError: no method matching test_types(::Float64)
Closest candidates are:
test_types(!Matched::TS) at ~/work/TSx.jl/TSx.jl/test/getindex.jl:3
Stacktrace:
[1] top-level scope
@ ~/work/TSx.jl/TSx.jl/test/getindex.jl:47
[2] include(fname::String)
@ Base.MainInclude ./client.jl:451
[3] macro expansion
@ ~/work/TSx.jl/TSx.jl/test/runtests.jl:15 [inlined]
[4] macro expansion
@ /opt/hostedtoolcache/julia/1.7.3/x64/share/julia/stdlib/v1.7/Test/src/Test.jl:1283 [inlined]
[5] top-level scope
@ ~/work/TSx.jl/TSx.jl/test/runtests.jl:15
[6] include(fname::String)
@ Base.MainInclude ./client.jl:451
[7] top-level scope
@ none:6
[8] eval
@ ./boot.jl:373 [inlined]
[9] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:268
[10] _start()
@ Base ./client.jl:495
in expression starting at /home/runner/work/TSx.jl/TSx.jl/test/getindex.jl:47
For the code committed in 82189a7.
The fun
argument may be a function which only works for scalars or only works for vectors (AbstractVector
). The implementation of apply()
needs to figure out whether to use broadcasting operator or not. Or, perhaps create different apply()
methods for both cases.
It would be nice to be able to specify column names:
https://github.com/xKDR/TSx.jl/blob/cb8c16c5dc943df0523e5b0db400d687dc19c59d/src/TS.jl#L258
https://github.com/xKDR/TSx.jl/blob/cb8c16c5dc943df0523e5b0db400d687dc19c59d/src/TS.jl#L271
https://github.com/xKDR/TSx.jl/blob/cb8c16c5dc943df0523e5b0db400d687dc19c59d/src/TS.jl#L277
This will do: CSV.read(filename::String, sink=DataFrame) |> TS
internally. Think about supporting XLS.
Native log function (#8) along with diff
will be used for computing log returns.
This package probably needs to be renamed if at some point it will be registered. I am quite certain that 3 letter package names are discouraged in the General registry.
https://pkgdocs.julialang.org/v1/creating-packages/#Package-naming-guidelines
Dates.DateTime
can only hold resolutions upto Millisecond
, whereas, Time
can hold upto Nanosecond
precision times. Many use cases require Nanosecond precision along with dates (ex: 2012-07-03T16:17:49.100200999
).
TimeDates.TimeDate provides this capability.
isregular(ts::TS)
isregular(ts::TS, unit::T) where {T<:Dates.Period}
isregular(timestamps::T) where {T<:AbstractVector{TimeType}}
isregular(timestamps::V, unit::T) {V<:AbstractVector{TimeType}, T<:Dates.Period}
<: TimeType
is provided the input will be checked using rule 1 using the period provided.Ref: #48
julia> ts[Year(2022), Month(8), Week(1)]
(0 x 1) TS with Date Index
But, the following returns correct output:
julia> ts[Year(2022), Month(8), Week(32)]
(7 x 1) TS with Date Index
Index x1
Date Float64
───────────────────────
2022-08-08 0.647277
2022-08-09 0.800605
2022-08-10 0.698464
2022-08-11 0.868943
2022-08-12 0.510194
2022-08-13 2.4704
2022-08-14 -0.86813
This is because Dates.week()
returns the $N^th$
of the year and not within the month. This getindex()
method should just take year
and week
as arguments so that the functionality is clear.
function getindex(ts::TS, y::Year, w::Week)
The function should pass the following test:
ts = TS(DataFrame(Index = [1, 3, 4], Col1 = [1, exp(1), 10]))
ts_log = TSx.log(ts::TS)
expect(ts_log.coredata == DataFrame(Index = [1,3,4], Col1 = [0, 1, 2.3025]))
Essentially, a getindex
method like: getindex(::TS, ::Vector{Date}, ::T) where {T<:Union{String, Symbol, Int}
. Another method should be created which takes in a scalar Date
can call the previous method internally with [Date()]
(related to #19).
Do you mean you want to apply a function on a single column and return the entire TS object?
Yes, it is sometimes useful to be able to do calculations directly. For instance if ts
is a time series of temperature measurements in Kelvin but you want to plot it in Celsius.
Originally posted by @ValentinKaisermayer in #45 (comment)
See Matlab documentation.
ismissing
rmmissing
sortrows
unique
retime
isregular
retime
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.