Giter VIP home page Giter VIP logo

ts2vec's People

Contributors

linytsysu avatar zhihanyue avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ts2vec's Issues

请问关于encoder中的相关问题

微信图片_20220825151336
大佬您好,想请问下encoder中的Dilated Convolutions。

  1. 我看代码后的理解是,分别将a1-b1子序列和a2-b2子序列送入encoder,得到两个子序列表征。再对两个子序列时间公共的部分a2~b1进行对比学习。可以这么理解吗?
  2. 请问encoder中的Dilated Convolution是因果卷积吗?是每一点的表征学习只用到了该时刻及之前的数据?
  3. 如果是因果卷积,可否认为将a2-b2子序列送入encoder学习表征时,a2-b1段学到的表征,没有使用到b1-b2段的信息?

恳请大佬答疑解惑,多谢多谢!

关于损失函数的问题

作者你好,想请教一个问题,随机裁剪两个上下文后,怎么保证输出序列中out1和out2第一个索引在原始序列中对应的是同一个时间戳,如果不是对应的话,后面计算损失函数感觉和原文提到的不一样。
out1 = self._net(take_per_row(x, crop_offset + crop_eleft, crop_right - crop_eleft))
out1 = out1[:, -crop_l:]

out2 = self._net(take_per_row(x, crop_offset + crop_left, crop_eright - crop_left))
out2 = out2[:, :crop_l]

Evaluate code for univariate forecasting task maybe there is information leaking.

image

Firstly, I'm sure it wasn't intentional. I reopen this issue and just want to discuss this question clearly. Maybe I'm wrong, but after I read the code, especially encode() detailed, I still cannot understand the following words:

For example, we assume t is the first timestamp in the test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.

First, the whole series is fed into encode() in the eval_forecasting(). Does encode() know where the test part starts? In fact, there is no related variable t in the eval_forecasting() to indicate encode() the series is the training set or test set. I print the sliced representation of train/valid/test parts and I confirm that the encoded train/valid/test sets are overlapped. (I hope you can observe it and then reply to me.) When you encode the whole series to representation through the sliding window and sliding_length is 1. If encode() does not know where the test set starts, then the last sample of validation would cover the first test sample. But, I think this bug does not belong to encode().

I use Electricity as an example to explain this bug. I performed the univariate forecasting task and collected the shape of some variables. The shape of the original series(variable data in the eval_forecasting()) is (26304, 1). The shape of all_repr which is the encoded series by encode() is (26304, 128). The shape of the representation of train/valid/test sets are (15782, 128), (5261, 128), (5261, 128). If there are no overlapped samples among the train/valid/test sets, the sum of their samples should not equal the total samples.

Of course, if you skip the head-overlapped samples, you can avoid this leakage. The generate_pred_samples() is used to discard header samples, this problem may be solved by setting reasonable parameters. But, for the valid/test sets, the parameter of the drop is set as ZERO. That means that the valid set and test set would continue to the previous section.

    for pred_len in pred_lens:
        train_features, train_labels = generate_pred_samples(train_repr, train_data, pred_len, drop=padding)
        valid_features, valid_labels = generate_pred_samples(valid_repr, valid_data, pred_len)
        test_features, test_labels = generate_pred_samples(test_repr, test_data, pred_len)

The default value of drop in the generate_pred_samples() is 0. Is the [t-padding, t] you mentioned? Are you forget to set the drop?

At last, I hope you can provide detailed evidence to prove you avoid this leakage, rather than simply asking me to read the code because I still think the bug exists after reading it again.

Question on Plotting

I have run the code for ETTh2 dataset as mentioned in the paper for 500 epochs and have obtained the pickle file as output.
I am trying to get the plot of prediction v/s ground truth as mentioned in the paper in Figure 5.
Could you please suggest the code used to plot the obtained output.

question regarding the implementation of instance contrastive loss

Hello, thank you for sharing your work!

I have a question regarding the implementation of instance_contrastive_loss

def instance_contrastive_loss(z1, z2):
    B, T = z1.size(0), z1.size(1)
    if B == 1:
        # contrastive loss requires pair.
        return z1.new_tensor(0.)
    z = torch.cat([z1, z2], dim=0)  # 2B x T x C
    z = z.transpose(0, 1)  # T x 2B x C
    sim = torch.matmul(z, z.transpose(1, 2))  # T x 2B x 2B
    logits = torch.tril(sim, diagonal=-1)[:, :, :-1]    # T x 2B x (2B-1)
    logits += torch.triu(sim, diagonal=1)[:, :, 1:]
    logits = -F.log_softmax(logits, dim=-1)
    
    i = torch.arange(B, device=z1.device)
    loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2
    return loss

In your implementation, you calculate the logits until [:,:,:-1] for tril and [:,:,1:] for triu. Why is this so? is there something that I have missed?

thank you in advance!

best,

Some questions about time series anomaly detection

Hello authors, I think TS2Vec is a GOOD idea for time series representation learning.

However, I have some questions as follows:

Q1

https://github.com/yuezhihan/ts2vec/blob/12a737e6561878452fffb68c81c98d24628f274a/tasks/anomaly_detection.py#L138-L140

Why explicitly make test_res[i] negative (=0) when positive point exists in previous delay timesteps?

Q2

For KPI dataset, phase2_train.csv is the training set, and phase2_ground_truth.hdf for testing.

But in preprocess_kpi.py, each time series was split into two halves. Shouldn't train on phase2_train.csv and test on phase2_ground_truth.hdf?


Thanks.

Evaluate code for univariate forcasting task maybe is not right. There is information leaking.

As title said that the results of univariate forcasting provide by this repository and the related paper maybe is not right because the evaluate code of univariate forcasting leads to information leaking. The following figure described the evaluate code of univariate forcasting in this repository.

image

The main problem is that eval_forecasting() encode data through sliding window(sliding step=1) and then split the training/validation/test data set. Hence, it will lead to information leaking. The general way to do this is splitting the data first and then encodes them respectively.

This bug is found in another repository which inherit evaluate code of TS2VEC, and we found the unfair evaluate code would contribute about 10% improvement on Electricity dataset about such project. Then, we checked the evaluate code of univariate forcasting in this repository and found the same operation.

So, I hope authors check the evaluate code of univariate forcasting, especially the result presented in the TS2VEC paper because maybe such results are wrong. If this problem is confirmed, I hope the authors will revise the results of the paper to avoid unfair comparisons.

prediction part

Hello, do you have a main function for the prediction part? Can you send me a copy? I want to study the model.

The dataset about ETT

Hi,it is a very nice work. But I have a question about Multivariate time series forecasting results. This Github repo https://github.com/zhouhaoyi/ETDataset only offer ETT-small dataset that is a Univariate time serie. I don't konw how to use Multivariate time serie dataset of ETT to run this code.
Thank you very much
Best wishes

Pre-processing for Electricity dataset

Hi, congratulations on the impressive results. May I check if you are able to give more details about the pre-processing details for the electricity dataset, or provide the pre-processing code? Do you use simple resampling, or do you take the average of each hour? Thank you!

如何接续训练呢?

你好,我在用您的代码进行训练时,中断后,我使用您提供的model.load()函数加载模型,希望进行接续训练,但是发现损失函数不像我期望的那样,继续下降,而是像是在从头训练,您可以提供一些可用的建议吗?

Q: Multivariate time-series regression

Hi, I'm studying ts2vec owing to my interest in time series.
I wonder how can I derive the results for the multivariate time-series regression task based on this code.
I hope for your response.
Thank you.

clarification on the sliding length and padding

Thank you for your great contribution. I was unable to understand the difference in usage for the sliding length and sliding padding. For example, if I wanted to utilize X days for a forecasting problem, what would be the proper usage for the parameters be?

Thank you in advance.

sliding_length
sliding_padding

Note: I noticed on my dataset that using 24 =>sliding length > 1 yields better results, however for sliding length >24 a size mismatch error occurs at evaluation.
The impact for increasing the padding was less impactful than the length, so if you can clarify the proper usage it would be great.

Question on Yahoo.sh

2023-12-09_11h16_39

Could you please explain what yahoo anomaly_0,1,2 and anomaly_coldstart_0,1,2 is referring to?
Thanks

请问训练轮次如何控制?

大佬您好,恭喜sota,有两个小问题:

  1. 我对非监督不是很熟悉,监督学习下判断过拟合欠拟合我主要是用valid set的early stop。这里任务被解耦成特征提取和一个判别网络,似乎只能观测一下特征提取这个阶段目标的loss curve性状来大体判断一下,而这个目标也不是任务整体的目标,那么如何决策要训练多少轮呢?是否有过、欠拟合风险呢?
  2. 我的任务是时序分类,并且全部贴好了标签。那么比起最大化某序列和另一个随机抽样序列的差异,是否有意地在另一个类别里抽样会更好呢?

非常感谢!

data shape, loading custom data, possible lookahead

Hi,
I am trying to test on my own datasets which are multivariate time series. I load the data into a Dataframe and then create the slices for train validate and test, just mimicking the existing code.

There is a point where my n x m data, where m is the number of features, or covariate time series, is expanded to 1 x n x m. The comments in your code say "number of instances x timestamps x features". What is instances in this context?

I am worried that my results are perhaps too good to be true and I am trying to make sure I understand where lookahead might be.

CUDA out of memory

I run
python3 train.py ETTm1 mytest --loader forecast_csv

and have got an error as follows.
Could you pls help me? thanks
##############################
Dataset: ETTm1
Arguments: Namespace(batch_size=8, dataset='ETTm1', epochs=None, eval=False, gpu=0, irregular=0, iters=None, loader='forecast_csv', lr=0.001, max_threads=None, max_train_length=3000, repr_dims=320, run_name='binh', save_every=None, seed=None)
Loading data... done
Traceback (most recent call last):
File "train.py", line 120, in
loss_log = model.fit(
File "/home/binh/experiments/ts2vec/ts2vec.py", line 137, in fit
loss.backward()
File "/home/binh/.local/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/binh/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 380.00 MiB (GPU 0; 3.82 GiB total capacity; 1.80 GiB already allocated; 254.62 MiB free; 2.25 GiB reserved in total by PyTorch)

Rounding error concerning max_train_length

Hi, I think there is a rounding error concerning the max_train_length

https://github.com/yuezhihan/ts2vec/blob/631bd533aab3547d1310f4e02a20f3eb53de26be/ts2vec.py#L77-L80

To crop the data into cropped into some sequences, each of which has a length less than <max_train_length>, the number of sections should be rounded up.

For example in the ETTh1 dataset cropping the train slice of length 8640 with max_train_length = 201 results in 42 sections of length 206, instead of 43 sections of length 201.

ETTh1 data dimension mismatch problem

Hello, I have also been studying the ts2vec model recently. When I used the ETTh1 dataset, I encountered a dimension inconsistency problem. Have you ever encountered this problem? I hope to receive your answer. Thank you very much!

Format of Yahoo dataset for pre-processing

Thank you so much for continuously open-sourcing your findings! I noticed that the downloaded data from Yahoo seems to be in a different format than the one required for preprocess_yahoo.py. Will it be possible for you to look into this? Thank you very much!

Yahoo follows the format of
A1/real_1.csv
...
A2/synthetic_1.csv
...

While the required format is path/1 ... path/367, which seems to contain dictionaries.

Training speed slowdown problem when using gpu

When I'm training using univariate time-series data, when I use gpu in the SamePadConv layer, I get 1.33 seconds and use cpu in the SamePadConv layer, I get 0.0002 seconds.

As for as I know, if I use gpu, it should be faster, so why is this cause occurring?

retrain a model after loading

When i fit an model, save the model, load it again and retrain the model, it seems that this doesn't work ...

model = TS2Vec(input_dims=5, batch_size = 64, device="cuda", max_train_length=680, output_dims=32, lr=0.001)
loss_log = model.fit(dvt_data, n_epochs=3, verbose=True)
model.save(f"model.sd")

Epoch #0: loss=2.07257523319938
Epoch #1: loss=1.1711310916765094
Epoch #2: loss=0.9953813534158445

for ts2iter in range(5):
model = TS2Vec(input_dims=5, batch_size = 64, device="cuda", max_train_length=680, output_dims=32, lr=0.001)
model = TS2Vec.load("model.sd")
loss_log = model.fit(dvt_data, n_epochs=1, verbose=True)
model.save(f"model.sd")

Epoch #0: loss=2.0144058981869803
Epoch #0: loss=2.1384957409088194
Epoch #0: loss=1.9966144137437456
Epoch #0: loss=2.0894507004875837
Epoch #0: loss=2.028488379587132

Suitability of ts2vec Framework for Lengthy Mono Audio Waveforms

In the context of mono audio classification on the raw waveforms, can the ts2vec framework be utilized for frames of 500ms with a sample rate of 22050 Hz? This would result in univariate time series of shape (1, 11025). Is 11025 considered too long for the ts2vec framework?

Have you tested the ts2vec framework on longer time series datasets, and is there any existing benchmark for evaluating its performance on such lengthy inputs?

Pretrained model loading problem

I meet an issue when I try to load a pre-trained TS2Vec model

Here is the code:

# train.py
model = TS2Vec(
        input_dims=input_dims,
        device=device,
        batch_size=batch_size,
        output_dims=output_dims
    )
loss_log = model.fit(X, verbose=verbose, n_epochs=n_epochs)
model.save(r'./models/mv_10_1_5_model.pkl')
# predict.py
model = TS2Vec(
    input_dims=input_dims,
    device=device,
    batch_size=batch_size,
    output_dims=output_dims
)
model.load(r'./models/mv_10_1_5_model.pkl')

And the bug

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_5364\1224328800.py in ()
     10     output_dims=output_dims
     11 )
---> 12 model.load('./models/mv_10_1_5_model.pkl')

[E:\Workplace\ts2vec\ts2vec.py](file:///E:/Workplace/ts2vec/ts2vec.py) in load(self, fn)
    315             fn (str): filename.
    316         '''
--> 317         state_dict = torch.load(fn, map_location=self.device)
    318         self.net.load_state_dict(state_dict)
    319 

[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in load(f, map_location, pickle_module, **pickle_load_args)
    710                     opened_file.seek(orig_position)
    711                     return torch.jit.load(opened_file)
--> 712                 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
    713         return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
    714 

[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in _load(zip_file, map_location, pickle_module, pickle_file, **pickle_load_args)
   1044     unpickler = UnpicklerWrapper(data_file, **pickle_load_args)
   1045     unpickler.persistent_load = persistent_load
-> 1046     result = unpickler.load()
   1047 
   1048     torch._utils._validate_loaded_sparse_tensors()

[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in persistent_load(saved_id)
   1014         if key not in loaded_storages:
   1015             nbytes = numel * torch._utils._element_size(dtype)
-> 1016             load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
   1017 
   1018         return loaded_storages[key]

[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in load_tensor(dtype, numel, key, location)
    999         # stop wrapping with _TypedStorage
   1000         loaded_storages[key] = torch.storage._TypedStorage(
-> 1001             wrap_storage=restore_location(storage, location),
   1002             dtype=dtype)
   1003 

[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in restore_location(storage, location)
    974     else:
    975         def restore_location(storage, location):
--> 976             result = map_location(storage, location)
    977             if result is None:
    978                 result = default_restore_location(storage, location)

TypeError: 'int' object is not callable

Do you have an example for saving and loading the pre-trained model? I would very appreciate if you could help address this issue!

Unexpected evaluation results with zeroed Loss (anomaly detection)

Hello, I have been studying the ts2vec model recently. When I use the yahoo dataset for training, I encountered an inconsistency about the contribution of the loss. Even if i set the loss manually to zero during the training phase, the evaluation results are almost the same, sometimes even better. I was wondering if it is something that I am missing or if it is a possible issue encountered, perhaps related to the evaluation phase itself. I appreciate any guidance or assistance you can provide regarding this matter and I hope to receive your answer. Thank you for your time and attention to this issue!

ts2vec additional use case for non-uniformly sampled ts input

hi!

Just read your paper and am super interested in it. The results are really promising on uniformly sampled input data, but it seems like from the way that the method itself was conceptualized and structured it should/could be robust to non-uniformly sampled ts input. By non-uniformly sampled time-series, I mean a time series where not all time points are sampled in the input. Using a cancer patient getting X-rays for example, they might get it at irregular intervals. For example, patient 1 might get the radiology at day 0, day 7, day 25, day 47; patient 2 might do so at day 0, day 23, day 56, day 59, day 70; etc..

One obvious way to adapt this to the ts2vec format is to bin the days, say 20 days per interval; fill in the missing data with some imputations; then converts that into a uniformly sampled ts. However, do you guys see any obvious ways one can use ts2vec in a way that does not require that approximation? Thanks!

Incorrect results when using MPS backend (MacOS)

TS2Vec uses the GELU activation function. Unfortunately, there's currently a bug in PyTorch for the GELU activation function on MPS devices (since April 2023).

This won't throw an error, just leads to rubbish results in both training and inference.

Hopefully this will be patched by PyTorch, but in the meantime a work around is to change F.gelu(x) to F.gelu(x.contiguous()) on lines 34 and 36 of models/dilated_conv.py . Results are then as expected.

I won't bother with a PR as I assume this will be patched by PyTorch in the (near) future.

More info:
pytorch/pytorch#98212
huggingface/transformers#22468

Question on Instance Constrastive-Loss

Problem Description

@yuezhihan Thanks for making the code available, really nice approach.

Going through the code, I have some difficulties understanding the Instance Constrastive-Loss. Starting from your paper explanation

I was trying to understand your implementation, in particular the final part , where you take specific entries across batches of the logits-tensor.

Details

For ease of explanation, I created a small example

# Create time series tensors (B:3,T:4,C:1)
z1 = torch.tensor([[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]], dtype=torch.float32).reshape(3,4,1)
z2 = torch.tensor([[[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]], dtype=torch.float32).reshape(3,4,1)

# get batch size and length of time series
B, T = z1.size(0), z1.size(1)

# handle edge case of single batch element
if B == 1:
    return z1.new_tensor(0.)

# concatenate z1 and z2 along the batch dimension to form a tensor of shape (2B, T, C)
z = torch.cat([z1, z2], dim=0)

# transpose z to shape (T, 2B, C)
z = z.transpose(0, 1)

# calculate the dot product between z and its transpose to get a similarity matrix of shape (T, 2B, 2B)
sim = torch.matmul(z, z.transpose(1, 2))

# extract the lower triangular part of sim, excluding the main diagonal that reflects self-similarity (T, 2B, 2B-1)
logits = torch.tril(sim, diagonal=-1)[:, :, :-1]

# add the upper triangular part of sim, excluding the main diagonal that reflects self-similarity (T, 2B, 2B-1)
logits += torch.triu(sim, diagonal=1)[:, :, 1:]

# apply log_softmax to logits
logits = -F.log_softmax(logits, dim=-1)

# use arange to create a tensor of indices 
i = torch.arange(B, device=z1.device)

# calculate the mean of the logits along the specified entries and across batches
loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2

For illustration purposes, let's focus on the first batch of the similarity matrix sim

tensor([[[  1.,   5.,   9.,  13.,  17.,  21.],
         [  5.,  25.,  45.,  65.,  85., 105.],
         [  9.,  45.,  81., 117., 153., 189.],
         [ 13.,  65., 117., 169., 221., 273.],
         [ 17.,  85., 153., 221., 289., 357.],
         [ 21., 105., 189., 273., 357., 441.]],

Calling torch.tril(sim, diagonal=-1)[:, :, :-1] gives the lower triangular matrix

tensor([[  0.,   0.,   0.,   0.,   0.],
        [  5.,   0.,   0.,   0.,   0.],
        [  9.,  45.,   0.,   0.,   0.],
        [ 13.,  65., 117.,   0.,   0.],
        [ 17.,  85., 153., 221.,   0.],
        [ 21., 105., 189., 273., 357.]])

Calling torch.triu(sim, diagonal=1)[:, :, 1:] gives the upper triangular matrix

tensor([[  5.,   9.,  13.,  17.,  21.],
        [  0.,  45.,  65.,  85., 105.],
        [  0.,   0., 117., 153., 189.],
        [  0.,   0.,   0., 221., 273.],
        [  0.,   0.,   0.,   0., 357.],
        [  0.,   0.,   0.,   0.,   0.]])

Having the diagonal and the remaining 0-elements on it removed, gives a matrix of pairwise dot-products logits = torch.tril(sim, diagonal=-1)[:, :, :-1] + torch.triu(sim, diagonal=1)[:, :, 1:]

print(logits)

tensor([[  5.,   9.,  13.,  17.,  21.],
        [  5.,  45.,  65.,  85., 105.],
        [  9.,  45., 117., 153., 189.],
        [ 13.,  65., 117., 221., 273.],
        [ 17.,  85., 153., 221., 357.],
        [ 21., 105., 189., 273., 357.]])

Applying the soft-max and the negative-log gives the negative log-likelihood as follows

logits= -F.log_softmax(logits, dim=-1)

We then use the batch-size B=3 to create an index for extracting certain parts of the logits-tensor

i = torch.arange(B, device=z1.device) 
    # i = tensor([0, 1, 2])
loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2
    # (B + i - 1) = tensor([2, 3, 4])
    # (B + i) = tensor([3, 4, 5])

Questions

Based on the above example, my questions are as follows:

  • How does loss = (log_prob[:, i, B + i - 1].mean() + log_prob[:, B + i, i].mean()) / 2 relate to the above equation?
  • Why do we extract logits[:, i, B + i - 1] parts from the upper-diagonal matrix? What do these entries mean? Are these the positive examples? Why do we extract exactly these entries?
  • Why do we extract logits[:, B + i, i] parts from the lower-diagonal matrix? What do these entries mean? Are these the negative examples? Why do we extract exactly these entries?
  • Why do we create the index i using the batch-size B, as we already take the mean across all batches?

what is difference between n_instance and n_features?

in ts2vec.py, fit() requires train_data type (n_instance. n_timestamps, n_features).
what is difference between n_instance and n_features?
I think n_feature means the number of time series (e.g. univariate time series -> n_features=1)
Is same that n_instance to window size? or something?
thank you.

Training iterations

Thanks for putting out this paper, sounds very promising. Would you mind clarifying the number of iterations you use for self-supervised training. You mention in the paper that you use 600 for datasets larger than 100,000 (with batch size 8). That seems incredibly low, are you sure you don't mean 600 epochs?

Any clarification would be great, thank you!

where can I download 'electricity.csv'

Hi Zhihan,

Thanks for the very useful repository. Could you point me to the place where I can download the 'electricity.csv'. From the link in the README, Electricity dataset , I can only get a file named LD2011_2014.txt.zip. Not sure how to convert it to the 'electricity.csv'.

请教下游任务中的padding

大佬您好,代码中有两个问题想请教您,希望大佬能够解答一下,多谢大佬!
1 下游预测任务中,padding设置了200,这个padding在预训练中有没有相对应的地方呢,也就是说预训练中要不要也padding呢?如果要,预训练的padding和下游任务的padding是否必须保持一致呢?200的值是怎么选取的呢?您使用的电力数据集,数据很长,padding设置为了200。如果数据没有那么长,例如只有100个点,那padding怎么设置合理点呢?
padding = 200
t = time.time()
all_repr = model.encode(
data,
casual=True,
sliding_length=1,
sliding_padding=padding,
batch_size=256
)
2 生成训练数据的时候,为什么要drop掉padding个数据呢?
train_features, train_labels = generate_pred_samples(train_repr, train_data, pred_len, drop=padding)

恳请大佬解惑,多谢大佬!!!

Simple sin wave results

sinwave.csv
I am using a simple sin wave to test the algor. s1 is long wave, s2 is med wave, s3 is short wave, s4=s1+s2+s3. The model can predict s1/s2/s3 successfully, but for s4 it performs poorly compared to even LSTM. Could you share some insights on this?
I've tried default hyper-parameters, and also tried to tune it. No significant improvement.
thanks.

下游任务相关问题?

你好,我想请教一下下游任务的问题

Compute timestamp-level representations for test set

test_repr = model.encode(test_data) # n_instances x n_timestamps x output_dims

Compute instance-level representations for test set

test_repr = model.encode(test_data, encoding_window='full_series') # n_instances x output_dims

  1. n_instances x n_timestamps x output_dims 我可以理解成为 batchsize * t * channels吗?

  2. 然后这个时间级别的表示和实例级别的表示在分类任务中有什么性能上的差异吗? 另外,我看代码是将输入维度升高到了320,这好像和我理解的传统的特征提取不太一样?

A dont understad part about random cropping in code?

For below part ,why use crop_right substract crop_eleft instead of substracting crop_left?
out1 = self._net(take_per_row(x, crop_offset + crop_eleft, crop_right - crop_eleft))
out1 = out1[:, -crop_l:]

            out2 = self._net(take_per_row(x, crop_offset + crop_left, crop_eright - crop_left))
            out2 = out2[:, :crop_l]

How to use your approach for downstream forecasting tasks

Summary

Thanks for making the code available. I really like the idea of first learning the embeddings in a self-supervised manner and then using a simpler model for forecasting. However, I am struggling how to use the learned embeddings for the forecasting part.

Problem Description

Say you are tasked with forecasting a monthly univariate time series Y = (y1, ..., yT), which is historically available from January.2010 until December.2020. The task is to forecast 2021, with the forecasting horizon being h=12 months. Based on your framework, we are using the TCN-Encoder to learn the embeddings for January.2010 until December.2020. For training of the downstream forecasting model, say a Ridge Regression Model, we are using the final timestamp of the learned representations. So far so good.

@yuezhihan & @linytsysu My questions is: given the representations and the trained Ridge model, how do we forecast 2021, since the data and hence representations are available until end of 2020 only? More specifically, what are the features for the Ridge model used for forecasting 2021?

In your Paper, Section C.2 you state that

For each task, we only use the training set to train the representation model, and apply the model to the testing set to
get representations

Does this mean you show the actual test-data to the model, create the representations/embeddings based on the test-data and then use these to fit the same test-data? Isn't this a simple interpolation of the test-data, using the representations instead of the actuals, rather than forecasting?

I highly appreciate your comments on this. Many thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.