zhihanyue / ts2vec Goto Github PK
View Code? Open in Web Editor NEWA universal time series representation learning framework
License: MIT License
A universal time series representation learning framework
License: MIT License
作者你好,想请教一个问题,随机裁剪两个上下文后,怎么保证输出序列中out1和out2第一个索引在原始序列中对应的是同一个时间戳,如果不是对应的话,后面计算损失函数感觉和原文提到的不一样。
out1 = self._net(take_per_row(x, crop_offset + crop_eleft, crop_right - crop_eleft))
out1 = out1[:, -crop_l:]
out2 = self._net(take_per_row(x, crop_offset + crop_left, crop_eright - crop_left))
out2 = out2[:, :crop_l]
Firstly, I'm sure it wasn't intentional. I reopen this issue and just want to discuss this question clearly. Maybe I'm wrong, but after I read the code, especially encode() detailed, I still cannot understand the following words:
For example, we assume t is the first timestamp in the test set. For the first sample in the test set, the original input is [t-padding, t]. However, your input is [t, t], which feeds only one timestamp as input, resulting in poor performance and biased distribution.
First, the whole series is fed into encode() in the eval_forecasting(). Does encode() know where the test part starts? In fact, there is no related variable t in the eval_forecasting() to indicate encode() the series is the training set or test set. I print the sliced representation of train/valid/test parts and I confirm that the encoded train/valid/test sets are overlapped. (I hope you can observe it and then reply to me.) When you encode the whole series to representation through the sliding window and sliding_length is 1. If encode() does not know where the test set starts, then the last sample of validation would cover the first test sample. But, I think this bug does not belong to encode().
I use Electricity as an example to explain this bug. I performed the univariate forecasting task and collected the shape of some variables. The shape of the original series(variable data in the eval_forecasting()) is (26304, 1). The shape of all_repr which is the encoded series by encode() is (26304, 128). The shape of the representation of train/valid/test sets are (15782, 128), (5261, 128), (5261, 128). If there are no overlapped samples among the train/valid/test sets, the sum of their samples should not equal the total samples.
Of course, if you skip the head-overlapped samples, you can avoid this leakage. The generate_pred_samples() is used to discard header samples, this problem may be solved by setting reasonable parameters. But, for the valid/test sets, the parameter of the drop is set as ZERO. That means that the valid set and test set would continue to the previous section.
for pred_len in pred_lens:
train_features, train_labels = generate_pred_samples(train_repr, train_data, pred_len, drop=padding)
valid_features, valid_labels = generate_pred_samples(valid_repr, valid_data, pred_len)
test_features, test_labels = generate_pred_samples(test_repr, test_data, pred_len)
The default value of drop in the generate_pred_samples() is 0. Is the [t-padding, t] you mentioned? Are you forget to set the drop?
At last, I hope you can provide detailed evidence to prove you avoid this leakage, rather than simply asking me to read the code because I still think the bug exists after reading it again.
I have run the code for ETTh2 dataset as mentioned in the paper for 500 epochs and have obtained the pickle file as output.
I am trying to get the plot of prediction v/s ground truth as mentioned in the paper in Figure 5.
Could you please suggest the code used to plot the obtained output.
Hello, thank you for sharing your work!
I have a question regarding the implementation of instance_contrastive_loss
def instance_contrastive_loss(z1, z2):
B, T = z1.size(0), z1.size(1)
if B == 1:
# contrastive loss requires pair.
return z1.new_tensor(0.)
z = torch.cat([z1, z2], dim=0) # 2B x T x C
z = z.transpose(0, 1) # T x 2B x C
sim = torch.matmul(z, z.transpose(1, 2)) # T x 2B x 2B
logits = torch.tril(sim, diagonal=-1)[:, :, :-1] # T x 2B x (2B-1)
logits += torch.triu(sim, diagonal=1)[:, :, 1:]
logits = -F.log_softmax(logits, dim=-1)
i = torch.arange(B, device=z1.device)
loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2
return loss
In your implementation, you calculate the logits until [:,:,:-1] for tril and [:,:,1:] for triu. Why is this so? is there something that I have missed?
thank you in advance!
best,
Hello authors, I think TS2Vec is a GOOD idea for time series representation learning.
However, I have some questions as follows:
Why explicitly make test_res[i]
negative (=0) when positive point exists in previous delay
timesteps?
For KPI dataset, phase2_train.csv
is the training set, and phase2_ground_truth.hdf
for testing.
But in preprocess_kpi.py, each time series was split into two halves. Shouldn't train on phase2_train.csv
and test on phase2_ground_truth.hdf
?
Thanks.
As title said that the results of univariate forcasting provide by this repository and the related paper maybe is not right because the evaluate code of univariate forcasting leads to information leaking. The following figure described the evaluate code of univariate forcasting in this repository.
The main problem is that eval_forecasting() encode data through sliding window(sliding step=1) and then split the training/validation/test data set. Hence, it will lead to information leaking. The general way to do this is splitting the data first and then encodes them respectively.
This bug is found in another repository which inherit evaluate code of TS2VEC, and we found the unfair evaluate code would contribute about 10% improvement on Electricity dataset about such project. Then, we checked the evaluate code of univariate forcasting in this repository and found the same operation.
So, I hope authors check the evaluate code of univariate forcasting, especially the result presented in the TS2VEC paper because maybe such results are wrong. If this problem is confirmed, I hope the authors will revise the results of the paper to avoid unfair comparisons.
Hello, do you have a main function for the prediction part? Can you send me a copy? I want to study the model.
Hi,it is a very nice work. But I have a question about Multivariate time series forecasting results. This Github repo https://github.com/zhouhaoyi/ETDataset only offer ETT-small dataset that is a Univariate time serie. I don't konw how to use Multivariate time serie dataset of ETT to run this code.
Thank you very much
Best wishes
Hi, congratulations on the impressive results. May I check if you are able to give more details about the pre-processing details for the electricity dataset, or provide the pre-processing code? Do you use simple resampling, or do you take the average of each hour? Thank you!
你好,我在用您的代码进行训练时,中断后,我使用您提供的model.load()函数加载模型,希望进行接续训练,但是发现损失函数不像我期望的那样,继续下降,而是像是在从头训练,您可以提供一些可用的建议吗?
Hi, I'm studying ts2vec owing to my interest in time series.
I wonder how can I derive the results for the multivariate time-series regression task based on this code.
I hope for your response.
Thank you.
Thank you for your great contribution. I was unable to understand the difference in usage for the sliding length and sliding padding. For example, if I wanted to utilize X days for a forecasting problem, what would be the proper usage for the parameters be?
Thank you in advance.
sliding_length
sliding_padding
Note: I noticed on my dataset that using 24 =>sliding length > 1 yields better results, however for sliding length >24 a size mismatch error occurs at evaluation.
The impact for increasing the padding was less impactful than the length, so if you can clarify the proper usage it would be great.
大佬您好,恭喜sota,有两个小问题:
非常感谢!
Hi,
I am trying to test on my own datasets which are multivariate time series. I load the data into a Dataframe and then create the slices for train validate and test, just mimicking the existing code.
There is a point where my n x m data, where m is the number of features, or covariate time series, is expanded to 1 x n x m. The comments in your code say "number of instances x timestamps x features". What is instances in this context?
I am worried that my results are perhaps too good to be true and I am trying to make sure I understand where lookahead might be.
I run
python3 train.py ETTm1 mytest --loader forecast_csv
and have got an error as follows.
Could you pls help me? thanks
##############################
Dataset: ETTm1
Arguments: Namespace(batch_size=8, dataset='ETTm1', epochs=None, eval=False, gpu=0, irregular=0, iters=None, loader='forecast_csv', lr=0.001, max_threads=None, max_train_length=3000, repr_dims=320, run_name='binh', save_every=None, seed=None)
Loading data... done
Traceback (most recent call last):
File "train.py", line 120, in
loss_log = model.fit(
File "/home/binh/experiments/ts2vec/ts2vec.py", line 137, in fit
loss.backward()
File "/home/binh/.local/lib/python3.8/site-packages/torch/tensor.py", line 245, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
File "/home/binh/.local/lib/python3.8/site-packages/torch/autograd/init.py", line 145, in backward
Variable._execution_engine.run_backward(
RuntimeError: CUDA out of memory. Tried to allocate 380.00 MiB (GPU 0; 3.82 GiB total capacity; 1.80 GiB already allocated; 254.62 MiB free; 2.25 GiB reserved in total by PyTorch)
Hi, I think there is a rounding error concerning the max_train_length
https://github.com/yuezhihan/ts2vec/blob/631bd533aab3547d1310f4e02a20f3eb53de26be/ts2vec.py#L77-L80
To crop the data into cropped into some sequences, each of which has a length less than <max_train_length>, the number of sections should be rounded up.
For example in the ETTh1 dataset cropping the train slice of length 8640 with max_train_length = 201 results in 42 sections of length 206, instead of 43 sections of length 201.
Hello, I have also been studying the ts2vec model recently. When I used the ETTh1 dataset, I encountered a dimension inconsistency problem. Have you ever encountered this problem? I hope to receive your answer. Thank you very much!
Thank you so much for continuously open-sourcing your findings! I noticed that the downloaded data from Yahoo seems to be in a different format than the one required for preprocess_yahoo.py. Will it be possible for you to look into this? Thank you very much!
Yahoo follows the format of
A1/real_1.csv
...
A2/synthetic_1.csv
...
While the required format is path/1 ... path/367, which seems to contain dictionaries.
When I'm training using univariate time-series data, when I use gpu in the SamePadConv layer, I get 1.33 seconds and use cpu in the SamePadConv layer, I get 0.0002 seconds.
As for as I know, if I use gpu, it should be faster, so why is this cause occurring?
Just read your paper. Pretty amazing results! Would you consider integrating this into https://github.com/timeseriesAI/tsai to make it accessible to more people?
你好,非常感谢您的开源项目,
现在我有一个疑问,在得到数据的表征后,比如分类数据,相同类数据得到的表征相似度会比非同类高吗?
因为我想单独证明我得到的表征模型是有效的.
如果您可以回答我的疑问,我将不胜感激!
When i fit an model, save the model, load it again and retrain the model, it seems that this doesn't work ...
model = TS2Vec(input_dims=5, batch_size = 64, device="cuda", max_train_length=680, output_dims=32, lr=0.001)
loss_log = model.fit(dvt_data, n_epochs=3, verbose=True)
model.save(f"model.sd")
Epoch #0: loss=2.07257523319938
Epoch #1: loss=1.1711310916765094
Epoch #2: loss=0.9953813534158445
for ts2iter in range(5):
model = TS2Vec(input_dims=5, batch_size = 64, device="cuda", max_train_length=680, output_dims=32, lr=0.001)
model = TS2Vec.load("model.sd")
loss_log = model.fit(dvt_data, n_epochs=1, verbose=True)
model.save(f"model.sd")
Epoch #0: loss=2.0144058981869803
Epoch #0: loss=2.1384957409088194
Epoch #0: loss=1.9966144137437456
Epoch #0: loss=2.0894507004875837
Epoch #0: loss=2.028488379587132
In the context of mono audio classification on the raw waveforms, can the ts2vec framework be utilized for frames of 500ms with a sample rate of 22050 Hz? This would result in univariate time series of shape (1, 11025). Is 11025 considered too long for the ts2vec framework?
Have you tested the ts2vec framework on longer time series datasets, and is there any existing benchmark for evaluating its performance on such lengthy inputs?
I meet an issue when I try to load a pre-trained TS2Vec model
Here is the code:
# train.py
model = TS2Vec(
input_dims=input_dims,
device=device,
batch_size=batch_size,
output_dims=output_dims
)
loss_log = model.fit(X, verbose=verbose, n_epochs=n_epochs)
model.save(r'./models/mv_10_1_5_model.pkl')
# predict.py
model = TS2Vec(
input_dims=input_dims,
device=device,
batch_size=batch_size,
output_dims=output_dims
)
model.load(r'./models/mv_10_1_5_model.pkl')
And the bug
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_5364\1224328800.py in ()
10 output_dims=output_dims
11 )
---> 12 model.load('./models/mv_10_1_5_model.pkl')
[E:\Workplace\ts2vec\ts2vec.py](file:///E:/Workplace/ts2vec/ts2vec.py) in load(self, fn)
315 fn (str): filename.
316 '''
--> 317 state_dict = torch.load(fn, map_location=self.device)
318 self.net.load_state_dict(state_dict)
319
[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in load(f, map_location, pickle_module, **pickle_load_args)
710 opened_file.seek(orig_position)
711 return torch.jit.load(opened_file)
--> 712 return _load(opened_zipfile, map_location, pickle_module, **pickle_load_args)
713 return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
714
[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in _load(zip_file, map_location, pickle_module, pickle_file, **pickle_load_args)
1044 unpickler = UnpicklerWrapper(data_file, **pickle_load_args)
1045 unpickler.persistent_load = persistent_load
-> 1046 result = unpickler.load()
1047
1048 torch._utils._validate_loaded_sparse_tensors()
[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in persistent_load(saved_id)
1014 if key not in loaded_storages:
1015 nbytes = numel * torch._utils._element_size(dtype)
-> 1016 load_tensor(dtype, nbytes, key, _maybe_decode_ascii(location))
1017
1018 return loaded_storages[key]
[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in load_tensor(dtype, numel, key, location)
999 # stop wrapping with _TypedStorage
1000 loaded_storages[key] = torch.storage._TypedStorage(
-> 1001 wrap_storage=restore_location(storage, location),
1002 dtype=dtype)
1003
[e:\Software\anaconda3\lib\site-packages\torch\serialization.py](file:///E:/Software/anaconda3/lib/site-packages/torch/serialization.py) in restore_location(storage, location)
974 else:
975 def restore_location(storage, location):
--> 976 result = map_location(storage, location)
977 if result is None:
978 result = default_restore_location(storage, location)
TypeError: 'int' object is not callable
Do you have an example for saving and loading the pre-trained model? I would very appreciate if you could help address this issue!
Hello, I have been studying the ts2vec model recently. When I use the yahoo dataset for training, I encountered an inconsistency about the contribution of the loss. Even if i set the loss manually to zero during the training phase, the evaluation results are almost the same, sometimes even better. I was wondering if it is something that I am missing or if it is a possible issue encountered, perhaps related to the evaluation phase itself. I appreciate any guidance or assistance you can provide regarding this matter and I hope to receive your answer. Thank you for your time and attention to this issue!
hi!
Just read your paper and am super interested in it. The results are really promising on uniformly sampled input data, but it seems like from the way that the method itself was conceptualized and structured it should/could be robust to non-uniformly sampled ts input. By non-uniformly sampled time-series, I mean a time series where not all time points are sampled in the input. Using a cancer patient getting X-rays for example, they might get it at irregular intervals. For example, patient 1 might get the radiology at day 0, day 7, day 25, day 47; patient 2 might do so at day 0, day 23, day 56, day 59, day 70; etc..
One obvious way to adapt this to the ts2vec format is to bin the days, say 20 days per interval; fill in the missing data with some imputations; then converts that into a uniformly sampled ts. However, do you guys see any obvious ways one can use ts2vec in a way that does not require that approximation? Thanks!
TS2Vec uses the GELU activation function. Unfortunately, there's currently a bug in PyTorch for the GELU activation function on MPS devices (since April 2023).
This won't throw an error, just leads to rubbish results in both training and inference.
Hopefully this will be patched by PyTorch, but in the meantime a work around is to change F.gelu(x)
to F.gelu(x.contiguous())
on lines 34 and 36 of models/dilated_conv.py . Results are then as expected.
I won't bother with a PR as I assume this will be patched by PyTorch in the (near) future.
More info:
pytorch/pytorch#98212
huggingface/transformers#22468
@yuezhihan Thanks for making the code available, really nice approach.
Going through the code, I have some difficulties understanding the Instance Constrastive-Loss. Starting from your paper explanation
I was trying to understand your implementation, in particular the final part , where you take specific entries across batches of the logits-tensor.
For ease of explanation, I created a small example
# Create time series tensors (B:3,T:4,C:1)
z1 = torch.tensor([[[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]]], dtype=torch.float32).reshape(3,4,1)
z2 = torch.tensor([[[13, 14, 15, 16], [17, 18, 19, 20], [21, 22, 23, 24]]], dtype=torch.float32).reshape(3,4,1)
# get batch size and length of time series
B, T = z1.size(0), z1.size(1)
# handle edge case of single batch element
if B == 1:
return z1.new_tensor(0.)
# concatenate z1 and z2 along the batch dimension to form a tensor of shape (2B, T, C)
z = torch.cat([z1, z2], dim=0)
# transpose z to shape (T, 2B, C)
z = z.transpose(0, 1)
# calculate the dot product between z and its transpose to get a similarity matrix of shape (T, 2B, 2B)
sim = torch.matmul(z, z.transpose(1, 2))
# extract the lower triangular part of sim, excluding the main diagonal that reflects self-similarity (T, 2B, 2B-1)
logits = torch.tril(sim, diagonal=-1)[:, :, :-1]
# add the upper triangular part of sim, excluding the main diagonal that reflects self-similarity (T, 2B, 2B-1)
logits += torch.triu(sim, diagonal=1)[:, :, 1:]
# apply log_softmax to logits
logits = -F.log_softmax(logits, dim=-1)
# use arange to create a tensor of indices
i = torch.arange(B, device=z1.device)
# calculate the mean of the logits along the specified entries and across batches
loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2
For illustration purposes, let's focus on the first batch of the similarity matrix sim
tensor([[[ 1., 5., 9., 13., 17., 21.],
[ 5., 25., 45., 65., 85., 105.],
[ 9., 45., 81., 117., 153., 189.],
[ 13., 65., 117., 169., 221., 273.],
[ 17., 85., 153., 221., 289., 357.],
[ 21., 105., 189., 273., 357., 441.]],
Calling torch.tril(sim, diagonal=-1)[:, :, :-1]
gives the lower triangular matrix
tensor([[ 0., 0., 0., 0., 0.],
[ 5., 0., 0., 0., 0.],
[ 9., 45., 0., 0., 0.],
[ 13., 65., 117., 0., 0.],
[ 17., 85., 153., 221., 0.],
[ 21., 105., 189., 273., 357.]])
Calling torch.triu(sim, diagonal=1)[:, :, 1:]
gives the upper triangular matrix
tensor([[ 5., 9., 13., 17., 21.],
[ 0., 45., 65., 85., 105.],
[ 0., 0., 117., 153., 189.],
[ 0., 0., 0., 221., 273.],
[ 0., 0., 0., 0., 357.],
[ 0., 0., 0., 0., 0.]])
Having the diagonal and the remaining 0-elements on it removed, gives a matrix of pairwise dot-products logits = torch.tril(sim, diagonal=-1)[:, :, :-1] + torch.triu(sim, diagonal=1)[:, :, 1:]
print(logits)
tensor([[ 5., 9., 13., 17., 21.],
[ 5., 45., 65., 85., 105.],
[ 9., 45., 117., 153., 189.],
[ 13., 65., 117., 221., 273.],
[ 17., 85., 153., 221., 357.],
[ 21., 105., 189., 273., 357.]])
Applying the soft-max and the negative-log gives the negative log-likelihood as follows
logits= -F.log_softmax(logits, dim=-1)
We then use the batch-size B=3
to create an index for extracting certain parts of the logits-tensor
i = torch.arange(B, device=z1.device)
# i = tensor([0, 1, 2])
loss = (logits[:, i, B + i - 1].mean() + logits[:, B + i, i].mean()) / 2
# (B + i - 1) = tensor([2, 3, 4])
# (B + i) = tensor([3, 4, 5])
Based on the above example, my questions are as follows:
loss = (log_prob[:, i, B + i - 1].mean() + log_prob[:, B + i, i].mean()) / 2
relate to the above equation?logits[:, i, B + i - 1]
parts from the upper-diagonal matrix? What do these entries mean? Are these the positive examples? Why do we extract exactly these entries?logits[:, B + i, i]
parts from the lower-diagonal matrix? What do these entries mean? Are these the negative examples? Why do we extract exactly these entries?i
using the batch-size B
, as we already take the mean across all batches?in ts2vec.py, fit()
requires train_data
type (n_instance. n_timestamps, n_features).
what is difference between n_instance and n_features?
I think n_feature means the number of time series (e.g. univariate time series -> n_features=1)
Is same that n_instance to window size? or something?
thank you.
Thanks for putting out this paper, sounds very promising. Would you mind clarifying the number of iterations you use for self-supervised training. You mention in the paper that you use 600 for datasets larger than 100,000 (with batch size 8). That seems incredibly low, are you sure you don't mean 600 epochs?
Any clarification would be great, thank you!
Hi Zhihan,
Thanks for the very useful repository. Could you point me to the place where I can download the 'electricity.csv'. From the link in the README, Electricity dataset , I can only get a file named LD2011_2014.txt.zip. Not sure how to convert it to the 'electricity.csv'.
大佬您好,代码中有两个问题想请教您,希望大佬能够解答一下,多谢大佬!
1 下游预测任务中,padding设置了200,这个padding在预训练中有没有相对应的地方呢,也就是说预训练中要不要也padding呢?如果要,预训练的padding和下游任务的padding是否必须保持一致呢?200的值是怎么选取的呢?您使用的电力数据集,数据很长,padding设置为了200。如果数据没有那么长,例如只有100个点,那padding怎么设置合理点呢?
padding = 200
t = time.time()
all_repr = model.encode(
data,
casual=True,
sliding_length=1,
sliding_padding=padding,
batch_size=256
)
2 生成训练数据的时候,为什么要drop掉padding个数据呢?
train_features, train_labels = generate_pred_samples(train_repr, train_data, pred_len, drop=padding)
恳请大佬解惑,多谢大佬!!!
Hi,
Is there any reason you set drop to equal padding lengths for training in forecasting, but not for valid and test? This could train forecast function with complete history only.
sinwave.csv
I am using a simple sin wave to test the algor. s1 is long wave, s2 is med wave, s3 is short wave, s4=s1+s2+s3. The model can predict s1/s2/s3 successfully, but for s4 it performs poorly compared to even LSTM. Could you share some insights on this?
I've tried default hyper-parameters, and also tried to tune it. No significant improvement.
thanks.
你好,我想请教一下下游任务的问题
test_repr = model.encode(test_data) # n_instances x n_timestamps x output_dims
test_repr = model.encode(test_data, encoding_window='full_series') # n_instances x output_dims
n_instances x n_timestamps x output_dims 我可以理解成为 batchsize * t * channels吗?
然后这个时间级别的表示和实例级别的表示在分类任务中有什么性能上的差异吗? 另外,我看代码是将输入维度升高到了320,这好像和我理解的传统的特征提取不太一样?
For below part ,why use crop_right substract crop_eleft instead of substracting crop_left?
out1 = self._net(take_per_row(x, crop_offset + crop_eleft, crop_right - crop_eleft))
out1 = out1[:, -crop_l:]
out2 = self._net(take_per_row(x, crop_offset + crop_left, crop_eright - crop_left))
out2 = out2[:, :crop_l]
Thanks for making the code available. I really like the idea of first learning the embeddings in a self-supervised manner and then using a simpler model for forecasting. However, I am struggling how to use the learned embeddings for the forecasting part.
Say you are tasked with forecasting a monthly univariate time series Y = (y1, ..., yT)
, which is historically available from January.2010 until December.2020. The task is to forecast 2021, with the forecasting horizon being h=12
months. Based on your framework, we are using the TCN-Encoder to learn the embeddings for January.2010 until December.2020. For training of the downstream forecasting model, say a Ridge Regression Model, we are using the final timestamp of the learned representations. So far so good.
@yuezhihan & @linytsysu My questions is: given the representations and the trained Ridge model, how do we forecast 2021, since the data and hence representations are available until end of 2020 only? More specifically, what are the features for the Ridge model used for forecasting 2021?
In your Paper, Section C.2 you state that
For each task, we only use the training set to train the representation model, and apply the model to the testing set to
get representations
Does this mean you show the actual test-data to the model, create the representations/embeddings based on the test-data and then use these to fit the same test-data? Isn't this a simple interpolation of the test-data, using the representations instead of the actuals, rather than forecasting?
I highly appreciate your comments on this. Many thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.