TimesNet (ICLR-2023)
00 分钟
2023-11-28
TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis

工作出发点

任务背景

提出一个统一的backbone/foundation model去解决各式各样的时序分析任务(预测forecasting/填补imputation/异常检测anomaly detection/分类classification)
However, the variations of real-world time series always involve intricate temporal patterns, where multiple variations (e.g. rising, falling, fluctuation and etc.) mix and overlap with each other, making the temporal variation modeling extremely challenging.

现有方法的缺点及其原因

  1. 经典方法(如ARIMA、Holt-Winter、Prophet)假设时间变化遵循预定义模式,但现实世界的时间序列变化通常过于复杂,无法被这些预定义模式覆盖。
    1. However, the variations of real-world time series are usually too complex to be covered by pre-defined patterns, limiting the practical applicability of these classical methods.
  1. 深度学习方法(如MLP、TCN、RNN)虽然提出用于时间建模,但它们没有考虑由周期性导致的时间二维变化。
    1. RNN Markov assumption based 无法有效捕捉长期依赖性,且效率受限于序列计算范式。
      1. However, these methods usually fail in capturing the longterm dependencies and their efficiencies suffer from the sequential computation paradigm.
    2. temporal dimension CNN (TCN) 由于一维卷积核的局部性特征,无法捕获长期依赖性,且一维卷积核只能模拟相邻时间点之间的变化
      1. Also, because of the locality property of the one-dimension convolution kernels, they can only model the variations among adjacent time points, thereby still failing in long-term dependencies.
    3. Transformer其注意力机制难以直接从分散的时间点中找出可靠的依赖关系,因为时间依赖性可能深藏于复杂的时间模式中。
      1. But it is hard for attention mechanism to find out reliable dependencies directly from scattered time points, since the temporal dependencies can be obscured deeply in intricate temporal patterns.

工作的构思(创新点)

Overall design of TimesNet
多周期视角(Multi-periodicity)来看待现实世界中的时间序列(overlap and interaction),多周期耦合性。
modular architecture将模型分成多个模块,分开处理,但是如此处理将无法处理多个模块之间的耦合关系(1D time series has limitations in representation capability),多个模块之间需要连续的看
Temporal 2D-variation建模,使用新的数据结构,在二维空间内进行分析
从intraperiod(adjacent area, short-term variations)和interperiod(same phase in adjacent periods,long-term variations)视角来看
将数据按照周期切分,之后将不同周期分别放在不同列上,行则表示不同周期同相位,每一列则表示同周期内的变化
notion image
在实际的调研过程中,发现将数据reshape过后可以看到数据结构会出现2D locality,受到此启发,可以使用如CNN等从2D视觉领域的视角来处理此问题(将长序列变成tensor,计算速度大幅度提升)
notion image

技术方案的解读

TimesNet
notion image
整体结构类似于ResNet,里面的TimesBlock思路是: ①将1D转成2D空间,②通过2D kernel进行representation learning,③通过2D→1D转化回1D中表示
整体结构类似于ResNet,里面的TimesBlock思路是: ①将1D转成2D空间,②通过2D kernel进行representation learning,③通过2D→1D转化回1D中表示

1D→2D

基于周期的转化:
notion image
  1. 通过FFT计算时间序列的spectrum
  1. 选择top k frequency
  1. 对于每一个frequency,reshape 1D time series into 2D tensor
    1. 对齐问题的解决:在末尾补零,满足相应 frequency的需求即可,之后输入到inception里面可以满足不同大小的tensor运算
      notion image

2D representation learning

notion image
  1. inception block is shared in all selected periods for parameter efficiency
    1. inception block速度快,且内含了多个尺度的建模结构,如此,不需要堆叠多层
  1. it can be replaced by any vision backbones, bridging time series and CV.

2D→1D

reshape to 1D space to aggregation
最初是基于频域的显著性选择的frequency,是与振幅相关的。在此处,将振幅作为softmax后作为权重,对特征进行加权、aggregation即可得到最终的特征。
notion image

实验部分

设计和结果

notion image
5 mainstream time series analysis tasks; 36 datasets, 81 settings, 20+ baseline
notion image
现阶段是在卷ETTm1/ETTm2/ETTh1/ETTh2/Electricity/Traffic/Weather/Exchange/ILI这9个数据集,但这9个数据集的temporal variation相对较为简单,本文使用了M4数据集(其变化相对而言较为复杂),“Simple linear methods degenerate a lot”

数据集

notion image
Especially for the long-term setting, we follow the benchmarks used in Autoformer (2021), including ETT (Zhou et al., 2021), Electricity (UCI), Traffic (PeMS), Weather (Wetterstation), Exchange (Lai et al., 2018) and ILI (CDC, Influenza-like Illness), covering five real-world applications. For the short-term dataset, we adopt the M4 (Spyros Makridakis, 2018), which contains the yearly, quarterly and monthly collected univariate marketing data.

指标

notion image

baseline

Model/Method
Year
Description
LSTM
1997
A recurrent neural network (RNN) model known for its ability to capture long-term dependencies in sequence data.
LSTNet
2018
Combines CNN and RNN layers to capture both spatial and temporal correlations in time series data.
LSSL
2022
An RNN-based approach that emphasizes long and short-term memory components for time series analysis.
TCN (Temporal Convolutional Network)
2019
Utilizes causal convolutions to handle sequence modeling tasks, offering an alternative to RNNs.
LightTS
2022
A lightweight MLP-based model designed for efficient and scalable time series forecasting.
DLinear
2023
An MLP-based model focusing on linear transformations for time series analysis.
Reformer
2020
A Transformer variant that reduces memory usage and computation time, suitable for long sequences.
Informer
2021
Optimizes the Transformer architecture for long-sequence time series forecasting by reducing computational cost.
Pyraformer
2021
Leverages a pyramidal structure to efficiently process long sequences in time series analysis.
Autoformer
2021
Incorporates an automatic decomposition mechanism to enhance forecasting performance in time series data.
FEDformer
2022
Employs Fourier and wavelet transformations within the Transformer framework for enhanced time series forecasting.
Non-stationary Transformer
2022
Designed to handle non-stationary time series data by adapting the Transformer architecture.
ETSformer
2022
A Transformer-based model that integrates exponential smoothing techniques for time series forecasting.
N-HiTS
2022
Focuses on short-term forecasting with a hierarchical architecture for multi-step time series prediction.
N-BEATS
2019
A purely MLP-based model, notable for its stackable and interpretable architecture in forecasting tasks.
Anomaly Transformer
2021
Specialized for anomaly detection in time series, leveraging Transformer's attention mechanism.
Rocket
2020
A classification-oriented approach using random convolutional kernels for feature extraction from time series.
Flowformer
2022
Aims at classification tasks in time series data, integrating flow-based models with the Transformer structure.

表示分析

Representation analysis
【insightful】横轴表示模型底层参数和顶层参数的相似度,纵轴表示模型的performance;需要思考期待的模型所需要的是hierarchical representations还是low-level representations
【insightful】横轴表示模型底层参数和顶层参数的相似度,纵轴表示模型的performance;需要思考期待的模型所需要的是hierarchical representations还是low-level representations
Benefiting from temporal 2D-variations, it can learn proper representations for different tasks.

消融实验分析

  1. 使用不同的模块
    1. notion image
  1. 模型架构:与Autoformer中的深度分解架构结合并不能带来进一步的提升。这些结果可能是因为在输入序列已经表现出明显周期性的情况下,我们的设计可以有效捕捉2D变化。
    1. notion image
  1. 自适应聚合:在Autoformer的设计框架下,TimesNet采用了Softmax函数后的振幅作为这些张量的聚合权重。比较直接求和与移除Softmax函数仅用其系数。
    1. notion image

总结与展望

论文贡献点

  • 模块化架构与二维空间建模:TimesNet通过其模块化架构有效地解开了时间序列中的复杂时间变化,并在二维空间中捕捉内周期和间周期变化,这是通过参数高效的inception block实现的。
  • 广泛的应用性能:在五个主流的时间序列分析任务中,TimesNet展示了出色的通用性和性能,证明了其作为时间序列分析的基础模型的有效性。

论文缺陷

  • 缺乏大规模预训练方法的探索:尽管TimesNet在多个任务中表现出色,但论文没有深入探讨大规模预训练方法在时间序列分析中的应用,这可能限制了模型在更广泛场景下的适用性和优化潜力。
    • In the future, we will further explore large-scale pre-training methods in time series, which utilize TimesNet as the backbone and can generally benefit extensive downstream tasks.

改进思路

  • 针对大规模预训练的探索:为了克服缺乏大规模预训练方法的问题,可以探索将TimesNet作为骨干网络,结合大规模数据集进行预训练。这种方法可能会提高模型在各种下游任务中的泛化能力和适应性。
  • 针对特定任务的优化:为了解决对特定任务优化不足的问题,可以在TimesNet的基础上开发特定任务的插件或模块,这些插件或模块可以针对特定类型的时间序列数据进行优化,例如通过引入特定于领域的特征提取器或调整模型架构以更好地处理特定类型的时间依赖性。

补充

代码仓库

  • 这个仓库中提供了iTransformer/PatchTST/TimesNet/DLinear/LightTS/ETSformer/FEDformer/Nonstationary Transformer/Pyraformer/Autoformer/Informer/Reformer/Transformer/Koopa/FiLM/MICN/Crossformer 的简洁复现代码

原作者知乎

notion image
notion image
notion image

基础知识

CKA

中心核对齐(Centered Kernel Alignment, CKA)是用来比较不同神经网络层或不同模型之间表征(特征空间)相似度的一种度量方法。具体来说,CKA评估了一个模型部分学到的特征空间与另一部分有多相似,这有助于洞察神经网络的功能和信息处理特性。
K和L分别是两组表征的中心化核矩阵(通过从每个元素中减去其所在行和列的均值,然后减去整体均值并加回矩阵的均值)
K和L分别是两组表征的中心化核矩阵(通过从每个元素中减去其所在行和列的均值,然后减去整体均值并加回矩阵的均值)

ResNet

Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
  • 核心:引入“残差连接”(或跳跃连接),允许网络中的信息跳过某些层。
  • 目的:解决深度神经网络训练中的梯度消失/爆炸问题(Gradient Vanishing/Exploding Problem),使得可以有效训练更深层的网络。
  • 效果:通过残差连接,网络可以学习恒等映射(Identity Mapping),有助于信息在深层网络中的传播,提高了训练的稳定性和效率。

Inception Block

Going deeper with convolutions
Going deeper with convolutions
notion image
  • 核心:在同一层内并行使用不同大小的卷积核和池化操作。
  • 目的:捕获不同尺度上的信息,同时控制计算复杂度。
  • 效果:通过并行结构,Inception Block能够在不显著增加计算负担的情况下,有效地提取和组合多尺度的特征,增强了模型的表达能力。

Temporal Convolutional Networks

#TODO

ConvNeXt

#TODO

Informer

#TODO

Autoformer

Autoformer
目标:解决待预测的序列长度远远大于输入长度,即基于有限的信息预测更长远的未来。
Transformer的不足:① 长序列中的复杂时间模式使得注意力机制难以发现可靠的时序依赖 基于Transformer的模型不得不使用稀疏形式的注意力机制来应对二次复杂度的问题,但造成了信息利用的瓶颈

评论
Loading...