Koopa(NIPS-2023)
00 分钟
2023-11-29
Koopa=Koopman Forecaster,从动力学视角出发,基于Koopman理论设计的时序预测模型
notion image

工作出发点

  1. 【现有方法的缺陷(概述)】难以拟合天然适配数据的非平稳性,很少有从理论基础出发,天然适配非平稳时序数据建模的深度模型结构。 In spite of elaboratively designed models, it is a fundamental problem for deep models to generalize on varying distribution [1, 25, 33], which is widely reflected in real-world time series because of inherent non-stationarity.few works research on the theoretical basis that can be applied to deal with time-variant temporal patterns naturally.
    1. 非平稳时间序列是指其统计属性(如均值、方差)随时间变化的数据序列。这种时间序列的预测比平稳时间序列(其统计属性保持不变)更为复杂。
      Non-stationary time series is characterized by time-variant statistics and temporal dependencies in different periods [2, 14], inducing a huge distribution gap between training and inference and even among each lookback window.
  1. 【TSD with DNN的问题】may still be constrained on non-stationary time series, which is endowed with time-variant properties(时变性) and poses challenges for model capacity and efficiency. Transformer-based的确实可以tailor to empower the robustness against shifted distribution,PatchTST也可以通过channel-independence and instance normalization使其表示更好,但may lead to unaffordable computational cost when the number of series variate is large
    1. TCN(Temporal Convolutional Networks)基模型探索了分层时间模式,采用了具有不同接受区的共享卷积核。 RNN(Recurrent Neural Networks)基模型利用其具有记忆功能的循环结构来捕捉时间点之间的隐式转换(implicit transition)。 MLP(Multi-Layer Perceptrons)基模型通过点对点加权学习来建模简单的时间依赖关系,展示了令人印象深刻的性能和效率。
  1. 【从动力学的视角出发建立模型】
    1. real-world time series acts like time-variant dynamics(时变动力学)
      Koopman Autoencoder to learn the measurement function and operator simultaneously,PCL [3] further introduces a backward procedure to improve the consistency and stability of the operator,MDKAE disentangles dominant factors underlying sequential data and is competent to forecast with specific factors,K-Forecast utilizes Koopman theory to handle nonlinearity in temporal signals and propose to optimize data-dependent basis for long-term time series forecasting.By leveraging predefined measurement functions, KNF learns Koopman operator and attention map to cope with time series forecasting with changing temporal distribution.
      本项目的Koopaman Predictor更加模块化(End-to-end forecasting objective optimization),hierarchically learned算子来处理时变与时不变,renovate Koopman Autoencoder by removing the reconstruction loss to achieve fully predictive training.

工作的构思

  1. Koopman的优势是将非线性的时变转变在线性空间内得以解决
    1. transform nonlinear system into measurement function space, which can be described by a linear Koopman operator. spectral analysis can be used
  1. localized time series exhibited weak stationarity, and Koopman function space can be divided into several neighborhoods and use localized linear operators.
    1. Wold’s Theorem, every covariance-stationary time series Xt can be formally ecomposed as: 分为时变与时不变两个部分,分别使用全局与局部进行解决
      notion image
      notion image
【modular Koopman Predictors(KP)】→ hierarchically describe and advance forward series dynamics.
notion image
 

技术方案的解读

Koopa Block

b-th block
b-th block
Unlike KAEs (Koopman Autoencoder)that introduce a loss term for rigorous reconstruction of the lookback-window series, we feed the residual X(b+1) as the input of next block for learning a corrective operator.
notion image

Fourier Filter

We take the top percent of α as the subset Gα ⊂ S, which contains dominant spectrums shared among all lookback windows and exhibits time-invariant dynamics underlying the dataset.(by amplitude)
notion image

Time-invariant KP

notion image
globally shared dynamics
D denotes the embedding dimension,where T and C denote the lookback window length and the variate number
D denotes the embedding dimension,where T and C denote the lookback window length and the variate number
notion image
 
notion image
snapshot pairs
snapshot pairs

Time-variant KP

obtain semantic snapshots and reduce iterations, do the division of , as well as padding
notion image
notion image
notion image
we leverage eDMD [45] to find the best fitted matrix that advances forward the system.
notion image
K_var在每个窗口都是独立计算且不同的
notion image
notion image
To obtain a prediction of length H, we iterate operator forwarding to get H/S predicted embedding
notion image
notion image

Forecasting Objective

In Koopa, Encoder, Decoder and Kinv are learnable parameters, while Kvar is calculated on-the-fly(different in different block).
notion image
if reconstruction failed, the prediction must also fail

Scaling Up Forecast Horizon

Koopa可利用到来的数据来自适应更新算子。提出一种滚动预测与算子更新结合的机制,该算法能使模型无需训练即可适配数据分布变化,进一步提高预测效果,在非平稳的时间序列中取得了尤为显著的提升。
Since Time-invariant KP has learned the globally shared dynamics and Time-variant KP can calculate localized operator Kvar within the lookback window, we freeze the parameters of trained Koopa but only use the incoming ground truth to adapt Kvar. The naïve implementation uses incremental Koopman embedding with dimension D and conducts Equation 10 to obtain an updated operator, which has a complexity of O(HteD3). We further propose an iterative algorithm with improved O((Hte + D)D2) complexity.
notion image
Therefore, we freeze the parameters of Koopa but only use the incoming ground truth for operator adaptation of Kvar in Time-variant KPs.Instead of calculating new Kvar+ from incremental collections, we utilize calculated Kvar to find the iteration rule on Kvar+.
notion image
notion image
notion image
notion image
notion image

实验结果

  • 适配非平稳特性:库普曼理论能够建模时变动力系统的状态转移,天然契合非平稳时序数据的预测场景;
  • 轻量化高效:基于线性层搭建,相较领域前沿模型,在取得SOTA效果的同时,仅需平均1/4的训练时间及显存占用;
  • 适应预测分布变化:可结合滚动预测以及算子更新机制,无需重新训练即可适应业务数据的实时分布变化。

标准数据集

ECL(UCI), ETT [53], Exchange [22], ILI (CDC), Traffic (PeMS), and Weather (Wetterstation). For univariate forecasting, we evaluate the performance on the well-acknowledged M4 dataset.
we set the length of lookback window T = 2H as the same with N-BEATS
  1. ETT dataset contains the data collected from electricity transformers, including load and oil temperature that are recorded every 15 minutes between July 2016 and July 2018.
  1. Electricity dataset contains the hourly electricity consumption of 321 customers from 2012 to 2014.
  1. Exchange [25] records the daily exchange rates of eight different countries ranging from 1990 to 2016.
  1. Traffic is a collection of hourly data from California Department of Transportation, which describes the road occupancy rates measured by different sensors on San Francisco Bay area freeways.
  1. Weather is recorded every 10 minutes for 2020 whole year, which contains 21 meteorological indicators, such as air temperature, humidity,etc.
  1. ILI4 includes the weekly recorded influenza-like illness (ILI) patients data from Centers for Disease Control and Prevention of the United States between 2002 and 2021, which describes the ratio of patients seen with ILI and the total number of the patients. We follow standard protocol and split all datasets into training, validation and test set in chronological order by the ratio of 6:2:2 for the ETT dataset and 7:1:2 for the other datasets.
  1. M4 contains four subsets of periodically collected univariate marketing data.
Dataset
Description
Time Interval
Time Range
Split Ratio
ETT
Electricity transformer data, including load and oil temperature, recorded every 15 minutes from July 2016 to July 2018.
15 minutes
July 2016 - July 2018
6:2:2 (ETT)
Electricity
Hourly electricity consumption of 321 customers from 2012 to 2014.
Hourly
2012 - 2014
7:1:2
Exchange
Daily exchange rates of eight different countries from 1990 to 2016.
Daily
1990 - 2016
7:1:2
Traffic
Hourly road occupancy rates from California Department of Transportation on San Francisco Bay area freeways.
Hourly
Not specified
7:1:2
Weather
Meteorological indicators recorded every 10 minutes throughout 2020, including 21 indicators like air temperature, humidity, etc.
10 minutes
2020
7:1:2
ILI
Weekly recorded influenza-like illness (ILI) patient data from the US Centers for Disease Control and Prevention from 2002 to 2021.
Weekly
2002 - 2021
7:1:2
 

评价标准

  1. 6 Autoformer: MAE,MSE
  1. M4: sMAPE,MASE,OWA
    1. sMAPE(对称平均绝对百分比误差):这是一个衡量预测模型准确性的指标,它是对称的;即不会对高估预测结果过度惩罚,也不会对低估结果欠缺惩罚。
      1. notion image
    2. MASE(平均绝对缩放误差):这个指标衡量预测的准确性,并根据简单预测方法(通常是朴素方法)的历史性能来缩放误差。它在比较不同预测模型在单一数据集上的性能时特别有用,尤其是当数据集的尺度不一致时。公式是 \( \frac{mean(|F_t - A_t|)}{mean(|A_{t} - A_{t-1}|)} \)。MASE小于1表示预测结果比平均水平的朴素预测要好。
      1. notion image
    3. OWA(整体加权平均):OWA是一个综合性指标,将不同的预测准确性指标组合成一个统计量。它通常用于预测比赛中,以便更全面地评估模型在多个错误指标上的表现。它通过对几个准确性指标(通常包括MAE和sMAPE等)的标准化值进行平均计算,并为每个指标赋予一定的权重。OWA值越低表示整体性能越好。

Baseline&SOTA

  1. Transformer-based model: Autoformer [48], PatchTST [31];
  1. TCN-based model: TimesNet [47], MICN [43];
  1. MLP-based model: DLinear [51];
  1. Fourier forecaster: FiLM [54]
  1. Koopman forecaster: KNF [44].
  1. We also introduce additional specialized models N-HiTS [7] and N-BEATS [32] for univariate forecasting as competitive baselines.

核心结论分析

  1. surpasses SOTA Koopman-based forecaster KNF, which can be attributed to our hierarchical dynamics learning and disentangling mechanism.
  1. as the representative of efficient linear models, the performance of DLinear is still subpar in ILI, Traffic and Weather, indicating that nonlinear dynamics underlying the time series poses challenges for model capacity and point-wise weighting may not be appropriate to portray time-variant dynamics.
  1. Besides, compared with painstakingly trained PatchTST with channel-independence mechanism, our model can achieve a close and even better performance with naturally addressed non-stationary properties of real-world time series.在序列数较多的交通流量预测任务上(Traffic),Koopa最多可节省96.5%的的训练时间并且仅需其2.9%的显存开销;
notion image
Besides, as an efficient MLP-based forecaster, Koopa is also capable of learning nonlinear dynamics from time-variant and time-invariant components, and thus achieves a better performance.
  1. Dynamics disentanglement:the disentangling effect of our proposed Fourier Filter. It can be observed that larger deviations occur in the time-variant component, which indicates the proposed module successfully disentangles two types of dynamics from the perspective of frequency domain.
  1. Case study: It can be clearly observed that localized operators can exhibit changing temporal patterns in different periods, indicating the necessity of utilizing varying operators to describe time-variant dynamics.
    1. notion image
 

消融实验分析

  1. effective disentanglement: Truncated Filter replaces Fourier Filter with High-Low Pass Filter;
    1. notion image
  1. Avoiding rigorous reconstruction: Thus we remove the reconstruction branch, which is only utilized during training in previous KAEs.
    1. notion image
  1. from the spectral perspective As most of non-stationary time series experience the distribution shift and can be regarded as an unstable evolution, the learned Koopman operator with the modulus far from the unit circle will cause non-divergent and even explosive trending in the long term, leading to training failures.
    1. 在动态系统理论中,算子的特征值对应于系统演化的振幅。如果特征值的模(即绝对值)远离单位圆,这表明系统的演化不稳定,可能会导致训练失败,因为长期预测可能会发散或呈爆炸趋势。
      对应的解决方案:utilize the disentanglement and deep residual structure
      notion image
 

总结与展望

论文贡献点

  • Koopa模型的提出:利用Koopman理论来学习和分解非平稳时间序列中的动态,尤其是通过分离时间变化和时间不变的成分,提高预测的准确性和效率。
  • 算子适应方法:开发了一种算子适应方法,可以扩展预测模型的预测长度,这对于长期时间序列预测特别有价值。

论文缺陷

  • 多变量动态未分别考虑:模型没有针对时间序列中不同变量的独特动态分别建模,这可能限制了模型在处理多变量、复杂相互作用数据时的效果。
    • 原因分析:多变量时间序列中的变量可能有着复杂的相互依赖关系,单一模型可能无法捕捉这种多维度的动态变化。

改进思路

  • 针对多变量动态未分别考虑:改进模型以包括变量间相互作用的显式建模,例如通过引入图网络来捕捉变量间的关系,或者使用多任务学习框架来为不同的变量学习专门的子模型。
    • 改进原因:显式建模变量间的相互作用可以帮助模型更精确地预测时间序列,尤其是在变量间存在复杂相互依赖关系的情况下。
 
notion image
notion image

基础知识

Koopman and eDMD

DMD

Dynamic Mode Decomposition
假设相邻两个时刻之间满足线性变换,通过使用SVD保留前面个特征值的方法,将维度先降低为维,再升高为维,从而降低计算复杂度,从而递归表示出各个时刻的向量;之后使用EVD构造出各个时刻的向量的直接计算方法
notion image
notion image

Koopman Operator

notion image
notion image
notion image
notion image

eDMD

Extended Dynamic Mode Decomposition
以数据驱动去逼近主导的Koopman eigenvalue, eigenfunctions, modes,Koopman得到的是无限维的,可以使用DMD将其维数降低为低维进行处理(DMD的思路)
notion image

TCN

TCN-based models explore hierarchical temporal patterns and adopt shared convolutional kernels with diverse receptive fields. 探索了分层时间模式,采用了具有不同接受区的共享卷积核
#TODO
 
notion image

Koopman

notion image
每个single_forward里面计算一起attention
每个single_forward里面计算一起attention

评论
Loading...