mamba
00 分钟
2024-4-6

Structured State SpaceS (S4)

State Space Models (SSM) mechanics

notion image
notion image
相较于Linear RNN的到下一个state,SSM得到的导数更具有连续性(可以设置来控制步长)
transformer可以在训练并行,但推理时还是要串行;RNN训练时要将序列逐个加入,推理时可以并行(看头一个);SSM将RNN+Transformer
 
notion image
notion image
主要在于continuous(由定义决定的)、recurrent(可以化成递归的形式进行计算)、convolutional(可以使用卷积的方式并行加速计算)
notion image
notion image
notion image
可以构造一个卷积核K(对于L固定来说K也是definite的),使得u不经过x便可以算出y,相较于RNN的递归式,如此可以通过parallelizable + near-linear computation的方式算出y(可以通过FFN来计算卷积,计算快速)

Structured State SpaceS (S4) for long-term dependencies

S4: a special state space model

SSM + HiPPO + Structured Matrices = S4
SSM在计算时slow:
notion image
  • hidden state x的维度是input u的k倍,计算时用x做运算(占用更多空间,k倍)
  • 计算卷积核K更加昂贵
这时,使用structed state space可以有效地解决此问题
notion image

S4 variants and simplifications

HIPPO可以用于online function reconstruction(实时根据x生成对应的u ),同时HIPPO的设置也能避免vanilla RNN的梯度爆炸问题,可以使用S4D核技巧,将A换成近似对角矩阵来解决
notion image

Modeling signals with S4

notion image
S4比WaveNet对语音建模更好的地方在于其inductive bias,可以对于continuous signal有更好的捕捉,同时可以捕获大量的上下文信息
notion image

Selective State Space Model

MAMBA and State Space Models explained | SSM explained
We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs. SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences! AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/ Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBABEAN. Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael Outline: 00:00 Mamba to replace Transformers!? 02:04 State Space Models (SSMs) – high level 03:09 State Space Models (SSMs) – more detail 05:45 Discretization step in SSMs 08:14 SSMs are fast! Here is why. 09:55 SSM training: Convolution trick 12:01 Selective SSMs 15:44 MAMBA Architecture 17:57 Mamba results 20:15 Building on Mamba 21:00 Do RNNs have a comeback? 21:42 AICoffeeBreak Merch 📄 Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752 (2023). https://arxiv.org/abs/2312.00752 📄 MoE-Mamba https://arxiv.org/abs/2401.04081 📄 Vision Mamba https://arxiv.org/abs/2401.09417 📄 MambaByte https://arxiv.org/abs/2401.13660 🕊️ Mamba rejected from ICLR: https://twitter.com/srush_nlp/status/1750526956452577486 📖 Prefix sum (Scan) with Cuda: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda 📺 Transformer explained: https://www.youtube.com/playlist?list=PLpZBeKTZRGPNdymdEsSSSod5YQ3Vu0sKY Great resources to learn about Mamba: 📙 Mamba: https://jameschen.io/jekyll/update/2024/02/12/mamba.html 📕 The Annotated S4: https://srush.github.io/annotated-s4/ 📘 Mamba The Easy Way: https://jackcook.com/2024/02/23/mamba.html ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕ Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak Join this channel to get access to perks: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔗 Links: AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ YouTube: https://www.youtube.com/AICoffeeBreak #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​ Scientific advising by Mara Popescu Video editing: Nils Trost Music 🎵 : Sunny Days – Anno Domini Beats
MAMBA and State Space Models explained | SSM explained
notion image
相较于SSMs的对于每一个输入(x)都是一样的处理,Selective SSM可以考虑到每一个输入的重要程度(即增加input-dependent control,类似于attention)
notion image
notion image
 
使用SCAN算法(all-prefix-sums)来计算对于input-dependent control的处理,使得计算更加正确
同时,使用纯pytorch的SCAN,计算速度还是很慢,故提出了下面的方法(mamba考虑到了硬件上的加速)
 
notion image
 

Mamba

notion image
notion image
输出的x是由SSMs的输出(得到了之前的token融合到的)以及当前的embedding的积

Links

MambaStock
zshicodeUpdated May 8, 2024
transformer在处理discrete、short data时相较于S4处理得更好;作为continuous model,其在continuous, need lot of context时处理得更好
u是input,x是hidden,y是output
u是input,x是hidden,y是output
还是这个讲的更清楚一点
MAMBA and State Space Models explained | SSM explained
We simply explain and illustrate Mamba, State Space Models (SSMs) and Selective SSMs. SSMs match performance of transformers, but are faster and more memory-efficient than them. This is crucial for long sequences! AI Coffee Break Merch! 🛍️ https://aicoffeebreak.creator-spring.com/ Celebrating our merch launch, here is a limited time offer! 👉 Get 25% discount on AI Coffee Break Merch with the code MAMBABEAN. Thanks to our Patrons who support us in Tier 2, 3, 4: 🙏 Dres. Trost GbR, Siltax, Vignesh Valliappan, Michael Outline: 00:00 Mamba to replace Transformers!? 02:04 State Space Models (SSMs) – high level 03:09 State Space Models (SSMs) – more detail 05:45 Discretization step in SSMs 08:14 SSMs are fast! Here is why. 09:55 SSM training: Convolution trick 12:01 Selective SSMs 15:44 MAMBA Architecture 17:57 Mamba results 20:15 Building on Mamba 21:00 Do RNNs have a comeback? 21:42 AICoffeeBreak Merch 📄 Gu, Albert, and Tri Dao. "Mamba: Linear-time sequence modeling with selective state spaces." arXiv preprint arXiv:2312.00752 (2023). https://arxiv.org/abs/2312.00752 📄 MoE-Mamba https://arxiv.org/abs/2401.04081 📄 Vision Mamba https://arxiv.org/abs/2401.09417 📄 MambaByte https://arxiv.org/abs/2401.13660 🕊️ Mamba rejected from ICLR: https://twitter.com/srush_nlp/status/1750526956452577486 📖 Prefix sum (Scan) with Cuda: https://developer.nvidia.com/gpugems/gpugems3/part-vi-gpu-computing/chapter-39-parallel-prefix-sum-scan-cuda 📺 Transformer explained: https://www.youtube.com/playlist?list=PLpZBeKTZRGPNdymdEsSSSod5YQ3Vu0sKY Great resources to learn about Mamba: 📙 Mamba: https://jameschen.io/jekyll/update/2024/02/12/mamba.html 📕 The Annotated S4: https://srush.github.io/annotated-s4/ 📘 Mamba The Easy Way: https://jackcook.com/2024/02/23/mamba.html ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔥 Optionally, pay us a coffee to help with our Coffee Bean production! ☕ Patreon: https://www.patreon.com/AICoffeeBreak Ko-fi: https://ko-fi.com/aicoffeebreak Join this channel to get access to perks: https://www.youtube.com/channel/UCobqgqE4i5Kf7wrxRxhToQA/join ▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀ 🔗 Links: AICoffeeBreakQuiz: https://www.youtube.com/c/AICoffeeBreak/community Twitter: https://twitter.com/AICoffeeBreak Reddit: https://www.reddit.com/r/AICoffeeBreak/ YouTube: https://www.youtube.com/AICoffeeBreak #AICoffeeBreak #MsCoffeeBean #MachineLearning #AI #research​ Scientific advising by Mara Popescu Video editing: Nils Trost Music 🎵 : Sunny Days – Anno Domini Beats
MAMBA and State Space Models explained | SSM explained

评论
Loading...