CascadiaPrime Cognition

 Home    About    Blog    X.AI Understand the Universe    Future of Life Institute    Oxford Future of Humanity Institute    Cambridge Center for Existential Risk   Machine Intelligence Research Institute     Partnership on AI  

  Center for Brains, Minds & Machines     US Brain Project    EU Brain Project    Blue Brain Project     China Brain Project     AI for the Brain     CLAIRE Research Network  

  The Montreal Institute for Learning Algorithms (MILA)     Vector Institute for Artificial Intelligence     The Alberta Machine Intelligence Institute (AMII)     CAIDA: UBC ICICS Centre for Artificial Intelligence Decision-making and Action     CIFAR  Canadian Artificial Intelligence Association (CAIAC)  

 The Stanford Institute for Human-Centered Artificial Intelligence     Open AI    The Association for the Advancement of Artificial Intelligence (AAAI)    Allen Institute for AI     AI 100    The Lifeboat Foundation     Center for Human-Compatible AI  


CascadiaPrime Cognition - State Space Models (SSMs)

Unlike the brain, existing LLM transformer models use staggering amounts of energy.

Simply put, not only are the transformer models in LLMs expensive to run, they do not align with sustainability objectives for the planet.

SSMs are models with three views. A continuous view, and when discretized, a recurrent as well as a convolutive view. SSMs have an ability to handle very long sequences (number of tokens), generally with a lower number of parameters than other models (ConvNet or transformers), while still being very fast. SSMs can be applied to text, vision, audio and time-series tasks (or even graphs).

Jensen Huang, NVIDIA: "And I think the work around state-space models, or SSMs, that allow you to learn extremely long patterns and sequences without growing quadratically in computation, probably is the next transformer."


What is a State Space Model?

  Wiki: State Space (computer science)
  Mathworks: What are State-Space Models?
  State space model (SSM) definition and history in various fields
  Wiki: State-space representation

State Space Model (SSM)

  The Stanford AI Lab Blog : Can Longer Sequences Help Take the Next Leap in AI? (Chris Ré, Tri Dao, Dan Fu, Karan Goel) (June 9, 2022)
  Could State Space Models kill Large Language Models? (January 18, 2024)
  Huggingtace: Introduction to State Space Models (SSM)
  Structured State Space Models for In-Context Reinforcement Learning Part of Advances in Neural Information Processing Systems 36 pre-proceedings (NeurIPS 2023) Main Conference Track
  A Visual Guide to Mamba and State Space Models An alternative to Transformers for language modeling (February 2024)

State Space Model Papers (SSM)

  arXiv: Learning World Models With Hierarchical Temporal Abstractions: A Probabilistic Perspective Vaisakh Shaj (April 24, 2024)
  arXiv: Efficiently Modeling Long Sequences with Structured State Spaces Albert Gu, Karan Goel, Christopher Ré (August 5, 2023)
  arXiv: Mamba: Linear-Time Sequence Modeling with Selective State Spaces (December 1, 2023)
  arXiv: Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, Christopher Ré (October 26, 2021)
  arXiv: Convolutional State Space Models for Long-Range Spatiotemporal Modeling Jimmy T.H. Smith, Shalini De Mello, Jan Kautz, Scott W. Linderman, Wonmin Byeon (October 30, 2021)
  Amazon Research: Deep State Space Models for Time Series Forecasting (32nd Conference on Neural Information Processing Systems (NeurIPS 2018))
  arXiv: Long Range Arena: A Benchmark for Efficient Transformers (November 8, 2020)
  arXiv: Mamba: Linear-Time Sequence Modeling with Selective State Spaces (December 1, 2023)

State Space Model (SSM) Talks (You Tube)

  Efficiently Modeling Long Sequences with Structured State Spaces, Albert Gu, Stanford MedAI
  Mamba: Long Range Arena: A Benchmark for efficient Transformers
  Mamba STRIKES again Overview