Exploring Jamba 1.5: Innovative Hybrid Transformers for AI
Written on
Chapter 1: Introduction to Jamba 1.5
AI21 Labs has unveiled the latest version of Jamba, known as Jamba 1.5. This update introduces two new hybrid models that combine the strengths of transformer layers with state-space Mamba layers and a mixture-of-experts (MoE) approach.
The new Jamba 1.5 models come in two variants: the Mini model, which operates with 12 billion active parameters out of a total of 52 billion, and the Large model, featuring 94 billion active parameters from a total of 398 billion. The licensing for Jamba 1.5 is custom, permitting commercial use as long as the company's annual revenue does not exceed $50 million.
Section 1.1: Long-Context Processing Capabilities
Jamba 1.5 has been specifically designed for long-context retrieval-augmented generation (RAG) tasks, with the ability to support contexts as large as 256,000 tokens. This makes it exceptionally well-suited for scenarios that demand comprehensive contextual comprehension.
Subsection 1.1.1: Multilingual Support
In addition to its impressive context handling, Jamba 1.5 supports eight languages, including English, Spanish, French, Portuguese, Italian, Dutch, German, Arabic, and Hebrew.
Section 1.2: Performance and Efficiency
The model is tailored for function calling and structured outputs, particularly in the JSON format. Notably, a new quantization technique named ExpertsInt8 has been incorporated, enhancing its efficiency. Performance-wise, Jamba 1.5 boasts inference speeds that are up to 2.5 times faster for long-context scenarios compared to traditional decoder-only models.
Chapter 2: Practical Applications and Comparisons
For example, the Mini version can manage sequences of up to 140,000 tokens using a single A100 80GB GPU, a feat that standard transformer models struggle to achieve, even with KV cache quantization.
The video titled "Hybrid SSM-Transformer Models - Jamba 1.5 is out" provides an in-depth look at the capabilities and applications of Jamba 1.5.
When it comes to accuracy, the Mini version demonstrates performance comparable to Gemma 2 9B, while the Large model aligns closely with Llama 3.1 70B. Due to the substantial size of these models, they are best utilized for applications requiring extensive long-context processing.
For further insights, I previously reviewed the initial release of Jamba in an article for The Salt:
source: The Jamba 1.5 Open Model Family: The Most Powerful and Efficient Long Context Models.
To stay updated on the latest advancements in AI, consider subscribing to my newsletter for more articles and tutorials: Stackademic 🎓. Thank you for reading! If you found this helpful, please support my work by clapping and following me! 👏 Connect with us on X | LinkedIn | YouTube | Discord. Explore more content at Stackademic.com.