Member-only story

Yarn Mistral 7B 128K: GGUF Model with LangChain and CTransformers

11 min readNov 6, 2023

Currently, the best 7B LLM on the market is Mistral 7B v0.1 with 8K context window, and Yarn Mistral 7B 128K is an extension of the base Mistral-7B-v0.1 with 128K context window, released by NousResearch.

We know that the context window is crucial to LLMs’ performance, but extending this context window often led to a degradation in performance — until now.

Join our next cohort: Full-stack GenAI SaaS Product in 4 weeks!

In this article, I will walk you through:

A brief explanation of YaRN (Yet another RoPE extensioN method)
Running GGUF format “Yarn Mistral 7B 128K” model that requires only 8GB VRAM
Running Yarn Mistral 7B 128K GGUF model with LangChain and CTransformers with small and long contexts

Let’s get started!

Introducing YaRN — Yet another RoPE extensioN

YaRN is a compute-efficient method to push the boundaries of context length with unprecedented efficiency. It requires 10 times fewer tokens and 2.5 times fewer training steps than former methods.

Previous methods, such as the “NTK-aware” (used in CodeLlama) and “Dynamic NTK” (used in Qwen 7B) interpolations, made…

Yarn Mistral 7B 128K: GGUF Model with LangChain and CTransformers

Introducing YaRN — Yet another RoPE extensioN

Written by Agent Issue

Responses (6)