Member-only story

Starling LM 7B Alpha Surpasses Claude 2, Nears Parity with GPT-4 Turbo

Agent Issue
9 min readDec 5, 2023

--

I recently wrote about OpenChat 3.5, which was the first 7B model that achieves comparable results with ChatGPT!

Starling-7B is a fine-tuned version of OpenChat 3.5, released by researchers from Berkeley EECS. It’s been trained using Reinforcement Learning from AI Feedback (RLAIF) on the latest GPT-4 labeled ranking dataset, berkeley-nest/Nectar (183K chat prompts and 3.8M pairwise comparisons).

Join our next cohort: Full-stack GenAI SaaS Product in 4 weeks!

The team also leveraged a new reward training and policy tuning pipeline, which is where it sets itself apart from the rest.

Starling-7B tested across various benchmarks, including MT-Bench, AlpacaEval, and MMLU, giving a comprehensive view of its capabilities:

MT Bench and AlpacaEval assess the chatbot’s helpfulness, as you can see it scores 8.09 in MT Bench and 91.99 in AlpacaEval, outshining almost every other model to date, except for GPT-4 and its Turbo version.

--

--

Agent Issue
Agent Issue

Written by Agent Issue

Your front-row seat to the future of Agents.

Responses (3)