Member-only story
Starling LM 7B Alpha Surpasses Claude 2, Nears Parity with GPT-4 Turbo
I recently wrote about OpenChat 3.5, which was the first 7B model that achieves comparable results with ChatGPT!
Starling-7B is a fine-tuned version of OpenChat 3.5, released by researchers from Berkeley EECS. It’s been trained using Reinforcement Learning from AI Feedback (RLAIF) on the latest GPT-4 labeled ranking dataset, berkeley-nest/Nectar (183K chat prompts and 3.8M pairwise comparisons).
Join our next cohort: Full-stack GenAI SaaS Product in 4 weeks!
The team also leveraged a new reward training and policy tuning pipeline, which is where it sets itself apart from the rest.
Starling-7B tested across various benchmarks, including MT-Bench, AlpacaEval, and MMLU, giving a comprehensive view of its capabilities:
MT Bench and AlpacaEval assess the chatbot’s helpfulness, as you can see it scores 8.09 in MT Bench and 91.99 in AlpacaEval, outshining almost every other model to date, except for GPT-4 and its Turbo version.