Member-only story

Latest Vision, Image and Language Models: Pangea, Ferret, OmniParser, Granite, Pixtral, Aya, SD 3.5

17 min readOct 29, 2024

Recently, an influx of new model releases from all angles has felt like the floodgates of innovation bursting open.

Staying up-to-date might seem overwhelming — like drinking from a firehose — but rest assured, you’re in the right place.

I’ll guide you through the most groundbreaking developments you won’t want to miss:

Pangea from CMU
PUMA
Ferret-UI from Apple
OmniParser from Microsoft
Pixtral 12B Base from Mistral
Stable Diffusion 3.5 from StabilityAI
Mochi from GenmoAI
Granite 3.0 from IBM
Llama 3.2 1B & 3B from Meta
Aya Expanse from Cohere

Let’s GO!

I want to personally thank you for taking your time to be here — I truly appreciate it!
If you enjoyed this piece, it would mean a lot if you could follow Agent Issue on Medium, give this article a clap, and drop a quick hello in the comments!
I also share more insights over on X — come join the conversation!

Vision-Language Models

Pangea

Pangea-7B is an open-source multilingual multimodal large language model (MLLM) that…

Latest Vision, Image and Language Models: Pangea, Ferret, OmniParser, Granite, Pixtral, Aya, SD 3.5

Vision-Language Models

Pangea

Written by Agent Issue

No responses yet