Member-only story
Latest Vision, Image and Language Models: Pangea, Ferret, OmniParser, Granite, Pixtral, Aya, SD 3.5
Recently, an influx of new model releases from all angles has felt like the floodgates of innovation bursting open.
Staying up-to-date might seem overwhelming — like drinking from a firehose — but rest assured, you’re in the right place.
I’ll guide you through the most groundbreaking developments you won’t want to miss:
- Pangea from CMU
- PUMA
- Ferret-UI from Apple
- OmniParser from Microsoft
- Pixtral 12B Base from Mistral
- Stable Diffusion 3.5 from StabilityAI
- Mochi from GenmoAI
- Granite 3.0 from IBM
- Llama 3.2 1B & 3B from Meta
- Aya Expanse from Cohere
Let’s GO!
I want to personally thank you for taking your time to be here — I truly appreciate it!
If you enjoyed this piece, it would mean a lot if you could follow Agent Issue on Medium, give this article a clap, and drop a quick hello in the comments!
I also share more insights over on X — come join the conversation!
Vision-Language Models
Pangea
Pangea-7B is an open-source multilingual multimodal large language model (MLLM) that…