Member-only story

Latest Vision, Image and Language Models: Pangea, Ferret, OmniParser, Granite, Pixtral, Aya, SD 3.5

Agent Issue
17 min readOct 29, 2024

--

Recently, an influx of new model releases from all angles has felt like the floodgates of innovation bursting open.

Staying up-to-date might seem overwhelming — like drinking from a firehose — but rest assured, you’re in the right place.

I’ll guide you through the most groundbreaking developments you won’t want to miss:

  • Pangea from CMU
  • PUMA
  • Ferret-UI from Apple
  • OmniParser from Microsoft
  • Pixtral 12B Base from Mistral
  • Stable Diffusion 3.5 from StabilityAI
  • Mochi from GenmoAI
  • Granite 3.0 from IBM
  • Llama 3.2 1B & 3B from Meta
  • Aya Expanse from Cohere

Let’s GO!

I want to personally thank you for taking your time to be here — I truly appreciate it!

If you enjoyed this piece, it would mean a lot if you could follow Agent Issue on Medium, give this article a clap, and drop a quick hello in the comments!

I also share more insights over on X — come join the conversation!

Vision-Language Models

Pangea

Pangea-7B is an open-source multilingual multimodal large language model (MLLM) that…

--

--

Agent Issue
Agent Issue

Written by Agent Issue

Your front-row seat to the future of Agents.

No responses yet