AI Roundup

Deep Currents 12.12.25

Crispin Bailey

12 Dec 2025 • 11 min read

Welcome to the final 2025 installment of Deep Currents, a monthly curated digest of breakthroughs, product updates, and significant stories that have surfaced in the rapidly-evolving world of generative AI. My goal with these posts is to help you keep your head above water – hang in there, it's been quite a ride so far!

Not surprisingly, the past month brought another flood of releases across every category. This month I'm trying something new based on reader feedback: leading with the analysis rather than burying it at the end. If you want the full rundown of updates, scroll down to the detailed breakdown. Otherwise, here's what I think matters most from this month's developments...

Reading the Currents

The LLM race is now about differentiation. December saw major releases from Google (Gemini 3), OpenAI (GPT-5.1 and GPT-5.2), Anthropic (Claude Opus 4.5), DeepSeek (V3.2), Mistral (Mistral 3), Baidu (ERNIE 5), and Amazon (Nova 2). When this many capable models ship in a single month, the conversation shifts from "who can build a good LLM" to "what makes yours different."

The positioning strategies are telling. Google is betting on UI generation and agentic features. OpenAI focused GPT-5.1 on making the model "feel more human" and expanded personality customization with presets like Friendly, Efficient, and Quirky, then optimized GPT-5.2 for math, science, and professional tasks. Anthropic emphasized that Opus 4.5 is not only the best coder but also their most "robustly aligned" model, resistant to prompt injection and less likely to exhibit concerning behavior. DeepSeek and Mistral continue pushing the open-weights frontier.

None of these companies are abandoning raw capability, but they're clearly thinking about what will matter once capability becomes table stakes. Personality, safety, openness, and specialized features are the new differentiators.

Google is betting big on Generative UI. Amid all the model announcements, Google's Gemini 3 stands out for a capability that goes beyond the usual benchmarks: Generative UI. The new Dynamic View and Visual Layout features let Gemini design and build interactive, highly visual experiences on the fly from a simple prompt. This isn't generating an image or writing code that someone else executes. It's generating functional interfaces in real time.

Google is already integrating these into Search's AI Mode, and they're pushing even further with Disco, an experimental browser prototype now in limited testing. Disco's GenTabs feature builds entire interfaces on the fly based on your prompts. Instead of navigating to websites, you describe what you want and the browser constructs it. It's a bold bet that the future of the web isn't about better ways to browse existing pages, but about generating experiences on demand. Whether users actually want this remains to be seen, but Google is clearly committed to finding out.

Visual generation just leveled up. The image and video tools released this month represent a genuine quality leap rather than incremental improvement. Google's Nano Banana Pro has incredible prompt accuracy, can blend up to 14 images while maintaining consistency across 5 people, and it taps into Google Search for factually accurate visuals when creating infographics. Black Forest Labs' FLUX.2 delivers 4MP photorealistic output with complex typography. And Bytedance's Seedream 4.5 made a huge leap in photorealism, editing, and object reference consistency.

On video, Runway's Gen-4.5 topped the leaderboards for realism, and excels in representing accurate physics and fine details, while Kling's O1 became the first multimodal video model, able to understand text, images, video, and visual elements like characters in a single architecture. When video models can handle both creation and editing in the same system, the workflow friction that's kept AI video feeling experimental starts to dissolve. We're approaching the point where these tools become practical for professional work.

Vibe coding is no longer a curiosity. When Google launches an entirely new IDE (Antigravity), expands a multi-year enterprise partnership with Replit, and Replit ships a Design Mode that generates interactive mockups in under two minutes using Gemini 3, you're looking at a maturing market rather than a novelty. Warp's Agents 3.0 release added full terminal control, spec-driven development, and integrations with Slack and Linear. OpenAI's GPT-5.1-Codex-Max is their first model natively trained to work across multiple context windows. Webflow now lets you build full-stack apps without leaving their platform. Claude Code is now integrated with their desktop app so you no longer need to work in an IDE.

The competition has shifted from "can AI write code" to "what's the best workflow for AI-assisted development." These tools are no longer targeting hobbyists and curious developers. They're going after enterprise teams and professional workflows. The vibe coding era that started as a meme is now firmly established.

Competitors are collaborating on agent standards. In a surprising move, Anthropic, Block, and OpenAI joined forces to launch the Agentic AI Foundation under Linux Foundation governance. Each company donated their open-source agent projects: Anthropic contributed MCP, Block contributed Goose, and OpenAI contributed AGENTS.md. AWS, Google, Microsoft, Cloudflare, Salesforce, Oracle, and many others have signed on.

This is notable because these companies are fierce competitors in every other dimension. But they've apparently decided that fragmented agent protocols would be worse for everyone than shared standards. It's reminiscent of how browser vendors eventually converged on web standards despite competing aggressively on everything else. Whether this foundation becomes the W3C of agents or fizzles into irrelevance will depend on whether the donated projects actually become the de facto way agents communicate. The fact that it exists at all suggests the industry is taking the interoperability problem seriously.

The music licensing wars are becoming a negotiated peace. Both Suno and Udio announced deals with Warner Music Group this month. Udio is now on track to become a fully-licensed service in 2026. Meanwhile, Suno raised funding at a $2.45 billion valuation.

The approaches differ in interesting ways. Udio seems committed to transitioning entirely to licensed content, while Suno's path remains less clear. But the broader signal is that the adversarial phase between AI music companies and major labels may be ending. Rather than fighting over whether AI-generated music should exist, the conversation has shifted to how artists and labels participate in the value it creates. Whether these deals will work for artists (rather than just labels) is still an open question.

Meta acquired Limitless. Hackers responded within 48 hours. The AI wearable company Limitless sold out to Meta last week. Just two days later, hackers had figured out how to connect the Limitless pendant to Omi, an open-source alternative. It's a small story, but a tidy parable about platform control and the persistence of open alternatives. Every time a company tries to lock down a hardware ecosystem, someone finds a way around it.

We're finally getting real data on how people actually use AI. Anthropic released two studies this month: one examining how their own engineers use Claude for coding, and a broader survey of 1,250 professionals conducted using a new tool they built called Anthropic Interviewer. OpenAI published their first State of Enterprise AI report with insights from their largest customers. Microsoft analyzed 37.5 million Copilot conversations from the past year, revealing behavioural patterns across devices, time periods, and topics.

This kind of research has been notably absent from the AI hype cycle. Companies have been eager to announce capabilities and benchmarks, but actual usage patterns have remained murky. As these tools become workplace fixtures rather than experiments, understanding how people integrate them into real workflows matters more than what they can theoretically do. The fact that major AI companies are now investing in this kind of research suggests they're thinking beyond acquisition and toward retention and depth of use.

Okay, now for the full rundown of the past month's releases across every category. If the list below seems overwhelming, that's because it is.

LLMs

Amazon announced the Nova 2 family of models including Lite, Pro, Sonic (voice), and Omni (multimodal).
Anthropic released Claude Opus 4.5, claiming it's both the best coding model and the most robustly aligned model they've ever shipped.
AI2 released Olmo 3, the first fully open 32B-parameter reasoning model with complete training transparency.
Baidu released ERNIE 5, an omni-modal model claiming to beat GPT-5 on document understanding and chart analysis, plus Famou, a "self-evolving" AI agent.
DeepSeek released V3.2 and V3.2-Speciale, both open-source reasoning models performing on par with GPT-5 and Gemini 3 Pro.
Google unveiled Gemini 3 with deep agentic features and new Generative UI capabilities (Dynamic View and Visual Layout). Gemini 3 Deep Think is now available to AI Ultra subscribers. They also introduced Private AI Compute, a cloud-based processing platform with hardware-secured isolation for enterprise customers.
Mistral released Mistral 3, a family of upgraded multimodal open models comparable to DeepSeek and Kimi K2.
NotebookLM gained a new Deep Research feature and support for more file types including spreadsheets.
Nous Research released Hermes 3, their first model post-trained entirely on the distributed Psyche network, with up to 512K context length.
OpenAI released GPT-5.1 (focused on feeling more human), with Instant for everyday use and Thinking for complex reasoning. They also updated Customization with six personality presets and fine-tuning for traits like conciseness and warmth. Then a month later they released GPT-5.2 optimized for math, science, and agentic tasks with a 400,000-token context window. They also rolled out ChatGPT for Teachers, free through June 2027, and added group chats to ChatGPT.
Perplexity released persistent memory, enabling the assistant to remember preferences and conversations across sessions.
xAI released Grok 4.1 with personality optimizations.

Images

Adobe launched Photoshop for ChatGPT (plus Acrobat and Express) as integrated apps for making basic edits directly in ChatGPT threads.
Black Forest Labs released FLUX.2 with 4MP photorealistic output, color-matching, complex typography, and multi-reference control.
Bytedance released Seedream 4.5 with improvements to editing, object reference, text rendering, and photorealism.
Google launched Nano Banana Pro with improved text rendering, the ability to blend up to 14 images while maintaining consistency of up to 5 people, and integration with Google Search for factually accurate visuals.
ImagineArt released Imagine 1.5 Preview emphasizing realism and text rendering, plus a node-based Workflows feature.
Krea launched Nodes (a node-based workflow tool) and community Blueprints for sharing templates.
Midjourney released a new Style Creator (beta) for generating custom Style Reference codes based on your aesthetic preferences.

Video

Kling AI released Kling O1, the first multimodal video model that understands text, images, video, and visual elements in a single architecture that can handle both creation and editing. They also released Avatar 2.0 with enhanced expressiveness and the ability to create lip-sync videos up to 5 minutes long.
Pika launched Pika 2.5 with ultra-realistic generations, enhanced physics, and improved prompt adherence.
Runway released Gen-4.5, topping leaderboards for realism, and added four new Audio Nodes to their Workflow tool: Text to Speech, Text to SFX, Voice Dubbing, and Voice Isolation.

Voice and Audio

ElevenLabs launched Scribe v2 Realtime, a transcription model topping accuracy benchmarks with real-time understanding in 90 languages.
Gemini Live got some audio enhancements to its iOS and Android apps, and now speaks faster, uses accents, and sounds more expressive.
Klay, a new music startup, announced licensing deals with all three major publishers (WMG, Universal, and Sony).
Microsoft released VibeVoice, an open-source text-to-speech model with real-time streaming and long-form generation up to 90 minutes with up to 4 voices.
Suno announced major funding at a $2.45B valuation and a licensing agreement with Warner Music Group.
Udio signed a second major label licensing agreement with Warner Music Group, positioning to become a fully-licensed service in 2026.

3D and World Models

Marble, the first product from Fei-Fei Li's World Labs, came out of private beta, allowing anyone to turn a single image, video, or text prompt into a high-fidelity 3D world you can explore and interact with.
Meta announced SAM 3D with two models for creating 3D objects and bodies from photographs.
Runway released GWM-1, their first world model built on Gen-4.5. It generates frame by frame, runs in real time, and can be controlled interactively. Three variants: GWM Worlds (explorable environments), GWM Avatars (conversational characters), and GWM Robotics (robotic manipulation).

Wearables

Alibaba released the Quark AI Glasses in China, offering three dual-display (S1) and three camera-focused (G1) models powered by Qwen LLMs.
Google revealed more details about their upcoming smart glasses. The first pair is designed for screen-free assistance using built-in speakers, microphones, and cameras for chatting with Gemini and taking photos, launching sometime in 2026. The second pair will add an in-lens display, with no official launch window.
Limitless sold their souls and their company to Meta. Within 48 hours, hackers had connected the Limitless pendant to the open-source Omi app as users (myself included) exported all their data, deleted their accounts, and requested refunds.

Web Browsers

Firefox announced a new AI Window feature in development.
Google opened a waitlist (US only) for Disco, an experimental browser prototype. The GenTabs feature builds interfaces on the fly based on your prompts.
Opera's agentic Neon browser is now generally available, though it requires a substantial subscription.
Perplexity launched an Android version of their Comet browser.

Vibe Coding

Anthropic made Claude Code available in their desktop app and launched an enhanced Slack integration for initiating Claude Code sessions directly from a conversation thread.
Cursor released version 2.2 with a new drag-and-drop visual editor.
Google launched Antigravity, a new agentic IDE with features like artifact-level feedback and self-improvement.
Google and Replit announced a multi-year partnership expansion focused on bringing vibe coding to enterprise customers on Google infrastructure.
Mistral launched Devstral 2 and Vibe CLI, the next-gen version of its coding-focused model family and its first move into autonomous coding agents.
OpenAI released GPT-5.1-Codex-Max, claiming faster, more intelligent, and more token-efficient coding, and their first model natively trained to work across multiple context windows.
Replit launched Design Mode using Gemini 3 to generate interactive mockups and static sites in under 2 minutes.
Webflow launched App Gen for creating full-stack web apps from inside Webflow.
Warp launched Agents 3.0 with full terminal use, /plan for spec-driven development, interactive code review, and Slack/Linear integrations.

Agents

Anthropic, Block, and OpenAI launched the Agentic AI Foundation under Linux Foundation governance, each donating their open-source agent projects (MCP, Goose, and AGENTS.md respectively). AWS, Google, Microsoft, Cloudflare, Salesforce, Oracle, and many others have signed on.
Amazon released three "frontier agents" that can run autonomously for hours or days: Kiro (coding), Security Agent, and DevOps Agent.
Google launched Workspace Studio for building Gemini agents to automate Gmail and Docs tasks without coding.
Manus released Web Operator, a Chrome extension that automates tasks in sites you're logged into.
OpenAGI came out of stealth with Lux, claiming to be the best, fastest, and cheapest computer-use model, along with an SDK for developers.
Taskade launched Genesis, a platform for making dashboards, portals, and automation workflows from a single prompt.

eCommerce

Instacart is the first company to launch an app offering checkout directly within ChatGPT.
OpenAI launched a shopping research feature in ChatGPT that builds personalized shopping guides using clarifying questions, web research, and your conversation history.
Shopify launched Agentic Storefronts, letting merchants configure products to appear in ChatGPT, Microsoft Copilot, and Perplexity conversations.

Education

Oboe launched a learning platform that turns prompts into structured, multi-modal courses on virtually any subject. They also got a big a16z investment.
OpenAI launched their first two certification courses: AI Foundations (currently in pilot programs with select employers and public-service partners) and ChatGPT Foundations for Teachers (available on Coursera). Both are free.

AI Usage

Anthropic released two studies: one on how their engineers use Claude for coding, and a survey of 1,250 professionals conducted using a new tool called Anthropic Interviewer.
Microsoft published research analyzing 37.5 million Copilot conversations from the past year, revealing behavioral patterns across devices, time periods, and topics.
OpenAI published their first State of Enterprise AI report with insights from their largest customers.

That's it for 2025 – what a year it's been! We'll see what 2026 brings us. Till then, please reach out if you have questions or thoughts to share, or if you need any help making sense of all this. Wishing you a happy and safe holiday season!

Cover image created with Midjourney.