AI Roundup

Deep Currents 10.10.25

Crispin Bailey

10 Oct 2025 • 11 min read

Welcome to the latest installment of Deep Currents, a monthly curated digest of breakthroughs, product updates, and significant stories that have surfaced in the rapidly-evolving world of generative AI. Hopefully this post will help you keep your head above water.

October brought yet another wave of releases across every category of generative AI. Across the board, tools are getting smarter, faster, and more specialized. But if the list below becomes overwhelming, feel free to jump to the end where I attempt to synthesize what all these updates mean.

Let's dive into this month's currents…

LLMs

Language models continue to evolve at a blistering pace, with companies pushing both performance boundaries and new interaction paradigms.

Anthropic had a busy month. First they rolled out new productivity features to Claude, including the ability to create and edit Excel sheets, Word documents, PowerPoint slides, and PDFs directly in chat. Then they announced the rollout of memory for Teams and Enterprise users, allowing Claude to optionally remember and use previous chats and project conversations. Finally, they rolled out Claude Sonnet 4.5 claiming it to be the best coding model in the world, the best model for building complex agents, and the best model at using computers.
Baidu launched ERNIE X1.1, a new reasoning model that nears GPT-5 and Gemini 2.5 Pro in benchmarks with significant hallucination reduction.
Google maintained their rapid release pace: the Gemini app finally lets you share your custom Gems with others; Gemini now supports adding audio files to your chats in the app; and they've updated the 2.5 Flash and 2.5 Flash Lite models, making them faster while using less tokens so they're cheaper to run via API.
Google also released VaultGemma, the first large, capable language model trained with mathematical privacy guarantees that prevent it from memorizing or leaking any specific information from its training data.
IBM released Granite 4.0, a powerful open language model you can run on significantly cheaper GPUs using 70% less memory.
Meta announced it will use AI chatbot conversation data for ad targeting starting December 2025, affecting over 1 billion monthly users who cannot opt out except in the EU, UK, and South Korea.
OpenAI also had a busy month. They officially rolled out new safety routing and parental controls to ChatGPT following suicide and violence incidents linked to problematic chatbot interactions. They announced Pulse, a new feature that delivers a daily dose of fresh news based on your past queries, calendar events, and interests delivered through the ChatGPT app (initially rolling out for Pro users with access to Plus users coming later). They added MCP support for ChatGPT in developer mode so you can connect to virtually any MCP server. And they announced the Apps SDK, a new app integration program that lets users interact with other apps like Spotify, Figma, Canva, Zillow, and Booking.com right in ChatGPT.
Qwen3-Omni from Alibaba is a multi-modal model designed to process diverse inputs including text, images, audio, and video, while delivering real-time streaming responses in both text and natural speech.
Tencent released Hunyuan-Vision-1.5-Thinking, a new multimodal vision-language model that comes in at No.3 on LM Arena's Vision Arena leaderboard.
Thinking Machine Labs, the startup launched by former OpenAI CTO Mira Murati, released their first product called Tinker, an API for fine-tuning open weights language models (currently Llama and Qwen).
UAE has launched a new open-source model. The country unveiled K2 Think, a 32-billion-parameter reasoning model that claims to outperform much larger models in math and advanced problem-solving benchmarks.

Images

Image generation tools reached new levels of sophistication, with particular advances in story generation, realism, and prompt adherence.

ByteDance introduced Seedream 4.0, a new image generation and editing model that competes with Nano Banana with 4K outputs and multimodal capabilities. It can also create up to 9 coherent story images in a single batch — perfect for making sequential comics or storyboards.
Google rolled out Gemini 2.5 Flash Image (Nano Banana) to more users, with new features including additional aspect ratios and prompt customizations.
Google Labs launched a new AI-powered mood board product called Mixboard, that lets users (in the U.S. only for now) "explore, expand, and refine" ideas.
Midjourney continued to improve upon and expand its new Style Explorer.
Tencent open-sourced HunyuanImage 2.1, a new image generation model with high-quality realism, prompt following, and text rendering, and less than a month later open-sourced HunyuanImage 3.0, an even newer text-to-image model that the company says compares to the industry's top closed options, and achieved top rank on the LMArena text-to-image leaderboard.

Video and Audio

Video generation tools continue to evolve rapidly, with major improvements in audio generation, user control, and production quality.

Alibaba released Wan 2.5 Preview, a multimodal model that can generate images and video with audio, with great prompt adherence.
ElevenLabs released Studio 3.0 which brings several of their offerings together in a unified audio and video editor for voice, music, sound effects, and captions.
Google added new capabilities to its Veo 3 and Veo 3 Fast models, including vertical video outputs, 1080p HD resolution, and a 50% decrease in price.
Higgsfield Animate allows you to swap any actor in a video with the character in an image, or apply the movement from any video to the character and scene from an image.
Kling AI launched Avatar, letting you animate a character from an input image, audio, and prompted expressions and emotions.
Meta launched a new AI video social platform called Vibes. Reactions have been mostly negative despite Meta's partnerships with Midjourney and Black Forest Labs to provide higher quality generated content.
OpenAI launched Sora 2 and an all-new mobile social app (called Sora) that lets users apply their likeness and voice to videos, and maintain control over who sees it.
Ray3 from Luma is the first physics-aware reasoning video AI that lets you sketch directly on images to control motion and camera work, iterate quickly in Draft Mode, then output HDR video.
xAI launched v0.9 of its Grok Imagine video model, featuring upgraded quality and motion, native synced audio creation, and new camera effects.

Music

Music generation tools continued to improve with longer output lengths and new vocal capabilities.

MiniMax released Music 1.5, which can generate complete four-minute songs with natural sounding vocals and instrumentals, and coherent song structures.
Spotify announced several new methods they're applying to address the increasing use of AI in music production, to protect musicians from blatant cloning, and offer listeners a tagging system to indicate the degree of AI use in original songs.
Stability.ai released Stable Audio 2.5 which is now able to generate commercially-safe enterprise-grade 3-minute songs.
Udio introduced Voices, which lets users create or select consistent vocals for their music tracks.

Voice and Translation

Voice synthesis reached new levels of versatility with better multi-language support and creative voice manipulation tools.

Apple's AirPods Pro got a big update with Apple's in-house AI tech, and now offer live translation which allows users to have conversations in different languages. The new feature works with both AirPods Pro 3 and 2 (with a firmware upgrade).
ElevenLabs launched Agent Workflows, a visual tool for building voice conversations that branch in different directions and change behaviour during interactions, along with Voice Remixing that lets you change any aspect of a voice (real or generated) such as gender, age, accent, or tone, by simply describing the desired changes with a text prompt.
Hume launched Octave 2 with support for 30+ languages, claiming it to be 40% faster than v1 and half the price. They also launched EVI 4 mini, their smallest and most capable conversational model, that is essentially a version of the Octave 2 speech-language model, but tuned for real conversation.

Agents

The agent revolution continues to accelerate, with new commercial applications and improved capabilities for autonomous task completion.

Anthropic released the Claude Agent SDK (the same day they launched Sonnet 4.5), a collection of tools to build powerful agents on top of Claude Code.
AUI revealed Apollo-1, a new hybrid model for conversational agents that enables "agentic shopping" to do things like reliably book flights and process refunds. This is achieved using neuro-symbolic reasoning, which combines neural networks with logic-based rules.
Google introduced the Agent Payments Protocol (AP2), a new open framework that enables AI agents to securely make purchases on a user's behalf, with backing from over 60 financial and tech giants.
Google also released Gemini 2.5 Computer Use in preview, a new API-accessible model that can control web browsers and complete tasks through direct UI interactions like clicking buttons and filling out forms.
Microsoft enhanced their Researcher agent and Copilot Studio with Anthropic's two Claude models (Sonnet 4.5 and Opus 4.1). Previously users only had access to OpenAI's models.
Microsoft also introduced Agent Mode in Excel and Word, along with Office Agent in Copilot, enabling the creation of spreadsheets, docs, and presentations with text.
Notion Agents can autonomously execute multi-step workflows, create documents and databases, and work for up to 20 minutes across hundreds of pages.
OpenAI launched a new "Instant Checkout" feature in ChatGPT (U.S. only for now) that allows users to make online purchases (initially from Etsy and soon from all Shopify stores) using a new open source commerce API (called Agentic Commerce Protocol) that they co-developed with Stripe.
OpenAI also launched AgentKit, offering developers a set of tools to build, deploy, and optimize agents, deploy chat UIs with ChatKit, and perform scenario testing with Evals.
Perplexity released an Email Assistant for Gmail/Outlook that drafts replies, triages, and schedules meetings (Max tier only for now).

Vibe Coding

The vibe coding space matured significantly, with tools offering longer autonomous coding sessions and better integration with design workflows.

Bolt launched Bolt v2 which now includes integrations with Claude Code and Codex, has enterprise-level backend infrastructure (including databases, hosting, authentication, SEO, payments, and storage), and promises less error loops.
Figma Make turns your design mockups into functional prototypes, and the Figma MCP server allows AI coding assistants to access your Figma designs' components, variables, and styling directly, so they can generate code that matches your design system, rather than just working from screenshots.
Gemini Canvas now lets you edit any part of your web app by clicking and describing the changes you want to edit.
GitHub launched a public directory of open MCP servers.
Google launched Jules Tools, a new command-line interface and public API for its autonomous coding agent, allowing developers to trigger tasks and monitor progress from terminals rather than switching to separate browser windows.
Google also released a new Chrome DevTools MCP that connects your preferred AI coding assistant to Chrome's browser tools so it can debug your website, check performance, and fix issues by actually seeing what happens when code runs.
GPT-5 Codex is OpenAI's latest update to its coding agent, with two major improvements: dynamic thinking time and seamless hand-offs between local and cloud environments. Additionally, they've added a Slack integration (so you can tag it and issue requests directly from a Slack chat), and Codex is now available on all paid plans.
Lovable rolled out a few big new features. First, they added File Uploads so you can do things like upload a spreadsheet and turn it into a dashboard, or upload a resume to turn it into a portfolio website. They also dropped Lovable Cloud and Lovable AI. With Lovable Cloud you can add a backend to your app automatically, meaning it will setup databases, logins, file storage, and security. Lovable AI adds support for AI functionality right in your apps, so you can have a chatbot, AI summaries, translation, etc. powered by Gemini with the billing and setup handled by Lovable.
Replit released v3 of their Agent. Key upgrades: 1) It can go up to 200 minutes of working autonomously. 2) Agent tests your apps in the browser periodically, like clicking a button, trying to log in, etc. 3) In beta, but Replit Agent can build other agents and automations (powered by Mastra), not just web apps.

Web Browsers

AI-native browsers continued to proliferate, offering new ways to interact with the web through AI.

Genspark launched a privacy-focused AI-powered web browser that lets you download and run various open-source LLMs directly on your desktop device.
Google began rolling out Gemini in Chrome to U.S. desktop users with AI features including automated task handling, multi-tab analysis, and enhanced scam protection.
Opera launched Neon, a new AI-powered browser that can take agentic actions on a user's behalf and includes some interesting integrations including the ability to generate videos using Veo 3 or Sora 2. Initially limited to a small rollout via waitlist, it requires a premium $20USD/month subscription.
Perplexity's Comet web browser and the Dia browser (recently acquired by Atlassian) are both now generally available for anyone to use (they previously required an invite code).

Wearables and Human Augmentation

The wearables space saw another competitor enter the always-on AI assistant market.

Taya, founded by two ex-Apple engineers, looks to take on Limitless and Omi with a wearable AI pendant that looks more like jewellery than a tech gadget.

Learning & Education

Educational AI tools expanded with new formats for personalized learning experiences.

Oboe is a brand new AI-powered learning platform that lets you create custom learning courses on any topic in seconds through a simple prompt, offering nine formats including text, audio, games, and quizzes.
Google launched a collection of educational games called AI Quests that teach students AI literacy through hands-on challenges and tasks.

Policy & Law

California took significant steps toward AI regulation and infrastructure development.

CA Governor Gavin Newsom signed SB 53 into law, which launches CalCompute (a public cloud to spur AI innovation), requires large AI developers to publicly disclose safety plans and report incidents, and adds strong whistleblower protections for lab employees.

What This All Means

October's developments reveal several themes that will likely shape the next phase of AI evolution.

The explosion of agentic capabilities across platforms marks a fundamental shift from AI as a passive tool to AI as an active participant in completing complex tasks. From shopping agents that can book flights and process refunds, to coding agents that work autonomously for hours, to payment protocols that enable AI to make purchases on your behalf, we're seeing the infrastructure for AI autonomy rapidly coming together.

The proliferation of AI-native browsers reveals an industry still figuring out what comes after the traditional browser paradigm. Companies like Opera with Neon, Perplexity with Comet, and Google with Gemini in Chrome are all racing to add AI capabilities, but taking the safe evolutionary path of adding AI sidebars and assistants to help manage tabs and windows rather than questioning whether that interface makes sense in the first place. Meanwhile, OpenAI's approach of making ChatGPT better at accessing and interacting with web information and other apps—bypassing the browser interface entirely—represents a more revolutionary direction.

The maturation of multimodal capabilities reveals something interesting about the trajectory of AI development. When an image model can generates coherent 9-image story sequences, when video models can produce physics-aware content with synced audio, and when voice models can handle 30+ languages with natural-sounding output, we're approaching a point where the modality itself becomes less important than what you want to create. The lines between text, image, video, and audio are blurring into a unified creative surface.

The vibe coding space hitting 30-hour autonomous sessions represents more than just a technical milestone. It suggests we're approaching a threshold where AI can handle entire development workflows end-to-end. When you combine this with tools that can directly access design systems, test in browsers automatically, and deploy backends, you're looking at a future where software development becomes much more accessible to people with great ideas but don't know how to code.

The tension between AI capabilities and AI safety grows more visible each month. OpenAI's parental controls, Spotify's AI tagging system, California's new regulatory framework, and Meta's controversial data usage policy all reveal an industry still figuring out the boundaries between innovation and responsibility.

Finally, the sheer volume of releases this month (across LLMs, image, video, audio, and music generators, agents, and more) reveals that we're still in an explosive growth phase of AI development. Companies are shipping faster than ever, and the pace shows no signs of slowing down. For anyone trying to keep up, this is both exhilarating and exhausting. The tools are here, and they're getting better every month, but the challenge is figuring out how they can transform our work while simultaneously adapting to the relentless pace of change.

Well, that's enough for this month! As always, please reach out if you have questions or thoughts to share, or if you need any help making sense of all this.

Cover image created with Midjourney.