AI Roundup

Deep Currents 09.09.25

Crispin Bailey

09 Sep 2025 • 9 min read

Welcome to the latest installment of Deep Currents, a monthly curated digest of breakthroughs, product updates, and helpful articles that surfaced in the world of generative AI. These are the things that stood out as impactful to a design director in IT trying to stay on top of this rapidly evolving field.

There were dozens of new AI releases and announcements in the past month. Between significant LLM updates, new creative tools, and the continued push toward agentic systems, the pace isn’t slowing down yet. So if the list below becomes overwhelming, feel free to jump to the end where I try to synthesize what all these updates mean.

Okay, let's dive into this month's currents…

LLMs

The core language model space continues its relentless pace of improvement, with several companies pushing the boundaries of context windows, reasoning capabilities, and specialized applications.

OpenAI gave GPT-5 a 'warmer' and 'friendlier' personality upgrade after initial complaints following the recent launch of their new flagship model. They also restored access to the 4o model that many customers had grown attached to, but only for paying subscribers.
Due to popular demand ChatGPT now supports branches in conversations so you can go down multiple rabbit holes simultaneously.
Anthropic’s Claude Sonnet 4 got an updated version with a 1-million token context window — meaning it can now handle the entire Harry Potter series in a single prompt. They also rolled out new memory capabilities in Claude for all Max, Team, and Enterprise users, giving the ability to reference previous chats.
Anthropic’s Claude Opus 4.1 and Sonnet 4 can also now end conversations if they become harmful or abusive toward the model. Developed as part of exploratory work on AI welfare, it has broader relevance to model alignment and safeguards. But it also gets into slightly controversial AI-rights territory.
Google launched temporary chats that expire after 72hrs along with new memory capabilities for Gemini. They also released Gemma 3 270M, an even smaller open-source model that can run directly on smartphones and browsers.
Microsoft launched MAI-1-preview, a text-based model trained on a fraction of the GPUs of rivals, specializing in instruction following and everyday queries.
Cohere dropped a new reasoning model for enterprise applications.
Mistral released Mistral Medium 3.1, showing improvements in overall performance and creative writing. They also expanded Le Chat, their flagship chatbot, with over 20 new enterprise MCP connectors and a "Memories" feature for persistent context.
Tencent released Hunyuan-Vision-Large, a multimodal understanding model that ranks No. 6 in the Vision Arena leaderboard, near GPT-4.5, o4 mini, and Sonnet 4.
DeepSeek's V3.1 model packs 685B parameters, giving it larger working memory for longer, more coherent conversations. The model competes with OpenAI's and Anthropic's flagship offerings while being completely open-source.
Alibaba’s Qwen3 AI models now process up to 1M tokens in a single operation — like Claude, that’s equivalent to analyzing about 1,200-1,500 pages of text simultaneously. The models are open-source and deliver 3x faster processing speeds than previous versions. Qwen3-Max (not open sourced) is a 1Trillion+ parameter model that surpasses other Qwen3 variants, Deepseek V3.1, and Claude Opus 4 (non-reasoning) across several benchmarks.
Moonshot‘s latest Kimi K2 model (K2-Instruct-0905) is a state-of-the-art mixture-of-experts (MoE) language model, featuring 32 billion activated parameters and a total of 1 trillion parameters.
Nous Research released Hermes 4, focusing on hybrid reasoning models with expanded test-time compute capabilities, creativity, and neutrally-aligned performance (meaning, less censorship applied to risky topics).
Apertus launched as Switzerland's new open-source AI model that respects privacy and data rights.

Images

Image generation and editing tools reached new levels of sophistication, with Google taking the lead in several key areas while other players introduced innovative features.

Google's Imagen 4 became generally available, representing their state-of-the-art image generation model.
Google then officially launched Gemini Flash 2.5 Image, codenamed Nano Banana, which is capable of precise, multi-step image editing that preserves character likeness and scene context. It's currently the top-rated image editing tool on LMArena, and by a wide margin.
Alibaba launched Qwen-Image-Edit for sophisticated image manipulation tasks.
Genspark AI Designer turns one prompt into a complete brand system — logo, packaging, website, and all.
Ideogram launched a new Styles feature, offering preset styles plus the ability to create custom styles by uploading up to 3 images, similar to Midjourney's Moodboard feature.
Midjourney launched a Style Explorer feature that allows users to browse and select from an initial selection of top Sref codes (from over 4 billion possible combinations).

Video + Audio

Video generation tools continue to evolve rapidly, with major improvements in audio generation and synchronization, user control, and production quality.

Google launched new features for Google Vids including image-to-video capability powered by Veo 3, transforming photos into eight-second video clips with sound.
Tencent open-sourced HunyuanVideo-Foley, creating professional-grade soundtracks and effects with state-of-the-art audio-visual synchronization.
Pika now has audio-driven lip sync, where you upload any audio file and it makes your AI avatar sing, rap, or act in perfect sync.
Luma Labs launched Modify with Instructions, allowing users to edit generated videos through natural language prompts.
Higgsfield AI launched Draw-to-Video, allowing users to sketch text directions, shapes, and visual instructions on images to create tailored video output. They also added Speak 2.0 for voice generation including multiple voices and stage directions.
Topaz Labs released Astra for video upscaling and frame interpolation, enabling creation of slow-motion video.
Baidu launched MuseStreamer 2.0, a family of image-to-video models with upgrades in multi-character coordination and synced audio outputs.
Wan Video released Wan-S2V, an open AI model that lets you upload an image and audio file to create a video where the person moves and speaks in sync.
ElevenLabs released SFX v2, generating higher-quality sound effects with extended output length (up to 30s) and seamless looping audio, all from text prompts.

Voice + Translation

Voice synthesis and translation tools reached new levels of quality and versatility.

Microsoft released VibeVoice, an open-source text-to-speech model capable of generating up to 90 minutes of multi-speaker conversational audio using just 1.5B parameters.
Microsoft also introduced MAI-Voice-1, marking their first fully in-house voice AI model.
OpenAI officially released Realtime API, introducing a new gpt-realtime speech-to-speech model with SIP (phone system) support and new developer tools.
Cohere introduced Command AI Translate, claiming top scores on key translation benchmarks while allowing for deep customization and secure deployment.
Tencent released Hunyuan Translation Model, an open-source state-of-the-art AI translation system.

Vibe Coding

The convergence of natural language and code generation continues to mature, with tools becoming more sophisticated and accessible to non-technical users.

OpenAI released a cheat-sheet for using GPT-5 for coding and rolled out new features for Codex, including IDE extensions and GitHub code reviews.
Claude Code can now teach you while writing code using two new communication modes: Explanatory mode and Learning mode where Claude shares insights and asks you to contribute strategic pieces of code.
Qoder is Alibaba's new coding platform with an autonomous "Quest Mode" that handles full projects from specs and writes code across multiple files.
xAI released Grok Code Fast 1, a new advanced coding model with very low costs for agentic coding tasks.
WordPress's Telex turns text descriptions into custom WordPress blocks, delivering ready-to-install elements like sliders and galleries without coding.
Replit Agent now works with any framework or language, expanding beyond its previous limitations.
Lovable rolled out Voice Mode, allowing users to build apps via voice commands powered by ElevenLabs' speech-to-text.

Agents + Dev Tools

AI agents and development tools continue to evolve toward more autonomous and capable systems, with several major players introducing new features and capabilities.

Chrome got a new Claude plugin that plugs Claude directly into the browser to interact with content in your tabs. Currently in research preview for Claude Max plan users.
Grammarly released eight new AI agents that act as intelligent writing collaborators.
Google added agentic features to AI Mode, enabling it to book dinner reservations based on preferences — but requires a $250/month Ultra subscription.
Letta lets you build custom AI agents with persistent memory that remember conversations and preferences, creating assistants that improve over time.

3D + World Models

Three-dimensional content generation and world modeling saw significant advances, with several companies pushing toward more realistic and interactive virtual environments.

Microsoft released Copilot 3D, converting images into usable 3D models in a single click for games, animation, and VR/AR applications.
Tripo claims its latest 3D AI model Tripo 3.0 delivers the most accurate, detail-rich geometry yet, generating three-dimensional content from text or image inputs.
World Labs opened early access to Marble, their first product that creates rich, high-fidelity 3D worlds from an image or text prompt.
Mirage 2 generates real-time, playable world engines from text or images.
Skywork AI demonstrated Matrix Game 2.0, where AI creates consistent game worlds on the fly via a generative "world model."
Tencent released HunyuanWorld-Voyager, an open-source "ultra long-range" AI world model that transforms a single photo into an explorable, exportable 3D environment.
Nvidia released Cosmos Reason, a 7-billion-parameter "reasoning" vision language model for physical AI applications and robots.

Wearables + Hardware

AI-powered wearables finally started reaching consumers, marking a potential shift toward more personal and ambient AI experiences.

HTC launched Vive Eagle smart glasses as their entry into the AI wearables market.
Omi finally started shipping the consumer version of its AI pendant.
Friend began shipping their pendant a full year after announcing it.
Alterego came out of stealth mode to announce "the worlds first near-telepathic wearable" that can convert thoughts into speech.

Learning + Education

Educational AI tools and resources expanded significantly, with major companies investing in AI literacy and skill development programs.

Anthropic expanded their AI Fluency program with three new courses: AI Fluency for Educators, AI Fluency for Students, and Teaching AI Fluency. They also extended the Claude 'Learning Mode' feature to regular users (previously only available to Max subscribers).
Google's NotebookLM rolled out new formats for audio overviews: Brief (1-2 minute overviews), Critique (expert reviews), and Debate (thoughtful debates between hosts). Each format offers a different perspective on the source materials added to your project. They've also added several new learning tools including Flash Cards, Quizzes, and a Learning Guide option that encourages participation with open-ended questions.
OpenAI announced a new Jobs Platform and Certification program coming in the next year, with a goal of certifying 10 million Americans by 2030.
Microsoft announced new AI literacy initiatives following the White House's mandate to improve AI literacy in America.

Product Development

New tools are emerging to help teams better define and evaluate products, addressing the growing need for structured approaches to product development and UX research that leverage AI.

ChatPRD helps teams define products before coding with AI-assisted documentation and requirement gathering.
Strella is an AI-moderated all-in-one user research platform that let's you mix and match panel-recruited and self-recruited participants, and run your studies 24/7/365. Technically they launched the platform a year ago, but I only recently had the opportunity to try it out and was very impressed.

What This All Means

September's AI landscape reveals a few themes that will likely shape the technology's trajectory through the rest of 2025.

Context window increases, combined with persistent memory capabilities, offers to significantly improve the capabilities of AI systems for a number of use cases. When an LLM can process the equivalent of multiple books in a single conversation while remembering previous chats, or analyze 1,500 pages simultaneously alongside stored user preferences, we get AI systems that maintain both immediate and long-term context. AI assistants capable of simultaneously remembering your working style, previous projects, and evolving needs moves us toward genuinely personalized AI collaborators that improve over time, potentially transforming everything from creative workflows to strategic decision-making.

The maturation of creative AI tools represents another inflection point. The gap between professional and AI-generated content continues to narrow, with tools like Google's Nano Banana (aka Gemini Flash 2.5 Image) leading editing benchmarks and video generation platforms achieving photorealistic lip-syncing and high-quality audio tracks. These advances suggest we're approaching a world where the distinction between human and AI-generated content may become irrelevant, forcing us to reconsider fundamental questions about creativity, authorship, and value.

Perhaps most significantly, the convergence toward agentic systems signals a shift from AI as a tool to AI as a guided collaborator. We're beginning to see AI systems that can complete multi-step tasks with minimal human oversight, whether taking actions on websites through AI-powered browsers, or building apps and deploying websites through AI-powered coding tools. This evolution suggests that 2025 might be the year AI transitions from being impressive technology to being genuinely useful in everyday workflows.

The recent emphasis on AI literacy and skills development reveals a growing recognition that widespread AI adoption and societal adaptation requires deliberate education and skills development efforts. The expansion of education and certification programs suggests that AI fluency is becoming as fundamental as digital literacy was two decades ago. These initiatives from Anthropic, OpenAI, and Microsoft also demonstrate an industry-wide agenda to support the current administration‘s mandate.

Finally, the emergence of ambient AI experiences marks a significant transition in how we interact with artificial intelligence, and it also raises questions about privacy and social acceptance. Unlike the ill-fated Google Glass which was obvious and weird, today’s AI pendants and glasses appear innocuous while capturing audio and context almost invisibly. As these ambient recording devices proliferate, we’re running headlong into an unplanned social experiment about surveillance, memory, and consent.

Well, that's all for this month! As always, please reach out if you have questions or thoughts to share, or if you need any help making sense of all this.