AI Roundup

Deep Currents 08.08.25

Crispin Bailey

08 Aug 2025 • 9 min read

Welcome to the latest installment of Deep Currents, a monthly curated digest of breakthroughs, product updates, and helpful articles that have surfaced in the rapidly-evolving world of generative AI.

These were the things that stood out to me as impactful, as a design director in IT trying to stay on top of this rapidly evolving field. Hopefully this post will help you keep your head above water too.

July started out slowly but things really started heating up again during the first week of August. Let's dive into this month's currents…

LLMs

The big model makers keep raising the bar, and the past month has seen intense jockeying for the lead:

xAI released Grok 4, which for a brief period was the smartest reasoning model based on Artificial Analysis benchmarks.
Mistral rolled out major updates to its Le Chat platform, including Deep Research, Voice Mode, multilingual reasoning, Projects, and new image editing capabilities.
Alibaba's Qwen team took the open-source crown with the release of an updated Qwen3 model that beats Kimi K2 across the board and challenges top closed-source models like Anthropic's Claude Opus 4. They also launched an update to Qwen3-Thinking, making it competitive with Gemini 2.5 Pro, ChatGPT o4-mini, and DeepSeek R1 across knowledge, reasoning, and coding benchmarks.
Cohere launched a new vision model that excels at extracting data from documents like slides, diagrams and PDFs.
Google released Gemini 2.5 Deep Think, its first publicly available multi-agent model that does “parallel thinking” to help researchers, scientists, and academics tackle complex problems. Gemini 2.5 Deep Think is a variant of the model that won the gold-medal standard at this year’s International Math Olympiad, and it's available exclusively to users on the $250/month Ultra plan.
Anthropic launched an update to its most powerful model. Claude Opus 4.1 has improved performance and precision for real-world coding and agentic tasks, and handles complex, multi-step problems with more rigour and attention to detail.
OpenAI had a busy week. After months of promises, they released not one, but 2 open weights language models available under the Apache 2.0 license, with the smaller model being comparable to o3-mini but can run on a Mac laptop.
And finally, to cap off a week of big announcements, OpenAI released GPT-5, which besides being better at everything, aims to simplify the user experience by reducing the number of models to choose from. There’s now a "GPT-5 Flagship model" which will work for most use cases. Plus and Team plans also get access to GPT-5 Thinking (which replaces o3), and users on the Pro plan get access to GPT-5 Pro (which replaces o3 Pro). They also added a few nice touches to make ChatGPT more personalized, including customizable color schemes, a choice of chat personalities (“cynic,” “robot,” “listener,” and “nerd”), and the ability to connect to Gmail and Google Calendar for even more personalized responses. Overall this makes for a pretty big overhaul of the world's most popular chatbot, and represents another leap forward toward their goal of achieving AGI (artificial general intelligence).

Learning Tools

As concerns grow about AI tools potentially undermining educational outcomes by simply providing answers, major AI companies are now building AI that actively supports the learning process rather than replacing it:

OpenAI released Study Mode for ChatGPT, a new feature designed to guide students through problems step-by-step, using Socratic questions and feedback instead of just providing solutions.
Google just integrated LearnLM models into Gemini to provide a new "Guided Learning" feature. These models were fine-tuned for learning and grounded in educational research.

Images

Image generation tools continue to evolve, with particular advances in style control and reference consistency:

Black Forest Labs launched Kontext Komposer, which lets you restyle images without writing prompts by using polished presets. They also partnered with Krea to release FLUX.1 Krea, an open-weight image model with improved photorealism and aesthetic quality that ranks higher than previous open weights FLUX models and approaches FLUX Pro quality.
Adobe unveiled new AI features for Photoshop, including Harmonize for realistic blending, Generative Upscale, and more.
Ideogram released Character, a character consistency model allowing users to place a specific person into existing scenes and new outputs from a single reference photo.
Leonardo launched a new model called Lucid Origin that promises more vibrancy, greater diversity, and stunning Full HD output.
Stability AI launched Stability AI Solutions, a suite of generative AI solutions for enterprise clients with creative production needs including product photography, brand style, product concepting, and "digital twins" for consistent character generation.
Grok's Imagine image and video generation tool is now available to all X Premium subscribers via the Grok mobile app.

Video Generation

The AI video space continues to get more crowded and more capable, with several new startups making waves and the incumbents adding impressive new features:

Moonvalley, a newcomer in the AI video scene, launched a platform that offers a fully-licensed commercially safe option for video generation.
Google Flow users can now animate input images and make them talk using Veo 3, and access has also expanded to 140+ countries.
OpenArt Story turns your idea, script, or character into a complete ready-to-post video with motion, music, and narrative arc.
Runway launched Act-Two, the company's next-gen motion capture model that translates single performance videos into fully animated characters with head, face, body, and hand tracking across artistic styles and outputs. They also unveiled Aleph, a new "in-context" video model that edits and transforms existing footage through text prompts — handling tasks from generating new camera angles to removing objects and adjusting lighting.
Lightricks released an update to its open-weights LTXV model, now allowing for image-to-video generations over 60 seconds long — streamed in real time, with live prompt control and efficient performance on consumer GPUs.
Mirage is the world's first ever "world transformation model", where AI generates video over anything in real time.
YouTube Shorts launched new AI-powered creation tools for Shorts including a photo-to-video feature and generative effects, powered by Veo 2 technology.
Midjourney enhanced their new video generator with looping and the ability to specify start and end frames, and also released a 720p HD mode which is only available on Pro and Mega plans due to the high cost to run it (roughly 3x more per second compared to SD video).
And just for fun Midjourney launched MidjourneyTV where you can watch an endless stream of top video clips from the Midjourney community. There's also a YouTube stream—just be sure to set it to 1080p for the best quality: youtube.com/live/Jkbmn13F2hA.

Voice + Audio

Voice technology saw significant advances this month, with improvements in both quality and functionality:

Mistral released Voxtral its first open-source audio model suite. Voxtral Small and Mini are two open-source models designed for transcribing and summarizing speech as well as translating audio across multiple languages. At 24B parameters, Small is best for production-scale apps, while its 3B sibling is efficient enough to run locally. Both match or beat comparable models from ElevenLabs and OpenAI, while coming in at half the price.
Hume launched Voice Cloning. If you're on the Creator Plan for TTS or above, you can access Voice Cloning for personal content creation or development purposes, and speak with their state-of-the-art Speech-to-Speech model, Empathic Voice Interface (EVI) 3 for no additional cost. English, Spanish and German to start, with Portuguese, Japanese, French and more releasing soon.

Music

AI-generated music continues to evolve with new features and business models:

Suno released v4.5+, a new audio generation model with new song creation features including vocal swaps, playlist inspiration, and more.
Udio launched updates to its Styles feature for song generation, with new Blending, Library, and Artist Styles coming alongside expanded access for Basic and free subscribers.
ElevenLabs launched a new service called Eleven Music that lets individuals and businesses generate their own music with its artificial intelligence model. A forthcoming "Pro" model will be trained on music from Merlin (a digital licensing organization, representing 30,000 independent labels and distributors) and Kobalt (an independent publisher) and other music writers who opt in.

World Models

This emerging category shows promise for real-time AI-generated worlds:

Google announced Genie 3, DeepMind's latest world model that generates interactive 3D environments from text prompts at 24fps in 720p, maintaining visual consistency for several minutes.

Vibe Coding

The AI-powered development space saw major advances in natural language programming:

Deepgram released Saga, a Voice OS that lets developers turn spoken ideas into code without having to leave their workflow.
Kiro is Amazon's new vibe coder that generates detailed specs and automated checks so you can build apps from idea to production without missing requirements or breaking tests. This Claude-powered "agentic IDE" addresses the quality problems of AI-generated code by first producing specifications and user stories before generating actual code.
Qwen3-Coder builds complete web apps and simulations from your descriptions, handles massive codebases (1M context), plus now includes a new Qwen Code CLI for command-line coding assistance.
GitHub Spark is a new competitor to Replit/Lovable/V0/Bolt and similar platforms.
Opal is a new experimental tool from Google Labs that lets you build and share AI mini apps that chain together prompts, models, and tools using simple natural language and visual editing.
Cursor launched an AI debugger called Bugbot that analyzes your entire project context, lets you describe bugs in plain English, and suggests real-time fixes across multiple files.
Mistral announced Codestral 25.08 as part of a full-stack coding platform for enterprise-grade software development.
Anthropic launched a new feature that automates security reviews in Claude Code.

Agents

A big focus of AI development in 2025 centers on agentic AI, and there have been several significant updates in the agent space:

Anthropic published the Claude Directory, a new directory of tools that connect to Claude using MCP connectors so you can interact with them directly via Claude and Claude Code.
Manus released Wide Research for Pro subscribers, which lets you research hundreds of items at once using multiple AI agents working together in parallel.
OpenAI launched a powerful Agent feature that combines the reasoning and web-crawling capabilities of Deep Research with the autonomous agent capabilities of their Operator tool which was previously only available on the Pro plan.
Perplexity now integrates with OpenTable, allowing users to find and reserve restaurants directly through natural language queries.

Browsers

AI-native browsers became a major theme this month, with several significant launches that signal a fundamental shift in how we'll interact with the web:

Dia from The Browser Company (makers of Arc) represents their vision for an AI-first browsing experience that goes beyond traditional web navigation.
Comet from Perplexity brings their search expertise directly into browser form, creating a seamless research and browsing workflow with powerful agent features.
BrowserOS is open source and lets you plug into several of the big LLMs or run Ollama locally, offering developers and power users maximum flexibility in their AI integration.
Opera announced a new AI-powered agentic browser called Neon (waitlist only for now), positioning it as a browser that can take actions on your behalf rather than just displaying web pages.
Microsoft released Copilot Mode for their Edge browser, integrating their AI assistant directly into the browsing experience for enhanced productivity and assistance.

AI in Politics

Political developments in AI governance continue to shape the global landscape:

The official US AI Action Plan seeks to ensure the USA wins the AI race.
China meanwhile has proposed a more collaborative approach to global AI development.

What This All Means

This month's updates reveal several critical trends reshaping the AI landscape. The rapid-fire releases in LLMs demonstrate the continuing rapid pace of innovation, with companies leapfrogging each other almost weekly. The emergence of open-source models that rival closed-source capabilities suggests the democratization of AI development is also accelerating.

The introduction of dedicated learning tools marks a significant maturation in how the industry approaches AI's role in education. Rather than ignoring concerns about AI undermining learning, major companies are actively developing pedagogical approaches that support educational goals. This shift from answer-providing to learning-facilitation could help define how AI gets integrated into classrooms and other educational contexts.

The explosion of AI-native browsers signals a fundamental shift in how we interact with information online. They're more than just browsers with AI features bolted on, they represent entirely new paradigms for web interaction that could reshape digital experiences, especially as they incorporate more agentic features.

The focus on agentic AI across multiple companies (and industries) further indicates 2025 will be the year AI systems move from responding to requests to proactively accomplishing complex, multi-step tasks. This represents a fundamental shift from AI as a tool to AI as a collaborative assistant.

The maturation of coding agents and vibe coding platforms indicates we're approaching a tipping point where software development becomes accessible to anyone who can describe what they want to build. The quality improvements and enterprise focus suggest these tools are moving beyond experimental use cases and into real-world business opportunities.

And perhaps most significantly, the convergence of capabilities across video, voice, music, and image generation points toward a future where creating rich multimedia content and virtual worlds requires only natural language descriptions (and your imagination!). The speed at which these tools are improving suggests we're still in the early acceleration phase of this transformation.

Well, that's plenty for one month! Let me know what's resonated with you lately, either in the comments, or send me an email. I'd love to hear from you.

Cover image generated with Midjourney. Editing assistance provided by Claude Sonnet 4.