Deep Currents: 07.07.2025

Deep Currents: 07.07.2025

Welcome to the July edition of Deep Currents, a monthly curated digest of breakthroughs, product updates, and insightful articles that surfaced over the past month in the rapidly-evolving world of generative AI.

These were the things that stood out to me as particularly significant, from the perspective of a design director in IT trying to stay on top of this exciting field. Hopefully this article will help you keep your head above water too.

Alright, let’s dive into this month’s currents…

LLMs

The frontier model race remains as competitive as ever, with incremental but meaningful improvements across the board.

Gemini 2.5 Pro received another upgrade, extending its #1 lead on the Chatbot arena by 30 ELO points. Google’s strategy of continuous iteration appears to be paying off, at least for now.

OpenAI launched o3-pro with significantly reduced token pricing. The model can search the web, analyze files, reason about visual inputs, use Python, personalize responses using memory, and more, but image generation and Canvas mode aren’t supported, suggesting OpenAI is still reliant on different models for specific use cases.

Mistral released Magistral, their new open reasoning model (download available) that shows its step-by-step thinking process. This transparency in AI reasoning continues an important trend toward interpretable AI systems.

Shanghai’s MiniMax released M1, featuring an enormous 1M-token context window that can theoretically process entire book collections (vs a single novel). The model also edges out DeepSeek’s R1, showing that innovation in AI capabilities continues to emerge from Chinese labs.

Images

Image generation tools focused on refinement and enhanced control capabilities over the past month.

Midjourney launched an improved style reference (sref) model for V7 which is much smarter at understanding the style of an image, allowing for more consistent and faithful reproduction of a desired aesthetic. When combined with V7's Omni-reference capabilities, artists can now create consistent objects, characters, and styles across multiple images.

Topaz Labs introduced Bloom, an AI upscaler that makes images 8X bigger while adding creative detail. This addresses the growing need for high-resolution outputs as AI-generated images move into professional contexts.

KREA AI unveiled Krea 1, their first in-house image model, launching in free beta with enhanced aesthetic control and broad artistic knowledge, able to render accurate skin textures, dynamic camera angles, and expressive color.

Adobe released new mobile apps for its Firefly platform, bringing AI image, video, and creative tools to iOS and Android.

Higgsfield launched Soul, a “high-aesthetic” photo model with advanced realism and 50+ style presets, further expanding the options available to creators.

Video

AI video generation reached new heights of sophistication, with major improvements in quality and control, and more affordable access.

HeyGen upgraded Avatar IV with prompt-controlled gestures and hyper-real micro-expressions at 1080p resolution. The level of control users now have over AI-generated human avatars represents a significant leap forward for content creation.

Higgsfield AI launched Speak, transforming scripts into motion-driven talking videos. This feature bridges the gap between written content and dynamic video presentations.

Luma Labs introduced Modify Video, allowing users to change styles, objects, and settings in recorded videos. This post-production capability adds a new dimension to video editing workflows.

MiniMax debuted Hailuo 02, which moved to No. 2 on the Artificial Analysis leaderboard for image-to-video, surpassing Veo 3 and demonstrating how competitive the AI video generation landscape is right now.

Google added a new Veo 3 Fast version in Gemini and Flow, reducing wait times with 2x the speed of the original model. Access to the Veo 3 model was also rolled out globally to Google AI Pro users across 159 countries, allowing up to 3 video generations with sound per day.

And after much anticipation Midjourney finally released their first video model, simply called V1, featuring a user-friendly image-to-video interface with simple but flexible motion controls and the ability to extend initial 5-second clips up to four times, to create videos of up to 21 seconds. Available at all subscription tiers, Midjourney's entry into video represents the next step in their journey towards real-time world generation.

Voice

The voice AI space continues to evolve rapidly, with breakthrough advances in expressiveness and functionality.

ElevenLabs dropped their most expressive text-to-speech (TTS) model yet with v3 (alpha), supporting 70+ languages, emotion tags, and multi-speaker dialogue capabilities. They followed up later in the month with the official release of Voice Design V3, cementing their position at the forefront of synthetic voice technology.

The company also launched 11ai, an in-house voice assistant that connects to tools via Anthropic’s Model Context Protocol (MCP). Rather than simply answering questions, 11ai can execute tasks, representing a shift toward truly functional voice interfaces.

OpenAudio entered the race with S1, a state-of-the-art TTS model featuring actor-level expressiveness and emotion tags. The competition in high-quality voice synthesis has clearly intensified, with each new release pushing the boundaries of what synthetic voices can achieve.

Music + Audio

The AI music landscape continued to mature, with tools becoming more sophisticated and professional-grade.

Suno upgraded its Song Editor with track editing, stem splitting, full-song reference uploads, and creative control sliders. They also acquired boutique DAW maker WavTool with plans to integrate both tools into a comprehensive AI-powered music studio solution.

Meanwhile Udio added Sessions, a new timeline editing view that provides more precise control over track editing and extension. The new feature offers two big workflow improvements to Udio users: edits to a song can be saved as different "takes", and all iterations are organized under a single "Sessions" folder in the Library.

Vibe Coding

The AI-powered development space saw major platform releases and new interaction paradigms.

Anysphere released Cursor 1.0 (quickly followed by v1.1) featuring BugBot for code review, memories, one-click MCP setup, and general availability of their Background Agent. They also launched an “Ultra” pricing tier at $200/month after securing long-term deals with major AI providers, establishing a new standard for unlimited access to cutting-edge models.

Vercel V0 added design tools to their prototyping platform, allowing manual UI changes alongside AI generation. They also added inline code generation for targeted modifications. These features mirror capabilities found in Lovable, which launched Agent mode in beta.

Anthropic added new capabilities with Claude-powered Artifacts, enabling users to build and share AI-powered apps with API usage paid for by end users (through their own Claude account). This represents a significant shift in how AI-powered applications might eventually be monetized.

Figma enhanced their Make tool to pull context from published style libraries, including color palettes, typography, and core styling elements. This allows designers to ensure their Make prototypes are consistent with an existing design system.

Agents + Dev Tools

The AI development ecosystem continues to heat up with new tools and enhanced capabilities...

The Cursor team announced Agents, a new web and mobile app that lets users control a team of agents from their smartphone, extending AI-powered development beyond the desktop.

Replit added Dynamic Intelligence and Web Search to their Agent tool, enhancing reasoning capabilities and overcoming training data limitations.

Google launched Gemini CLI, an open-source tool enabling developers to code with Gemini directly in the terminal for free with generous daily limits.

Anthropic expanded Integrations access for Claude Pro accounts, connecting to tools like Jira, Confluence, Cloudflare, and Zapier through MCP servers.

Warp AI launched Warp 2.0, featuring a coding agent, multi-threading management, terminal access, and a shareable knowledge base with MCP support.

Airtable relaunched with an AI-native platform featuring Omni, an agentic app-building assistant that leverages production-ready components while maintaining enterprise-grade reliability.

The legal landscape around AI training data saw significant developments.

Both Anthropic and Meta won initial cases absolving them from claims related to training on copyrighted works, potentially setting important precedents for the industry. The judge in the Anthropic case ruled that training models on legally-acquired content, including copyrighted books, constitutes fair use. However, downloading pirated books was not okay, even if they weren't used for training, and that issue will be addressed in a separate trial. In the Meta case the judge determined that the plaintiffs' simply didn't do enough to prove lost earnings, which is a key element of copyright law.

Disney and Universal Pictures decided to sue Midjourney, escalating efforts to reign in copyright violations on the popular image (and now video) generation platform.

Creative Commons unveiled CC Signals, a new opt-in metadata system allowing dataset owners to specify exactly how AI models may reuse their work. This represents a potential framework for more structured licensing in the AI era.

What This All Means

This month’s developments reveal several trends reshaping the AI landscape. The rapid commoditization of advanced capabilities across voice, video, and image generation suggests we’re moving beyond the experimental phase toward practical, production-ready tools.

The emergence of sophisticated agent workflows and comprehensive development environments indicates that AI is becoming integral to professional creative and technical processes. The $200/month pricing tier has become standard for unlimited access to frontier models, suggesting the industry has found a sustainable economic model for high-usage AI subscriptions.

Most significantly, the legal victories for major AI companies and the introduction of structured licensing frameworks like CC Signals suggest the industry is beginning to establish clearer boundaries around training data and copyright. This legal clarity, combined with the technical maturity evident in recent releases, ultimately positions AI tools for broader mainstream adoption.

The convergence of capabilities across modalities (with platforms now offering a mix of text, image, video, voice, and code generation) indicates we’re approaching a unified AI creative suite era. This consolidation will likely accelerate as companies compete to offer comprehensive solutions rather than one-off niche tools.

The month’s releases demonstrate that AI capabilities are less constrained by technological limitations at this stage, and the focus is shifting to user interface design, managing computational costs, and establishing legal frameworks for ethical training. The next phase of development will likely focus on making these powerful capabilities more accessible and legally compliant for widespread professional use.

Well, that's it for this month! Let me know what's resonated with you lately. Drop me a line in the comments, or send me an email. I'd love to hear from you.


Cover image generated with Midjourney V7. Editing assistance provided by Claude Sonnet 4.