352: Google Next: Rebrandapalooza

Welcome to episode 352 of The Cloud Pod, where the weather is always cloudy! Justin, Matt, and Ryan are safely back from Vegas (Ryan and Justin, anyway), and they have all the news and announcements from Google Next. Plus, we have Ryan’s take on Phish, news from Cloudflare, and a shoe company making a pivot. There’s a lot to cover, so let’s get started!

Titles we almost went with this week

Redact Yourself Before You Wreck Yourself OpenAI **Anthropic
Fork Yeah Cloudflare Artifacts Is Here
Git Happens at Scale on Cloudflare
Bucket List Item Checked Lambda Mounts S3 File Systems
Terraform Your Agents Before They Terraform You
Cloud Run Gets GPUs and Finally Hits the Gym
Spanner Goes Rogue, Leaves the Cloud Behind
Knowledge Catalog Knows What Your Agents Did Last Query
One Control Plane to Rule a Million Chips
No More Incognito Windows for Your AWS Identity Crisis
Your Agent Can Now Write Files Without Burning Everything Down
Spend Caps Finally Tell Runaway AI Jobs to Chill
RIP Vertex, long live the agent
Agents all the way down
Google Next: This is the dawning of the Age of Agentic
Allbirds Proves AI Hype Needs No Infrastructure

A big thanks to this week’s sponsors:

There are a lot of cloud cost management tools out there, but only Archera provides insured commitments. It sounds fancy, but it’s really simple. Archera gives you the cost savings of a 1 or 3-year AWS Savings Plan with a commitment as short as 30 days. If you do not use all the cloud resources you have committed to, Archera will literally cover the difference. Other cost management tools may say they offer “insured commitments”, but remember to ask: Will you actually give me my rebate? Because Archera will.

Check out thecloudpod.net/archera to schedule a demo today.

We also wanted to tell you about something coming to the US for the first time — WeAreDevelopers World Congress!

They’ve been doing this in Europe for years, 15,000-plus attendees in Berlin, it’s one of the biggest developer events over there. Coté from Software Defined Talk is actually speaking at their Berlin event this summer, so we’ve got some firsthand context here. In September, they’re launching the North America edition. San José, September 23 to 25. 500-plus speakers, 18 tracks — cloud, infrastructure, DevOps, security, AI, data engineering, all of it. Speakers from Datadog, Honeycomb, Sentry, Google, LinkedIn, and Stack Overflow. Olivier Pomel, Christine Yen, Milin Desai, Kelsey Hightower – plus workshops and masterclasses, not just talks. These are people who know how to do a developer conference at scale. wearedevelopers.us, code DEVPOD26 for 15% off. Group rates on top of that for 4 or more.

General News

06:12 Amazon invest up to $25 billion in Anthropic part of AI infrastructure

Amazon has committed up to $25 billion in additional investment in Anthropic, bringing its total potential investment to $33 billion. The latest $5 billion tranche is based on Anthropic’s $380 billion valuation, with up to $20 billion more tied to commercial milestones.
In exchange, Anthropic has committed to spending over $100 billion on AWS over the next decade, with a specific focus on Trainium custom AI chips, and plans to bring nearly 1 gigawatt of Trainium2 and Trainium3 capacity online by end of the year.
Anthropic cited real infrastructure strain from growing enterprise and consumer demand for Claude, noting reliability and performance impacts, which gives this deal a practical operational motivation beyond financial positioning.
Amazon is now a substantial investor in both Anthropic and OpenAI, having committed up to $50 billion to OpenAI in February, which raises notable questions for developers about how AWS positions competing AI platforms on its infrastructure.
With Anthropic also holding compute agreements with Microsoft Azure and Google, and now securing up to 5 gigawatts of total capacity, the company is distributing its infrastructure across multiple providers despite naming AWS its primary training partner.

08:46 📢 Justin – “The big question is going to be when one of these companies – OpenAI or Anthropic – finally goes public, and they start publishing these things; what people’s actual reaction is to their financials .”

10:48 SpaceX Strikes $60 Billion Deal for Right to Buy Coding Startup Cursor

SpaceX struck a deal with AI coding startup Cursor, valued at either a $60 billion acquisition or a $10 billion partnership fee, giving Cursor access to xAI‘s Colossus supercomputer, which runs 200,000 Nvidia H100-equivalent GPUs for model training.
Cursor had been compute-constrained despite reaching $1 billion in annual recurring revenue and a $29.3 billion valuation, so this deal directly addresses their infrastructure bottleneck for scaling model intelligence.
The partnership positions SpaceX to compete in the AI coding tools space against Anthropic and others, notable given xAI’s Grok has publicly acknowledged falling behind competitors in coding capabilities.
For developers and cloud users, this deal signals continued consolidation between compute providers and AI coding tools, which could influence pricing, model availability, and platform lock-in decisions for teams building on AI-assisted development workflows.
SpaceX’s recent acquisition of xAI, combined with this Cursor deal, suggests a vertical integration strategy connecting rocket-company compute infrastructure directly to developer-facing AI products ahead of a potential IPO later this year.

12:27 📢 Justin – “The thing I don’t get is the $10 billion partnership versus the $60 billion acquisition. What’s the triggering events on those things? When is it a partnership, versus when is it now an acquisition? And does that mean that these people who are working at Cursor – if it’s a partnership, aren’t getting equity? That’s a bummer.”

AI Is Going Great – Or How ML Makes Money

13:36 The next evolution of the Agents SDK

OpenAI has updated its Agents SDK to general availability, adding native sandbox execution, configurable memory, and filesystem tools modeled after Codex.
Agents can now read and write files, run shell commands, install dependencies, and apply code patches within controlled environments without developers building that infrastructure themselves.
The SDK introduces a Manifest abstraction that standardizes how agent workspaces are defined across sandbox providers, including Blaxel, Cloudflare, E2B, Modal, and Vercel, with storage integrations for AWS S3, Google Cloud Storage, Azure Blob Storage, and Cloudflare R2.
This gives developers a consistent path from local prototype to production deployment.
Built-in snapshotting and rehydration mean a failed or expired sandbox container does not terminate a long-running agent run, as the SDK can restore state in a fresh container from the last checkpoint. This addresses a practical reliability gap for agents working on multi-step tasks.
The SDK incorporates several emerging agentic standards, including MCP for tool use, AGENTS.md for custom instructions, and the skills spec for progressive capability disclosure.
OpenAI positions this as reducing the maintenance burden on developers as these patterns evolve.
The updated SDK is currently Python-only, with TypeScript support planned for a future release.
Pricing follows standard API rates based on tokens and tool use, and features like code mode and subagents are still in development for both language runtimes.

13:59 📢 Ryan – “As long as it also logs and has permissions and some sort of boundaries, I don’t have to kill it. It’s just terrifying because we already have people that are just throwing questions into any chat tool, and just then running whatever command it spits out indiscriminately. And now that’s just going to happen at a faster rate.”

19:56 Introducing Claude Opus 4.7

Claude Opus 4.7 is now generally available across Claude products, the API, Amazon Bedrock, Google Cloud Vertex AI, and Microsoft Foundry at the same pricing as Opus 4.6: $5 per million input tokens and $25 per million output tokens.
The model targets complex, long-running agentic coding workflows, with early testers reporting 13% higher resolution on a 93-task coding benchmark and 3x more production task resolution on Rakuten-SWE-Bench compared to Opus 4.6.
Vision capabilities received a notable upgrade, with Opus 4.7 now supporting images up to 2,576 pixels on the long edge, more than three times the resolution of prior Claude models.
This opens up use cases like computer-use agents reading dense screenshots and data extraction from complex technical diagrams, though higher-resolution images will consume more tokens.
Anthropic is using Opus 4.7 as a testbed for cybersecurity safeguards before any broader release of its more capable Mythos Preview model.
The model includes automatic detection and blocking of prohibited cybersecurity uses, with a new Cyber Verification Program available for legitimate security professionals doing penetration testing or vulnerability research.
A new high effort level sits between the existing high and max settings, giving developers finer control over the reasoning-versus-latency tradeoff.
Developers migrating from Opus 4.6 should note that the updated tokenizer can increase token counts by roughly 1.0 to 1.35 times, depending on content type, and a migration guide is available on the Claude platform.
File system-based memory improvements allow Opus 4.7 to retain notes across multi-session agentic work, reducing the need to re-establish context at the start of each task.
This is particularly relevant for enterprise teams running parallel agent workflows where continuity across long runs matters.

21:50 📢 Ryan – “I didn’t realize it’s the same price because every platform that I’m using this in, Opus 4.7 is so much more expensive than 4.6.”

28:14 Introducing Claude Design by Anthropic Labs

Anthropic launched Claude Design in research preview for Pro, Max, Team, and Enterprise subscribers, powered by Claude Opus 4.7.
It enables users to create interactive prototypes, pitch decks, wireframes, and marketing assets through conversational prompts and inline editing controls.
A notable workflow feature is the Claude Code handoff, where finished designs are packaged into a bundle that developers can pass directly to Claude Code for implementation, creating a tighter loop between design and engineering.
Claude Design builds a team-specific design system during onboarding by reading codebases and design files, then automatically applies brand colors, typography, and components to every subsequent project.
Teams can maintain multiple design systems simultaneously.
Early user data from Brilliant suggests complex pages that required 20-plus prompts in other tools needed only 2 prompts in Claude Design, indicating meaningful efficiency gains for interactive prototype creation.
Export options include Canva, PDF, PPTX, and standalone HTML, with organization-scoped sharing and collaborative editing.
For Enterprise customers, the feature is off by default and must be enabled by admins in Organization settings.

30:56 Building the agentic cloud: everything we launched during Agents Week 2026

Cloudflare held its first Agents Week, shipping a new set of primitives across compute, security, and tooling specifically designed for running AI agents at scale.
The core premise is that traditional cloud infrastructure built around one app serving many users does not fit a model where individual users each run multiple concurrent agents.
On the compute side, Cloudflare launched new environments supporting both full operating system containers for package installation and terminal commands, and lightweight isolates that start in milliseconds for high-scale deployments.
They also shipped a Git-compatible workspace designed for agent-generated code moving from prototype to production.
Security and identity were treated as built-in defaults rather than add-ons, with new tools for connecting agents to private networks and managing autonomous actions taken on behalf of users across an organization.
The agent toolbox additions include inference, search, memory, voice, email, and a browser primitive, giving agents the ability to perceive, remember, and communicate without developers assembling separate third-party services.
Cloudflare also addressed the web infrastructure side, releasing tools for existing websites to control bot access, package content for agent consumption, and measure their readiness for agent-driven traffic, acknowledging that most of the current web was built for human browsers rather than automated agents.

31:40 📢 Justin – “I look forward to Cloudflare taking down Cloudflare, and then writing an RCA with these great tools.”

32:14 Artifacts: versioned storage that speaks Git

Cloudflare launched Artifacts in private beta, a versioned file system built on Git that lets developers and agents programmatically create, fork, and manage Git repositories at scale via a REST API and native Workers API, with public beta targeted for early May 2026.
The system is built on Durable Objects with a Git server written in Zig and compiled to a roughly 100KB WebAssembly binary, enabling tens of millions of isolated repo instances per namespace while handling the full Git smart HTTP protocol with zero external dependencies.
Cloudflare is also open-sourcing ArtifactFS, a filesystem driver that mounts large Git repos using a blobless clone and lazy file hydration, reducing startup times for multi-gigabyte repos from roughly 2 minutes down to 10-15 seconds, which at 10,000 sandbox jobs per month translates to approximately 2,778 compute hours saved.
Beyond source control, Artifacts supports use cases like per-session agent state persistence, customer config versioning with rollback, and session forking, using Git semantics such as diff, revert, and clone as a general-purpose state management layer rather than just a code storage tool.
Pricing is designed for agent-scale workloads, charging based on storage consumed and operations performed rather than repo count, with plans to bring Artifacts to the Workers Free plan with fair use limits as the beta progresses.

32:54 📢 Justin – “…another way it’s going to take down Cloudflare, so I look forward to that.”

34:53 Cortex Agents: The Platform Powering Snowflake Intelligence and Enterprise AI Agents

Snowflake is launching Cortex Agents as a full enterprise agent platform with several capabilities now generally available, including multi-tenancy with row-level data isolation, agent versioning with commit-based rollback, resource budgets for per-agent and per-team spending controls, and Cortex Agent Evaluations using their GPA (Goal-Plan-Action) framework.
MCP connector support is coming soon to GA, allowing Cortex Agents to connect natively to external tools like Salesforce, Jira, GitHub, Slack, and Google Workspace using the Model Context Protocol standard, with the same Snowflake role-based governance applied to those external connections.
The Code Execution Tool (public preview soon) gives agents a sandboxed Python environment with session-level isolation, letting agents generate and run code on demand during conversations without accessing data outside the current session scope.
The GPA evaluation framework is a notable technical detail here: in benchmark testing against TRAIL/GAIA, it captured 95% of human-annotated errors compared to a 55% baseline, and localized errors to specific trace spans with 86% accuracy, giving teams a structured alternative to subjective human review.
The cost governance model is more granular than typical platforms, supporting both agent-level and per-team shared budgets with configurable threshold actions, such as alerts at 80% spend and automatic access revocation at 100%, which addresses a practical concern for enterprises deploying agents across multiple business units.

35:06 📢 Justin – “If you need your agents close to your data, this is a great way to do it. I definitely would look into cost with this one, because Snowflake is not cheap.”

36:11 Introducing GPT-5.5

GPT-5.5 is now generally available in ChatGPT and Codex for Plus, Pro, Business, and Enterprise users, with API access priced at $5 per 1M input tokens and $30 per 1M output tokens, and a Pro variant at $30 input and $180 output per 1M tokens.
The model shows notable agentic coding improvements, scoring 82.7% on Terminal-Bench 2.0 and 58.6% on SWE-Bench Pro, while using fewer tokens than GPT-5.4 to complete the same tasks, which partially offsets the higher per-token cost.
For cloud and enterprise workloads, GPT-5.5 was co-designed with and served on NVIDIA GB200 and GB300 NVL72 systems, with inference optimizations including dynamic load balancing heuristics that increased token generation speeds by over 20%.
Knowledge work benchmarks are worth noting for enterprise buyers: 84.9% on GDPval across 44 occupations, 78.7% on OSWorld-Verified for autonomous computer use, and 98.0% on Tau2-bench Telecom for customer service workflows, suggesting practical applicability across business functions.
OpenAI is classifying GPT-5.5 as High under its Preparedness Framework for both cybersecurity and biological capabilities, and is introducing a Trusted Access for Cyber program through Codex that gives verified defenders expanded access with fewer restrictions, which has direct implications for security teams evaluating AI-assisted vulnerability management.

37:31 📢 Ryan – “That’s kind of cool. That’s the first I’m hearing of those kind of frameworks for their testing, and testing the safety AI aspects and having a rating, which I like.”

37:58 Introducing workspace agents in ChatGPT

OpenAI is launching workspace agents in ChatGPT as a research preview for Business, Enterprise, Edu, and Teachers plans, positioning them as an evolution of GPTs powered by Codex and designed for shared team workflows rather than individual use.
These agents run persistently in the cloud, meaning they can continue working on long-running tasks without user interaction, and can be triggered on a schedule or deployed directly in Slack to handle incoming requests automatically.
The practical use cases OpenAI highlights include a lead outreach agent that reduced 5-6 hours of weekly rep work to an automated background process, and an accounting agent that handles month-end close tasks, including journal entries and variance analysis in minutes.
(mention privacy filters) On the enterprise controls side, admins get role-based access management, a Compliance API for auditing every agent configuration and run, built-in prompt injection safeguards, and the ability to suspend agents, which addresses a common concern about autonomous agents operating within sensitive business environments.
Pricing is worth noting for teams evaluating adoption: workspace agents are free until May 6, 2026, after which credit-based pricing kicks in, giving organizations a window to test and build before committing to costs.

38:42 Introducing OpenAI Privacy Filter

OpenAI released Privacy Filter, an open-weight 1.5B parameter model (with only 50M active parameters) for detecting and redacting PII in text, available now on Hugging Face and GitHub under the Apache 2.0 license for free commercial use and fine-tuning.
The model uses a bidirectional token-classification architecture with constrained Viterbi span decoding, processing up to 128,000 tokens in a single forward pass across eight PII categories, including private persons, addresses, account numbers, and secrets like API keys and passwords.
A key practical advantage for cloud and on-premise deployments is that the model runs locally, meaning sensitive data never needs to leave the device for de-identification, which directly reduces exposure risk in logging, indexing, and training pipelines.
Performance benchmarks show a 97.43% F1 score on the corrected PII-Masking-300k benchmark, and fine-tuning on small domain-specific datasets can lift accuracy from 54% to 96% F1, making it adaptable for legal, medical, and financial workflows.
OpenAI explicitly notes this is not a compliance certification or anonymization guarantee, and recommends human review in high-stakes settings, which is an important caveat for developers considering it as a drop-in solution for regulated industries.

38:52 📢 Justin – “If you’re looking for a lightweight built-in option inside of Codex to find privacy PII, this little model sits on top of it and does great work.”

40:33 Introducing ChatGPT Images 2.0

ChatGPT Images 2.0 can now handle small text, UI elements, icons, and complex layouts at up to 2K resolution; no more getting something “close enough. It actually delivers what you asked for.
Previous versions struggled outside of Latin-based text, but now it has solid support for Japanese, Korean, Chinese, Hindi, and Bengali, where the language is baked into the design itself.
When paired with a reasoning model, it can search the web, plan the image structure, self-check its work, and even produce multiple distinct images from a single prompt.
Images 2.0 supports everything from wide 3:1 banners to tall 1:3 mobile screens. Useful for social graphics, presentations, posters, and more, all without manual resizing.
This replaces the back-and-forth between prompting, designing, and editing. You describe what you need, it researches, writes, and visualizes from start to finish.

41:29 📢 Matt – “I like that it can do multiple at the same time. That’s a nice feature.”

Cloud Tools

42:19 Register domains wherever you build: Cloudflare Registrar API now in beta

Cloudflare Registrar API is now in beta, allowing developers to search, check availability, and register domains programmatically through three straightforward API endpoints, keeping the entire workflow inside editors, terminals, or agent-driven tools.
The API integrates directly with Cloudflare’s MCP server, meaning agents in environments like Cursor or Claude Code can already discover and call Registrar endpoints without any additional integration or custom tool definitions.
Cloudflare maintains its at-cost pricing model through the API, charging exactly what the registry charges with no markup, and WHOIS privacy protection is enabled by default at no extra charge.
Registration typically completes synchronously within seconds, but the API also handles longer operations by returning a 202 Accepted with a polling URL, using the same response shape either way to simplify agent logic.
The beta currently covers search, check, and registration for a curated set of TLDs, with Cloudflare actively working to expand the API to include transfers, renewals, contact updates, and eventually a broader registrar-as-a-service offering for multi-tenant platforms.

AWS

43:29 AWS Interconnect is now generally available, with a new option to simplify last-mile connectivity

AWS Interconnect is now generally available in two flavors: multicloud for private Layer 3 connections between AWS and other cloud providers (starting with Google Cloud, Azure coming later in 2026), and last-mile for connecting on-premises locations to AWS through network providers like Lumen, AT&T, and Megaport.
The multicloud option uses IEEE 802.1AE MACsec encryption by default on physical links, routes traffic entirely over private backbones without touching the public internet, and includes built-in redundancy across at least two physical facilities.
Pricing is a flat hourly rate based on bandwidth tier and region pair, so check the pricing page before sizing your connection.
Provisioning is handled through the AWS Direct Connect console in a few clicks, generating an activation key that completes the handshake on the partner cloud side.
However, there are gotchas to watch for, including non-overlapping IP ranges, matching MTU settings between VPCs, and consistent IPv4/IPv6 configuration on both sides.
Last-mile connectivity automatically provisions four redundant connections, configures BGP routing, enables MACsec and Jumbo Frames by default, and supports 1 Gbps to 100 Gbps with bandwidth adjustable from the console without reprovisioning. It includes a 99.99% availability SLA up to the Direct Connect port.
Current multicloud availability covers five region pairs across US East, US West, and Europe, connecting to Google Cloud, with last-mile launching in US East N. Virginia only.
The open specification published on GitHub under Apache 2.0 allows other cloud providers to implement the standard and become Interconnect partners.
AWS Interconnect -multicloud pricing is available here, and last-mile pricing can be found here.

44:29 📢 Justin – “Good to see it in GA; hopefully it gets expanded out pretty quickly.”

44:43 Amazon Quick for marketing: From scattered data to strategic action

Amazon Quick is an AI-powered marketing intelligence tool built on AWS that connects to existing tools like HubSpot, Salesforce, Slack, and Adobe to create a unified knowledge graph from scattered marketing data.
Pricing is available here, with support for MCP and OpenAPI integrations for extending to other systems.
The tool addresses three specific marketing pain points: campaign performance reporting, competitive intelligence, and content creation. Quick claims to reduce competitive analysis from days to 30 minutes and content production from three hours to under 20 minutes.
Quick Flows allow teams to automate recurring tasks like weekly performance summaries and monthly competitive reports on a schedule, shifting work from manual queries to automated delivery.
This is a notable distinction from standard AI chat assistants that require active prompting.
On the security side, Quick runs within the customer’s AWS environment, queries and responses are not used to train external models, and role-based access controls are included.
This positions it as an enterprise-focused offering rather than a consumer AI tool.
The product references an MIT study showing AI cut document creation time by 40% and improved output quality by 18% among 444 professionals, which gives some external grounding to the productivity claims.
Teams considering this should evaluate it against existing point solutions like dedicated BI tools or standalone AI writing assistants they may already have in place.

48:12 Amazon CloudWatch now supports cross-region telemetry auditing and enablement rules

CloudWatch now lets customers audit telemetry configuration and enable telemetry from services like EC2, VPC, and CloudTrail across multiple regions from a single control point, reducing the operational overhead of managing observability at scale.
Enablement rules can be scoped to specific regions or all supported regions, and rules set to cover all regions automatically expand to include new regions as they become available, which is useful for organizations with growing AWS footprints.
A practical use case is a central security team creating one organization-wide rule for VPC Flow Logs that consistently applies across every account and region, eliminating gaps in telemetry coverage that could create blind spots.
The feature is available in all AWS commercial regions with standard CloudWatch pricing applying to telemetry ingestion, so costs will scale with the volume of logs and metrics collected rather than the feature itself carrying an additional charge.
For teams managing multi-account AWS Organizations setups, this reduces the risk of misconfigured or missing telemetry in individual accounts, which has historically required custom automation or third-party tooling to enforce consistently.

47:58 📢 Ryan – “…this has always been a challenge, even before I was doing security and trying to do log governance across these things, trying to have different serving farms basically in multiple regions and having to log into different web pages to view the metrics on each one. They sort of fix that with the ability to reference metrics in a foreign site a little while ago, but you could only do it for metrics. And so this is definitely something I’m glad to see that you can use.”

50:15 Introducing granular cost attribution for Amazon Bedrock

Amazon Bedrock now automatically attributes inference costs to the IAM principal making the call, with data flowing into CUR 2.0 via a new line_item_iam_principal column.
This works across all Bedrock models at no additional cost and requires no changes to existing workflows.
The feature supports four distinct access patterns: direct IAM users or API keys, application roles on AWS compute, federated identity through providers like Okta or Azure AD, and LLM gateway architectures.
Each scenario has different configuration requirements, with the gateway scenario being the most complex since it requires per-user AssumeRole session management to avoid all traffic appearing under a single identity.
Cost allocation tags can be attached to IAM users or roles, or passed dynamically as session tags through identity providers, and once activated in AWS Billing, they appear in Cost Explorer under an iamPrincipal prefix. This enables chargeback reporting by team, project, cost center, or tenant without building custom tracking infrastructure.
For organizations running LLM gateways like LiteLLM or custom proxies, the solution requires the gateway to call AssumeRole per user and cache those credentials for up to one hour, which keeps STS call volume manageable but introduces architectural changes.
The default STS rate limit of 500 AssumeRole calls per second per account may require a limit increase for high-throughput deployments.
Tags take 24 to 48 hours to appear in Cost Explorer and CUR 2.0 after activation, and IAM principal data must be explicitly enabled in the CUR 2.0 data export configuration before any attribution data will appear.

52:12 AWS Lambda functions can now mount Amazon S3 buckets as file systems with S3 Files

Lambda functions can now mount S3 buckets as file systems using S3 Files, which is built on Amazon EFS, allowing standard file operations without the overhead of downloading objects or managing ephemeral storage limits.
Multiple Lambda functions can connect to the same S3 Files file system simultaneously, enabling shared workspaces without custom synchronization logic, which is particularly useful for multi-step AI and machine learning pipelines.
The integration pairs well with Lambda durable functions, where an orchestrator can clone a repository to a shared workspace while parallel agent functions analyze it, with automatic checkpointing handling execution state.
Configuration is supported through the Lambda console, AWS CLI, SDKs, CloudFormation, and SAM, though the feature is limited to Lambda functions not configured with a capacity provider.
Pricing adds no additional charge beyond standard Lambda and S3 rates, and the feature is available in all AWS regions where both Lambda and S3 Files are supported.

52:19 📢 Justin – “Thanks. Could have announced that last week.”

53:41 From developer desks to the whole organization: Running Claude Cowork in Amazon Bedrock

Claude Cowork is a desktop application (macOS and Windows) that lets knowledge workers delegate research, document analysis, data processing, and report generation to Claude, with all model inference routed through Amazon Bedrock in your AWS account rather than Anthropic’s infrastructure.
Pricing is consumption-based through your existing AWS agreement with no per-seat licensing from Anthropic, which is a notable distinction from Claude Enterprise and could make cost modeling more predictable for organizations with variable usage patterns.
Enterprise security controls are central to the integration, including AWS IAM or Bedrock API key authentication, VPC endpoint network isolation, CloudTrail audit logging, and OpenTelemetry export to CloudWatch, with Anthropic receiving only aggregate telemetry that can be disabled.
Setup relies on device management tools like Jamf, Microsoft Intune, or Group Policy to push a managed configuration to Claude Desktop, specifying the model ID, Bedrock inference profile, and auth method, which means IT teams control rollout rather than individual users configuring their own credentials.
Organizations already using Claude Code in Amazon Bedrock can reuse the same infrastructure setup for Cowork, and both in-region and cross-region inference profiles are supported to address data residency requirements across different geographies.

56:51 📢 Justin – “The problem is that instead of building a proper enterprise backend that would do all the things they want, they partnered with Work OS. And so while Work OS has a bunch of things, it doesn’t have all the things that you would want, and this is a problem also for OpenAI, as well, because they also partner the same way. And Snowflake partners with them. But some have done a better job than others in how they lay out some of these tools.”

57:54 Get to your first working agent in minutes: Announcing new features in Amazon Bedrock AgentCore

Amazon Bedrock AgentCore now includes a managed agent harness feature that lets developers define an agent’s model, tools, and instructions via API calls without writing orchestration code, reducing initial setup from days to minutes.
It supports popular frameworks, including LangGraph, LlamaIndex, CrewAI, and Strands Agents.
The new AgentCore CLI (available on GitHub at github.com/aws/agentcore-cli) keeps the full agent lifecycle in one workflow, covering local prototyping, deployment, and operations from a single terminal with CDK support and Terraform coming soon.
AgentCore now includes persistent session state via a durable filesystem, enabling agents to suspend mid-task and resume where they left off, which makes human-in-the-loop workflows practical without custom storage plumbing.
Pre-built coding agent skills give tools like Claude Code and Kiro curated knowledge of AgentCore best practices rather than just raw API access, with plugins for Codex and Cursor coming by the end of April.
The managed agent harness is in preview across four regions (Oregon, N. Virginia, Sydney, Frankfurt) with no additional charge for the CLI, harness, or skills beyond standard resource consumption.
Full pricing details are here.

58:49 📢 Ryan – “This is a great feature; this now makes it competitive with Vertex AI’s AgentBuilder, and so now it’s a useable option on Amazon. Awesome.”

GCP

Pre-Next Announcements

59:40 Gemini 3.1 Flash TTS: New text-to-speech AI model

Gemini 3.1 Flash TTS is now available in preview across three surfaces: the Gemini API and Google AI Studio for developers, Vertex AI for enterprises, and Google Vids for Workspace users, giving GCP customers multiple integration paths depending on their use case.
The model scored an Elo of 1,211 on the Artificial Analysis TTS leaderboard based on blind human preference testing, and was placed in the top quadrant for balancing speech quality with low cost, though specific per-character or per-request pricing was not disclosed in the announcement.
A new audio tags system lets developers embed natural language commands directly into text input to control vocal style, pace, tone, and accent at a granular level, including mid-sentence expression changes, which reduces the need for custom voice training pipelines.
The model supports native multi-speaker dialogue and more than 70 languages with localized style and accent controls, making it a practical option for developers building global or multilingual audio applications.
All generated audio is automatically watermarked using Google’s SynthID technology, embedding an imperceptible signal that allows detection of AI-generated content, which is a relevant consideration for enterprises with compliance or content authenticity requirements.

1:00:01 The Gemini App is now available on Mac OS

Google has released a native Gemini app for macOS, available free to all Gemini users on macOS 15 and above, downloadable at gemini.google/mac.
This is a desktop client rather than a GCP infrastructure announcement, so its relevance to enterprise GCP customers is indirect.
The app includes a screen-sharing feature that lets users pass local files and on-screen content directly to Gemini for context-aware assistance, which could be useful for analysts or developers reviewing complex outputs without leaving their workflow.
A keyboard shortcut (Option + Space) surfaces Gemini from any application, positioning it as a system-level assistant similar to Spotlight, aimed at reducing context-switching during tasks like spreadsheet work or document drafting.
The app integrates with Google’s existing generative media tools, including image generation via Imagen (Nano Banana) and video generation via Veo, giving creative users access to those capabilities without opening a browser.
Google has indicated this initial release is a foundation for a broader desktop assistant strategy, with additional features planned, so organizations evaluating AI assistant tooling for their teams should monitor how this evolves alongside Workspace and GCP integrations.

1:01:33 Create Expert Content: Deploying a Multi-Agent System with Terraform and Cloud Run

Google’s Dev Signal is a four-part tutorial series showing how to build and deploy a production multi-agent system using Google ADK, MCP, Vertex AI memory bank, and Cloud Run, with the full code available at the GoogleCloudPlatform devrel-demos GitHub repository.
The deployment architecture uses Terraform to provision least-privilege service accounts, Artifact Registry, and Secret Manager integrations, following the Agent Starter Pack patterns to avoid common security pitfalls like over-permissioned default compute accounts.
Observability is handled through OpenTelemetry integration with a single otel_to_cloud=True flag in the FastAPI server, which exports agent traces to Cloud Console showing LLM invocations and MCP tool calls, though production traces are sampled, so targeted evaluation runs are needed for full request visibility.
The system distinguishes between two types of monitoring: system traces for identifying latency and timeout issues at scale, and reasoning traces for targeted evaluation of specific agent decisions, which is a practical distinction teams often miss when moving prototypes to production.
Pricing for this stack depends on Cloud Run usage, Vertex AI memory bank calls, and Secret Manager API requests, all billed separately at standard GCP rates, so teams should factor in the multi-service cost model when estimating production expenses.

1:02:05 Google Next: the Conference

32K attendees, 3 keynotes, 25 spotlights, 700+ breakouts, 260 announcements (yeah, we counted.)

Justin:

Wiz + Google Cloud Security/Product Offering

- Antigravity IDE + Gemini CLI (agent mode) enhancements
  - Data Agent Kit with VS Code/ Claude Code and Gemini CLI (close but no cigar)
Ironwood TPU GA and/or dedicated Inference-based CHIP

Ryan

Gemini 3.1 Pro GA & Teasing Gemini 3.5 or 4 or future model

Enhancements with agents and Agentic (THE ENTIRE CONFERENCE)
VMware interruption based on Kubernetes? (Opposite of Tanzu)

Matt

Default Guardrails in AI in general. How Gemini will have guard rails via Vertex.

Agent Identity, Agent Gateway, and Model Armor

Agentic coding tooling and how developers are leveraging Agentic (SDLC)

Data Agent Kit & Agentic Task Force
3 Non AI Announcements (at the conference, but not on stage, so…)

This is genuinely the best we’ve ever done. Time to go buy a lotto ticket and lose.

Runner Ups

A2A protocol 1.0 released – Donated to CNCF

- Turboquant Ships in Vertex AI
- Something waymo

Biqquery AI Agents – Part of Data Agent Kit

Gemini 3.1 Flash GA

- Axion Gen 2

Nano bananas updates
Sovereign Cloud AI
Gemini Robotics API Preview
Hugging Face
AWS Activate type program
AP2 Payment Protocol
AI in Android
Gemini + Boston Dynamics
Glasswing Answer

How many times is AI said on stage?

JUST THE FIRST KEYNOTE WAS 132 Times!!

2nd Keynote: 55 Times

Matt – 99

Ryan – 75

Justin – 115 Winner

That makes Justin the overall winner for this year’s NEXT predictions.

Here’s our Claude-based tier ranking of the 260 announcements:

1:09:21 TIER S — Headline

Agent platform (Vertex AI evolves)

The story of the keynote. Vertex AI is being repositioned as the Gemini Enterprise Agent Platform
16 named sub-features in three buckets:
- Build: ADK (graph-based sub-agent networks), Agent Studio (no-code → ADK export), Agent Designer
- Run: Agent Runtime (sub-second cold starts), Agent Sandbox (now everyone), Memory Bank, Sessions, long-running agents
- Govern + Optimize: Agent Identity (cryptographic ID per agent), Agent Registry, Agent Gateway, Anomaly Detection, Security dashboard, Simulation, Evaluation, Optimizer
Strategic frame: the agent is now the unit of work, not the model call
Hot takes to fight over:
- Is “delegating business outcomes” the new “infrastructure as code”?
- Does the 16-feature stack feel cohesive or like marketing bolt-ons?
- Does this simplify Vertex’s SKU sprawl or make it worse?
Matt’s guardrails prediction lands naturally inside the Govern bucket.

1:10:27 Customer scale — agents actually in production

Strongest competitive flex of the keynote. Pick 4-5 to land on-mic:
- Mars — Gemini Enterprise as the primary AI operating system for the global workforce (the headline customer)
- Merck — agentic platform across R&D, manufacturing, commercial; 75K employees
- GE Appliances — 800+ agents across manufacturing, logistics, supply chain
- Tata Steel — 300+ specialized agents in 9 months
- Deutsche Telekom MINDR — 95%+ reduction in event management times (best ROI quote)
- Citadel Securities — TPU research workloads 4x faster, 30% lower cost, days → minutes
- Highmark Health Sidekick — $27.9M in value in 2025 alone
Frame: Last year was “agents are coming,” this year was “here’s the receipts”
Hot take: name the equivalent customer slate from AWS, Azure, or Snowflake. You can’t.
Skip if tight on time: Home Depot Magic Apron, Macy’s “Ask Macy’s”, Papa John’s, Virgin Voyages Rovey, Capcom, Citi Sky, Vodafone, Unilever

1:11:44 TPU 8t and 8i (the silicon split)

8t (training): ~3x compute vs Ironwood
8i (inference): purpose-built, 80% better perf/$, optimized for MoE + agentic workloads
TorchTPU: native PyTorch, full Eager Mode — kills the JAX-only friction
Strategic: only hyperscaler shipping dedicated inference silicon this generation
Practitioner angle: agent workloads (lots of small inference calls) tilt economically toward Google if 80% perf/$ holds in production
Justin’s prediction wins twice — he specifically called the dedicated inference chip

1:12:22 TIER A — Strong second tier

Wiz expands (multi-cloud agent visibility)

Lead with: acquisition formally closed
Wiz AI-APP — code-to-cloud-to-runtime AI Application Protection Platform
Killer move: Wiz now supports AWS Agentcore, Azure Copilot Studio, Salesforce Agentforce, Databricks
- Google is selling security to customers who’ll never run a workload on GCP
- Different posture than they’ve had historically
Other Wiz news worth a mention:
- Inline AI security hooks in IDEs
- Wiz Skills — validated attack-surface findings exposed to coding agents for auto-remediation
- AI-Bill of Materials — auto-inventory of every AI framework, model, IDE extension across your environment (shadow-AI killer)
- Lovable vibe-coding integration (security scanning inside Lovable)
Hot take: most strategically interesting acquisition payoff Google has shipped in years.

1:13:46 Partner fund — $750M + Forward-Deployed Engineers

$750M innovation fund for partner agent development
Agent Marketplace + Agent Gallery — 70+ partner-built agents at launch
- Accenture, Adobe, Atlassian, Deloitte, Lovable, Oracle, Palo Alto, Replit, S&P Global, Salesforce, ServiceNow, Workday
Forward-Deployed Engineers with Accenture, Deloitte, McKinsey — Google making its own engineers available through partner GTM
Hot take: this is a Palantir-style move. Google admitting agent adoption needs hand-holding and putting money + bodies behind it
Open question: Does this reshape the SI economics, or is it just GTM theater?

1:14:48 Antigravity + Data Agent Kit + Gemini 3.1 Pro

Gemini 3.1 Pro in preview across Vertex / Gemini Enterprise / Antigravity / Android Studio / Gemini CLI / AI Studio
Data Agent Kit — portable suite of skills, MCP tools, plugins; turns VS Code and Gemini CLI into native data workspaces
Full-stack vibe coding from AI Studio → Cloud Run is now GA (Firestore + auth out of the box)
Hot take: this is the developer story. Cursor / Claude Code / Replit competitors take note.
Justin and Ryan both have prediction wins here

1:15:25 Agentic Data Cloud — Knowledge Catalog + Cross-cloud Lakehouse + Spanner Omni

Knowledge Catalog — universal context engine; maps business meaning across the data estate. Foundation for accurate agent execution.
Cross-cloud Lakehouse (BigLake renamed) — Iceberg REST Catalog, federation with AWS Glue / Databricks / Snowflake / SAP, cross-cloud caching cuts egress
Spanner Omni — Spanner runs multi-cloud, on-prem, even on a laptop
- This is the most underrated announcement of the keynote
- Fight over: is this the new Aurora-anywhere? Does it actually pull workloads off RDS / Cosmos?
Lakehouse federation for AlloyDB — live joins between transactional + analytical without ETL.

1:17:17 TIER B — Solid block

Workspace AI — Workspace Intelligence + Studio

Workspace Intelligence — unified semantic understanding across Docs / Slides / Gmail / projects / org domain knowledge
Workspace Studio — no-code agent builder; skills deployable across Workspace
M365 → Workspace migration tool — competitive shot at Microsoft, easy to move emails/files/conversations
Sovereign controls + client-side encryption — lock processing to US/EU; CSE means even Google can’t see
Auto browse with Gemini in Chrome Enterprise (US)

1:17:53 Cloud Run grew up

Full-stack vibe coding deploy from AI Studio (GA)
NVIDIA RTX PRO 6000 Blackwell support — run 70B+ parameter models without managing GPU infra, scales to zero
Billing caps (long-requested!) — set max monthly spend, resources de-activate when hit
Cloud Run sandboxes for ephemeral isolated agent execution
SSH into running containers (preview)
Hot take: Cloud Run is positioning itself as the default agent runtime, period

Gemini Enterprise for CX

Shopping agent + Food Ordering agent (Papa John’s first user)
Omnichannel Gateway — agent context across web / mobile / voice
Agent Assist — coaching mode for human agents in complex situations

1:19:04 BigQuery AI

AI.PARSE_DOCUMENT — single SQL function for OCR + layout + chunking via Gemini’s layout parser
TabularFM — zero-shot regression/classification, no feature engineering
BigQuery Graph — entity/relationship modeling natively in the warehouse
Reverse ETL — one-click sync from lakehouse to AlloyDB/Spanner for low-latency serving
Connected Sheets with TimesFM — zero-shot forecasting in Google Sheets
BigQuery hybrid search — semantic + full-text in one function
35% YoY perf improvement, lower processing cost
Hot take: biggest “Monday morning” change for data teams in the entire keynote

1:19:32 TIER C — Lightning round

Virgo Network

Custom interconnect: 134K TPUs in a single fabric, 1M+ across sites
A5X with NVIDIA Vera Rubin NVL72 — up to 960K GPUs cross-site
The “we can scale further than anyone else” mic drop

1:20:05 Rapid storage

Rapid Bucket — 15 TB/s bandwidth, 20M req/s, sub-millisecond latency, single-zone
Rapid Cache (formerly Anywhere Cache) — 2.5 TB/s aggregate read; 2.2x faster checkpoint restores
Managed Lustre at 10 TB/s throughput; 2.6x faster checkpoints

1:20:54 Axion expands

N4A GA — 2x price/perf vs x86; 30% better perf/$ for GKE Agent Sandbox vs other hyperscalers
C4A.metal preview — first Axion bare metal (Android dev, automotive sim, custom hypervisors)
Confidential Computing on G4 (Blackwell) + C4 (Granite Rapids) — confidential AI workloads

1:21:54 Fraud Defense

reCAPTCHA evolves into a platform that distinguishes bots, humans, AND agents
Agent-specific capabilities coming for the digital commerce journey (account → payment → checkout)
Closest thing in the wrap-up to the AP2 protocol prediction nobody hit

1:21:50 Post-quantum crypto

KMS Quantum Safe Key Imports (preview)
PQC in Cross-Cloud Network
Boring but important — Google front-running the regulatory ask

1:22:00 GKE upgrades

4x faster node startup, 80% faster pod startup, 5x faster model loading
GKE hypercluster — single control plane, millions of accelerators, multi-region (private GA)
Predictive latency boost in GKE Inference Gateway — up to 70% lower time-to-first-token
KV Cache tiering across RAM / Local SSD / Cloud Storage / Lustre
RL Scheduler, RL Sandbox, RL Observability for reinforcement learning workloads

1:22:33 Three themes that emerged

Agent platform is the new operating system. Vertex’s rebrand to Gemini Enterprise Agent Platform isn’t cosmetic — Google restructured the portfolio so the unit of work is an agent, not a model call.
Wiz is now Google’s multi-cloud trojan horse. Supporting AWS Agentcore + Azure Copilot Studio + Salesforce Agentforce means Google is happy to sell security to customers who’ll never run on GCP. New posture.
Customer scale is the real flex. Mars, Merck (75K employees), GE Appliances (800 agents), Tata Steel (300 in 9 months), Deutsche Telekom (95% MTTR reduction). Other hyperscalers can match the silicon. They can’t yet match this deployment depth on stage.

1:23:00 Conspicuously absent

A2A 1.0 / CNCF donation — third-party press reported it, not in the official wrap-up
No Boston Dynamics or Waymo crossover
No Gemini Robotics API preview
No Hugging Face deal
No AP2 Payment Protocol (Cloud Fraud Defense is the closest cousin)
No Nano Banana update
No Glasswing answer
No Turboquant in Vertex

1:23:24 Less important stuff

Bigtable in-memory; Memorystore for Valkey 9.0
AlloyDB AI search at 10B vectors; new AlloyDB AI functions
Firestore Enterprise edition (full-text + geospatial + JOINs)
Firebase SQL Connect; Firebase Phone Number Verification
NetApp Volumes Flex Unified + ONTAP-mode
Filestore for GKE; Hyperdisk Exapools / ML / Balanced improvements
Cloud WAN expansion to 25+ countries; NCC Gateway with Palo Alto + Symantec
Cloud Armor managed rules (Thales Imperva); Cloud NGFW Advanced Malware Sandbox
Private Service Connect: 40+ published services, endpoint-based security
Looker Studio renamed to Data Studio; Looker Dashboard Agents; AI assistants
CME Group ultra-low-latency partnership for financial exchanges
Google for Startups AI Agents Challenge ($90K prize, $500 credits)

Google Cloud Next 2026 Wrap Up

Google Cloud Next 26 featured 260 announcements centered on what Google calls the “Agentic Era,” with the headline being the Gemini Enterprise Agent Platform, which replaces Vertex AI as the primary platform for building, scaling, and governing AI agents with new components like Agent Runtime (sub-second cold starts), Agent Memory Bank, Agent Identity with cryptographic IDs, and Agent Gateway for fleet management.
On the infrastructure side, Google announced 8th-generation TPUs split into two variants: TPU 8t for training workloads delivering roughly 3x higher compute than the previous generation, and TPU 8i for inference and reinforcement learning with up to 80% better performance-per-dollar, alongside new Axion-based N4A VMs now generally available at up to 2x better price-performance than comparable x86 VMs.
The Agentic Data Cloud introduces a Knowledge Catalog as a universal context engine, a Cross-Cloud Lakehouse (formerly BigLake) built on Iceberg REST Catalog spanning AWS and Azure, and Spanner Omni, which extends Spanner’s globally consistent database to run on-premises or on other clouds, addressing the challenge of agents needing consistent data access across fragmented environments.
Security got notable attention with the completed Wiz acquisition now reflected in integrated tooling, Model Armor expanding to Agent Gateway and Firebase, a new Google Cloud Fraud Defense platform (evolved from reCAPTCHA) now generally available, and post-quantum cryptography support in Cloud KMS for quantum-safe key imports, all aimed at securing agentic workloads specifically.
Storage announcements include the new Cloud Storage Rapid Bucket delivering over 15 TB/s bandwidth with sub-millisecond latency now generally available, Managed Lustre Dynamic tier priced at $0.06/GB-month, and Hyperdisk ML throughput increased to 2 TB/s aggregate, all targeting the checkpoint and model loading bottlenecks common in large-scale AI training.

Next ‘26 day 1 recap

Google Cloud Next ’26 centered on moving AI into production at enterprise scale, with the Gemini Enterprise platform serving as the connective tissue across a unified stack spanning chips, models, data, agents, and security. The Gemini Enterprise Agent Platform is essentially a rebranded and expanded Vertex AI with new tools for building, scaling, governing, and optimizing agents.
On the infrastructure side, Google announced two new TPU 8 variants with distinct purposes: TPU 8t for training scales to 9,600 TPUs with 2 petabytes of shared memory, while TPU 8i for inference delivers 80% better performance per dollar than the prior generation using a new Boardfly topology. The new Virgo Network and Google Cloud Managed Lustre at 10 terabytes per second throughput round out the infrastructure updates.
The Agentic Data Cloud rebrands and expands Google’s data platform with notable additions, including a Knowledge Catalog for contextual grounding, a Lightning Engine for Apache Spark claiming 4.5x speed over open-source alternatives, and a Cross-Cloud Lakehouse based on Apache Iceberg that lets customers query data in AWS or Azure without copying it.
Security got substantial attention with three new agents in Google Security Operations for threat hunting, detection engineering, and third-party context enrichment, all currently in preview. The Wiz acquisition is now complete, and new Wiz integrations include inline security scanning in IDEs, an AI Bill of Materials for inventorying AI frameworks and models, and a Lovable platform integration generally available in May.
Google Workspace is being repositioned from a productivity suite into what Google calls a semantic intelligence layer, with new features like AI Inbox in Gmail, Drive Projects as an active collaborator, and an Ask Gemini interface in Google Chat that can take actions like scheduling meetings or creating documents directly from the chat window.

Next ’26 day 2 recap

Google Cloud Next Day 2 centered on the Gemini Enterprise Agent Platform, positioned as the evolution of Vertex AI, offering tools to build, scale, govern, and optimize autonomous agents. The keynote used a multi-agent marathon route planner for Las Vegas as a practical demonstration of the platform’s capabilities.
The Agent Development Kit, remote MCP servers, and Agent Runtime work together to give agents instructions, skills, and tools, while Agent Registry functions as a DNS-like directory for discovering and connecting deployed agents across a system.
Agent Platform Sessions and Memory Bank address a common problem in agentic systems by allowing agents to retain learned knowledge across interactions without stuffing raw text into every request, which improves performance over time.
Debugging and observability are handled through Agent Runtime trace view and Gemini Cloud Assist, which let developers use natural language to investigate logs and pinpoint issues, with fixes applied directly from an IDE connected via MCP and redeployed automatically.
Security is addressed through Agent Identity, which gives each agent a unique, immutable credential, and Agent Gateway, which enforces IAM policies to restrict agent actions to approved sources. Wiz integration adds code and infrastructure scanning with remediation suggestions, and notably supports Anthropic Claude Code as an alternative tooling option alongside Google’s own tools.

Partner-built agents available in Gemini Enterprise

Google has added partner-built agents from its Agent Marketplace directly into the Agent Gallery inside the Gemini Enterprise app, with partners including Salesforce, ServiceNow, Workday, Oracle, Atlassian, and Palo Alto Networks, among others. Each agent must pass a four-step evaluation covering basic functionality, output accuracy, autonomous execution, and enterprise standards to earn the Google Cloud Ready – Gemini Enterprise designation.
The governance model is worth noting for enterprise IT teams: employees can browse and request agents, but administrators retain approval control over deployments and can manage access at a granular level. Every agent also gets a cryptographically secure identity for audit trail purposes, and Agent Gateway plus Model Armor screen traffic to prevent data from being used for model training.
Google announced a 750 million dollar partner fund for agentic development alongside this launch, and partners selling through the Marketplace are reportedly closing deals 112 percent larger, with purchasing cycles accelerating by up to 50 percent. This creates a clear commercial incentive for ISVs to build and list agents on the platform.
The agent catalog covers a wide range of industries and functions, including supply chain optimization from Accenture, tariff management from Deloitte, financial analysis from S&P Global, identity security from Saviynt, and healthcare intake workflows from Synthpop. This breadth suggests Google is positioning the Agent Gallery as a general-purpose enterprise AI distribution channel rather than a niche tool.
Pricing for individual agents will vary by partner and likely requires existing subscriptions in some cases, such as the Alteryx AI Insights Agent requiring an Alteryx One subscription. Gemini Enterprise offers a 30-day free trial at console.cloud.google.com/freetrial for organizations wanting to evaluate the platform before committing.

Level Up Your Agents: Announcing Google’s Official Skills Repository

Google announced an official Agent Skills repository at Cloud Next 2026, launching with 13 skills covering products like BigQuery, Cloud Run, GKE, Firebase, and Gemini API, plus Well-Architected Framework pillars and recipe-style guides for common tasks. The repository is available at github.com/google/skills and is free to use.
Agent Skills address a practical problem called context bloat, where loading too much information into an AI agent’s context window increases token costs and degrades model performance. Skills are compact Markdown-based documents that agents load only when needed, rather than pulling in full documentation sets.
The format is described as open, meaning it is not locked to Google’s own tooling. Skills work with Google’s Antigravity and Gemini CLI agents as well as third-party agents, and installation is handled via a single npx command.
The announcement positions Skills as a complement to existing approaches like the Google developer documentation, the MCP server, giving practitioners a lighter-weight alternative when full real-time documentation grounding is unnecessary or too costly.
For teams building AI agents on top of Google Cloud services, this provides a structured way to keep agents accurate on GCP-specific APIs and best practices without manual prompt engineering or expensive context loading. Google indicated that more skills will be added in the coming weeks.

Introducing Gemini Enterprise Agent Platform

Google launched the Gemini Enterprise Agent Platform, which consolidates Vertex AI capabilities with new agent-specific tooling for building, scaling, governing, and optimizing AI agents. All future Vertex AI services and roadmap updates will be delivered exclusively through this platform rather than as a standalone service.
The platform introduces four governance-focused components: Agent Identity assigns each agent a unique cryptographic ID for auditable trails, Agent Registry maintains a central library of approved tools, Agent Gateway enforces security policies across environments, and Agent Anomaly Detection flags unusual reasoning using an LLM-as-a-judge framework.
Agent Runtime now supports long-running agents that maintain state for multiple days, with sub-second cold starts and a Memory Bank for persistent context across sessions. This addresses a practical gap where most agent frameworks previously lost context between interactions.
Developers can access over 200 models through Model Garden, including Gemini 3.1 Pro, Gemma 4, and third-party models like Anthropic Claude, with a low-code Agent Studio path and a code-first Agent Development Kit that processes over six trillion tokens monthly. Agent Garden provides pre-built templates for use cases like invoice processing, financial analysis, and code modernization.
Real-world deployments mentioned include Comcast rebuilding its Xfinity Assistant, Color Health using agents to schedule cancer screenings, and PayPal using Agent Payment Protocol for secure agent-based commerce. Pricing details are not specified in the announcement and would need to be confirmed through the Google Cloud console at console.cloud.google.com/agent-platform/overview.

Gemini Cloud Assist at Next ‘26

Gemini Cloud Assist is shifting from a reactive assistant to a proactive operations platform, using an agentic architecture to handle tasks like infrastructure troubleshooting, cost anomaly detection, and application design without waiting for user prompts.
The redesigned Application Design Center lets teams describe infrastructure goals in plain language and get back visual architectures with deployable Terraform templates, integrated with Security Command Center to enforce organizational policies from the start.
A 24/7 FinOps agent monitors for cost anomalies and correlates spending spikes with specific triggers like auto-scaling events or new resource creation, allowing teams to query cost data in natural language instead of manually aggregating reports.
MCP server support extends Gemini Cloud Assist beyond the Google Cloud console into IDEs, CLIs, and third-party tools like ServiceNow and Slack, reducing context switching for development and operations teams.
Petco reported a 60% reduction in Google Cloud-related questions to their cloud team after adopting Gemini Cloud Assist, suggesting meaningful productivity gains for platform teams supporting large developer organizations. Pricing details are not specified in the announcement, so teams should check the Gemini Cloud Assist admin console for current costs.

Unify analytical and operational data for AI

Google announced what it calls an “Agentic Data Cloud” at Google Cloud Next, focused on eliminating the separation between operational and analytical data systems. The goal is to let AI agents query both live transactional data and historical analytical data without complex data movement pipelines.
Three specific capabilities are now available or in preview: Lakehouse federation for AlloyDB lets operational systems query BigQuery data directly, Reverse ETL for BigQuery pushes analytical results into AlloyDB, Bigtable, or Spanner with sub-millisecond read latency, and the Spanner Columnar Engine is now GA with analytical queries running up to 200 times faster than standard transactional queries.
Datastream now supports real-time Change Data Capture into Apache Iceberg tables from AlloyDB, Cloud SQL, Spanner, and Oracle, streaming operational changes directly into the open Lakehouse format for immediate use in BigQuery ML and feature engineering workflows.
Knowledge Catalog, formerly Dataplex, is being extended with integrations across AlloyDB, BigQuery, Bigtable, Cloud SQL, and Spanner to provide a unified metadata layer. The intent is to reduce inconsistent data definitions that can cause AI agents to produce inaccurate outputs.
Native vector and full-text search are being embedded directly into AlloyDB, Bigtable, Cloud SQL, Firestore, and Spanner, and graph federation is being added across BigQuery and Spanner. This removes the need to move data into separate search or graph engines for hybrid retrieval and GraphRAG patterns. Pricing for these features is not specified in the announcement and would vary by service and usage.

Introducing the Google Cloud Knowledge Catalog

Google is evolving its existing Dataplex service into the Knowledge Catalog, a context engine designed to feed AI agents accurate business semantics, data relationships, and verified SQL patterns to reduce hallucinations and improve query accuracy.
The service aggregates metadata from a broad range of sources, including BigQuery, AlloyDB, Spanner, Cloud SQL, and third-party catalogs like Collibra and Atlan, plus enterprise platforms like SAP, Salesforce, and Workday through a preview feature called Enterprise Connectivity.
A notable enrichment capability is Smart Storage, which automatically tags and embeds metadata for files as they land in Google Cloud Storage buckets, making unstructured data immediately discoverable by agents without manual curation steps.
The search layer uses hybrid retrieval with access control awareness, meaning agents can only retrieve data assets they are explicitly authorized to see, which addresses a practical governance concern when deploying autonomous agents at enterprise scale.
Bloomberg Media is cited as an early customer, using Knowledge Catalog to power an internal Data Access AI Agent that translates business questions against their data lake. Pricing details are not publicly listed, so teams evaluating this should check cloud.google.com/products/knowledge-catalog for current information.

The future of data lakehouse for the agentic era

Google Cloud announced a next-generation cross-cloud Lakehouse built around Apache Iceberg, offering fully managed Iceberg storage with read/write interoperability across BigQuery, Managed Apache Spark, and third-party engines like Databricks and Snowflake (Preview). The goal is to let teams process the same data across multiple engines without duplication, which Spotify is already doing across BigQuery and Dataflow.
A new cross-cloud interconnect and caching capability (Preview) gives BigQuery and Managed Apache Spark high-performance access to data stored in AWS S3 Iceberg tables, with claimed price-performance comparable to AWS-native solutions. Catalog federation (Preview) extends this to AWS Glue, Databricks, SAP, and Snowflake, with Confluent Tableflow support coming later this year.
The Lightning Engine for Apache Spark claims up to 2x price-performance over competing high-speed Spark alternatives using vectorized execution and optimized I/O, with no code changes required. This runs within Managed Service for Apache Spark, formerly known as Dataproc.
Knowledge Catalog (formerly Dataplex) now provides always-on context for AI agents by continuously learning how enterprise data is used and mapping relationships within unstructured files. This feeds grounded context to agents built with tools like Agent Developer Kit and Model Context Protocol.
Real-time change replication from Spanner, AlloyDB, and Cloud SQL into BigQuery is now GA, with Iceberg replication in Preview, enabling operational data to feed directly into lakehouse workloads. Pricing is not specified in the announcement and would vary based on storage, compute, and cross-cloud data transfer usage.

What’s New in the Agentic Data Cloud

Google is rebranding and expanding Dataplex Universal Catalog into the Knowledge Catalog, which aggregates business context from third-party platforms like Salesforce, SAP, ServiceNow, and Workday, then uses hybrid search with access-control-aware retrieval so agents only act on data they are authorized to see.
The new Google Cloud Data Agent Kit (Preview) drops into existing developer environments like VS Code, Gemini CLI, and Claude Code, automatically selecting frameworks like dbt, Spark, or Airflow and generating production-ready code, with three specialized agents for data engineering, data science, and database observability now available at various GA and Preview stages.
Google is expanding MCP support across BigQuery, Spanner, AlloyDB, Cloud SQL, and Looker, using existing IAM policies and VPC Service Controls to govern agent interactions rather than requiring separate security configurations.
The cross-cloud lakehouse now supports bi-directional federation with Databricks Unity Catalog, Snowflake Polaris, and AWS Glue Data Catalog using the open Iceberg REST Catalog standard, and Spanner Omni (Preview) extends the Spanner engine to run on-premises or across other clouds for the first time.
On the performance side, Google is citing up to 2x price-performance improvement for Apache Spark via Lightning Engine, up to 34% cost reduction for BigQuery autoscaling workloads, sub-millisecond Bigtable reads via a new in-memory tier, and up to 10 terabytes per second throughput with Managed Lustre, though specific pricing details were not disclosed in the announcement.

Next 26 storage announcements

Cloud Storage Rapid is now generally available in two forms: Rapid Bucket, which uses Google’s internal Colossus system to deliver over 15 TB/s bandwidth and sub-millisecond latency, and Rapid Cache, which provides 2.5 TB/s aggregate read throughput for existing buckets with no code changes. The headline numbers for AI training are checkpoint writes 3.2x faster and restores 5x faster compared to traditional object storage.
Google Cloud Managed Lustre now delivers up to 10 TB/s throughput, a 10x increase from last year, and adds a new Dynamic tier priced at $0.06 per GB per month that serves data from persistent disk rather than object-based caching to avoid performance degradation under load.
Smart Storage adds automated metadata annotation directly in Cloud Storage, so objects get labels, extracted entities, and compliance signals attached at write time without custom pipelines. A new Cloud Storage MCP server lets AI agents read, write, and analyze Cloud Storage data using the standard Model Context Protocol, which reduces the need for separate retrieval layers.
Storage Intelligence, already used by 70% of Google Cloud’s largest customers managing over 50 billion objects each, gets zero-configuration dashboards that surface cost anomalies and integrate Security Command Center’s data governance signals with no setup required, plus enhanced batch operations supporting multi-bucket actions on billions of objects at once.
The ecosystem additions include NetApp Volumes Flex Unified, supporting both block and file protocols on the same storage pool with ONTAP API compatibility, Filestore for GKE scaling down to 100 GiB shares, and Google Cloud Backup and DR gaining agentic AI capabilities to autonomously audit and remediate backup coverage gaps with new GA support for AlloyDB and Filestore.

Introducing Virgo Network megascale data center fabric

Google introduced Virgo Network, a specialized scale-out data center fabric designed for AI workloads, built on a flat two-layer non-blocking topology that reduces network tiers and latency compared to traditional data center architectures. It underpins the AI Hypercomputer platform and connects up to 134,000 TPU chips with up to 47 petabits per second of non-blocking bi-sectional bandwidth in a single fabric.
The architecture separates east-west accelerator traffic (handled by Virgo) from north-south storage and compute traffic (handled by the existing Jupiter network), allowing each layer to evolve independently without system-wide disruptions. This decoupling also means bandwidth dedicated to accelerator-to-accelerator communication is non-blocking and not competing with general data center traffic.
Virgo delivers 4x the bandwidth per accelerator and 40% lower unloaded fabric latency compared to the previous generation, which matters specifically for latency-sensitive inference workloads and large synchronized training jobs where a single slow node can degrade the entire cluster.
Reliability at this scale is addressed through independent switching planes for fault isolation, sub-millisecond telemetry for observability, and automated straggler and hang detection to minimize training job interruptions. Google frames this around maximizing “goodput,” meaning the useful work completed relative to total time, rather than just raw throughput.
No pricing details were provided in the announcement, as Virgo Network is infrastructure-level and costs would surface through TPU and AI Hypercomputer product pricing rather than as a standalone purchasable service.

What’s new for Google Cloud databases at Next’26

Google announced Spanner Omni, a downloadable edition of Spanner that runs outside of Google Cloud, including on-premises data centers, other clouds, and edge environments. This gives organizations using Spanner’s distributed database capabilities more deployment flexibility without being locked into a single cloud region or provider.
AlloyDB received notable vector search improvements, scaling to 10 billion vectors using Google’s ScaNN index and delivering up to 6 times faster vector queries compared to standard PostgreSQL HNSW indexes. The addition of native BM25 support, coming soon, enables hybrid search combining vector retrieval with full-text search in a single database.
Managed remote MCP servers are now generally available for AlloyDB, Bigtable, Cloud SQL, Firestore, and Spanner, with preview support for Memorystore, Datastream, and Oracle Database at Google Cloud. This removes the operational burden of self-hosting Model Context Protocol infrastructure for teams building AI agents that need secure, reliable access to enterprise data.
The lakehouse integration announcements bridge the gap between transactional and analytical workloads, with AlloyDB now able to query live BigQuery and Iceberg tables directly from the PostgreSQL data plane without data movement. Datastream also now supports continuous replication from AlloyDB to Iceberg tables, which is useful for real-time ML feature engineering pipelines.
Bigtable is adding a new in-memory tier with sub-millisecond read latency as part of a new Enterprise Plus edition, and Memorystore for Valkey 9.0 is now generally available with a managed migration path from self-managed Redis. Both updates reflect Google’s push to offer managed caching and low-latency storage options with enterprise security features like ACLs and token-based authentication.

Introducing Spanner Omni

Spanner Omni is a downloadable version of Google’s Spanner database now in preview, allowing deployment on-premises, across clouds, on Kubernetes clusters, or even a laptop, rather than being limited to Google Cloud infrastructure. The developer edition is available for free download today at the link in the show notes, with a commercial edition requiring direct contact with Google.
On the technical side, Google had to replace two core Spanner dependencies to make this work. Colossus, Google’s proprietary distributed file system, was replaced with a software abstraction layer that writes to local file systems, and TrueTime’s atomic clock and GPS-based synchronization was replaced with a software-based alternative that still provides error-bounded time synchronization.
Internal benchmarks show Spanner Omni can process millions of queries per second across petabytes of data in a single regional deployment, and it supports the full multimodal feature set, including SQL, graph, key-value, full-text search, vector search, and columnar analytics.
Three primary use cases are emerging from early adopters: hybrid failover, where managed Spanner in Google Cloud serves as primary, and Spanner Omni handles disaster recovery on-premises, a write-once-run-anywhere approach for ISVs and SaaS providers, and on-premises modernization for organizations with regulatory or data sovereignty requirements that prevent full cloud adoption.
Pricing for the commercial edition is not publicly listed yet, so organizations interested in production use will need to engage Google directly at cloud.google.com/consulting/spanner-omni to discuss terms.

TPU 8t and TPU 8i technical deep dive

Google’s eighth-generation TPUs split into two specialized chips: TPU 8t for large-scale pre-training and TPU 8i for inference and reasoning workloads. This specialization reflects a recognition that training and serving have distinct hardware bottlenecks that a single chip design cannot optimally address.
TPU 8t introduces native FP4 support, SparseCore for embedding lookups, and a new Virgo Network fabric that can link over 134,000 chips with 47 petabits per second of non-blocking bandwidth. Combined with TPUDirect Storage and Managed Lustre 10T, Google claims 10x faster storage access compared to seventh-generation Ironwood TPUs.
TPU 8i uses a new Boardfly network topology inspired by Dragonfly principles, reducing chip-to-chip communication from 16 hops to 7 hops in a 1,024-chip pod. This 56% reduction in network diameter directly benefits Mixture-of-Experts and reasoning models that require frequent all-to-all communication patterns.
On performance-per-dollar, Google claims TPU 8t delivers 2.7x improvement over Ironwood for training, while TPU 8i delivers 80% improvement for low-latency inference on large MoE models. Both chips also deliver up to 2x better performance-per-watt, which matters for customers managing energy costs at scale.
The software stack supports JAX, native PyTorch (currently in preview), Keras, and vLLM, with XLA handling hardware-specific translation transparently. Customers interested in access can submit an interest form at cloud.google.com/resources/tpu-interest, though pricing details have not been publicly disclosed.

Introducing Spend Caps AI Cost Visibility Next ’26

Google Cloud announced Spend Caps in private preview, allowing FinOps and DevOps managers to set hard budget limits at the project level for services including AI Studio, Gemini Agent Platform, Cloud Run, Cloud Run Functions, and Maps. Unlike traditional budget alerts, Spend Caps automatically pause API traffic when a budget threshold is reached while leaving underlying resources intact, addressing the risk of runaway AI training jobs or unoptimized models draining budgets quickly.
A new FinOps Explainability Agent, built on Gemini and accessible through Google Cloud Billing, autonomously analyzes AI cost drivers and answers natural language queries such as breaking down spend by API key or comparing input versus output token costs across specific Gemini models. This addresses the challenge of AI costs blending into general infrastructure spend, making ROI attribution more straightforward.
Google reported that since launching Gemini Cloud Assist for FinOps, cost reporting adoption increased 75% and time spent on cost analysis decreased 18%, providing some baseline context for the value customers are seeing from AI-assisted billing tools.
Two additional private previews were announced alongside Spend Caps: enhanced billing account hierarchies that aggregate spend across multiple billing accounts, including Other Eligible Services, and contract commitment reporting that shows burndown progress within Enterprise Agreements. Both features target larger organizations managing complex commercial arrangements with Google Cloud.
Spend Caps are currently in private preview with a signup form available, and no specific pricing details were provided for the new FinOps tooling beyond its availability in the Google Cloud Billing console.

Next ‘26: Redefining security for the AI era with Google Cloud and Wiz

Google Cloud announced three new security agents in Google Security Operations at Next 26: a Threat Hunting agent, a Detection Engineering agent, and a Third-Party Context agent, all in preview. The existing Triage and Investigation agent has already processed over 5 million alerts, reducing the typical 30-minute manual analysis to 60 seconds.
Wiz, now fully part of Google Cloud, is expanding its AI-Application Protection Platform to cover new agent studios, including AWS Agentcore, Microsoft Azure Copilot Studio, and Salesforce Agentforce, plus Databricks. New capabilities include inline AI security hooks for IDEs, agent-based remediation via Wiz Skills, and an AI Bill of Materials to inventory shadow AI tools across an environment.
Google Cloud is introducing Agent Identity and Agent Gateway as part of the Gemini Enterprise Agent Platform, giving AI agents unique identities with scoped permissions and enforcing policy on all agent-to-agent and agent-to-tool traffic. Model Armor now integrates with Agent Gateway, LangChain, and Firebase to provide runtime protection against prompt injection and data leakage without code changes.
On the data security side, Confidential Computing support is coming to G4 VMs with NVIDIA RTX PRO 6000 Blackwell GPUs and C4 VMs with Intel TDX, both in preview. KMS is also adding quantum-safe key imports in preview, addressing organizations starting to plan for post-quantum cryptography requirements.
ReCAPTCHA is being rebranded and expanded into Google Cloud Fraud Defense, now generally available, with agent-specific capabilities for distinguishing bots, humans, and AI agents coming in preview. Chrome Enterprise is adding shadow AI reporting and AI-aware extension threat detection to help organizations manage unsanctioned AI tool usage at the browser level.

Looker updates for agentic BI at Next ‘26

Google announced Looker BI Agents at Cloud Next, introducing Dashboard Agents and Agentic Workflows that go beyond static answers to trigger downstream business actions, all grounded in the Looker semantic layer and existing enterprise governance frameworks.
Several features moved to GA, including Embedded Conversational Analytics, Visualization Assistant, Self-service Explores with CSV and Excel blending, and CI/CD pipeline support, giving teams more production-ready options without waiting on preview limitations.
The new MCP integration adds a managed MCP server native to Looker, and a VS Code extension introduces a LookML AI Agent that translates natural language descriptions into production-ready LookML code, reducing the technical barrier for model authoring.
Knowledge Catalog integration in preview allows Looker to transform metadata into a semantic graph, which is positioned as a way to reduce AI hallucinations by giving agents the context needed to complete tasks autonomously.
Pricing details were not disclosed in the announcement, so teams evaluating these features should check cloud.google.com/looker directly, particularly for the preview features, which may have different availability or cost structures once they reach GA.

Next ‘26: Announcing new partner-supported workflows for Google Security Operations

Google Security Operations is expanding its partner ecosystem with 13 new integrations announced at Next ’26, bringing the total vendor count to over 300. The new partners span data ingestion, automated response, and bi-directional API workflows, covering gaps in areas like SAP logs, VMware ESXi threats, and application-layer attacks.
Three distinct integration patterns are supported: data feed integrations that pre-map telemetry to Google’s Unified Data Model schema, response integrations that automate alert triage and case management, and bi-directional API workflows that let partner platforms pull Chronicle detections without requiring analysts to switch consoles.
Notable technical additions include Synqly Mesh offering bi-directional normalization between UDM and the Open Cybersecurity Schema Framework (OCSF), and Contrast Security streaming verified runtime attack telemetry to surface confirmed application exploits as cases correlated with WAF and EDR signals.
AI-assisted triage shows up across multiple integrations, with Torq applying agentic AI to filter detections and autonomously execute response actions like endpoint isolation, and Prophet Security using natural language threat hunting with bidirectional sync back to Google Security Operations.
Vendors interested in joining the ecosystem can download the Google Security Operations Build Partner Guide and request a development environment through the Google Cloud Security Tech Partners team. Pricing for individual integrations is not specified in the announcement and would vary by partner.

The new Gemini Enterprise: one platform for agent development

Google rebranded and expanded Vertex AI into Gemini Enterprise Agent Platform, consolidating model access, agent development, governance, and deployment tooling into a single system aimed at enterprise-scale agent management.
The platform introduces Agent Identity, which assigns each agent a unique cryptographic ID for auditability, alongside Agent Gateway for securing agent-to-agent communications and Model Armor for protection against prompt injection and data leakage.
A new Memory Bank and Memory Profiles feature gives agents persistent long-term context across sessions, allowing them to retain user preferences and historical interactions rather than starting fresh each time.
The Gemini Enterprise app adds a no-code Agent Designer for non-technical users, a centralized Inbox for monitoring long-running agents, and a Projects workspace that preserves team context as a persistent company asset rather than individual chat history.
The partner ecosystem integration brings agents from Adobe, Salesforce, ServiceNow, Workday, and others directly into the in-app Agent Gallery, with Google Cloud validation for security and interoperability before deployment. Pricing details were not disclosed in the announcement, so listeners should check cloud.google.com/ai for current pricing information.

What’s new in Gemini Enterprise

Google is expanding Gemini Enterprise with long-running agents that can autonomously execute multi-step workflows for hours or days, handling tasks like financial reconciliation or sales prospecting without constant human supervision. This is managed through a new Inbox command center that categorizes agent activity into actionable groups.
The Enhanced Agent Designer lets non-technical users build agents using natural language or a visual interface, with reusable Skills that codify specific workflows and human-in-the-loop checkpoints for review and approval at critical steps.
Governance is built into the platform at no additional cost through three key controls: Agent Identity for unique digital IDs and least-privilege access, Agent Registry for IT-managed agent catalogs, and Agent Gateway for centralized network policies and protection against risks like prompt injection.
Projects and Canvas introduce team-level collaboration by creating shared workspaces where humans and agents co-create together, with cross-platform support spanning Google Workspace, Microsoft 365, and OneDrive, plus the ability to export directly to Microsoft Office formats.
The new Agent Marketplace integrates into the existing Agent Gallery, allowing organizations to browse and deploy third-party agents from partners like Accenture, Oracle, and ServiceNow, while BYO-MCP support lets admins connect custom or third-party business tools without writing code. New features will roll out over the coming months, and pricing details are available at cloud.google.com/gemini-enterprise.

Introducing Google Cloud Fraud Defense, the next evolution of reCAPTCHA

Google Cloud Fraud Defense is the rebranded and expanded version of reCAPTCHA, now positioned as a broader trust platform that handles not just bot detection but also AI agent verification and multi-stage fraud across entire user journeys. Existing reCAPTCHA customers are automatically migrated with no action required and no pricing changes.
The platform introduces an agentic policy engine that lets businesses allow or block traffic based on risk scores, automation types, and agent identity, addressing the growing reality that AI agents are being used to complete end-to-end transactions on behalf of users.
A notable new mitigation tool is a QR code-based challenge designed to require human presence when suspicious agent activity is detected, replacing traditional CAPTCHA puzzles with a method intended to make automated fraud economically impractical rather than just technically difficult.
Google cites a 51% average reduction in account takeover for customers using the unified trust model, and the platform currently protects over 14 million domains globally, including 50% of Fortune 100 companies, giving it broad signal coverage that individual site data cannot replicate.
The platform integrates with emerging standards like Web Bot Auth and SPIFFE for agent identity verification, which is worth watching for teams building or securing agentic workflows since standardized agent identity is still an evolving area across the industry.

What’s new for Cloud Run at Next ‘26

Cloud Run is adding support for NVIDIA RTX PRO 6000 Blackwell GPUs, now generally available, allowing teams to serve models with 70 billion or more parameters without managing underlying infrastructure, including automatic scale-to-zero when idle to avoid unnecessary GPU costs.
Google AI Studio now supports full-stack app deployment directly to Cloud Run with a single click, combining server-side code, Firestore, and user authentication in a generally available workflow aimed at lowering the barrier for new developers.
A new Cloud Run MCP server is now generally available, giving developers and AI agents a standardized way to deploy and manage applications programmatically, which fits into the broader push toward agentic workflows.
Cloud Run is introducing individual instances as a primitive resource, separate from services or jobs, allowing teams to run long-running background agents more directly, though this feature is currently in preview with select customers only.
Billing caps are coming soon, letting teams set a monthly spend ceiling after which Cloud Run resources are deactivated, which addresses a common concern for teams running unpredictable or experimental workloads on pay-per-use infrastructure.

What’s new in GKE at Next 26

GKE Agent Sandbox launches as a new isolated execution environment for running untrusted AI agent code, using gVisor kernel isolation to support 300 sandboxes per second at sub-second latency, with up to 30% better price-performance on Axion processors compared to other cloud providers.
GKE hypercluster enters private GA, enabling a single Kubernetes-conformant control plane to manage up to one million chips across 256,000 nodes spanning multiple Google Cloud regions, reducing the operational burden of managing hundreds of disconnected clusters for large AI training workloads.
Inference performance improvements include ML-driven Predictive Latency Boost in GKE Inference Gateway, reducing time-to-first-token latency by up to 70%, plus automatic KV Cache storage tiering that delivered over 40% TTFT reduction when offloading to RAM and nearly 70% throughput improvement when offloading to Local SSD for long-context workloads.
New reinforcement learning capabilities in preview include an RL Scheduler to address straggler effects, an RL Sandbox for millisecond-scale kernel-level isolation during reward evaluation, and out-of-the-box observability dashboards, targeting the GPU and TPU idle time that occurs between RL pipeline steps.
Intent-based autoscaling adds native custom metrics support to the Horizontal Pod Autoscaler, reducing autoscaling reaction time from 25 seconds to 5 seconds while eliminating dependencies on external monitoring stacks that could cause autoscaling failures if they go down.

AI infrastructure at Next ‘26

Google announced eighth-generation TPUs at Cloud Next, split into two specialized chips: TPU 8t for training (delivering nearly 3x higher compute performance than prior generation, with 121 exaflops in a single superpod) and TPU 8i for inference (offering 80% better performance per dollar with 5x lower on-chip latency). This is the first time Google has offered distinct TPU chips optimized for different workload types rather than a single general-purpose design.
The Virgo Network is a new data center fabric with 4x the bandwidth of previous generations, capable of connecting 134,000 TPUs in a single data center or over one million TPUs across multiple sites into a unified training cluster. Google is also making it available for NVIDIA-based A5X instances, supporting up to 960,000 GPUs across multiple sites.
Storage improvements include Google Cloud Managed Lustre now delivering 10 TB/s of bandwidth (10x improvement over last year) with 80 petabytes of capacity, plus a new Rapid Buckets feature on Cloud Storage offering sub-millisecond latency and 20 million operations per second to keep accelerator utilization at 95% or higher during training checkpoints.
GKE received notable orchestration updates targeting agentic workloads, including node startup times 4x faster, pod startup reduced by up to 80%, and an updated Inference Gateway using ML-driven routing that cuts time-to-first-token latency by more than 70% without manual tuning.
Native PyTorch support for TPUs (called TorchTPU) is now in preview, joining existing JAX and vLLM support, which reduces friction for teams who want to run existing PyTorch models on TPU hardware without significant code changes. Pricing for these new offerings has not yet been publicly detailed, with availability described as coming soon.

Azure

1:30:36 Optimize object storage costs automatically with smart tier—now generally available

Azure Smart Tier for Blob and Data Lake Storage is now generally available, automatically moving objects between hot, cool, and cold tiers based on actual access patterns.
Data inactive for 30 days shifts to cool, then cold after another 60 days, and immediately returns to hot upon re-access with no retrieval or early deletion charges.
The feature eliminates the need to manually configure and maintain lifecycle rules, which is particularly useful for organizations managing large analytics workloads, telemetry data, or data lakes with unpredictable access patterns.
During preview, over 50% of smart-tier-managed capacity automatically shifted to cooler tiers.
Pricing includes standard hot, cool, and cold capacity rates with no tier transition fees, but a per-object monthly monitoring fee applies to objects managed by the smart tier.
Objects smaller than 128 KiB stay in hot tier permanently and do not incur the monitoring fee, so workloads with many small files should factor that into cost planning.
Setup requires a storage account with zonal redundancy and is available via the Azure portal or API, either at account creation or by switching an existing account’s default tier to smart. Legacy account types like GPv1 and page or append blobs are not supported.
Smart tier is available now in nearly all zonal public cloud regions, with broader regional coverage and updated Storage SDK support planned in upcoming releases. More details and pricing are at azure.microsoft.com/en-us/pricing/details/storage/blobs.

1:31:21 📢 Justin – “Thanks, you finally got what Amazon’s had for a while.”

1:38:37 What’s new in Microsoft Entra – March 2026

Microsoft Entra ID is adding synced passkeys, passkey profiles, and phish-resistant MFA support for Linux SSO, giving organizations more options to move away from passwords while meeting compliance requirements for stronger authentication.
Starting June 1, 2026, Entra Connect Sync and Cloud Sync will block hard-match operations for users with assigned Entra roles, closing a potential attack path where on-premises AD attribute manipulation could be used to take over privileged cloud accounts.
Admins should review their hybrid sync configurations before that date.
The Microsoft Authenticator app now includes jailbreak and root detection for Android, with a phased rollout moving from warning to blocking to wipe mode, meaning users on non-compliant devices will eventually lose access to Entra credentials entirely.
Agent management is consolidating under Agent 365 as the single control plane, with the existing Entra admin center Agent registry and collections blades retiring May 1, 2026, and the current registry Graph API being deprecated and replaced, requiring re-registration of agents using the old API.
Entra ID Governance added several notable features this quarter, including SCIM 2.0 API support, delegated workflow management in Lifecycle Workflows, and a new billing meter for guest users, which organizations relying on governance features for external identities should review for potential cost impact.
Why June 1st? Turn this on today!

1:34:17 New in Azure SRE Agent: Log Analytics and Application Insights Connectors

Azure SRE Agent now supports Log Analytics and Application Insights as native connectors, allowing the agent to run KQL queries directly against workspaces and App Insights resources during incident investigations, replacing the previous approach of shelling out to Azure CLI commands. (REALLY? Bombastic side eye.)
Setup is simplified compared to the manual RBAC approach: selecting a resource from the dropdown automatically grants the agent’s managed identity Log Analytics Reader and Monitoring Reader on the target resource group, with a manual entry fallback if resource discovery fails.
The feature is backed by the Azure MCP Server using the monitor namespace, giving the agent read-only tools like monitor_workspace_log_query and monitor_table_list, with no ability to modify alerts, retention settings, or workspace configuration.
Practical use cases include AKS cluster investigations where the agent can automatically query ContainerLog, KubeEvents, and application traces across multiple connected workspaces to surface errors and failure patterns without manual intervention.
The connectors are currently behind an early access flag under Settings > Basics, though Azure SRE Agent itself is generally available.
Pricing is not detailed in the announcement, so listeners should check sre.azure.com/docs for current cost information.

1:35:14 📢 Justin – “So they REALLY want you to burn tokens.”

1:35:41 Azure Key Vault HSM Platform One Retirement: What Purview BYOK Customers Need to Know

Azure Key Vault is retiring its legacy HSM Platform One on September 15, 2028, and customers using Microsoft Purview Information Protection with Bring Your Own Key (BYOK) will need to migrate their tenant root keys to the modern FIPS 140-2 Level 3 certified HSM platform before that date or risk losing encryption and decryption capabilities.
The migration is not straightforward because Azure Key Vault does not support exporting keys once imported, meaning customers must re-import their original on-premises key material into a new vault, which can be a lengthy process if that original key material is no longer readily accessible.
Microsoft is recommending customers start planning now, despite the 2028 deadline, particularly because coordinating across security, compliance, and HSM teams to recover or regenerate lost key material can take considerable time.
The practical steps involve confirming whether your tenant key sits on the legacy HSM platform, creating a new Key Vault on the modern platform, and updating your Purview configuration to reference the new vault, with Microsoft support available for customers who no longer have access to the original key material.
This announcement is most relevant to enterprise customers in regulated industries who have adopted BYOK for compliance reasons, and they should review the updated guidance at the Microsoft Learn documentation for tenant root key management to understand prerequisites and supported migration paths.

1:36:19 📢 Matt – “The thing is, Microsoft does give you a decent amount of time to do stuff, but what’s always fun is if you buy a three-year reservation you’re stuck with it, and you have to deal with returning it right now, because otherwise you’d have negative time…”

After Show

1:38:11 Allbirds shares soar 580% after pivot from shoes to AI

Allbirds announced a $50 million deal to rebrand as NewBird AI, shifting its business model from footwear to GPU compute infrastructure and on-demand cloud services built for AI workloads.
The company’s stated rationale is a supply gap in AI compute capacity, with plans to purchase GPUs and offer them as on-demand cloud resources to businesses that cannot access sufficient computing power through existing providers.
Analysts are skeptical, with one branding consultant describing the move as using the company’s existing stock market shell for an unrelated business rather than a genuine operational pivot.
The 580% share surge on a press release, despite no demonstrated product or AI-related revenue, has led retail analysts to categorize this as a meme stock situation driven by AI sentiment rather than fundamentals.
For cloud podcast listeners, this story is a useful data point on how GPU scarcity narratives are influencing capital markets, and raises questions about the credibility of new entrants claiming to address AI compute shortages without established infrastructure or track records.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

352: Google Next: Rebrandapalooza