344: Amazon’s Coding Bot Bites The Hand That Runs It

Welcome to episode 344 of The Cloud Pod, where the forecast is always cloudy! Justin is out of the office at a World of Warcraft Tournament (not really), and Ryan is pursuing his lifelong dream of becoming a roadie for The Eagles (maybe?), so it’s Jonathan and Matt holding down the fort this week, and they’ve got a ton of cloud news for you! From security to AI assistants, we’ve got all the news you need. Let’s get started!

Titles we almost went with this week

🚍 Zero Bus, All Gas, No Kafka Brakes
🫦 AI Coding Bot Bites the Hand That Runs It
🤖 When Your Robot Developer Goes Rogue on AWS
🫛 Kubernetes VPA Finally Stops Evicting Your Database Pods
📄 Google Trains 100 Million People, Still No One Reads the Docs
🍷 MCP Walks Into a Bar Not Enterprise Ready Yet
⚖️ No More Pod Evictions Kubernetes 1.35 Scales In Place
🔑 No Keys No Drama Just IAM and Cloud SQL
💍One Agent to Rule Them All in Kubernetes
✍️ IAM Tired of Writing Policies Manually
🧑‍💻 When Your AI Coding Tool Has Delete Permissions
⌨️ One Dashboard to Rule All Your GPU Clusters
🧑‍🌾 Serverless Reservations Prove Nothing Is Truly Free Range
🛞 Kiro Takes the Wheel on AWS IAM Policies
👷 Stop Blaming Backups for Your Bad Architecture
🦹 AI Agent Goes Rogue, Takes AWS Down With It
💦 Everything is Bigger in Texas Except the Water Usage
🏀OpenAI launches the college basketball of Inference. Pro service – low cost

General News

1:05 Code Mode: give agents an entire API in 1,000 tokens

Cloudflare‘s Code Mode MCP server reduces token consumption by 99.9% compared to a traditional MCP implementation, exposing the entire Cloudflare API (over 2,500 endpoints) through just two tools, search() and execute(), using roughly 1,000 tokens versus 1.17 million for a conventional approach.
The architecture works by having the AI agent write JavaScript code against a typed OpenAPI spec representation, rather than loading tool definitions into context, with code executing inside a sandboxed V8 isolate (Dynamic Worker) that restricts file system access, environment variables, and external fetches by default.
This approach addresses a fundamental constraint in agentic AI systems: adding more tools to give agents broader capabilities directly competes with the available context space for the task at hand.

01:41 📢 Jonathan- “It’s good. I’m not sure I could imagine 2 ½ thousand MCP tool definitions in a context window and still actually use it for anything.”

AI Is Going Great – Or How ML Makes Money

03:58 OpenClaw creator Peter Steinberger joins OpenAI

Peter Steinberger, creator of viral AI assistant OpenClaw (formerly Clawdbot/Moltbot), has joined OpenAI to lead development of next-generation personal agents.
OpenClaw gained attention for its ability to perform real-world tasks like calendar management, flight booking, and autonomous social network participation.
OpenAI will maintain OpenClaw as an open source project through a foundation structure, allowing the community to continue development while Steinberger focuses on building similar capabilities into OpenAI’s product suite.
This acquisition-to-open-source model differs from typical tech company acquisitions, where projects are absorbed or shut down.
The move signals OpenAI’s strategic focus on agentic AI systems that can execute multi-step tasks autonomously rather than just responding to prompts. Steinberger’s experience building practical automation workflows could accelerate OpenAI’s development of agent capabilities that compete with offerings from Anthropic, Google, and Microsoft.
For developers, this represents a shift in how personal AI assistants may be deployed, moving from standalone applications to integrated agent frameworks within larger platforms.
The open source continuation of OpenClaw provides a reference implementation for building task-oriented AI systems.

04:19📢 Matt – “This is kind of where I see Anthriopic Cowork slowly going to, being your personal assistant, and having this be your ability to manage your real-world tasks. It’s great, and if they can build that into OpenAI, then it becomes a lot more of a personal assistant than just a general tool that you’re using.”

09:11 Making frontier cybersecurity capabilities available to defenders

Anthropic launched Claude Code Security in a limited research preview for Enterprise and Team customers, with free expedited access for open-source maintainers.
Unlike traditional static analysis tools that match known vulnerability patterns, it reasons through code contextually, the way a human security researcher would, catching logic flaws and access control issues that rule-based tools miss.
The tool uses a multi-stage verification process where Claude re-examines its own findings to filter false positives, assigns severity ratings, and provides confidence scores.
Critically, no patches are applied without human approval, keeping developers in the decision loop.
For cloud and enterprise teams, this integrates directly into Claude Code on the web, meaning security review happens within existing developer workflows rather than requiring separate tooling. The dashboard surfaces validated findings alongside suggested patches for team review.
Want to request access? You can do that here.

09:35 Preview, review, and merge with Claude Code

Claude Code on desktop now closes the full development loop by adding live app preview, inline code review, and GitHub PR monitoring in a single interface, reducing the need to switch between tools during development.
The new auto-fix and auto-merge features allow Claude to monitor PRs in the background, automatically attempt to fix CI failures, and merge PRs once all checks pass, letting developers move on to new tasks without manually tracking PR status.
The inline code review feature via the Review Code button lets Claude examine local diffs and leave comments directly in the desktop diff view before any code leaves the machine, functioning as an automated pre-push review step.
Session portability is now built in, allowing developers to start a session in the CLI using /desktop to bring context into the desktop app, or push local sessions to the web or Claude mobile app using the Continue with Claude Code on the web button.
These updates are available now to all users and represent a shift toward agentic, background-running development workflows where the AI continues working on tasks like CI remediation while the developer focuses elsewhere.

11:20📢 Jonathan – “It’s a very human way of going back and self-reflecting on the work that you’ve just done.”

18:08 Announcing General Availability of Zerobus Ingest, part of Lakeflow Connect

Databricks has announced General Availability of Zerobus Ingest, part of Lakeflow Connect, a serverless streaming service that pushes data directly into Delta tables without intermediate message buses like Kafka.
It supports thousands of concurrent connections and achieves over 10GB per second of aggregate throughput with data landing in under 5 seconds.
The core architectural difference is a single-sink design versus Kafka’s multi-sink approach, reducing a traditional five-system streaming stack down to two components.
This eliminates dedicated compute and storage for the message bus itself, along with the engineering overhead to manage it, at a fraction of the cost per gigabyte compared to self-managed Kafka.
Developers can integrate via gRPC, REST APIs, or language-specific SDKs, and every write is automatically governed through Unity Catalog for lineage tracking and access control.
This means streaming data gets the same governance treatment as the rest of the lakehouse from the moment it arrives.
Real-world deployments include Toyota using it to detect factory overheating conditions in minutes rather than hours, and Joby Aviation reducing aircraft telemetry resolution latency from days to minutes.
Both cases highlight manufacturing and IoT as strong use cases where low-latency ingestion has a direct operational impact.
Zerobus Ingest is now GA on AWS and Azure, with Google Cloud support coming soon, priced under the Lakeflow Jobs Serverless SKU with a 6-month promotional pricing period currently active.

20:05📢 Jonathan – “I’m not a fan of Kafka in general, but I am a fan of doing things at massive scale, so it’s kind of cool.”

07:27 OpenAI prepares new ChatGPT Pro Lite tier at $100 monthly

OpenAI appears to be preparing a ChatGPT Pro Lite tier at $100 per month, slotting between the existing Plus plan at $20 and the full Pro plan at $200, based on findings from engineer Tibor Blaho, who has a consistent track record of uncovering unreleased features.
The new tier would address a notable pricing gap for users who regularly hit Plus rate limits but cannot justify the full Pro cost, with freelancers, researchers, and developers as the likely target audience.
The plan may be structured around compute-heavy use cases, including Codex and persistent agentic workloads, where background-running agents carry substantially higher infrastructure costs than standard chat interactions.
OpenAI recently hired Peter Steinberger, creator of the open-source agent framework OpenClaw, and has signaled a multi-agent direction for ChatGPT, suggesting the Pro Lite tier could serve as an entry point for always-on agentic capabilities rather than just increased chat limits.
No release date or confirmed feature set exists yet, but the addition of a mid-tier option would create competitive pressure on Google, which currently lacks an equivalent individual plan at this price point.

21:56 📢 Matt – “I just think they needed a different naming convention.”

Cloud Tools

23:11 HCP Packer adds SBOM vulnerability scanning

HCP Packer now includes SBOM vulnerability scanning in public beta, allowing platform teams to scan software bills of materials against MITRE’s CVE database and classify findings by severity directly within the artifact registry.
The feature builds on last year’s SBOM storage capabilities, which are now generally available, meaning teams can generate, store, and now actively scan SBOMs for known vulnerabilities in a single workflow.
This addresses a supply chain security gap by surfacing vulnerability data at the image level, covering AMIs, Docker containers, and virtual machines before they reach production environments.
Teams can see which specific package versions are affected and when vulnerabilities were detected, giving them the information needed to prioritize remediation without leaving the HCP Packer interface.
The feature is available in public beta at no cost through the free HCP Packer tier, making it accessible for teams looking to add CVE scanning to their image management process without additional tooling.

24:15📢 Jonathan – “It’s only as current as the time you built it though…”

25:43 Why Kubernetes 1.35 is a game-changer for stateful workload scaling

Kubernetes 1.35 brings two notable autoscaling milestones: In-Place Pod Resize graduating to GA and Vertical Pod Autoscaler’s InPlaceOrRecreate update mode reaching beta, allowing VPA to adjust CPU and memory on running pods without evicting them.
The practical benefit for stateful workloads is substantial.
Previously, VPA had to evict and recreate pods to apply new resource requests, which caused disruption for databases, caches, and other restart-sensitive applications. In-place resizing preserves the pod UID, container ID, and restart count throughout the adjustment.
VPA operates in three stages worth understanding: a recommendation-only mode for passive observation, an InPlaceOrRecreate mode that attempts live resizing first and falls back to eviction only when node resources are insufficient, and configurable policies using minAllowed and maxAllowed to bound what VPA can actually set.
VPA controllers are not bundled with Kubernetes itself.
Engineers need to clone the kubernetes/autoscaler repository and run the vpa-up.sh script to deploy the Recommender, Updater, and Admission Controller components alongside the mutating

26:09📢 Jonathan – “I think the practical benefit for stable workloads are fairly substantial, if you’re one of those crazy people who like to run databases or SQL server on Kubernetes (like Cody) because previously those pods would be evicted and new resources requested, which would obviously cause disruption, stale caches, and other issues.”

AWS

31:20 Amazon service was taken down by AI coding bot

Listener note: paywall article
Amazon’s Kiro AI coding tool caused a 13-hour outage of an AWS cost exploration service in December after engineers granted it broad permissions, and it autonomously decided to delete and recreate the environment rather than patch it.
A second outage involved Amazon Q Developer, though Amazon says neither event impacted core customer-facing AWS services.
Amazon’s official position is that both incidents were user error stemming from improper access controls, not failures of the AI tools themselves.
Kiro is designed to request authorization before acting, but the engineer involved had been granted broader permissions than intended, bypassing that safeguard.
The incidents highlight a practical risk with agentic AI tools in production environments: when an AI agent is given the same permissions as a human operator without requiring peer review, it can take destructive autonomous actions that a second set of eyes might have caught. AWS has since added mandatory peer review and staff training as corrective measures.
AWS is pushing for 80 percent of its developers to use AI coding tools at least once weekly, which means these tools are being adopted at scale internally before the risk patterns are fully understood.
Listeners running their own AI agents in production should treat permission scoping and human-in-the-loop approval gates as non-optional controls, not optional defaults.
Kiro launched in July 2025 and is positioned as a specification-driven coding assistant meant to go beyond simple vibe coding.
The December incident was limited to mainland China, and the second incident had no customer-facing impact, but the pattern of two production disruptions in a few months is worth tracking as agentic tools become more common in enterprise workflows.

33:24 📢 Matt – “…if you’re letting the AI tool start to do things inside of production environments, that’s where you need to watch it, and you need to probably have it be a little bit more specific, so the human needs to kind of be watching what’s going on and peer reviewing it.”

35:49 Amazon pushes back on Financial Times report blaming AI coding tools for AWS outages

Amazon issued a public rebuttal to a Financial Times report claiming its Kiro AI coding tool caused multiple AWS outages, acknowledging one limited incident in December but attributing it to a misconfigured access control role rather than a flaw in the AI tool itself.
The confirmed disruption affected only AWS Cost Explorer in a single China region for roughly 13 hours, with no customer inquiries received, and did not touch core services like compute, storage, or databases.
Amazon’s core defense is that the issue was user error, not AI error, noting that a misconfigured role could result from any developer tool or manual action, AI-powered or not.
In response to the incident, AWS has added safeguards, including mandatory peer review for production access, which is a practical governance consideration for any organization deploying agentic AI tools in production environments.
The broader takeaway for AWS customers is that agentic AI tools capable of autonomous actions, like deleting and recreating environments, require clear human oversight policies and access control guardrails before being used in production systems.

37:00 AWS IAM Policy Autopilot is now available as a Kiro Power

AWS IAM Policy Autopilot, an open source static code analysis tool launched at re:Invent 2025, is now available as a Kiro Power, allowing developers to generate baseline IAM policies directly within the Kiro IDE without manual policy writing.
The integration uses a one-click installation model that removes the need for manual MCP server configuration, streamlining how developers access policy generation tools during AI-assisted development workflows.
Key use cases include rapid prototyping of AWS applications, baseline policy creation for new projects, and keeping developers in their coding environment rather than switching to the IAM console or documentation.
This fits into the broader trend of embedding security and permissions tooling earlier in the development cycle, helping teams start with least-privilege policies that can be refined over time rather than retrofitting permissions after the fact.
The tool is open source and available on GitHub at github.com/awslabs/iam-policy-autopilot, with no additional cost mentioned beyond standard Kiro and AWS service usage, making it accessible for teams already using the Kiro IDE.

38:18 📢 Jonathan – “I’m really on the fence about this. Because on one hand, I know the pain, especially with things like deployment policies…and just trying to figure out every permission that has to be added so that Terraform can just do a deployment – it becomes very complicated. At the same time, if you have a machine that looks at your code and says ‘this is the policy you need for it,’ I don’t think that’s any security at all unless there’s another check at the end.”

-Honorable Mentions-

41:52 Amazon Redshift Serverless introduces 3-year Serverless Reservations

Amazon Redshift Serverless now offers 3-year Serverless Reservations, providing up to 45% cost savings compared to standard on-demand RPU pricing while maintaining the serverless model’s flexibility.
The reservations are managed at the AWS payer account level and can be shared across multiple AWS accounts, making this useful for organizations running Redshift Serverless workloads across linked accounts.
-stop
Billing runs 24/7 on an hourly basis, metered per second, meaning you pay for reserved RPUs continuously, regardless of actual usage, so this option makes most sense for consistently active workloads rather than sporadic ones.
Any RPU consumption beyond the reserved amount falls back to standard on-demand rates, so customers need to size their reservations carefully to avoid negating the savings.
Reservations can be purchased through the Redshift console or via the create-reservation API and are available in all regions where Redshift Serverless is currently supported.
More information is available on the Amazon Redshift Management Guide, which you can find here.

42:03 Amazon Says It Will Spend $12 billion On Louisiana Data Centers

Amazon has announced a $12 billion investment in data center campuses in Louisiana, aimed at expanding infrastructure capacity for AI and cloud computing workloads.
A notable aspect of the deal is Amazon’s commitment to covering its own power costs directly, working with regional utility Southwestern Electric Power Company to avoid passing energy expenses onto local consumers.
Amazon is pairing the infrastructure investment with solar energy projects in Louisiana, which aligns with its broader sustainability commitments and addresses concerns about grid strain from large-scale data center operations.
This announcement reflects a broader industry trend where cloud providers are proactively addressing public and political concerns about data center energy consumption, following a similar commitment from Microsoft last month regarding higher electricity rate payments.
For AWS customers, this expansion signals continued investment in US-based infrastructure capacity, which could translate to improved regional availability and lower latency for workloads in the southern United States over time.

42:18 Announcing AWS Elemental Inference

AWS Elemental Inference is a fully managed AI service that automatically generates vertical video crops and highlight clips from live and on-demand broadcasts in parallel with encoding, targeting broadcasters who need to distribute content across TikTok, Instagram Reels, YouTube Shorts, and similar platforms without dedicated production staff.
The service uses an agentic AI approach with no prompts or human-in-the-loop intervention required, handling both vertical video cropping and metadata-based highlight detection automatically, which reduces the manual workflow overhead typically associated with multi-platform content distribution.
Beta testing with large media companies showed 34% or more cost savings on AI-powered live video workflows compared to using multiple point solutions, making this a notable consolidation option for media organizations already using AWS Elemental encoding services.
A practical sports broadcasting use case is highlighted where highlight clips can be identified and distributed to social platforms during live games rather than hours after the fact, addressing a real operational gap in live content workflows.
The service is available in four regions at launch: US East N. Virginia, US West Oregon, Asia Pacific Mumbai, and Europe Ireland.
Pricing details are not specified in the announcement, so listeners should check the AWS Elemental Inference documentation at docs.aws.amazon.com/elemental-inference for current pricing information.

GCP

57:25 Managed MCP servers for Google Cloud databases

Google Cloud expanded its managed MCP server support to cover AlloyDB, Spanner, Cloud SQL, Bigtable, and Firestore, allowing AI agents to interact with these databases through natural language without requiring infrastructure deployment or complex configuration.
The security model relies entirely on IAM for authentication rather than shared keys, and all agent actions are logged in Cloud Audit Logs, which addresses a practical concern for teams worried about giving AI agents access to production databases.
A new Developer Knowledge MCP server connects IDEs directly to Google’s official documentation, letting agents reference best practices in real time during tasks like database migrations or app development troubleshooting.
Because these servers follow the open MCP standard, they work with third-party clients like Anthropic’s Claude in addition to Gemini, which broadens the practical appeal beyond teams already committed to Google’s AI tooling.
Google has signaled plans to extend managed MCP support to Looker, Memorystore, Pub/Sub, Kafka, and migration services in the coming months, suggesting this is an ongoing buildout rather than a one-time release.
Pricing is not separately listed for MCP access and likely falls under existing database service costs.

44:12📢 Matt – “Anything that makes databases easier, I’m all for.”

45:12 Gemini 3.1 Pro: Announcing our latest Gemini AI model

Gemini 3.1 Pro is now available in preview for developers via Google AI Studio, Gemini CLI, Vertex AI, and Android Studio, with enterprise access through Vertex AI and Gemini Enterprise. Pricing details have not been publicly announced for the preview period.
The model scores 77.1% on the ARC-AGI-2 benchmark, which tests reasoning on novel logic patterns, representing more than double the score of the previous Gemini 3 Pro model.
This positions it as a stronger option for complex problem-solving tasks compared to its predecessor.
Practical use cases highlighted include generating animated SVGs from text prompts, building live data dashboards by connecting to public APIs, and prototyping interactive 3D interfaces with hand-tracking and generative audio. These examples suggest the model is particularly suited for developers working on data visualization and creative coding projects.
Consumer access is rolling out through the Gemini app and NotebookLM, but the 3.1 Pro tier is restricted to Google AI Pro and Ultra plan subscribers. This tiered access model means free-tier users will not have access during the preview phase.
Google notes the model is still in preview while they validate performance for agentic workflows before a general availability release. GCP customers evaluating it for production use should factor in that capabilities and pricing may shift before the full release.

46:23 📢 Matt – “It’s just amazing to me how fast these models are improving. This one is saying it scored a 77%, where models a year ago where 40 and 50%. Seeing how fast everything is moving is insane.”

47:36 Understanding the Firefly clock synchronization protocol

Google’s Firefly is a software-based clock synchronization protocol that achieves sub-10-nanosecond NIC-to-NIC synchronization across data center hardware, without requiring specialized or expensive dedicated timing equipment.
The protocol uses a distributed consensus algorithm built on random graphs rather than a traditional hierarchical time server model, which improves convergence speed, scalability, and resilience to network path asymmetries.
Firefly decouples internal synchronization from external UTC synchronization, meaning external time server jitter does not degrade the precision of clock alignment within the data center fabric itself.
Financial services workloads are a primary beneficiary, as regulatory requirements mandate sub-100 microsecond external UTC synchronization and sub-10 nanosecond internal synchronization, both of which Firefly meets on standard cloud infrastructure.
Beyond finance, the protocol has practical implications for distributed database consistency, ML workload coordination, and fine-grained network telemetry, potentially enabling workloads that previously required on-premises dedicated hardware to run on cloud infrastructure instead. No specific pricing details were provided in the announcement.

48:52📢 Jonathan – “The fact that you need to guarantee sub-hundred microsynchronization for financial systems is crazy.”

-Honorable Mentions-

50:32 America-India Connect infrastructure connects four continents

Google is investing $15 billion in AI infrastructure in India and launching America-India Connect, a multi-continent subsea cable initiative that establishes new fiber-optic routes connecting the United States, India, Singapore, South Africa, and Australia.
The project creates Visakhapatnam as a new international subsea gateway on India’s east coast, adding network diversity beyond existing Mumbai and Chennai landing points.
The infrastructure combines multiple subsea cable systems, including Equiano, Nuvem, Bosun, Tabua, TalayLink, and Honomoana, to create redundant high-capacity routes between American coasts and India through both African and Pacific paths.
This approach provides network resilience for over 1 billion people in India while improving connectivity across the Southern Hemisphere.
Google Cloud is serving as the primary cloud infrastructure provider for India’s iGOT Karmayogi platform, which delivers training to over 20 million public servants across 800+ districts.
The platform will use AI to digitize legacy training content and enable access in 18+ Indian languages, supporting the government’s Mission Karmayogi initiative for civil service modernization.
The announcement positions these subsea cables as critical infrastructure to prevent an AI divide, with documented evidence that subsea cable connectivity improves internet affordability and reliability while driving productivity and economic growth.
The initiative builds on Google’s existing infrastructure investments in Africa, Australia, and the Pacific region.
Added this one just for you, Justin.

52:20 Wilbarger County data center

Google is building a new data center in Wilbarger County, Texas, expanding its existing infrastructure footprint in the state.
This is primarily an infrastructure capacity announcement rather than a new GCP service or feature.
The facility will use air-cooling technology instead of traditional water cooling, limiting water consumption to only essential campus operations like kitchens. This is a notable operational choice given ongoing concerns about data center water usage in drought-prone regions.
Google has contracted to add more than 7,800 MW of net-new energy generation and capacity to the Texas electricity grid, with the Wilbarger facility co-located alongside new clean power developed in partnership with AES.
Google announced a $30 million Energy Impact Fund in November to support energy affordability, school weatherization, and energy workforce development across Texas. Details on the fund are available here.
For GCP customers, additional Texas-based infrastructure generally signals potential improvements in latency and redundancy for workloads serving the south-central US region, though Google has not announced specific new GCP regions or zones tied to this facility.

52:55 Use Lyria 3 to create music tracks in the Gemini app

Google DeepMind’s Lyria 3 model is now available in beta within the Gemini app, letting users generate 30-second music tracks with lyrics, custom cover art, and style controls from text prompts or uploaded photos and videos.
This is available to users 18 and older in 8 languages, with higher usage limits for Google AI Plus, Pro, and Ultra subscribers.
Lyria 3 improves on previous versions by auto-generating lyrics from prompts, offering more control over style, vocals, and tempo, and producing more musically complex outputs without requiring users to provide their own creative assets.
All generated tracks are embedded with SynthID, Google DeepMind’s imperceptible watermark, and the Gemini app now extends its AI content verification to audio files, allowing users to upload audio and check whether it was generated by Google AI.
The feature is also rolling out to YouTube creators via Dream Track for Shorts soundtracks, connecting Lyria 3 to a broader content creation workflow beyond the Gemini app itself.
On the responsible AI side, Google states Lyria 3 was trained with copyright and partner agreements in mind, artist-specific prompts are treated as stylistic inspiration rather than direct mimicry, and output filters check against existing content, though Google acknowledges this approach is not guaranteed to catch all issues.

Azure

57:25 A milestone achievement in our journey to carbon negative

Microsoft has achieved its 2025 goal of matching 100 percent of global electricity consumption with renewable energy, contracting 40 gigawatts of new renewable capacity across 26 countries since 2020.
This represents enough energy to power approximately 10 million US homes, with 19 GW currently online and the remainder coming online over the next five years.
The renewable energy procurement has reduced Microsoft’s reported Scope 2 carbon emissions by an estimated 25 million tons and mobilized billions in private investment through over 400 contracts with 95 utilities and developers. This directly impacts Azure datacenter operations globally, supporting the infrastructure that runs customer workloads while advancing toward the company’s 2030 carbon negative commitment.
Microsoft is expanding beyond renewable energy to include nuclear power and other carbon-free technologies, including a 50 MW fusion project with Helion in Washington state and restarting the 835 MW Crane Clean Energy Center in Pennsylvania with Constellation Energy. The Climate Innovation Fund has allocated $806 million to 67 investees, with 38 percent directed toward energy systems innovation.
The company is deploying AI-driven tools to accelerate clean energy deployment, including collaborations with Idaho National Laboratory for nuclear licensing and the Midcontinent Independent System Operator for grid optimization.
These tools aim to streamline the design, permitting, and deployment of new power technologies to expand grid capacity more efficiently.
Azure customers benefit indirectly through more sustainable cloud infrastructure, though Microsoft notes the shift to an all-of-the-above decarbonization strategy recognizes that rising electricity demand from datacenters, AI workloads, and digital services requires diverse carbon-free energy sources beyond renewables alone.

55:58 Generally Available: Quota and deployment troubleshooting tools for Azure Functions Flex Consumption

Azure Functions Flex Consumption now has generally available quota and deployment troubleshooting tools built directly into the platform, giving developers clearer visibility into quota limits and constraints without needing to dig through documentation or support tickets.
The quota troubleshooting experience surfaces Flex Consumption-specific limits in context, which is useful for teams hitting scaling walls and trying to understand why deployments are behaving unexpectedly.
This is a quality-of-life improvement aimed at developers and platform engineers who use Flex Consumption for its per-execution billing model and fast scaling, helping reduce time spent diagnosing deployment failures.
Pricing for Flex Consumption remains consumption-based, so there is no additional cost for these troubleshooting tools themselves. More details are available at the Azure updates page here.
Teams already invested in Azure Functions should note this reduces reliance on external monitoring or support escalations for common quota-related issues, keeping troubleshooting within the Azure portal workflow.

56:32 📢 Matt – “This is a great quality of life improvement because you can see why things are breaking when you’re using flexible consumption.”

-Honorable Mentions-

1:01:07 Public Preview Announcement: Empower Real-Time Security with Microsoft Sentinel’s CCF Push Feature | Microsoft Community Hub

Microsoft Sentinel’s CCF Push feature, now in public preview, allows security data providers to send logs directly to a Sentinel workspace without the traditional setup overhead of manually configuring Data Collection Endpoints, Data Collection Rules, Entra app registrations, and RBAC assignments. Pressing Deploy handles all resource provisioning automatically.
The feature is built on Sentinel’s Log Ingestion API, which supports high-throughput data ingestion, pre-ingestion data transformation, and direct targeting of system tables, making it more flexible than the older polling-based connector model.
For partners and ISVs building Sentinel integrations, CCF Push reduces time to market by consolidating connector deployment through the Content Hub as a single interface, rather than requiring customers to configure multiple Azure resources independently.
Early adopters include security vendors like Obsidian Security and Varonis, suggesting the feature is already being validated in real-world security workflows.
Developers can reference the MS Learn documentation here to get started.
No specific pricing details were provided in the announcement, but since CCF Push feeds data into Sentinel workspaces, standard Sentinel and Log Analytics ingestion costs would apply.
Organizations evaluating this feature should factor in their existing Sentinel pricing tier when estimating costs.

1:01:24 Microsoft Sovereign Cloud adds governance, productivity and support for large AI models securely running even when completely disconnected

Azure Local disconnected operations are now generally available, allowing organizations to run mission-critical infrastructure with full Azure governance and policy enforcement even when completely isolated from cloud connectivity. This targets government, defense, and regulated industries where external dependencies are either unacceptable or prohibited.
Microsoft 365 Local disconnected brings Exchange Server, SharePoint Server, and Skype for Business Server into fully air-gapped sovereign environments running on Azure Local, with Microsoft committing support for these workloads through at least 2035.
This keeps productivity tools available under the same governance boundary as infrastructure workloads.
Foundry Local now supports large multimodal AI models running on-premises hardware, including NVIDIA GPUs, within fully disconnected sovereign environments. This extends local AI inferencing capabilities beyond the smaller models Foundry Local previously supported, with Microsoft providing deployment, update, and operational health support.
The three components together form a full-stack sovereign private cloud covering infrastructure, productivity, and AI inferencing, all manageable through consistent Azure governance tooling regardless of connectivity state.
Pricing is not publicly listed and appears to vary based on deployment scale and customer qualification, so organizations should contact Microsoft directly for specifics.
Target customers include public sector agencies, classified environments, and regulated industries in regions where data residency and operational autonomy are legal or contractual requirements.
Azure Local is disconnected, and Microsoft 365 Local is available worldwide, while large model support on Foundry Local is currently limited to qualified customers.

Emerging Clouds

1:03:04 Introducing Command Center: The unified operations platform for AI workloads

Crusoe Command Center is a unified operations platform that consolidates GPU cluster monitoring, orchestration, and support into a single interface, addressing the common problem of engineers context-switching between fragmented dashboards during AI training runs.
The platform integrates with Crusoe Managed Kubernetes and supports Managed Slurm, allowing long-running multi-week training jobs to operate continuously across large GPU clusters without manual intervention.
AutoClusters is a key component that automatically detects GPU performance degradation, evicts compromised nodes, and replaces them with healthy instances from a reserve pool, reducing the need for around-the-clock manual oversight.
On the observability side, Command Center supports multiple access methods, including a UI, Grafana via PromQL API, and a Prometheus endpoint, while a Telemetry Relay feature streams infrastructure metrics directly to external tools to reduce data silos.
The Crusoe Watch Agent, paired with Telemetry Relay, extends visibility to custom application-level metrics, allowing teams to correlate workload performance with underlying GPU health data for more precise troubleshooting.

1:04:04 📢 Matt – “The whole stack here is what I kind of find nice. The smaller clouds are trying to attack that whole vertical a lot more, where they’re giving you that depth all the way down, so if you are training your own model, you get the CPU, you get the GPU, you can see that whole stack of what’s going on, and really start to fine-tune.”

1:05:09 Expanding our Agentic Inference Cloud: Introducing GPU Droplets Powered by AMD Instinct™ MI350X GPUs

DigitalOcean is adding AMD Instinct MI350X GPUs to its GPU Droplets lineup, built on the CDNA 4 architecture and optimized for inference workloads, including prefill phase compute, low-latency token generation, and larger context windows.
The platform has demonstrated measurable results with existing customers, including a 2x increase in production request throughput and 50% reduction in inference costs for Character.AI, giving potential adopters concrete performance benchmarks to evaluate.
DigitalOcean is positioning these offerings toward AI-native companies and developers who need enterprise features like HIPAA eligibility and SOC 2 compliance without the complexity of larger cloud providers, with provisioning available in a few clicks.
The GPUs are currently available in the Atlanta datacenter, with AMD Instinct MI355X GPUs planned for next quarter, which will introduce liquid-cooled rack infrastructure to support larger models and datasets.
For smaller businesses and developers, the predictable usage-based pricing and simplified deployment model represent a meaningful alternative to the more complex pricing and configuration requirements typical of hyperscaler GPU offerings.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

344: Amazon’s Coding Bot Bites the Hand That Runs It