348: Compliance Theater Now Available as a Subscriptions

April 1, 2026 01:10:59
348: Compliance Theater Now Available as a Subscriptions

348: Compliance Theater Now Available as a Subscriptions

April 1, 2026 01:10:59
0:00
0:00

Welcome to episode 348 of The Cloud Pod, where the weather is always cloudy! Justin, Ryan, and Matt are in the studio this week to bring you all the latest news in AI and Cloud, inclduing Strykers troubles, AWS’ birthday, Bedrock Agents, and Claude Code – plus so much more. Let’s get started! 

Titles we almost went with this week

  • 🧨 SOC 2 It to Me Delve Fires Back 
  • 🐚 Shell Yeah Bedrock Agents Just Got Command Line Powers
  • 🪭When Your SOC 2 Report Is Just Fan Fiction
  • 🚶 uv, Ruff, and ty Walk Into an OpenAI Acquisition
  • 🐟 Hash Field Expiration Is Here, and It’s No Redis Herring
  • 🪙 Stop Paying Full Price for Tokens You Already Bought
  • 📃 Fake It Till You Audit It
  • ⌛ Cache Me If You Can CNCF Sandbox Edition
  • 🛼 Microsoft Learns Consent Matters in Copilot Rollout
  • 🦨 Microsoft’s Stinky Cloud Gets Federal Seal of Approval
  • 🥊 When Your Audit Trail Leads to a Blog Fight
  • 🏓 Ping Your AI Agent on Discord Like a Millennial
  • 💸 Twenty Years of AWS and the Bill Never Stops
  • 📦The LLM hack that feels a lot like Node Shift Left Package issues
  • 👷 Claude Code Auto Mode Lets AI Work Unsupervised
  • 👶 Stop Babysitting Your AI Claude Code Goes Solo
  • 🔑 Auto Mode Gives Claude Code the Keys to the Car
  • ☕ Java comes to the coffee shop with AI

General News 

01:21 Customer Updates: Stryker Network Disruption 

  • Stryker confirmed a cyberattack on March 11, 2026, that disrupted their internal Microsoft corporate environment, affecting order processing, manufacturing, and shipping, but notably not their connected medical devices or cloud-hosted products.
  • The attack vector was specific to Stryker’s Microsoft environment, which meant products running on AWS (Vocera Edge, Vocera Ease) and Google Cloud Platform (care.ai) were architecturally isolated and unaffected, demonstrating a practical benefit of multi-cloud separation.
  • Stryker explicitly stated this was not ransomware or malware, and government agencies, including CISA, FBI, and the White House National Cyber Director, were engaged, with domain seizures linked to threat actors already executed.
  • The incident highlights how healthcare organizations can architect medical device and cloud product infrastructure to be independent of corporate IT environments, as every product from Mako to SurgiCount to LIFEPAK operated normally due to network segmentation.
  • Real-world patient impact was limited but present, with some personalized implant cases rescheduled due to shipping delays, underscoring that even contained corporate IT incidents can have downstream effects on physical supply chains.

02:30 📢 Justin – “HugOps to the entire Stryker team; I couldn’t imagine having to rebuild my entire Windows estate at a company the size of Stryker in the middle of trying to do business and everything else.” 

05:00 Federal cyber experts called Microsoft’s cloud a “pile of shit,” and approved it anyway

  • FedRAMP authorized Microsoft’s Government Community Cloud High despite internal reviewers finding insufficient security documentation, issuing an unusual “buyer beware” notice to agencies considering the product. 
  • This raises questions about the integrity of the federal cloud authorization process when commercial pressures intersect with security evaluations.
  • The GCC High offering is specifically designed to handle some of the US government’s most sensitive data, making the documentation gaps particularly consequential, given that Microsoft had already been linked to two significant federal breaches involving Russian and Chinese state actors.
  • The core technical concern was Microsoft’s inability to adequately document how data is protected as it moves between servers within their cloud infrastructure, leaving reviewers unable to assess the system’s overall security posture with confidence.
  • For cloud practitioners and federal agencies, this situation highlights the risk of relying on vendor-provided security documentation without independent verification, especially for high-sensitivity workloads where compliance approval does not necessarily equal verified security.
  • The outcome has broader implications for FedRAMP’s credibility as a security benchmark, since agencies selecting cloud providers often treat authorization as a meaningful security signal rather than a conditional or incomplete endorsement.

06:00 📢 Ryan – “If you can’t adequately explain how basic things like encryption and security controls are handled in your environment, that’s not good, right? Because while it’s not completely indicative of a security problem, it’s highly suspect.” 

06:51 Delve – Fake Compliance as a Service – Part I 

  • A detailed investigation alleges that Delve, a compliance automation platform, fabricates audit evidence, including board meeting records and test results, then uses Indian certification mills operating through US shell entities to rubber-stamp reports rather than conduct independent verification.
  • The core technical concern is that Delve reportedly generates identical audit reports across all clients, meaning the auditor independence required by AICPA and ISO standards is structurally violated since Delve itself is effectively acting as both platform and auditor.
  • Companies using Delve for HIPAA or GDPR compliance may face significant regulatory exposure, as the article claims the platform skips major framework requirements while telling clients they have achieved 100% compliance, potentially creating criminal liability under HIPAA and fines up to 4% of global revenue under GDPR.
  • The investigation highlights a broader issue in the compliance automation space where AI and automation claims may not reflect actual product capabilities, with the article describing Delve as essentially a template pack with a SaaS wrapper rather than a genuinely automated compliance tool.
  • For cloud-focused companies evaluating compliance platforms, this case underscores the importance of verifying auditor independence credentials, requesting evidence of actual testing procedures, and understanding whether a platform produces genuinely customized documentation or pre-populated templates adopted with minimal review.
  • Interested in reading the leaked spreadsheet? Find those here and the leaked documents here

08:47 📢 Ryan – “I’m not a big fan of checkbox security and having that around just for compliance purposes. But it’s also like, this is really a misrepresentation. You look at things and, and it’s certified by Delve; it’s not certified by these other companies. And if all that evidence, the specifics they listed in the report are crazy, just how, like, this is not cool. It’s just generated. It’s not even real in the slightest.”

11:37 Response to Misleading Claims 

  • Delve is a SOC 2 compliance automation platform serving over 1,700 customers, and this response addresses a Substack post making claims about the legitimacy of its audit processes. 
  • The core distinction Delve makes is that it automates evidence collection and provides templates, while independent licensed auditors retain sole authority to issue final reports.
  • The debate touches on a broader industry practice where compliance platforms provide standardized control sets based on AICPA and ISO frameworks, meaning structural overlap across reports is expected rather than evidence of fraud. 
  • This is worth discussing because buyers of compliance software often do not fully understand where the platform ends and the auditor begins.
  • Delve claims 120+ automated integrations, which is a notable gap from the 14 cited in the original criticism, and speaks to how quickly compliance tooling has evolved in the cloud ecosystem. 
  • For cloud-native companies pursuing SOC 2, the depth of integrations directly affects how much manual evidence collection is required.
  • The use of pre-filled templates for board minutes and policies is standard practice across compliance platforms, but it raises a legitimate question about whether customers treat these as starting points or simply submit them unchanged. 
  • This is a real risk area for organizations where compliance becomes a checkbox exercise rather than a genuine security posture.
  • The competitive compliance automation market, which includes players like Vanta and Drata, means disputes like this are likely to continue as vendors differentiate on auditor quality, automation depth, and pricing. 
  • Listeners evaluating compliance tools should independently verify auditor accreditation regardless of which platform they use.

13:08 📢 Ryan – “I would argue the use of pre-filled templates is common…prefilled and direct copied templates from between companies.” 

19:04 Supply Chain Attack in litellm 1.82.8 on PyPI

  • Litellm versions 1.82.7 and 1.82.8 on PyPI were found to contain a malicious .pth file that executes automatically on every Python process startup, with no corresponding release on the official GitHub repository, indicating the PyPI account was likely compromised.
  • The malware follows a three-stage attack pattern: collecting SSH keys, cloud credentials, .env files, and Kubernetes configs; encrypting and exfiltrating them to a domain unrelated to legitimate litellm infrastructure; then attempting persistent backdoor installation via systemd and privileged Kubernetes pod creation.
  • The attack was discovered because a bug in the malware caused an exponential fork bomb through a recursive .pth file, triggering, which crashed the host machine and made the compromise visible rather than silent.
  • Any developer or CI/CD pipeline that pulled litellm as a transitive dependency after March 24, 2026, should treat all credentials on that machine as compromised and rotate SSH keys, cloud provider tokens, API keys, and database passwords immediately.
  • This incident highlights the risk of supply chain attacks through transitive dependencies, where a package you never directly installed can introduce malicious code into your environment, making dependency auditing and package integrity verification important practices for cloud-connected development workflows.

21:21📢 Justin – “Yeah… that’s bad too.” 

Lite LLM Soc 2

KUBECON EU

23:24 GKE and OSS innovation at KubeCon EU 2026

  • GKE Autopilot is no longer a cluster-level decision made at creation time. Standard clusters can now enable Autopilot compute classes on a per-workload basis, removing the need to create entirely new clusters when workload requirements change.
  • Google is open-sourcing the GKE Cluster Autoscaler, one of the core infrastructure provisioning components, with the goal of making it available to the broader Kubernetes community as a vendor-neutral tool.
  • llm-d, a Kubernetes-native distributed inference framework built with Red Hat and NVIDIA, has been accepted as a CNCF Sandbox project. It addresses inference-aware traffic management, multi-node replica orchestration, and KV cache offloading in a hardware-agnostic way.
  • Google released an open-source DRA driver for TPUs, coordinated alongside NVIDIA, donating their own DRA driver, establishing Dynamic Resource Allocation as a shared standard for describing specialized hardware across Kubernetes workloads.
  • TPU support is coming to Ray v2.55 with backing from both Google and Anyscale, and a new Ray History Server in alpha allows users to debug completed or terminated RayJobs using persisted logs, state, and metrics through the Ray Dashboard on GKE.

24:29 📢 Ryan – “It’s super nice of them to open source that, because it does seem like a very powerful thing to use. I love the idea of having individual workloads on a cluster, and be able to delegate to managed and unmanaged… it’s kind of neat.”  

24:49 llm-d officially a CNCF Sandbox project

  • llm-d has been accepted as a CNCF Sandbox project, with Google Cloud as a founding contributor alongside Red Hat, IBM Research, CoreWeave, and NVIDIA. 
  • The project aims to extend Kubernetes for LLM inference workloads under an open-source model with no vendor lock-in, available at llm-d.ai.
  • The core technical contribution is model-aware request routing through the llm-d Endpoint Picker, which considers KV-cache hit rates, in-flight requests, and queue depth to direct traffic to optimal backends. 
  • In production testing on Vertex AI, this approach reduced Time-to-First-Token latency by over 35% for coding workloads and improved P95 tail latency by 52% for bursty chat workloads.
  • A notable outcome of the routing intelligence was doubling Vertex AI’s prefix cache hit rate from 35% to 70%, which directly reduces re-computation overhead and lowers cost-per-token for high-volume inference deployments.
  • Google leads development of the Kubernetes LeaderWorkerSet API, which llm-d uses to orchestrate prefill and decode disaggregation across independently scalable pods, supporting both TPU and GPU fleets at scale.
  • Google has also extended vLLM natively for Cloud TPUs with a unified PyTorch and JAX backend, delivering up to 5x throughput gains over the initial release. Pricing for running llm-d workloads depends on underlying GKE and accelerator costs, which vary by instance type and region.

26:21 What’s new with Microsoft in open-source and Kubernetes at KubeCon + CloudNativeCon Europe 2026 

  • Dynamic Resource Allocation has reached general availability in Kubernetes, and Microsoft’s DRANet now includes upstream support for Azure RDMA NICs, meaning GPU-to-NIC topology alignment is handled at the scheduler level rather than through manual configuration. 
    • This matters for teams running distributed training workloads where network topology directly affects performance.
  • AI Runway is a new open-source project under the KAITO umbrella that provides a common Kubernetes API for inference workloads, with a web interface, HuggingFace model discovery, GPU memory fit indicators, and real-time cost estimates. 
    • It supports multiple runtimes, including NVIDIA Dynamo and KubeRay, giving platform teams a single control plane for model deployments without requiring end users to know Kubernetes.
  • AKS networking gets several notable updates, including Azure Kubernetes Application Network for identity-aware mTLS and traffic telemetry without a full service mesh, WireGuard encryption at the node level via Cilium, and Pod CIDR expansion that lets clusters grow IP ranges in place rather than requiring a full rebuild. 
    • Pricing for Advanced Container Networking Services features like Cilium mTLS is not specified in the announcement.
  • On the observability side, AKS now surfaces GPU utilization directly into managed Prometheus and Grafana, closing a monitoring gap that previously required manual exporter configuration.
    • A new agentic container networking interface also lets operators run natural-language diagnostic queries against live telemetry, reducing time to identify network issues.
  • Blue-green agent pool upgrades and agent pool rollback are now available in AKS, letting teams provision a parallel node pool with the new configuration, validate it, and revert to the previous Kubernetes version and node image if problems appear. 
  • AKS Desktop also reached general availability, giving developers a local environment that mirrors production AKS configuration.

27:42 📢 Ryan – “And if you’ve ever debugged an issue on Kubernetes, then you know that there’s logs everywhere that you have to go and review and correlate across each other, so having an agent that can go and look across all those places and diagnose issues is fantastic.” 

AI Is Going Great – Or How ML Makes Money 

28:22 Project SnowWork: The easiest way for business users to get work done

  • Snowflake announced Project SnowWork in Research Preview, an agentic AI platform targeting business users in finance, sales, marketing, and operations who need to complete multi-step data workflows without writing code or relying on technical teams.
  • The platform differentiates itself from general AI assistants by grounding outputs in an organization’s existing Snowflake data and automatically enforcing existing RBAC and governance policies, meaning users only see data they are already authorized to access.
  • Project SnowWork ships with pre-built persona profiles for specific business functions, so a finance user gets workflows tuned to FP&A KPIs and close narratives while a sales user gets pipeline risk summaries, rather than a one-size-fits-all interface.
  • Practical use cases highlighted include compressing financial close storytelling from days to a single workflow and replacing manual pipeline rollups with automated executive briefs, which gives listeners a concrete sense of the time savings being targeted.
  • Access is currently limited to a select group of customers in a collaborative research preview, so this is not a general availability release, and organizations interested in early access would need to engage directly with Snowflake.

27:42 📢 Ryan – “I do like the idea of bringing AI to the data rather than the data to the AI, which is a common problem, especially in enterprise platforms. I worry a little bit;  The RBAC and authorization in Snowflake is very complex, and I wonder if people are actually going through and actually defining those in a way that would be proper segmentation? But I guess, you know, they have access to it today, they just have to know how to query it.”

30:10 OpenAI to acquire Astral 

  • OpenAI is acquiring Astral, the company behind three widely adopted Python developer tools: uv for dependency and environment management, Ruff for linting and formatting, and ty for type safety enforcement. 
  • The Astral team will join the Codex team after the deal closes, pending regulatory approval.
  • Codex has reached over 2 million weekly active users, with 3x user growth and 5x usage increase since the start of 2025. This acquisition appears aimed at deepening Codex’s ability to operate across the full Python development lifecycle rather than just generating code snippets.
  • The stated goal is to move Codex toward participating in complete development workflows, including planning changes, modifying codebases, running tools, verifying results, and maintaining software over time. Integrating Astral’s tooling directly into that workflow gives Codex agents access to infrastructure developers already use daily.
  • OpenAI has committed to continuing support for Astral’s open source projects after closing, which matters to the Python community given how widely these tools are already embedded in developer workflows. Developers using uv or Ruff should not expect immediate disruption to those projects.
  • For cloud and platform teams, this signals a trend toward AI coding agents that are tightly coupled with language-specific toolchains rather than operating as generic code generators, which could influence how development environments and CI/CD pipelines are structured going forward.

30:47 📢 Justin – “I don’t know why they needed to buy the company to do all this, it is open source already.” 

32:50 Anthropic just shipped an OpenClaw killer called Claude Code Channels, letting you message it over Telegram and Discord 

  • Anthropic released Claude Code Channels in version 2.1.80, enabling developers to connect their Claude Code sessions to Telegram and Discord bots, shifting from a synchronous chat model to an asynchronous, persistent agent that can work autonomously and notify users when tasks are completed.
  • The feature is built on Anthropic’s open-source Model Context Protocol, which acts as a standardized bridge between Claude Code and external messaging platforms. 
  • The setup uses the Bun JavaScript runtime to run a polling service that injects incoming messages as session events, allowing Claude to execute code, run tests, and reply back through the messaging app.
  • Practically, this eliminates the need for developers to maintain dedicated hardware like a Mac Mini running open-source agent frameworks 24/7, since Claude Code itself now handles session persistence when run in a background terminal or on a VPS.
  • The plugin architecture is open, with official Telegram and Discord connectors hosted on GitHub under Anthropic repositories, meaning the community can build additional connectors for platforms like Slack or WhatsApp without waiting for Anthropic to ship them natively.
  • The feature remains tied to Anthropic’s commercial subscriptions (Pro, Max, and Enterprise), so while the MCP layer is open, the underlying Claude model and Claude Code harness are proprietary, which is an important cost and vendor-lock consideration for teams evaluating this against self-hosted alternatives.

33:50 📢 Justin – “I tried to use this, and it don’t work for me, but I didn’t have enough time to test it, I had too many Claude sessions going, and I needed to kill all of them and update properly to the 2.1.80 version. But I am curious to play with it a little more.”      

35:34 Put Claude to work on your computer 

  • Anthropic has launched computer use capabilities in Claude Cowork and Claude Code, now in research preview for Pro and Max subscribers on macOS. Claude can directly control a browser, mouse, keyboard, and screen to complete tasks when no direct connector exists, with no setup required.
  • The feature follows a tool priority hierarchy, reaching for service connectors like Slack or Google Calendar first, then falling back to direct computer control. Claude requests explicit permission before accessing new applications and can be stopped at any point.
  • Anthropic has built in prompt injection safeguards by scanning model activations during computer use sessions. They acknowledge that the capability is still early and recommend users avoid sensitive data and start with trusted applications only.
  • Dispatch, released alongside this update, enables a continuous conversation thread between mobile and desktop, letting users assign tasks from their phone and pick up completed work on their computer. 
    • Use cases include automated morning email checks, scheduled metric pulls, and triggering Claude Code sessions for pull requests.
  • The combination of Dispatch and computer use means Claude can execute multi-step workflows on a desktop while the user is away, such as making IDE changes, running tests, and submitting a PR. 
  • Current limitations include macOS-only support, slower execution compared to direct integrations, and occasional need for retries on complex tasks.

36:28 📢 Ryan – “I didn’t know this was macOS only, because I was going to put it on my Linux server so I could get compute that wasn’t my laptop.” 

38:32 Auto mode for Claude Code

  • Anthropic launched auto mode for Claude Code in research preview for Team plan users, with Enterprise and API access coming soon. It works with both Claude Sonnet 4.6 and Opus 4.6, offering a middle ground between the default conservative permission prompts and the risky dangerously-skip-permissions flag.
  • The core mechanism is a classifier that reviews each tool call before execution, automatically blocking potentially destructive actions like mass file deletion, sensitive data exfiltration, or malicious code execution, while letting safe actions proceed without interruption.
  • This directly addresses a practical developer workflow problem: Claude Code’s default mode requires frequent human approvals that prevent truly unattended long-running tasks, and auto mode allows developers to kick off extended jobs without babysitting the process.
  • Anthropic is transparent about the limitations, noting the classifier may still allow some risky actions when user intent is ambiguous, and may occasionally block benign ones. They continue to recommend using it in isolated environments rather than treating it as a fully safe alternative.
  • There is a small performance tradeoff to be aware of, as auto mode adds some overhead to token consumption, cost, and latency per tool call due to the classifier running before each action.

AWS

41:21 Amazon Bedrock AgentCore Runtime now supports shell command execution

  • Amazon Bedrock AgentCore Runtime now includes InvokeAgentRuntimeCommand, an API that lets developers execute shell commands directly inside a running agent session, streaming output in real time over HTTP/2 and returning exit codes without custom container logic.
  • The practical benefit here is that AI agents frequently need to run deterministic operations like tests, dependency installs, or git commands alongside LLM reasoning, and previously, developers had to build all that process management themselves inside their containers.
  • Commands run in the same container, filesystem, and environment as the agent session and can execute concurrently with agent invocations without blocking, which simplifies architectures for coding agents, CI/CD automation, and similar workflows.
  • The feature is available across 14 AWS regions, including major US, European, and Asia Pacific locations, giving teams broad geographic coverage for latency-sensitive or data-residency-constrained workloads.
  • Pricing details are not specified in the announcement, so teams evaluating this should check the AgentCore Runtime pricing page directly before building cost models around heavy command execution workloads.

42:11 📢 Ryan – I do get the advantages of this. Most of my use cases in GitHub Autopilot or Cloud Code it’s running Shell to do lots of things, especially executing tests, and so for CI-CD type workflows, you couldn’t do anything without it. I’m really curious how teams were working around this; people that were previously using Agent Core, because I bet that is ugly. But yeah, it’s going to be dangerous.”

42:56 Amazon Inspector expands agentless EC2 scanning and introduces Windows KB-based findings

  • Amazon Inspector now supports agentless EC2 scanning for a broader range of software, including WordPress, Apache HTTP Server, Python packages, and Ruby gems, plus Windows OS vulnerabilities, with no configuration changes required for existing customers.
  • The new Windows KB-based findings consolidate multiple CVEs addressed by a single Microsoft patch into one finding, surfacing the highest CVSS score, EPSS score, and exploit availability, which reduces noise and makes remediation more straightforward.
  • All existing CVE-based Windows OS findings will automatically transition to KB-based findings, meaning security teams will see fewer duplicate alerts and can map findings directly to specific Microsoft patches via included KB article links.
  • The agentless approach lowers the operational overhead for security teams managing large EC2 fleets, particularly in environments where installing and maintaining agents is restricted or impractical.
  • Both capabilities are available across all AWS Regions where Amazon Inspector is currently offered, and pricing follows the existing Inspector model based on instance scanning volume, so customers should review the Inspector pricing page for current rates.

43:33 📢 Justin – “I’m actually shocked this wasn’t already there, because CVE is really just the generic way that you would find these, but typically they’re always linked to a knowledge-based article which then typically links you to the patch, so I don’t know how people got from the CVE to the patch without this before, other than maybe the CVE mentions the KB articles.”

22:53 Amazon ECR now supports pull-through cache for Chainguard

  • Amazon ECR pull-through cache now supports Chainguard as an upstream registry source, allowing customers to automatically sync Chainguard container images into ECR without building custom synchronization workflows.
  • Chainguard images are known for their minimal attack surface and security-focused builds, so pairing them with ECR’s native image scanning and lifecycle policies gives teams a more integrated security posture for their container supply chain.
  • The practical benefit here is operational simplicity: teams using Chainguard images at scale no longer need separate tooling to keep images current, as ECR handles the sync automatically and frequently.
  • Cached Chainguard images inherit standard ECR capabilities, including lifecycle policies for cost management and image scanning, which means customers get consistent governance across both their own images and upstream Chainguard images.
  • The feature is available in all AWS regions where ECR pull-through cache is supported, and pricing follows standard ECR storage and data transfer rates with no additional charge specific to the Chainguard integration. Full details are in the ECR pull-through cache documentation here

46:22 📢 Matt – “It’s massive, but checks a box for your security team, right, that doesn’t want to understand how containers work. Just use this one, and you’ll have to worry about it. It’s like, but I can install anything I want on it. So is it actually going to help?”

47:57 AWS at 20*: Inside the rise of Amazon’s cloud empire, and what’s at stake in the AI era

  • AWS turns 20 this month, growing from 10 cents per compute hour in 2006 to nearly $129 billion in annual revenue, which would place it in the Fortune 500 top 40 as a standalone company. 
  • The article traces how S3 and EC2 established the pay-per-use primitive model that directly undercut Oracle-style licensing and reshaped enterprise IT economics.
  • Bedrock has become the fastest-growing service in AWS history, surpassing 100,000 customers and generating multi-billion dollar revenue with 60% quarter-over-quarter spending growth. AWS built it as a multi-model platform rather than pushing a single in-house option, following the same pattern it used with CPUs and GPUs by offering AMD, Intel, Graviton, Nvidia, and Trainium alongside each other.
  • Project Rainier, an AI compute cluster powered by over 500,000 Trainium2 chips in Indiana, represents AWS attempting to reduce dependence on Nvidia by building its own silicon stack from chip to data center. 
  • The OpenAI partnership, worth up to $100 billion in cloud commitments over eight years, brings OpenAI workloads onto Trainium chips, making it the second major AI lab after Anthropic to commit to Amazon’s custom silicon.
  • AWS still leads cloud revenue at over $116 billion annually, but Azure at $75 billion and Google Cloud at $50 billion annual run rates show the gap narrowing, particularly in AI workloads. 
  • Corey Quinn’s Cisco analogy is worth discussing: AWS could remain profitable and essential while becoming less central to where AI innovation actually happens.
  • Jassy has publicly projected AWS could reach $600 billion in annual revenue by 2036 with AI as the driver, backing that with $200 billion in capital expenditure planned for this year alone, which would consume nearly all of Amazon’s operating cash flow.
  • Happy Birthday 

49:37 AWS MCP Server (Preview) now with enhanced monitoring and semantic search capability

  • AWS MCP Server in preview now automatically publishes metrics to CloudWatch under the AWS-MCP namespace at no additional cost, covering invocation counts, success rates, client errors, server errors, and throttling for individual tools like the AWS API caller and Agent SOP retriever.
  • Agent SOPs are pre-built, tested workflows that guide AI assistants through complex multi-step AWS tasks, and the documentation search tool now uses semantic similarity so agents can discover the right SOP through natural language queries rather than exact keyword matching.
  • The CloudWatch integration addresses a previous gap where customers had no visibility into agent-driven changes, enabling teams to track usage patterns, identify permission issues, and configure alarms when error rates exceed defined thresholds.
  • The service is currently available only in US East (N. Virginia) in preview, which is worth noting for teams with data residency requirements or those operating primarily in other regions.
  • For listeners building AI-assisted infrastructure automation, this update provides a practical observability layer for MCP-based agents, which is increasingly relevant as teams adopt AI assistants for AWS operations tasks.

50:26 📢 Ryan – “Why did everything go offline? Now you can find out!” 

GCP

50:59  CloudSQL read pools support autoscaling 

  • Cloud SQL read pools, now generally available for Enterprise Plus edition, let you provision up to 20 read replicas behind a single load-balanced endpoint for MySQL and PostgreSQL, removing the need to manually manage multiple replicas or reconfigure applications when nodes are added or removed.
  • The new autoscaling feature dynamically adjusts node count based on CPU utilization or database connection thresholds, with users defining minimum and maximum node counts so the pool scales within those bounds automatically during traffic fluctuations.
  • Pools with two or more nodes are backed by a 99.99% availability SLA that covers maintenance downtime, and configuration changes like VM type or database flag updates are applied across all nodes with near-zero downtime.
  • From a cost perspective, autoscaling helps avoid over-provisioning by scaling in during low-traffic periods, meaning you pay only for nodes actively in use rather than maintaining a fixed fleet sized for peak load.
  • Retail and other industries with variable workloads are a natural fit, and teams can get started via gcloud CLI, Terraform, or the REST API, with a 30-day free trial available at cloud.google.com/sql for hands-on access to Enterprise Plus features.
  • Want to sign up for a free trial of Cloud SQL? You can do that here

52:29 📢 Matt – “The feature here I actually like is that it autoscales reads… nothing I’ve seen will do auto scaling on the reads for SQL and scale it out horizontally in that way. Like, even Aurora, if you’re on the normal one, you build a read replica, you have to build each read replica, and then either route or round robin to those ones. So if it’s actually going to do automatic adding and removing based on capacity needs, that’s a pretty nice feature because it can save you a lot of money.” 

53:35 Design UI using AI with Stitch from Google Labs

  • Google Labs has evolved Stitch (stitch.withgoogle.com) into an AI-native design canvas that converts natural language descriptions into high-fidelity UI designs, targeting both professional designers and non-designers who want to move from concept to prototype quickly.
  • The updated tool introduces an infinite canvas, a design agent that reasons across a project’s full history, and an Agent Manager for running multiple design directions in parallel, which addresses a common pain point of managing divergent design explorations.
  • DESIGN.md is a notable addition that lets users extract and export design systems as an agent-friendly markdown file, making it easier to apply consistent design rules across projects or share them with other tools without starting from scratch each time.
  • Stitch connects to developer workflows through an MCP server and SDK, with export options to AI Studio and Antigravity, positioning it as a handoff layer between design and development rather than a standalone tool.
  • Pricing details are not specified in the announcement, so listeners interested in using Stitch for production workflows should check the documentation at stitch.withgoogle.com for current access and cost information.

Stitch Example

55:20 📢 Ryan – “I was developing something for my family, and it looks like you would expect, and so I can’t wait to try this out. And it was really impressive how fast, and how little feedback you gave it.” 

Azure

56:12 Microsoft at NVIDIA GTC: New solutions for Microsoft Foundry, Azure AI infrastructure and Physical AI 

  • Microsoft Foundry Agent Service and Observability in Foundry Control Plane are now generally available, giving enterprise teams a unified platform to build, deploy, and monitor AI agents with end-to-end visibility into agent behavior across tools, data, and workflows.
  • Azure is the first hyperscale cloud to power on NVIDIA Vera Rubin NVL72 systems in its labs, with rollout planned to liquid-cooled datacenters over the coming months, following deployment of hundreds of thousands of Grace Blackwell GPUs in under a year. 
  • This positions Azure as a target platform for inference-heavy and reasoning-based workloads at scale.
  • NVIDIA Nemotron models are now available through Microsoft Foundry, and the Fireworks AI integration allows customers to fine-tune open-weight models into low-latency deployments that can be distributed to the edge. 
  • Pricing for these models is not specified in the announcement and would vary based on usage.
  • Microsoft is extending NVIDIA Vera Rubin platform support to Azure Local, allowing organizations in sovereign and regulated environments to run next-generation AI workloads while maintaining Azure-consistent governance through Azure Arc and Foundry Local.
  • A new Physical AI Toolchain, available via a public GitHub repository, integrates NVIDIA Physical AI Data Factory with Azure services, enabling developers to build robotics and physical AI workflows that connect physical assets, simulation environments, and cloud training into repeatable enterprise pipelines.

57:38📢 Justin – “Skynet is VERY excited.” 

59:06 Microsoft 365 pauses Copilot creep after admins cry foul

  • Microsoft has paused the automatic deployment of the Microsoft 365 Copilot app to desktop users, halting a rollout that had already slipped twice from its original October 2025 target date. 
  • The pause has no specified end date, and existing installations remain unaffected.
  • The core admin complaint was that the opt-out default model increased IT workload by forcing organizations to set policies on Microsoft’s timeline rather than their own. Admins who want to proceed with deployment can still do so manually through other available methods.
  • European Economic Area customers were already excluded from this rollout, likely reflecting ongoing regulatory considerations around default software installations in that region.
  • This pause aligns with broader reported changes to Microsoft’s approach of embedding Copilot across Windows 11 surfaces, suggesting some recalibration of how aggressively the assistant is pushed to end users. 
  • For IT decision-makers, the key takeaway is that centralized control over AI tool deployment remains a practical concern, and Microsoft’s willingness to halt the rollout signals that enterprise admin feedback carries weight in deployment decisions.

59:45 📢 Justin – “Don’t force your IT people to do things. That’s not good. They’re already overworked and stressed.” 

1:00:46 Advancing agentic AI with Microsoft databases across a unified data estate 

  • Microsoft announced a savings plan for databases at SQLCon 2026, offering up to 35% savings versus pay-as-you-go pricing on a one-year hourly spend commitment, automatically applied across eligible Azure database services, including Azure SQL.
  • GitHub Copilot is now generally available in SQL Server Management Studio 22, bringing chat and T-SQL code assistance directly into SSMS for developers and DBAs who already use Copilot in Visual Studio and VS Code.
  • Azure SQL Database Hyperscale gained new public preview features, including a SQL MCP Server for connecting SQL data to AI agents, larger 160 and 192 vCore options, and enhanced vector indexes with full insert, update, and delete support requiring no code changes.
  • SQL database in Fabric reached general availability for several enterprise security features, including SQL Auditing, Customer-Managed Keys, and Dynamic Data Masking, with workspace-level Private Link in preview, targeting customers with strict governance and compliance requirements.
  • Microsoft introduced the Database Hub in Fabric, now in early access, providing a single management plane across Azure SQL, Cosmos DB, PostgreSQL, MySQL, and Arc-enabled SQL Server, with agent-assisted monitoring that surfaces estate-wide signals and recommended actions. 
  • Interested in signing up for Database Hub? You can do that here

1:01:37 📢 Matt – “There’s a lot of ‘things’ in this blog post; the biggest one for me is the savings plan for databases… It’s just built in there now. It really means you can get those savings; you don’t have to commit or be a hyperscaler.” 

1:03:28 Generally Available: Versionless key support for transparent data encryption in Azure SQL Database 

  • Azure SQL Database now supports versionless keys for transparent data encryption, meaning customers can point to a key in Azure Key Vault without pinning to a specific version, and the database will automatically use the latest key version as it rotates.
  • This reduces operational overhead for teams managing customer-managed keys, eliminating the manual step of updating TDE configurations each time a key is rotated in Azure Key Vault or Managed HSM.
  • The practical benefit is improved reliability around key rotation workflows, since missed version updates previously could cause access disruptions to encrypted databases, a real risk in regulated industries with frequent rotation policies.
  • This feature is generally available and integrates with existing Azure Key Vault and Managed HSM setups, so customers already using bring-your-own-key TDE can adopt versionless references without rebuilding their encryption architecture.
  • No additional cost is associated with this feature beyond standard Azure Key Vault or Managed HSM pricing, making it a straightforward operational improvement for any Azure SQL Database customer using customer-managed keys.

1:04:10 📢 Justin – “There’s no additional cost for this, and thank god, because this is the dumbest feature I’ve ever heard of in my entire life. Why does it not just do it automatically?”  

1:06:24 Microsoft Launches Azure Skills Plugin to Give AI Coding Agents Real Azure Expertise

  • Microsoft released the Azure Skills Plugin, available at aka.ms/azure-plugin, which bundles over 19 curated Azure workflow skills, the Azure MCP Server with 200+ tools across 40+ Azure services, and the Foundry MCP Server into a single install for AI coding agents. 
  • The goal is to move agents beyond generic code suggestions toward actual Azure deployment actions like provisioning, cost optimization, and live diagnostics.
  • The skills layer is the core differentiator here, encoding decision trees and sequencing logic for real Azure workflows rather than simple prompt snippets. Key skills include azure-prepare for generating infrastructure code, azure-validate for pre-flight checks, azure-deploy for orchestrating through the Azure Developer CLI, and azure-diagnostics for troubleshooting using logs and KQL queries.
  • The plugin is designed to be portable across agent hosts, including GitHub Copilot in VS Code, Copilot CLI, and Claude Code, with configuration handled automatically through a .mcp.json file and a .github/plugins/azure-skills folder. Teams using multiple agent tools do not need to maintain separate configurations for each.
  • Microsoft is explicit that this setup requires real credentials and real Azure resources, recommending least-privilege access, explicit tool approvals, and skills sourced only from trusted repositories. This positions the agent as a supervised collaborator rather than an autonomous actor, which is a practical consideration for teams evaluating security posture.
  • Prerequisites include Node.js 18 or later, Azure CLI authenticated via az login, and optionally the Azure Developer CLI for deployment workflows. No specific pricing is listed for the plugin itself, though costs will vary based on the underlying Azure services and resources the agent provisions during use.

1:07:48 📢 Matt – “I’m actually most excited for the KQL feature because writing KQL is like writing SQL, but harder, but also I’m terrible at both, so don’t judge that one statement. But if I can live, just tell it to search the logs in a certain way, because right now I just have this terrible workflow of Claude – this is what I’m looking for in KQL. Copy-paste, take the screenshot, put it back over here, copy-paste, and iterate through this very slow cycle. So if I can have it understand KQL, so much better.” 

1:08:56 Azure DevOps Remote MCP Server Lands in Microsoft Foundry, Giving AI Agents Direct Access to Your DevOps Data

  • Microsoft launched the Azure DevOps Remote MCP Server in public preview on March 17, followed by its integration into Microsoft Foundry two days later. 
  • The server gives AI agents a hosted, authenticated connection to Azure DevOps data, including work items, pull requests, pipelines, repos, and wikis via a single URL endpoint at mcp.dev.azure.com.
  • Authentication runs entirely through Microsoft Entra, meaning organizations apply their existing identity policies, conditional access rules, and permission boundaries to agent access without building separate integrations. Notably, only Entra-backed Azure DevOps organizations are supported, leaving MSA-backed and on-premises deployments without this option for now.
  • Two access control headers stand out for enterprise use: X-MCP-Readonly restricts agents to read-only operations, and X-MCP-Toolsets lets teams scope which tool categories an agent can access. This shifts the governance conversation from whether agents should touch DevOps data to defining the specific conditions under which they can.
  • The Foundry integration connects Azure DevOps data to Foundry’s full agent development lifecycle, including model access, orchestration, evaluation, and deployment. Teams can add the server through the Foundry tool catalog and control which specific operations each agent is permitted to perform.
  • Current limitations worth noting include client support restricted to Visual Studio and VS Code without extra setup, while Claude Desktop, GitHub Copilot CLI, and ChatGPT require additional OAuth configuration in Entra before connecting. Microsoft has also indicated plans to eventually archive the local MCP Server in favor of this remote version, so teams on the local server should begin evaluating migration. No separate pricing has been announced beyond standard Azure DevOps and Foundry costs. 

Oracle

1:11:13 Oracle Releases Java 26 

  • Java 26 ships 10 JDK Enhancement Proposals covering AI integration, cryptography, and language simplification, including HTTP/3 support in the HTTP Client API and a fourth preview of primitive types in pattern matching. None of these are final features yet, with several JEPs still in preview or incubator status after multiple rounds.
  • The Ahead-of-Time Object Caching feature from Project Leyden is worth noting as it extends startup time improvements to work with any garbage collector, including ZGC, which addresses a practical pain point for cloud-native Java deployments where cold start latency matters.
  • Oracle is launching the Java Verified Portfolio, a bundled support offering covering JavaFX, Helidon, and the VS Code Java extension, included free for Java SE subscribers and OCI customers running Java workloads. For everyone else, pricing is not explicitly stated beyond noting that many components remain free for a wide range of use cases.
  • The Applet API removal in JEP 504 is notable mainly as a cleanup item, having been deprecated since JDK 17, and signals Oracle is willing to break legacy compatibility when features have been sufficiently warned about over multiple release cycles.
  • Helidon is being proposed as an OpenJDK project and aligned to the Java release cadence, which tightens Oracle’s control over the microservices framework ecosystem while keeping it open source, a pattern Oracle has used with other technologies in its portfolio.

1:11:21 📢 Justin – “They brought AI to Java, and all is going to be lost.” 

1:12:21 Oracle Unveils AI Database Agentic Innovations for Business Data

  • Oracle announced a bundle of agentic AI capabilities for Oracle AI Database at its AI World Tour in London, centered on keeping AI workloads closer to the data rather than moving data to external AI systems. 
  • The headline additions include the Autonomous AI Vector Database in limited availability on free and low-cost developer tiers, a Private Agent Factory for no-code agent building, and a Unified Memory Core for storing agent context across multiple data types in a single engine.
  • The security angle is notable here. Oracle Deep Data Security and the Private AI Services Container are positioned to address prompt injection and data leakage risks by enforcing least-privilege access at the database layer rather than in application code, which is a practical concern for enterprises deploying agents against sensitive business data.
  • Oracle Trusted Answer Search takes a conservative approach to reducing hallucinations by matching user questions to pre-built reports via vector search rather than letting an LLM answer directly, which trades flexibility for determinism and may suit regulated industries but limits open-ended query use cases.
  • The open standards additions, specifically Vectors on Ice for Apache Iceberg support and an Autonomous AI Database MCP Server, are worth noting because they reduce some of the lock-in concerns that typically follow Oracle announcements, though customers still need to be running Oracle AI Database to benefit.
  • Pricing details are sparse in the announcement. The Autonomous AI Vector Database is available through the Oracle Cloud free tier or a low-cost developer tier, with a one-click upgrade path to full Autonomous AI Database, but Oracle has not published specific per-unit costs for the new agentic capabilities.

Matt's Forest

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0:00
0:00