316: Microsoft’s New AI Agent Has Trust Issues (With Software)

Welcome to episode 316 of The Cloud Pod, where the forecast is always cloudy! This week we’ve got earnings (with sound effects, obviously) as well as news from DeepSeek, DocumentDB, DigitalOcean, and a bunch of GPU news. Justin and Matt are here to lead you through all of it, so let’s get started!

Titles we almost went with this week:

🧌Lake Sentinel: The Security Data Monster Nobody Asked For
🪪Certificate Authority Issues: When Your Free Lunch Gets a Security Audit
🔕Slash and Learn: Gemini Gets Command-ing
⚓DigitalOcean Drops Anchor in AI Waters with Gradient Platform
😭The Three Stages of Azure Grief: Development, Preview, and Launch
🔷E for Enormous: Azure’s New VM Sizes Are Anything But Virtual
📲SRE You Later: Azure’s AI Agent Takes Over Your On-Call Duties
🧑‍🔬Site Reliability Engineer? More Like AI Reliability Engineer
💾Azure Disks Get Elastic Waistbands
🕴️Agent Smith Would Be Proud: Google’s Multi-Agent Matrix Gets Real
🧨C4 Yourself: Google Explodes Into GA with Intel’s Latest Silicon
💰The Cost is Right: GCP Edition
🪙Penny for Your Cloud Thoughts: 💸Google’s Budget-Friendly Update
🍽️DocumentDB Goes on a Diet: Now Available in Serverless Size
👩‍❤️‍👨MongoDB Compatibility Gets the AWS Serverless Treatment
🧑‍💻No Server? No Problem: DocumentDB Joins the Serverless Party
🚣Stream Big or Go Home: Lambda’s 10x Payload Boost
🎏Lambda Response Streaming: Because Size Matters
🛍️GPT Goes Open Source Shopping
GPT’s Open Source Awakening
🦠When Your Antivirus Needs an Antivirus: Enter Project Ire
🧑‍💻The Opus Among Us: Anthropic’s Coding Assistant Gets an Upgrade
🚣Serverless is becoming serverful in streaming responses

General News

02:08 It’s Earnings Time! (INSERT AWESOME SOUND EFFECTS HERE)

02:16 Alphabet beats earnings expectations, raises spending forecast

Google Cloud revenue hit $13.62 billion, up 32% year-over-year, with OpenAI now using Google’s infrastructure for ChatGPT, signaling growing enterprise confidence in Google’s AI infrastructure capabilities.
Alphabet is raising its 2025 capital expenditure forecast from $75 billion to $85 billion, driven by cloud and AI demand, with plans to increase spending further in 2026 as it competes for AI workloads.
AI Overviews now serves 2 billion monthly users across 200+ countries, while the Gemini app reached 450 million monthly active users, demonstrating Google’s scale in deploying AI services globally.
The $10 billion increase in planned capital spending reflects the infrastructure arms race among cloud providers to capture AI workloads, which require significant compute and specialized hardware investments.
Google’s cloud growth rate of 32% outpaces its overall revenue growth of 14%, indicating the strategic importance of cloud services as traditional search and advertising face increased AI competition.

03:55 📢 Justin – “I don’t know what it takes to actually run one of these large models at like ultimate scale that like a ChatGPT needs or Anthropic, but I have to imagine it’s just thousands and thousands of GPUs just working nonstop.”

04:31 Microsoft (MSFT) Q4 earnings report 2025

Microsoft reported Q4 fiscal 2025 earnings with revenue of $76.44 billion, up 18% year-over-year and beating expectations, marking the fastest growth in over three years.
Azure revenue grew 39% in Q4, significantly exceeding analyst expectations of 34-35%, with Microsoft disclosing for the first time that Azure and cloud services exceeded $75 billion in annual revenue for fiscal 2025.
Microsoft’s AI investments are showing returns with 100 million monthly active users across Copilot products, driving higher revenue per user for Microsoft 365 commercial cloud products.
Capital expenditures reached $24.2 billion for the quarter, up 27% year-over-year, as Microsoft continues aggressive data center buildout for AI workloads alongside peers like Alphabet ($85B annual) and Meta ($66-72B annual).
Microsoft’s market cap crossed $4 trillion in after-hours trading, becoming only the second company, after Nvidi,a to reach this milestone, driven by strong cloud and AI momentum.

06:33 Amazon earnings key takeaways: AI, cloud growth, tariffs

Things weren’t quite as great for Amazon…
Amazon’s capital expenditure could reach $118 billion in 2025, up from the previous $100 billion forecast, with spending primarily focused on AI infrastructure alongside competitors Meta ($66-72B) and Alphabet ($85B).
AWS revenue grew 18% year-over-year, trailing Microsoft Azure’s 39% and Google Cloud’s 32% growth rates, though AWS maintains a significantly larger market share with the second player at approximately 65% of AWS’s size.
Amazon’s generative AI initiatives are generating multiple billions in annualized revenue for AWS, with potential monetization through services like Alexa+ at $19.99/month or free for Prime members.
Despite initial concerns about tariffs impacting costs, Amazon reported 11% growth in online store sales and 12% increase in items sold, with no significant price increases or demand reduction observed.
The company expects Q3 revenue growth of up to 13%, suggesting tariffs have been absorbed by suppliers and customers, though uncertainty remains with the U.S.-China trade agreement deadline on August 12.

08:08 📢 Justin – “They’re not there yet. And they, they haven’t been there for a while, which is the concerning part. And I don’t know, you know – I haven’t really heard much about Nova since they launched. They talk a lot about their Anthropic partnership, which makes sense. But I don’t feel like they have the swagger in AI that the others do.”

AI Is Going Great – or How ML Makes Its Money

11:23 Gemini 2.5: Deep Think is now rolling out

Google’s Gemini 2.5 Deep Think uses parallel thinking techniques and extended inference time to solve complex problems, now available to Google AI Ultra subscribers in the Gemini app with a fixed daily prompt limit.
The model achieves state-of-the-art performance on LiveCodeBench V6 and Humanity’s Last Exam benchmarks, with a variation reaching gold-medal standard at the International Mathematical Olympiad, though the consumer version trades some capability for faster response times.
Deep Think excels at iterative development tasks like web development, scientific research, and algorithmic coding problems that require careful consideration of tradeoffs and time complexity.
The technology uses novel reinforcement learning techniques to improve problem-solving over time and automatically integrates with tools like code execution and Google Search for enhanced functionality.
Google plans to release Deep Think via the Gemini API to trusted testers in the coming weeks, signaling potential enterprise and developer applications for complex reasoning tasks in cloud environments.

13:02 📢 Justin – “…these deep thinking models are the most fun to play with, because you know, you don’t need it right away, but you want to go plan out a weekend in Paris, or I want you to, uh, go compare these three companies products based on public data and Reddit posts and things like that. And it goes, it does all this research, then it comes back with suggestions. That’s kind of fun. The more in depth it is, the better it is in my opinion.So the deep thinking stuff is kind of the coolest, like heavy duty research stuff.”

14:17 Introducing Gpt OSS

OpenAI is releasing the new GPT-OSS-120b and GPT-oss-20b open weight language models that deliver strong real-world performance at low costs.
They’re both available under the flexible Apache 2.0 license; these models on reasoning tasks demonstrate strong tool use capabilities and are optimized for efficient deployment on consumer hardware.
Gpt-oss-120b model achieves near-parity with OpenAI o4-mini on core reasoning benchmarks while running efficiently on a single 80 GB GPU.
The gpt-oss-20b model delivers similar results to OpenAI o3-mini on common benchmarks and can run on edge devices with just 16 GB of memory, making it ideal for on-device use cases, local inferenc,e or rapid iteration without costly infrastructure.
They’re also both compatible with the responses API and are designed to be used within agentic workflows with exceptional instruction following, tool use like web search or Python code execution, and reasoning capabilities.

15:30 📢 Matt – “I’m still stuck on the 16 gigabytes of memory on your video card. I still remember, I bought my video first video card, it had 256 megabytes. It was a high end video card. And now I’m like, God, these things got so much bigger and faster. Okay, I’m officially old.”

16:43 Project Ire autonomously identifies malware at scale – Microsoft Research

Microsoft Research developed Project Ire, an autonomous AI agent that reverse engineers software files to determine if they’re malicious, achieving 0.98 precision and 0.83 recall on Windows driver datasets.
The system uses LLMs combined with decompilers, binary analysis tools, and memory sandboxes to analyze code without human assistance.
The technology addresses a significant cloud security challenge where Microsoft Defender scans over 1 billion devices monthly, requiring manual review of suspicious files by experts who face burnout and alert fatigue.
Project Ire automates this gold-standard malware classification process at scale.
The system creates an auditable “chain of evidence” for each analysis, using tools like angr and Ghidra to reconstruct control flow graphs and identify malicious behaviors like process termination, code injection, and command-and-control communication. It was the first reverse engineer at Microsoft (human or machine) to author a conviction case for blocking an APT malware sample.
In real-world testing on 4,000 hard-target files that couldn’t be classified by other automated systems, Project Ire achieved 0.89 precision with only 4% false positives, demonstrating potential for deployment alongside human analysts.
The prototype will be integrated into Microsoft Defender as Binary Analyzer for threat detection.
This development represents a practical application of agentic AI in cybersecurity, building on the same foundation as GraphRAG and Microsoft Discovery, with future goals to detect novel malware directly in memory at cloud scale.

19:15 📢 Justin – “I can think of all the things that can make us more efficient at and more productive with, and it’s like wow, that’s a great use case… it just takes away all of the noise.”

27:22 Claude Opus 4.1 \ Anthropic

Claude Opus 4.1 achieves 74.5% on SWE-bench Verified coding benchmark, with GitHub reporting notable improvements in multi-file code refactoring and Rakuten praising its precision in debugging large codebases without introducing bugs
The model is available across major cloud platforms, including Amazon Bedrock and Google Cloud’s Vertex AI, at the same pricing as Opus 4, making it accessible for enterprise cloud deployments
Opus 4.1 uses a hybrid reasoning approach with extended thinking capabilities up to 64K tokens for complex benchmarks, while maintaining simpler scaffolding for coding tasks using just bash and file editing tools
Windsurf reports the upgrade delivers a one standard deviation improvement over Opus 4 on their junior developer benchmark, comparable to the performance leap between Sonnet 3.7 and Sonnet 4
For cloud developers, the immediate upgrade path is straightforward – simply switch to claude-opus-4-1-20250805 via the API with no pricing changes or major integration modifications required

AWS

29:09 Announcing general availability of Amazon EC2 G6f instances with fractional GPUs – AWS

AWS launches G6f instances with fractional GPU capabilities, offering 1/8, 1/4, and 1/2 GPU partitions powered by NVIDIA L4 Tensor Core GPUs, enabling customers to right-size workloads and reduce costs compared to full GPU instances.
The instances target graphics workloads, including remote workstations for media production, CAD engineering, ML research, and game streaming, with configurations ranging from 3-12 GB GPU memory paired with AMD EPYC processors.
This represents AWS’s first GPU partitioning offering, addressing the common challenge of GPU underutilization where workloads don’t require full GPU resources but previously had no smaller options.
Available across 11 regions with On-Demand, Spot, and Savings Plan pricing options, requiring NVIDIA GRID driver 18.4+ and supporting Amazon DCV for remote desktop access.
The fractional approach could significantly reduce costs for organizations running multiple smaller GPU workloads that previously required dedicated full GPU instances, particularly beneficial for development, testing, and lighter production workloads.

30:15 📢 Matt – “The fractional GPUs is an interesting concept; most people probably don’t need a massive GPU… so of you’re just doing one off things or you need it for a specific project, then you can get that small usage. “

31:07 Amazon DocumentDB Serverless is now available | AWS News Blog

Amazon DocumentDB Serverless automatically scales compute and memory using DocumentDB Capacity Units (DCUs), where each DCU provides approximately 2 GiB of memory plus corresponding CPU and networking resources, with a capacity range of 0.5-256 DCUs.
The service offers up to 90% cost savings compared to provisioning for peak capacity and charges a flat rate per second of DCU usage, making it cost-effective for variable workloads, multi-tenant environments, and mixed read/write scenarios.
Existing DocumentDB clusters can add serverless instances without data migration by simply changing the instance type, requiring DocumentDB version 5.0 or higher, with the ability to mix provisioned and serverless instances in the same cluster.
Key use cases include handling traffic spikes for promotional events, managing individual database capacity across multi-tenant SaaS applications, and building agentic AI applications that leverage DocumentDB’s built-in vector search capabilities.
The service maintains all standard DocumentDB features, including MongoDB-compatible APIs, read replicas, Performance Insights, and AWS service integrations, while automatically tracking CPU, memory, and network utilization to scale without disrupting availability.

33:04 📢 Justin – “I mean, the one thing about the DCU model – and I see it a bunch of places, because I’ve been doing a lot more serverless with Valkey, and this DCU model comes up a lot. I actually just moved the CloudPod database to serverless Aurora for MySQL. And so I’ve been getting a little more exposed to the whole, whatever that one’s called; something like DCU as well. And it’s a little bit opaque. I definitely don’t love it as a model, but it is so much cheaper.”

35:18 Introducing Amazon Application Recovery Controller Region switch: A multi-Region application recovery service | AWS News Blog

Amazon Application Recovery Controller (ARC) Region switch provides automated orchestration for multi-Region application failover, addressing enterprise concerns about untested recovery procedures and unknown dependencies during Regional outages.
The service supports nine execution block types, including EC2 Auto Scaling, Aurora Global Database failover, Route 53 health checks, and EKS/ECS resource scaling, enabling coordinated recovery across compute, database, and DNS services.
Region switch uses a Regional data plane architecture where recovery plans execute from the target Region, eliminating dependencies on the impacted Region and providing more resilient recovery operations.
Continuous validation runs every 30 minutes to check resource configurations and IAM permissions.
The service costs $70 per month per plan supporting up to 100 execution blocks or 25 child plans.
Organizations can balance cost and reliability by configuring standby resource percentages, though actual capacity depends on Regional availability at recovery time, making regular testing essential for confidence in disaster recovery strategies.

36:23 📢 Matt – “I like the note here: ‘to facilitate the best possible outcomes, we recommend you regularly test your recovery plans and maintain appropriate service quotas in your standby region’ because the amount of times I’ve seen people try to do DR testing and then they his a service quota limit is comical at this point.”

38:42 AWS Lambda response streaming now supports 200 MB response payloads – AWS

AWS Lambda response streaming now supports 200 MB response payloads, a 10x increase from the previous 20 MB limit, enabling direct processing of larger datasets without compression or S3 intermediary steps.
This enhancement targets latency-sensitive applications like real-time AI chat interfaces and mobile apps where time to first byte directly impacts user experience and engagement metrics.
The expanded payload capacity opens new use cases, including streaming image-heavy PDFs, music files, and real-time processing of larger datasets directly through Lambda functions.
Response streaming is available on Node.js managed runtimes and custom runtimes across all AWS regions where the feature is supported, with the 200 MB limit now set as default.
This update reduces architectural complexity by eliminating workarounds previously required for payloads exceeding 20 MB, potentially lowering costs associated with S3 storage and data transfer fees.

GCP

40:26 Gemini CLI: Custom slash commands | Google Cloud Blog

Gemini CLI now supports custom slash commands through .toml files and Model Context Protocol (MCP) prompts, allowing developers to create reusable prompts for common workflows like code reviews or planning tasks.
This brings GitHub Copilot-style command functionality to Google’s AI assistant in the terminal.
Commands can be scoped at the user level (available across all projects) or the project level (checked into Git repos), with namespacing support through directory structures.
The implementation uses minimal configuration requirements – just a prompt field – making it accessible for quick adoption.
The MCP integration enables Gemini CLI to automatically expose prompts from configured MCP servers as slash commands, supporting both named and positional arguments. This positions Google to leverage the growing ecosystem of MCP-compatible tools and services.
Key use cases include automating code reviews, generating implementation plans, and standardizing team workflows through shared command libraries. The shell command execution feature (!{…}) allows integration with existing CLI tools and scripts.
While this is a developer productivity tool rather than a cloud service, it strengthens Google’s developer ecosystem play against GitHub Copilot and Amazon Q Developer.
The feature is available now with a simple npm update, requiring only a Gemini API key to get started.

37:18 📢 Matt – “I still like the VS Code plugin, and making it interact more that way. I find that a little bit better from the little bit I’ve played with Claude Code, but recently I’ve been talking to people who say Claude Code has gotten better since the initial release so I have to go back and play with it and see.”

42:40 Agent2Agent protocol (A2A) is getting an upgrade | Google Cloud Blog

Google releases A2A protocol version 0.3 with gRPC support, security card signing, and Python SDK improvements, positioning it as an open standard for multi-agent AI systems that can communicate across different platforms and vendors.
The protocol now has native support in Google’s Agent Development Kit (ADK) and offers three deployment paths: managed Agent Engine, serverless Cloud Run, or full control with GKE, giving developers flexibility in how they scale their agent systems.
Over 150 organizations, including Adobe, ServiceNow, and Twili,o are adopting A2A, with real implementations like Tyson Foods and Gordon Food Service using collaborative agents to share supply chain data and reduce friction in their operations.
Google is launching an AI Agent Marketplace where partners can sell A2A-enabled agents directly to customers, while Agentspace provides a governed environment for users to access these agents with enterprise security controls.
The protocol was contributed to the Linux Foundation in June 2024, making it a vendor-neutral standard that could become the HTTP of agent-to-agent communication, though adoption will depend on whether competitors embrace an open approach.

44:18 📢 Justin – “Agent to Agent is basically how you make MCP to MCP work in the cloud.”

44:38 C4 VMs based on Intel 6th Gen Xeon Granite Rapids now GA | Google Cloud Blog

Google launches C4 VMs on Intel Xeon 6 processors (Granite Rapids) with up to 30% better general compute performance and 60% improvement for ML recommendation workloads compared to the previous generation, making them the first major cloud provider to offer Xeon 6.
New C4 shapes include Titanium Local SSD variants delivering 7.2M max read IOPS (3x higher than comparable offerings from other hyperscalers) and 35% lower access latency, targeting high-performance databases, big data processing, and media rendering workloads.
C4 bare metal instances provide direct CPU/memory access for commercial hypervisors and SAP workloads, achieving 132,600 aSAPs – the highest of any comparable machine – with 35% performance improvement over C3 bare metal.
The expanded C4 series maintains existing CUD discounts and integrations with managed instance groups and GKE custom compute classes, available in 19 zones with shapes ranging from 4 to 288 vCPUs.
Key use cases include AI inference with FP16-trained models using Intel AMX-FP16, financial services requiring microsecond-level latency improvements, and visual effects rendering with reported 50% speedups over n2d instances..

46:24 Announcing Cloud Hub Optimization and Cost Explorer for developers | Google Cloud Blog

Google launches Cloud Hub Optimization and Cost Explorer in public preview, providing application-centric cost visibility across multiple projects without additional charges, addressing the challenge of tracking expenses for applications that span dozens of GCP projects.
The tools integrate Cloud Billing cost data with Cloud Monitoring utilization metrics to surface underutilized resources like GKE clusters with idle GPUs, showing average vCPU utilization at the project level to identify optimization candidates.
Unlike traditional cost dashboards that show aggregate Compute Engine costs, Cost Explorer breaks down spending by specific products, including GKE clusters, Persistent Disks, and Cloud Load Balancing for more granular cost attribution.
Built on AppHub Applications framework, the solution reorganizes cloud resources around applications rather than projects, competing with AWS Cost Explorer and Azure Cost Management by focusing on application-level cost optimization.
MLB’s Principal Cloud Architect reports that the tools help monitor costs across tens of business units and hundreds of developers, with particular value for organizations shifting left on cloud cost management.

47:26 📢 Justin – “And if you’ve ever used the Google Cloud Optimization Hub and Cost Explorer previously, you’d know they’re hot garbage. So this was a very appreciated announcement at Google Next.”

Azure

49:10 Introducing Microsoft Sentinel data lake | Microsoft Community Hub

Microsoft Sentinel data lake enters public preview as a fully managed security data lake built directly into Sentinel, allowing organizations to store all security data in one place with cost-effective long-term retention while eliminating the need to build custom data architectures.
The service integrates with 350+ existing Sentinel connectors including Microsoft 365, Defender, Azure, AWS, and GCP sources, storing data in open formats that support both Kusto queries and Python notebooks through a new Visual Studio Code extension for advanced analytics.
Pricing separates data ingestion/storage from analytics consumption, enabling customers to store high-volume, low-fidelity logs like network traffic cost-effectively in the data lake tier while automatically mirroring critical analytics-tier data to the lake at no extra charge.
Key differentiator from AWS Security Lake is the native integration with Microsoft’s security ecosystem and managed compute environment – security teams can run scheduled analytics jobs and retroactive threat intelligence matching without managing infrastructure.
Target use cases include forensics analysis, compliance reporting, tracking slow attacks over extended timeframes, and running ML-based anomaly detection on historical data, with results easily promoted back to the analytics tier for investigation.

51:40 📢 Matt – “Kusto is their proprietary time series database. So all of Azure metrics. And you can even pay for teh service and leverage it yourself as Azure data explorer.”

38:01 Announcing General Availability of Azure E128 & E192 Sizes in the Esv6 and Edsv6-series VM Families | Microsoft Community Hub

Azure launches E128 and E192 VM sizes with up to 192 vCPUs and 1832 GiB RAM, targeting enterprise workloads like SAP HANA, large SQL databases, and in-memory analytics.
These new sizes use Intel’s 5th Gen Xeon Platinum processors and deliver 30% better performance than the previous Ev5-series.
The VMs feature Azure Boost technology providing 400K IOPS and 12 GB/s storage throughput with 200 Gbps network bandwidth, plus NVMe interface delivering 3X improvement in local storage IOPS. This positions them competitively against AWS’s memory-optimized instances like X2iezn and GCP’s M3 series.
Intel Total Memory Encryption (TME) provides hardware-based memory encryption for enhanced security, addressing enterprise concerns about data protection in multi-tenant environments. The isolated VM option (E128i and E192i) offers dedicated physical hosts for compliance-sensitive workloads.
Currently available in 14 regions including major markets like East US, West Europe, and Japan East, with expansion planned for 2025. Pricing follows standard Azure VM models with both diskful (Edsv6) and diskless (Esv6) options to optimize costs based on storage needs.
These sizes specifically target customers running memory-intensive applications who need to scale beyond traditional VM limits without moving to specialized services. The combination of high memory capacity, enhanced networking, and improved storage performance makes them suitable for consolidating multiple workloads.

56:12 Announcing a flexible, predictable billing model for Azure SRE Agent | Microsoft Community Hub

Azure SRE Agent is a pre-built AI tool for root cause analysis and incident response that uses machine learning to analyze logs and metrics, helping site reliability engineers focus on higher-value tasks while reducing operational costs and improving uptime.
The billing model introduces Azure Agent Units (AAU) as a standardized metric across all Azure agents, with a fixed baseline cost of 4 AAU per hour ($0.40/hour) for continuous monitoring plus 0.25 AAU per second for active incident response tasks.
As part of Microsoft’s Agentic DevOps strategy, SRE Agent represents a shift toward AI-native cloud operations where intelligent agents handle routine tasks automatically, competing with AWS DevOps Guru and Google Cloud’s Operations suite.
The dual-flow architecture keeps the agent always learning from normal behavior patterns while ready to activate AI components instantly when anomalies are detected, providing 24/7 intelligent monitoring without manual intervention.
Target customers include organizations managing complex cloud workloads who want predictable operational costs – the usage-based pricing means you only pay for active incident response time beyond the baseline monitoring fee.

57:25 📢 Matt – “I really want to play with this. I’m a little terrified of what the cost is gong to be.”

59:02 Generally Available: Live Resize for Premium SSD v2 and Ultra NVMe Disks

Azure’s Live Resize feature for Premium SSD v2 and Ultra NVMe disks enables storage capacity expansion without downtime, addressing a common pain point where disk resizing traditionally required VM restarts and application disruption.
Hasn’t Amazon had this forever? 👀
This positions Azure competitively against AWS EBS volume modifications and GCP persistent disk resizing, though Azure’s implementation specifically targets their high-performance disk tiers used for latency-sensitive workloads like databases and analytics.
The feature supports cost optimization by allowing customers to start with smaller disk sizes and scale up only when needed, avoiding overprovisioning costs that can add thousands of dollars monthly for enterprise workloads.
Target use cases include production databases, real-time analytics platforms, and high-transaction applications where both performance consistency and zero-downtime operations are critical requirements.
Implementation requires no code changes and works through standard Azure portal, CLI, or API commands, making it accessible for both manual operations and automated infrastructure-as-code deployments.

1:00:03 📢 Justin – “I’m just mad this didn’t exist until today.”

1:01:20 Generally Available: Agentless multi-disk crash consistent backup for Azure VMs

Azure Backup now supports agentless multi-disk crash consistent backups for VMs in general availability, eliminating the need to install backup agents or extensions on virtual machines while maintaining data consistency across multiple disks.
This feature addresses a common pain point for enterprises running multi-disk applications like databases where crash consistency across all disks is critical for successful recovery, competing directly with AWS’s EBS snapshots and GCP’s persistent disk snapshots.
The agentless approach reduces VM overhead and simplifies backup management by leveraging Azure’s infrastructure-level capabilities rather than guest OS agents, making it particularly valuable for locked-down or legacy systems where agent installation is problematic.
Target use cases include SQL Server, Oracle databases, and other multi-disk applications where maintaining write-order consistency across volumes is essential, with pricing following standard Azure Backup rates based on protected instance size.
This positions Azure Backup closer to feature parity with native hypervisor-level backup solutions while maintaining cloud-native scalability and integration with Azure Recovery Services vault for centralized management.

1:01:56 📢 Justin – “I’ll tell you – if you are running this on SQL Server or Oracle; things like asset compliance are very, very important and you need to test the crap out of this, because my experience has been that if you are not quiescing the data to the disk, it doesnt matter if you snapshotted all the partitions together – you are still going to have a bad time.”

Other Clouds

1:04:18 Introducing Gradient: DigitalOcean’s Unified AI Cloud | DigitalOcean

DigitalOcean is consolidating its AI offerings under a new unified platform called Gradient, combining GPU infrastructure, agent development tools, and pre-built AI applications into a single integrated experience for developers.
The platform includes three main components: Infrastructure (GPU compute for training and inference), Platform (tools for building intelligent agents with upcoming Model Context Protocol support), and Applications (pre-built agents for common use cases).
DigitalOcean is expanding GPU options with AMD Instinct MI325X available this week and NVIDIA H200s coming next month, providing more choice and flexibility for different AI workload requirements.
Existing DigitalOcean AI users won’t need to change anything as all current projects and APIs will continue working, with the rebrand focused on improving organization and documentation.
The platform targets digital native enterprises looking to build AI applications from prototype to production without managing complex infrastructure, competing with larger cloud providers in the AI space.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

AI Agent Microsoft AI Agent