312: Azure Firewall Finally Learns to Spell (FQDN Edition)

Cloud Pod Header
tcp.fm
312: Azure Firewall Finally Learns to Spell (FQDN Edition)
Loading
/

Welcome to episode 312 of The Cloud Pod, where your hosts, Matt, Ryan, and Justin, are here to bring you all the latest in Cloud and AI news. We’ve got security news, updates from PostgreSQL, Azure firewall and BlobNFS, plus TWO Cloud Journey stories for you! 

Thanks for joining us this week in the cloud!  

Titles we almost went with this week:

  • 💔Git Happens: Why Your Database Pipeline Keeps Breaking
  • 💘PostgreSQL and Chill: Azure’s New Storage Options for Database Romance
  • 🧑‍🤝‍🧑NVMe, Myself, and PostgreSQL
  • 🎨Canvas and Effect: AWS Paints a New Picture for E-commerce
  • 🪖Oracle’s $30 Billion Stargate: The AI Infrastructure Wars Begin
  • 🤝Larry’s Last Laugh: Oracle Lands OpenAI’s Mega Deal
  • 🛋️AI Will See You Now (Couch Not Included)
  • 💣Purview and Present Danger: Microsoft’s AI Security SDK Goes Live
  • 🐦The Purview from Up Here: Microsoft’s Bird’s Eye View on AI Data Security
  • 🌉Building Bridges: Azure’s Two-Way Street to Active Directory
  • 📛Domain Names: Not Just for Browsers Anymore
  • 🏃FUSE or Lose: Azure’s BlobNFS Gets a Speed Boost
  • 💌When Larry Met Andy: An Exadata Love Story
  • 🐕‍🦺Bing There, Done That: Azure’s New Research Assistant
  • 🕺The Search is Over: Azure AI Foundry Finds Its Research Groove
  • 🧠Memory Lane: Where AI Agents Go to Remember Things
  • 🐘Elephants Never Forget, and Now Neither Do Google’s Agents
  • 🧰Z3 or Not Z3: That is the Storage Question
  • 🧑‍🚒Local SSD Hero: A New Hope for I/O Intensive Workloads
  • 📜Azure’s Certificate of Insecurity
  • 🔑KeyVault’s Keys Left Under the Doormat
  • 📨When Your Cloud Provider Accidentally CCs the Hackers

AI Is Going Great – Or How ML Makes Money 

03:09 RYAN DOES A THING FOR SECURING AI WORKLOADS

  • Ryan was recently invited to Google’s Headquarters in San Francisco as part of a small group of security professionals where they spent time hands-on with Google security offerings, learning how to secure AI workloads. 
  • AI – and how to secure it – is a hot topic right now, and being able to spend time working with the Google development team was really insightful, with how they work with various levels of protections in place in dummy applications. 
  • Ryan was especially interested in the back-end logic that was executed in the applications. 

05:32  📢 Ryan – “I was impressed because there’s how we’re thinking about AI is still evolving, and how we’re protecting it’s gonna be changing rapidly, and having real-world examples really helped really flesh out how their AI services are, how they’re integrated into a security ecosystem. It was pretty impressive. And it’s something that’s near and dear. I’ve been working and trying to roll out Google agent spaces and different AI workloads and trying to get involved and make sure that we, just getting visibility into all the different ones. And that was, it was really helpful to sort of think about it in those contexts.”

10:13 OpenAI secures $30bn cloud deal with Oracle

  • OpenAI signed a $30 billion annual cloud computing agreement with Oracle for 4.5GW of capacity, making it one of the largest AI cloud deals to date, and nearly triple Oracle’s current $10.3 billion annual data center infrastructure revenue.
  • The deal represents a major expansion of the Stargate data center initiative, a $500 billion joint venture between OpenAI, SoftBank, Oracle, and Abu Dhabi’s MGX fund aimed at building AI infrastructure across multiple US states, including Texas, Michigan, and Ohio.
  • Oracle plans to purchase 400,000 Nvidia GB200 chips for approximately $40 billion to power the Abilene, Texas facility, positioning itself to compete directly with AWS and Microsoft in the AI cloud infrastructure market.
  • The 4.5GW capacity represents about 25% of the current US operational data center capacity, highlighting the substantial infrastructure requirements for training and running advanced AI models at scale.
  • This partnership signals a shift in the cloud landscape, where traditional database companies like Oracle are becoming critical infrastructure providers for AI workloads, potentially disrupting the current cloud provider hierarchy.

04:09 Google announces new AI tools for mental health research and treatment

  • Google is developing AI tools specifically for mental health research and treatment, though the article appears to be a survey page rather than containing actual content about the tools themselves.
  • Without the article content, we can note that AI applications in mental health typically involve natural language processing for therapy chatbots, pattern recognition for symptom tracking, and predictive analytics for treatment outcomes.
  • Cloud infrastructure would be essential for these tools to handle sensitive health data processing, ensure HIPAA compliance, and scale to support healthcare providers and researchers.
  • Mental health AI tools often integrate with existing cloud-based electronic health record systems and require robust security measures for patient data protection.
  • The development signals Google’s continued expansion into healthcare AI applications, following their work in medical imaging and clinical decision support systems.
  • We’re not really sure how we feel about sharing our deepest, darkest secrets. The machines won’t use any of that against us, right?
  • Interested in the article Ryan talked about? https://www.washingtonpost.com/technology/2025/05/31/ai-chatbots-user-influence-attention-chatgpt/

AWS

20:06 Amazon Nova Canvas update: Virtual try-on and style options now available | AWS News Blog

  • Amazon Nova Canvas adds virtual try-on capability, allowing users to combine two images – like placing clothing on a person or furniture in a room – using AI-powered image generation with three masking modes (garment, prompt, or custom image masks).
  • Eight new pre-trained style options simplify consistent image generation across different artistic styles, including 3D animated family film, photorealism, graphic novel, and midcentury retro, eliminating complex prompt engineering.
  • The feature targets e-commerce retailers who can integrate virtual try-on to help customers visualize products before purchase, potentially reducing returns and improving conversion rates.
  • Available immediately in US East (N. Virginia), Asia Pacific (Tokyo), and Europe (Ireland) regions with standard Amazon Bedrock pricing, requiring images under 4.1M pixels (2048×2048 max).
  • Integration requires minimal code changes using the existing Bedrock Runtime invoke API with new taskType parameters, making it accessible for developers already using Nova Canvas without model migration.

21:09 📢 Matt – “Amazon is going to have a field day with this.” 

22:20 Introducing Oracle Database@AWS for simplified Oracle Exadata migrations to the AWS Cloud

  • Oracle Database@AWS enables direct migration of Oracle Exadata and RAC workloads to AWS with minimal changes, providing a third option beyond self-managed EC2 or RDS for Oracle. 
  • This addresses a significant gap for enterprises locked into Oracle’s high-end database features.
  • The service runs Oracle infrastructure within AWS data centers, integrating with native AWS services like VPC, IAM, CloudWatch, and S3 for backups while maintaining Oracle’s management plane. 
  • Customers get unified billing through AWS Marketplace that counts toward AWS commitments.
  • Zero-ETL integration with Amazon Redshift eliminates cross-network data transfer costs for analytics workloads, while S3 backup support provides eleven nines durability. 
  • The service supports both traditional Exadata VM clusters and fully managed Autonomous Database options.
  • Currently available in US East and US West regions, with expansion planned to 20 AWS regions globally. 
  • Pricing is set by Oracle through AWS Marketplace private offers (So prepare to spend all your $$$) and requires coordination between AWS and Oracle sales teams for activation.
  • VM cluster creation takes up to 6 hours and requires navigating between AWS and OCI consoles for full database management. Oof. 
  • The service maintains compliance with major standards including SOC, HIPAA, and PCI DSS.

23:37  📢 Ryan – “…there’s a ton of advantages when you think about the integration like the zero ATL with Redshift is a pretty, pretty prominent example. If you’re in the Amazon ecosystem and you’re utilizing those services, like this is going to be great. Somehow, you’re limited to the Oracle database products; it’s such a hard place to be between those two things. And so I like this for the customers this will fit, but it does seem a little clunky.”

GCP

25:54 Google Cloud Managed Lustre for AI HPC | Google Cloud Blog

  • Google Cloud Managed Lustre is now GA with four performance tiers ranging from 125 MB/s to 1000 MB/s per TiB, scaling up to 8 PB of storage capacity, powered by DDN’s EXAScaler technology for high-performance parallel file system needs in AI/ML workloads.
  • The service addresses critical AI infrastructure bottlenecks by providing POSIX-compliant storage with sub-millisecond read latency, enabling efficient GPU/TPU utilization for model training, checkpointing, and high-throughput inference tasks that require rapid access to petabyte-scale datasets.
  • Pricing starts at $0.14 per TiB-hour for the 125 MB/s tier up to $0.70 per TiB-hour for the 1000 MB/s tier, positioning it competitively against AWS FSx for Lustre while offering native integration with GKE and TPUs across multiple Google Cloud regions.
  • The partnership with DDN brings enterprise-grade Lustre expertise to Google Cloud’s managed services portfolio, filling a gap for customers who need proven HPC storage solutions without the operational overhead of self-managing Lustre clusters. (Say that 6 times fast.) 
  • Key use cases extend beyond AI to traditional HPC workloads like genomic sequencing and climate modeling, with NVIDIA endorsing it as part of their AI platform on Google Cloud for organizations requiring high-performance storage at scale.

27:13  📢 Matt – “I’m still am always impressed by how cheap storage is on these services.” 

29:49 Vertex AI Memory Bank in public preview | Google Cloud Blog

  • Vertex AI Memory Bank enables agents to maintain persistent memory across conversations, storing user preferences and context beyond single sessions, addressing the common limitation where agents treat every interaction as new and ask repetitive questions.
  • The service uses Gemini models to automatically extract, consolidate, and update memories from conversation history, handling contradictions intelligently while providing a similarity search for relevant context retrieval, based on Google Research’s ACL 2025 accepted method for topic-based agent memory.
  • Memory Bank integrates with Agent Development Kit (ADK) and Agent Engine Sessions, with support for third-party frameworks like LangGraph and CrewAI – developers can start with a Gmail account and API key through express mode registration before upgrading to full GCP projects.
  • This positions Google competitively against AWS Bedrock’s conversation memory and Azure’s similar offerings, though Google’s implementation emphasizes automatic memory extraction and intelligent consolidation rather than simple conversation storage.
  • Key use cases include personalized retail assistants, customer service agents that remember past issues, and any application requiring multi-session context, with the service available in public preview at standard Vertex AI pricing tiers.

31:35 Expanded Z3 VM portfolio for I/O intensive workloads | Google Cloud Blog

  • Do you love burning a lot of money? Have we got news for you! Google is expanding its Z3 storage-optimized VM family with 9 new instances offering 3-18 TiB local SSD capacity, plus a bare metal option with 72 TiB, targeting I/O-intensive workloads like databases and analytics. 
  • The new Titanium SSDs deliver up to 36 GiB/s read throughput and 9M IOPS, with 35% lower latency than previous generation local SSDs.
  • Z3 introduces two VM types: standard SSD (200 GiB SSD per vCPU) for OLAP and SQL databases, and high SSD (400 GiB SSD per vCPU) for distributed databases and streaming. 
  • The bare metal instance provides direct CPU access for specialized workloads requiring custom hypervisors or specific licensing needs.
  • Enhanced maintenance features include advanced notice for planned maintenance, live migration support for VMs with 18 TiB or less local SSD, and in-place upgrades that preserve data for larger instances. 
  • This addresses a common pain point for stateful workloads requiring local storage.
  • Z3 integrates with Google’s Hyperdisk for network-attached storage, supporting up to 350K IOPS per VM and 500K IOPS for bare metal instances. AlloyDB will leverage Z3 as its foundation, using local SSDs as cache to hold datasets 25x larger than memory with near-memory performance.
  • Early adopters report significant performance gains: OP Labs saw 30-50% reduction in p99 latencies for blockchain nodes, Tenderly achieved 40% read latency improvement, and Shopify selected Z3 as their platform for performance-sensitive storage systems.

34:06 📢 Ryan – “They’ve put in so much development in Google Hyperdisk and making that a service, but everything that’s over a network is going to have a higher latency than a local SSD, and so it’s kind of funny to see these ginormous boxes.” 

Azure

35:33 Running high-performance PostgreSQL on Azure Kubernetes Service | Microsoft Azure Blog

  • Azure now offers two PostgreSQL deployment options on AKS, including Azure Container Storage with local NVMe for performance-critical workloads, achieving up to 26,000 TPS with sub-millisecond latency, and Premium SSD v2 for cost-optimized deployments with flexible IOPS/throughput scaling up to 80,000 IOPS per volume.
  • The CloudNativePG operator integration provides automated failover, built-in replication, and native Azure Blob Storage backup capabilities, addressing the complexity of running stateful workloads on Kubernetes that has historically pushed enterprises toward managed database services.
  • Benchmark results show local NVMe delivers 14,812 TPS at 4.3ms latency on Standard_L16s_v3 VMs, while Premium SSD v2 achieves 8,600 TPS at 7.4ms latency on Standard_D16ds_v5, with the NVMe option costing approximately $1,382/month versus $348/month for Premium SSD v2.
  • This positions AKS competitively against AWS EKS and GCP GKE for database workloads, particularly as PostgreSQL now powers 36% of all Kubernetes database deployments according to the 2025 Kubernetes in the Wild report, up 6 points since 2022.
  • Target customers include organizations running payment systems, gaming backends, multi-tenant SaaS platforms, and real-time analytics that need either maximum performance or flexible scaling, with Azure Container Storage also supporting Redis, MongoDB, and Kafka workloads beyond PostgreSQL.

34:06 📢 Ryan – “I bristle at all the numbers because they’re comparing it to managed services, and it’s a cost. But you’re also not counting the cost of the three people minimum that it’s going to take to support your Kubernetes cluster… there’s just a lot of advantages that you’re giving up in order ot run it locally and to have direct access to that layer.” 

43:17 Announcing General Availability of Microsoft Purview SDK and APIs | Microsoft Community Hub

  • Microsoft Purview SDK and APIs are now generally available, enabling developers to embed enterprise-grade data security and compliance controls directly into custom GenAI applications and agents, addressing critical concerns around data leakage, unauthorized access, and regulatory compliance.
  • The SDK provides three key security capabilities: preventing data oversharing by inheriting labels from source data, protecting against data leaks with built-in safeguards, and governing AI runtime data through auditing, data lifecycle management, eDiscovery, and communication compliance.
  • This positions Microsoft competitively against AWS and GCP by offering native integration with Microsoft 365 Copilot-level security features, allowing developers to focus on core product development while Purview handles the complex compliance and governance requirements enterprises demand.
  • Target customers include ISVs and enterprises building custom AI applications that need to meet strict data governance requirements, particularly in regulated industries where data security and compliance are non-negotiable for adoption.
  • The SDK works across any platform and AI model, not just Azure, making it a flexible solution for multi-cloud environments while leveraging Microsoft’s existing Purview data governance infrastructure that many enterprises already use.

44:48 📢 Matt – “They’re definitely pushing Purview and a lot of the features of it recently – or maybe it’s just people I’ve been talking to – but it’s something that’s been coming up more and more. I think if they’re just doing a push to make it a larger service to be used, not just in the corporate IT space, but in the software dev… You can build in these controls that will help along the way.”

48:35 Generally Available: Two-Way Forest Trusts for Microsoft Entra Domain Services

  • Do you love old features repackaged into new features? Us too. 
  • Two-way forest trusts between Microsoft Entra Domain Services and on-premises Active Directory enable bidirectional authentication and resource access, addressing a key limitation where only one-way trusts were previously supported.
  • This feature allows organizations to maintain their existing on-premises AD infrastructure while extending authentication capabilities to cloud resources, reducing the need for complex identity federation or migration projects.
  • The general availability release positions Azure more competitively against AWS Managed Microsoft AD, which has supported two-way trusts since launch, closing a notable feature gap in Azure’s managed directory services.
  • Primary use cases include hybrid cloud deployments where applications in Azure need to authenticate users from on-premises domains and vice versa, particularly beneficial for enterprises with regulatory requirements to maintain on-premises identity systems.
  • Organizations should evaluate the additional network connectivity requirements and potential latency impacts when implementing forest trusts across hybrid environments, as authentication traffic will traverse between cloud and on-premises infrastructure.

49:47 📢 Justin – “Thank goodness this is finally here. This is actually a pain point I’m familiar with from the day job. The ability to connect your Entra ID to your local authorization domain is a big problem, and so not having this ability actually caused a lot of weird edge cases and extra hoops that now Ryan won’t have to solve.” 

54:44 Generally Available: FQDN Filtering in DNAT rules in Azure Firewall

  • Azure Firewall now supports FQDN filtering in DNAT rules, allowing administrators to route inbound traffic to backend resources using domain names instead of static IP addresses, which simplifies management when backend IPs change frequently.
  • This feature addresses a common pain point where organizations had to manually update firewall rules whenever backend server IPs changed, particularly useful for scenarios with dynamic infrastructure or when using services with rotating IP addresses.
  • The implementation brings Azure Firewall closer to feature parity with AWS Network Firewall and Google Cloud Armor, both of which have supported domain-based filtering for inbound traffic rules for some time.
  • Target use cases include load balancing to backend pools with changing IPs, routing to containerized applications, and managing multi-region deployments where IP addresses may vary across availability zones.
  • Organizations should note that FQDN resolution adds a slight processing overhead and DNS lookup time to DNAT operations, though Microsoft hasn’t published specific latency metrics for this generally available feature.

56:49 📢 Ryan – “The fact that routing traffic by IP Address on the backend wasn’t possible until now is crazy to me.” 

58:14 Accelerating BlobNFS throughput & scale with FUSE for superior performance

  • Azure’s updated AZNFS 3.0 introduces FUSE-based performance enhancements to BlobNFS, delivering up to 5 times faster single-file reads and 3 times faster writes compared to native Linux NFS clients. This addresses performance bottlenecks for HPC, AI/ML, and backup workloads that require high-throughput access to blob storage via NFS protocol.
  • The update increases TCP connection support from 16 to 256, enabling workloads to saturate VM network bandwidth with just 4 parallel operations. 
  • This brings Azure’s NFS blob access performance closer to that of AWS EFS and GCP Filestore capabilities for demanding enterprise workloads.
  • Key technical improvements include support for files up to 5TB (previously limited to 3TB), removal of the 16-group user limitation, and enhanced metadata operations with 3MB directory queries. These changes particularly benefit EDA and CAD workloads that process large simulation files and extensive file metadata.
  • While BlobFuse offers Azure Entra ID authentication and public endpoint access, BlobNFS still requires virtual network connectivity and lacks native Azure AD integration. Organizations must weigh protocol requirements against security needs when choosing between the two mounting options.
  • The preview requires registration and targets customers running Linux-based HPC clusters, AI training pipelines, and legacy applications requiring POSIX compliance. Installation involves the AZNFS mount helper package available on GitHub, with no additional Azure costs beyond standard blob storage pricing.

1:00:42 Introducing Deep Research in Azure AI Foundry Agent Service | Microsoft Azure Blog

  • Azure AI Foundry introduces Deep Research as an API/SDK service that automates web-scale research using OpenAI’s o3-deep-research model, enabling developers to build agents that can analyze and synthesize information from across the web with full source citations and audit trails.
  • The service integrates with Azure’s enterprise ecosystem through Logic Apps, Azure Functions, and other Foundry Agent Service connectors, allowing research to be embedded as a reusable component in multi-step workflows rather than just a standalone chat interface.
  • Pricing starts at $10 per 1M input tokens and $40 per 1M output tokens for the o3-deep-research model, with additional charges for Bing Search grounding and GPT models used for query clarification, positioning this as a premium enterprise offering. Because everyone is using Bing search for their ground needs, right? 
  • The architecture provides transparency through documented reasoning paths and source citations, addressing enterprise governance requirements for regulated industries where AI decision-making needs to be fully auditable.

1:01:39 📢 Ryan – “It is truly evil to do a four times cost increase for the output that you’re not in control of.” 

1:03:00 Azure MCP Exploited Maliciously Leaking User S Keyvault Secrets To Attackers

  • Researchers discovered a critical vulnerability in Azure’s Managed Certificate Provider (MCP) that allowed attackers to extract KeyVault secrets by exploiting certificate validation flaws in the authentication process.
  • The vulnerability stemmed from MCP’s improper handling of certificate chains, enabling malicious actors to forge certificates that appeared legitimate to Azure’s authentication system and gain unauthorized access to sensitive KeyVault data.
  • Microsoft has since patched the vulnerability, but the incident highlights ongoing security challenges in cloud certificate management systems and the need for robust certificate validation mechanisms across all cloud providers.
  • Organizations using Azure KeyVault should audit their access logs and rotate any potentially exposed secrets, as the vulnerability could have been exploited without leaving obvious traces in standard monitoring systems.
  • This discovery follows a pattern of certificate-related vulnerabilities across major cloud platforms, emphasizing that even mature cloud services require continuous security scrutiny and that customers should implement defense-in-depth strategies rather than relying solely on platform security.
  • Nice job Azure. Ryan is extra impressed. 

1:05:21 📢 Justin – “I have to say that the more I’ve learned about MCPs, the more I’ve played with them, the more that I have created them and seeing what gets created, MCPs scare me. In production, in areas where data is sensitive and I need to be concerned about it, I don’t know that I would trust an AI generated MCP not to have this problem.”

Cloud Journey 

1:11:07 Database DevOps: Fix Git Before It Breaks Production

  • Database deployments often fail due to poor Git branching strategies, particularly the common practice of maintaining separate branches for each environment (dev, qa, prod) which leads to merge conflicts, configuration drift, and manual patching becoming routine problems.
  • Trunk-based development with context-driven deployments offers a more scalable solution by storing all database changelogs in a single branch and using Liquibase contexts or metadata to control where changes are applied, eliminating duplication and conflicts.
  • Database changes require different handling than stateless applications because they involve persistent state, sequential dependencies, and irreversible operations, making proper version control and GitOps practices essential for safe deployments.
  • Harness Database DevOps currently supports Liquibase for change management and enables referencing changelogs for any supported database from a single CI/CD pipeline, with plans to add Flyway support in the future.
  • Automation capabilities including drift detection, automated rollbacks, and compliance checks are critical for production-grade database DevOps, ensuring consistency and traceability while reducing manual overhead and risk.

1:03:00 TDD: The Missing Protocol for Effective AI Assisted Software Development | 8th Light

  • This article from 8th Light makes a compelling case that Test-Driven Development, or TDD, is the missing piece for making AI coding assistants actually useful in real-world development. The core insight is that we’ve been treating LLMs like they’re human developers who understand context and intent, when really they need structured, explicit instructions – and TDD provides exactly that framework by forcing us to break down problems into small, testable pieces.
  • The timing of this is particularly relevant for cloud developers because we’re seeing tools like GitHub Copilot, Amazon CodeWhisperer, and Google’s Duet AI becoming deeply integrated into cloud development workflows. 
  • But without a proper protocol for communicating with these tools, developers are getting frustrated when the AI generates code that looks good but doesn’t actually work or meet their requirements.
  • What’s clever about using TDD as a communication protocol is that it solves multiple problems at once – you’re not just getting better AI-generated code, you’re also ensuring your code has proper test coverage, which is critical for cloud applications where reliability and scalability matter. 
  • The article shows how writing test descriptions first gives the AI clear boundaries and expectations, similar to how you’d define infrastructure requirements before deploying to the cloud.
  • The practical workflow they outline is really straightforward – you write descriptive test cases covering your requirements, implement one seed test to establish patterns, then let the AI generate the remaining tests and implementation code. This approach would work particularly well for cloud microservices where you need consistent patterns across multiple services and APIs.
  • For businesses adopting AI coding assistants, this could be a game-changer in terms of productivity and code quality. 
  • Instead of developers spending hours debugging AI-generated code that missed critical edge cases, they’re using AI to handle the repetitive implementation work while maintaining high standards through automated testing.

Closing

And that is the week in the cloud! Visit our website, the home of the Cloud Pod, where you can join our newsletter, Slack team, send feedback, or ask questions at theCloudPod.net or tweet at us with the hashtag #theCloudPod

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.