
Welcome to episode 308 of The Cloud Pod – where the forecast is always cloudy! Justin, Matt and Ryan are in the house today to tell us all about the latest and greatest from FinOps and SnowFlake conferences, plus updates from Security Command Center, OpenAI, and even a new AWS Region. All this and more, today in the cloud!
Titles we almost went with this week:
- ❄️I Left My Wallet at FinOps X, But Found Savings at Snowflake Summit
- 🐚Snowflake City Lights, FinOps by the Sea
- ⛰️The Two Summits: A Tale of FinOps and Snowflakes
- 🌨️Crunchy on the Outside, Snowflake on the Inside
- 🏙️AWS Taipei: Because Sometimes You Need Your Data Closer Than Your Night Market
- 🏴AWS Plants Its Flag in Taipei: The 37th Time’s the Charm
- 🪚AWS Slashes GPU Prices Faster Than a CUDA Kernel
- ✍️Two Writers Walk Into a Database… And Both Succeed
- 🪟AWS Network Firewall: Now With Windows!
- 🤐The VPN Connection That Keeps Its Secrets
- 💬Transform and Roll Out: Pub/Sub’s New Single Message Feature
- 📠SAP Happens: Google’s New M4 VMs Handle It Better
- 📝Total Recall: Google’s 6TB Memory Machines
- 🧠The M4trix Has You (And Your In-Memory Databases)
- ☁️DeepSeek and You Shall Find… on Google Cloud
- 🎩Four Score and Seven Vulnerabilities Ago – mk
- 🥏The Fantastic Four Security Features
- 🎛️MCP: Model Context Protocol or Master Control Program from Tron?
- 🛞No SQL? No Problem! AI Takes the Wheel
- 🧼Injection Rejection: How Azure Keeps Your Prompts Clean
General News
05:09 FinOps X 2025 Cloud Announcements: AI Agents and Increased FOCUS™ Support
- All major cloud providers announced expanded support for FOCUS (FinOps Open Cost and Usage Specification) 1.0, with AWS already in general availability and Google Cloud launching a BigQuery export in private preview.
- This signals an industry-wide standardization of cloud cost reporting formats.
- AWS introduced AI-powered cost optimization through Amazon Q Developer integration with Cost Optimization Hub, enabling automated recommendations across millions of resources with detailed explanations and action plans for cost reduction.
- Microsoft Azure launched AI agents for application modernization that can reduce migration efforts from months to hours by automating code assessment and remediation across thousands of files, while also introducing flexible PTU reservations that work across multiple AI models.
- Google Cloud unveiled FinOps Hub 2.0 with Gemini-powered waste detection that identifies underutilized resources (like VMs at 5% usage) and provides AI-generated optimization recommendations for Kubernetes, Cloud Run, and Cloud SQL services.
- Oracle Cloud Infrastructure added carbon emissions reporting with hourly power-based calculations and GHGP compliance, plus new cost anomaly detection and rules-based cost allocation features for improved financial governance.
06:11 📢 Justin – “I mean, if I’m modernizing my application, typically it’s off .NET and Azure, but ok…”
07:20 Broadcom reboots CloudHealth with enhancements to broaden FinOps use – SiliconANGLE
- Broadcom has redesigned CloudHealth with AI-powered features including Intelligent Assist for natural language queries and Smart Summary for explaining billing changes, marking the platform’s most significant update since its 2012 launch.
- The update addresses a key FinOps challenge by making cloud cost data accessible to non-technical teams through plain English interfaces, instead of requiring SQL knowledge, as 44% of FinOps teams were created within the past year according to the FinOps Foundation.
- CloudHealth processes 10 petabytes of cost and usage data daily across 22,000 customers, with the new AI features tested for over six months to ensure accuracy in recommendations for users managing millions in cloud spending.
- Smart Summary analyzes billing data to explain cost changes down to unit price level in plain English, condensing billions of lines of cost data into a few hundred actionable lines.
- The redesign aims to shift cost optimization visibility earlier in the application lifecycle by extending access beyond centralized FinOps teams to engineering and other departments involved in cloud infrastructure decisions.
08:42 📢 Justin – “I’m glad to see CloudHealth getting some love. I thought it was just going to die inside of the Broadcom behemoth.”
AI Is Going Great – Or How ML Makes Money
SnowFlake Summit
12:57 Democratizing Enterprise AI: Snowflake’s New AI Capabilities Accelerate Data-Driven Innovation
- Snowflake introduces Snowflake Intelligence and Cortex Agents to enable natural language querying of structured and unstructured data, allowing business users to ask questions in plain English and receive governed answers without SQL knowledge or dashboards
- Cortex AISQL brings AI capabilities directly into SQL syntax, enabling analysts to extract metadata, classify sentiment, and process documents, images and other formats with 30-70% performance improvements over traditional pipelines.
- The platform now includes AI Observability tools for monitoring generative AI applications, access to models from OpenAI, Anthropic, Meta and Mistral within Snowflake’s security perimeter, and provisioned throughput for dedicated inference capacity.
- New ML capabilities include a Data Science Agent that uses Anthropic models to automatically generate ML pipelines from natural language prompts, distributed training APIs, and support for serving models from Hugging Face with one-click deployment.
- All AI and ML features operate within Snowflake’s unified governance framework with role-based access control, usage tracking, and budget enforcement, eliminating the need for separate infrastructure management.
14:00 Experience AI-Powered Analytics and Migrations at Warp Speed with Snowflake’s Latest Innovations
- Snowflake‘s SnowConvert AI now supports automated migrations from Greenplum, Netezza, Postgres, BigQuery, Sybase, and Microsoft Synapse, with AI-powered code verification and data validation to reduce migration complexity and timelines.
- Cortex AISQL enables SQL-based analysis of both structured and unstructured data (text, images, audio) in a single query, allowing data analysts to perform AI analytics without specialized expertise or external integrations.
- Standard Warehouse Generation 2 delivers 2.1x faster performance for core analytics workloads and 4.4x faster Delete, Update, and Merge operations, while new Adaptive Compute automatically selects optimal cluster sizes and routing without manual configuration.
- Iceberg performance improvements include 2.4x faster analytics on externally managed tables through search optimization, query acceleration, automatic compaction, and enhanced pruning capabilities for selective queries.
- Semantic Views provide a unified business metrics layer accessible through Cortex Analyst, Snowflake Intelligence, BI tools, or direct SQL queries, ensuring consistent results across different interfaces and partner integrations.
15:52 📢 Ryan – “…we’ve moved into running infrastructure not being sort of the first principle of a lot of businesses, and now it seems like sort of hosting data and databases and large data warehouses is sort of going that route too, which I think makes sense.”
An Even Easier-to-Use and More Trusted Platform from Snowflake
- Snowflake introduces Adaptive Compute in private preview, which automatically selects cluster sizes, number of clusters, and auto-suspend durations without user configuration. This service delivers 2.1x faster performance through Gen2 warehouses and optimizes costs by intelligently routing queries to right-sized clusters across a shared compute pool.
- The platform adds comprehensive FinOps capabilities including cost-based anomaly detection, tag-based budgets, and joins the FinOps Foundation as a Premier Enterprise Member.
- These tools help organizations track spending spikes, set resource limits by tags, and align with industry best practices for cloud cost management.
- Horizon Catalog now federates across Apache Iceberg REST catalogs through Catalog-linked Databases, enabling unified governance across external data sources.
- The addition of AI-powered Copilot for Horizon Catalog allows natural language queries for governance and metadata discovery tasks.
- New security features include anomaly detection using AI models, leaked password protection that disables compromised credentials found on the dark web, and bad IP blocking. Workload Identity Federation removes the need for long-lived credentials while passkey support adds modern authentication methods.
- Snowflake announces PostgreSQL support through Snowflake Postgres (in development) and expands Unistore to Azure with Hybrid Tables.
- This allows organizations to run transactional and analytical workloads on the same platform with unified governance and security.
Introducing Even Easier-to-Use Snowflake Adaptive Compute with Better Price/Performance
- Snowflake’s Adaptive Compute automatically selects cluster sizes, number of clusters, and auto-suspend/resume settings, eliminating manual infrastructure decisions while maintaining familiar billing models and FinOps tools.
- Standard Warehouse Generation 2 delivers 2.1x faster performance for core analytics workloads compared to the previous generation, with upgraded hardware and performance enhancements now generally available.
- Converting existing warehouses to Adaptive Warehouses requires only a simple alter command with no downtime, preserving warehouse names, policies, and permissions to minimize disruption to production workloads.
- All Adaptive Warehouses in an account share a common resource pool, optimizing efficiency through intelligent query routing to right-sized clusters without user intervention.
- Pfizer reports successful consolidation of multiple warehouses across different workloads during private preview, highlighting reduced management overhead while maintaining budget controls.
Snowflake Intelligence: Talk to Your Data, Unlock Real Business Insights
- Snowflake Intelligence introduces a natural language interface at ai.snowflake.com that allows business users to query both structured and unstructured data through conversational AI, eliminating the need for SQL knowledge or waiting for data team support.
- The platform’s Deep Research Agent for Analytics goes beyond simple data retrieval to analyze complex business questions and uncover the “why” behind trends, while maintaining Snowflake’s existing security and governance controls automatically.
- Integration with third-party applications like Salesforce, Zendesk, and Slack provides a unified view across business systems, and Cortex Knowledge Extensions add external data sources like Stack Overflow and The Associated Press for enriched insights.
- The service enables direct action from insights, allowing users to trigger workflows, send notifications, or update records in other systems directly from the conversational interface, reducing the time from insight to action.
- Early adopter WHOOP reports their analytics teams can now focus on strategic work rather than manual data retrieval tasks, demonstrating the potential for organizations to democratize data access while maintaining enterprise security standards.
Cortex AISQL: Reimagining SQL into AI Query Language for Multimodal Data
- Snowflake Cortex AISQL brings AI capabilities directly into SQL, allowing analysts to process text, images, and audio data using familiar SQL commands like AI_FILTER, AI_AGG, and AI_CLASSIFY without needing separate AI tools or specialized skills.
- The new FILE data type enables direct referencing of multimodal data within Snowflake tables, eliminating the need for separate processing systems and allowing complex queries that combine structured and unstructured data analysis in a single workflow.
- Performance optimizations deliver up to 70% query runtime reduction for operations like FILTER and JOIN compared to manual implementations, achieved by running AI functions inside Snowflake’s core query engine with intelligent model selection.
- Real-world applications include financial services automating corporate action processing from news feeds, retailers detecting product quality issues from customer reviews, and healthcare researchers correlating clinical notes with patient records for new treatment insights.
- The public preview makes AI-powered data analysis accessible to SQL analysts without requiring data science expertise, transforming weeks of custom development into straightforward SQL queries that can be modified in minutes.
17:45 Delivering the Most Enterprise-Ready Postgres, Built for Snowflake
- Snowflake is acquiring Crunchy Data to create Snowflake Postgres, bringing enterprise-grade security, compliance, and operational standards to PostgreSQL within the Snowflake platform.
- This addresses the gap between developer preference for Postgres and enterprise requirements for production workloads.
- The acquisition targets organizations that need advanced security features like customer-managed encryption keys and compliance certifications for regulated industries. Crunchy Data brings proven expertise in enterprise Postgres deployments across cloud, Kubernetes, and on-premise environments.
- Snowflake Postgres will enable developers to run existing Postgres applications on Snowflake without code rewrites while gaining access to built-in connection pooling, performance metrics, and logging support. This consolidates transactional and analytical workloads in a single platform.
- The offering compliments Snowflake’s existing Unistore solution by providing native Postgres compatibility for transactional applications. Early customers like Blue Yonder and Landing AI see opportunities to simplify their application stacks and accelerate AI development.
- This move positions Snowflake to capture more enterprise workloads by eliminating the need for separate database management while maintaining full Postgres compatibility. The acquisition is expected to close imminently pending standard closing conditions.
19:24 📢 Ryan – “If the data set is presented as a single data source that I can run analytical and transactional workloads against, that would be amazing value to develop on and to simplify the application architecture. So that would be super cool.”
20:33 Exclusive: OpenAI taps Google in unprecedented cloud deal despite AI rivalry, sources say | Reuters
- OpenAI is adding Google Cloud’s infrastructure to its compute resources despite being direct competitors in AI, marking a shift from its exclusive reliance on Microsoft Azure for data center infrastructure since January 2025.
- The deal centers on Google’s tensor processing units (TPUs) which were historically reserved for internal use but are now being offered to external customers including Apple, Anthropic, and Safe Superintelligence.
- OpenAI’s compute demands are driven by both training large language models and running inference at scale, with the company reporting $10 billion in annualized revenue as of June 2025.
- This partnership adds to OpenAI’s infrastructure diversification strategy including the $500 billion Stargate project with SoftBank and Oracle, plus billions in compute contracts with CoreWeave.
- For cloud providers, the deal demonstrates how AI workloads are reshaping competitive dynamics – Google Cloud generated $43 billion in 2024 revenue and positions itself as a neutral compute provider despite competing directly with customers through DeepMind.
21:55 📢 Matt – “It also is probably the first true multi-cloud workload that there is out there that they can train across multiple clouds. And if they do it right, they can, in theory, actually leverage spot markets and things like that, which will be interesting to see how they destroy spot markets real fast when they start training everything.”
24:11 Magistral | Mistral AI
- Mistral AI released Magistral, their first reasoning model available in two versions: Magistral Small (24B parameters, open source under Apache 2.0) and Magistral Medium (enterprise version), with the Medium version scoring 73.6% on AIME2024 benchmarks and 90% with majority voting.
- The model introduces transparent, traceable reasoning chains that work natively across multiple languages including English, French, Spanish, German, Italian, Arabic, Russian, and Simplified Chinese, making it suitable for global enterprise deployments requiring auditable AI decisions.
- Magistral Medium achieves 10x faster token throughput than competitors through Flash Answers in Le Chat, enabling real-time reasoning for cloud-based applications in regulated industries, software development, and data engineering workflows.
- Enterprise availability includes deployment options on Amazon SageMaker with upcoming support for IBM WatsonX, Azure AI, and Google Cloud Marketplace, positioning it as a multi-cloud solution for businesses needing domain-specific reasoning capabilities.
- The open-source Magistral Small enables developers to build custom reasoning applications, with the community already creating specialized models like ether0 for chemistry and DeepHermes 3, expanding the ecosystem of thinking language models.
25:19 📢 Matt – “The multiple languages Day 1, and the quantity of languages has always impressed me. It’s not like all Latin based languages; but getting Russian and Chinese in there Day 1. They’re different alphabets and completely different speech patterns…and having all of them at once impressed me.”
AWS
26:52 Now open – AWS Asia Pacific (Taipei) Region | AWS News Blog
- AWS launches its 37th global region in Taipei (ap-east-2) with three availability zones, marking the 15th region in Asia Pacific and bringing the total to 117 availability zones worldwide.
- This addresses data residency requirements for Taiwan’s regulated industries including finance and healthcare.
- The region builds on AWS’s decade-long presence in Taiwan which includes two CloudFront edge locations, three Direct Connect locations, AWS Outposts support, and a Local Zone in Taipei for single-digit millisecond latency applications.
- Major Taiwan enterprises are already leveraging AWS including Cathay Financial Holdings for compliance-focused cloud environments, Gamania Group’s Vyin AI platform for celebrity digital identities, and Chunghwa Telecom using Amazon Bedrock for generative AI applications.
- AWS has trained over 200,000 people in Taiwan through AWS Academy, AWS Educate, and AWS Skill Builder programs, supporting the local ecosystem that includes 4 AWS Heroes, 17 Community Builders, and Premier Partners like eCloudvalley and Nextlink Technology.
- The region supports Taiwan’s 2050 net-zero emissions goal with customers like Ace Energy achieving 65% steam consumption reduction and Taiwan Power Company implementing smart grid technologies with drones and robotics for infrastructure management.
32:18 Introducing AWS API models and publicly available resources for AWS API definitions | AWS News Blog
- AWS is now publishing Smithy API models daily to Maven Central and GitHub, providing developers with definitive, up-to-date sources of AWS service interface definitions and behaviors that have been used internally since 2018 to generate AWS SDKs and CLI tools.
- Developers can use these models to generate custom SDKs for unsupported languages, build server stubs for testing, create developer tools like IAM policy generators, or even generate Model Context Protocol (MCP) server configurations for AI agents.
- The repository structure organizes models by service SDK ID and version, with each model containing detailed API contracts including operations, protocols, authentication methods, request/response types, and comprehensive documentation with examples.
- This release enables developers to build purpose-built integrations without waiting for official SDK support, particularly valuable for niche programming languages or specialized use cases where existing SDKs don’t meet specific requirements.
- The models are available at no cost through the GitHub repository and Maven Central, with Smithy CLI and build tools providing immediate access to code generation capabilities.
38:36 Announcing up to 45% price reduction for Amazon EC2 NVIDIA GPU-accelerated instances | AWS News Blog
- AWS is reducing prices by up to 45% for NVIDIA GPU-accelerated EC2 instances including P4 (P4d/P4de) and P5 (P5/P5en) families, with On-Demand pricing effective June 1 and Savings Plans pricing after June 4, addressing the industry-wide GPU shortage that has driven up costs for AI workloads.
- The price cuts apply across all regions where these instances are available, with AWS expanding at-scale On-Demand capacity to additional regions including Asia Pacific, Europe, and South America, making GPU resources more accessible for distributed AI training and inference workloads.
- AWS is now offering the new P6-B200 instances powered by NVIDIA Blackwell GPUs through Savings Plans for large-scale deployments, previously only available through EC2 Capacity Blocks, providing customers with more flexible purchasing options for next-generation GPU compute.
- Customers can choose between EC2 Instance Savings Plans for the lowest prices on specific instance families in a region, or Compute Savings Plans for maximum flexibility across instance types and regions, with both 1-year and 3-year commitment options.
- This pricing reduction represents AWS passing operational efficiencies from scale back to customers, making advanced GPU computing more economically viable for generative AI applications, employee productivity tools, and customer experience improvements.
40:02 📢 Ryan- “I took issue with the way that this blog post was written and was just squinting all the way through it because like, it feels like the shortages are lightening up, and so they can offer this – which I like, right, because they are really passing down that savings – and you know, maybe it’s extra capacity. But I don’t think so. I think it’s because the capacity is available that they can, you know, via supply and demand lower the prices for it.”
42:06 Announcing open sourcing pgactive: active-active replication extension for PostgreSQL – AWS
- AWS open sourced pgactive, a PostgreSQL extension that enables asynchronous active-active replication between database instances, allowing multiple writers across different regions to maintain data consistency and availability.
- The extension builds on PostgreSQL 16’s bidirectional replication features, simplifying management of active-active scenarios for use cases like regional failover, geographic data distribution, and zero-downtime migrations between instances.
- This addresses a common PostgreSQL limitation where traditional replication only supports single-writer architectures, making it difficult to achieve true multi-region active deployments without complex third-party solutions.
- Organizations can now implement disaster recovery strategies with multiple active database instances, reducing recovery time objectives (RTO) and enabling seamless traffic switching during maintenance or outages.
- The open source release on GitHub allows community collaboration on improving PostgreSQL’s active-active capabilities while providing AWS customers with a supported path for multi-writer database architectures without vendor lock-in.
43:49 📢 Justin – “It’s also interesting that they announced this just after Snowflake announced the purchase of CrunchyData – which I believe also offered an active-active solution; as well as there are a couple other commercial versions that you can buy for lots of money. So interesting as well on that part.”
45:59 AWS Network Firewall launches new monitoring dashboard – AWS
- AWS Network Firewall now includes a built-in monitoring dashboard that provides visibility into network traffic patterns, including top traffic flows, TLS SNI, and HTTP Host headers without additional charges beyond standard CloudWatch and Athena costs.
- The dashboard helps identify long-lived TCP connections and failed TCP handshakes, making it easier to troubleshoot network issues and spot potential security concerns that previously required manual log analysis.
- This addresses a common pain point where customers had to build custom dashboards or use third-party tools to visualize Network Firewall data, now providing out-of-the-box insights for faster incident response.
- Setup requires enabling Flow logs and Alert logs in Network Firewall, then activating the monitoring dashboard – a straightforward process that immediately provides actionable network intelligence.
- Available in all AWS Network Firewall regions, this feature strengthens AWS’s network security observability story alongside services like VPC Flow Logs and Traffic Mirroring.
47:09 📢 Matt – “I feel like 50% of the time I get it (Athena) to work, and the other 50% of the time I just swear at it and walk away.”
50:04 AWS Site-to-Site VPN introduces three new capabilities for enhanced security – AWS
- AWS Site-to-Site VPN now integrates with Secrets Manager to automatically redact pre-shared keys in API responses, displaying only the ARN instead of exposing sensitive credentials.
- The new GetActiveVpnTunnelStatus API eliminates the need to enable VPN logs just to track negotiated security parameters like IKE version, DH groups, and encryption algorithms, reducing operational overhead.
- AWS added a recommended parameter to the GetVpnConnectionDeviceSampleConfiguration API that automatically configures best-practice security settings including IKE v2, DH group 20, SHA-384, and AES-GCM-256.
- These security enhancements come at no additional cost and address common VPN configuration challenges where customers often struggle with selecting appropriate cryptographic parameters or accidentally expose PSKs in logs.
- The features are available in all commercial AWS regions except Europe (Milan – we’re not sure who you ticked off), making it easier for enterprises to maintain secure hybrid connectivity without manual security configuration complexity.
- The only thing we have to say here is THANK YOU.
GCP
53:08 Pub/Sub single message transforms | Google Cloud Blog
- Google Pub/Sub now supports JavaScript User-Defined Functions (UDFs) for in-stream message transformations, eliminating the need for separate services like Dataflow or Cloud Run for simple data modifications.
- This reduces latency and operational overhead for common tasks like format conversion, PII redaction, and data filtering.
- The feature allows up to five JavaScript transforms per topic or subscription, with transformations happening directly within Pub/Sub before message persistence or delivery.
- This positions GCP competitively against AWS EventBridge’s input transformers and Azure Service Bus’s message enrichment capabilities.
- Key use cases include data masking for compliance, format conversion for multi-system integration, and enhanced filtering based on message content rather than just attributes. Industries handling sensitive data like healthcare and finance will benefit from built-in PII redaction capabilities.
- The service integrates seamlessly with existing Pub/Sub features like Import Topics and Export Subscriptions, continuing Google’s strategy of simplifying streaming architectures. Additional transforms including schema validation and AI inference are planned for future releases.
- Available now in GA through the Google Cloud console and gcloud CLI with standard Pub/Sub pricing applying to transformed messages.
- The JavaScript runtime limitations and performance characteristics aren’t specified, which may be important for latency-sensitive applications.
54:19 📢 Ryan – “…the fact that this happens before persistence layer is key, right? Because it’s difficult to undo anything you introduce once that happens. so be careful. Test well.”
55:34 M4 VMs are designed for memory-intensive workloads like SAP | Google Cloud Blog
- Google Cloud launches M4 VMs with up to 224 vCPUs and 6TB of DDR5 memory, targeting memory-intensive workloads like SAP HANA and SQL Server with 66% better price-performance than the previous M3 generation and full SAP certification across all shapes.
- Built on Intel’s 5th gen Xeon processors (Emerald Rapids), M4 offers two memory-to-vCPU ratios (13.3:1 and 26.6:1) and delivers up to 2.25x more SAPs compared to M3, making it the first memory-optimized instance among hyperscalers to use these processors.
- M4 leverages Google’s Titanium offload technology for 200 Gb/s networking bandwidth and integrates with Hyperdisk storage supporting up to 500K IOPS and 10,000 MiB/s throughput, with dynamic tuning capabilities and storage pooling for cost optimization.
- The instances are backed by a 99.95% Single Instance SLA and support hitless upgrades and live migration for minimal disruption during maintenance, with initial availability in five regions (us-east4, europe-west4, europe-west3, us-central1).
- M4 completes Google’s memory-optimized portfolio alongside X4 (up to 32TB memory), positioning GCP competitively for large-scale in-memory databases and analytics workloads with both on-demand and committed use discount pricing options.
1:00:32 Deploying Llama4 and DeepSeek on AI Hypercomputer | Google Cloud Blog
- Google Cloud releases optimized deployment recipes for Meta’s Llama4 (Scout 17B-16E and Maverick 17B-128E) and DeepSeek’s V3/R1 models on AI Hypercomputer, providing step-by-step guides for running these open-source LLMs on Trillium TPUs and A3 Mega/Ultra GPUs.
- The recipes leverage JetStream for TPU inference and vLLM/SGLang for GPU deployments, with Pathways enabling multi-host serving across TPU slices – the same system Google uses internally for Gemini model training and serving.
- MaxText now includes architectural innovations from DeepSeek like Multi-Head Latent Attention, MoE Shared/Routed Experts, and YARN RoPE embeddings, allowing developers to experiment with these newer model architectures on Google Cloud infrastructure.
- These deployment options target enterprises needing to run large open models on-premises or in their own cloud environments, competing directly with AWS SageMaker and Azure ML’s model hosting capabilities while leveraging Google’s TPU advantage.
- The GitHub recipes provide complete workflows including model weight downloads, checkpoint conversion, server deployment, and benchmarking scripts, reducing the typical deployment complexity from days to hours for these multi-billion parameter models.
1:01:23 📢 Matt – “I think you’re making up half these words.”
1:02:23 Understanding updates to BigQuery workload management | Google Cloud Blog
- BigQuery introduces reservation fairness and predictability features that allow organizations to set absolute maximum slot consumption limits and distribute idle capacity equally across reservations rather than projects, providing more granular control over resource allocation and costs in Enterprise editions.
- The new runtime reservation specification feature enables users to override default reservation assignments via CLI, UI, SQL, or API at query execution time, with role-based access controls for improved security and flexibility in multi-team environments.
- Autoscaler improvements deliver 50-slot increment granularity (down from 100), near-instant scale up, and faster scale down capabilities, allowing more responsive resource adjustments to workload demands compared to previous iterations.
- Reservation labels now integrate with Cloud Billing data for the Analysis Slots Attribution SKU, enabling detailed cost tracking and optimization by workload or team, addressing a common enterprise requirement for chargeback and showback scenarios.
- These updates position BigQuery’s workload management closer to dedicated resource pools found in Snowflake’s multi-cluster warehouses or AWS Redshift’s workload management queues, but with more dynamic allocation options suited for variable analytics workloads.
1:03:31 📢 Justin – “If you’re going to use reservation fairness and you’re not going to honor the project boundary, I will cut you – Ryan – when you take my BigQuery slots.”
1:07:16 Enhancing protection: 4 new Security Command Center capabilities | Google Cloud Blog
- Security Command Center now offers agentless vulnerability scanning for Compute Engine and GKE at no additional charge, eliminating the need to deploy and manage scanning agents on each asset while providing coverage even for unauthorized VMs provisioned by adversaries.
- Container image vulnerability scanning is now integrated through Artifact Analysis, with scans included at no extra cost for SCC Enterprise customers when images are deployed to GKE, Cloud Run, or App Engine, consolidating security findings in one dashboard.
- Cloud Run threat detection introduces 16 specialized detectors that analyze serverless deployments for malicious activities, including behavioral analysis, NLP-powered code analysis, and control plane monitoring – capabilities not available in third-party products.
- SCC automatically detects connections to known malicious IPs by analyzing internal network traffic without requiring customers to purchase, ingest, and analyze VPC Flow Logs separately, unlike third-party security tools that charge extra for this capability.
- All four capabilities leverage Google’s first-party access to infrastructure data and Google Threat Intelligence, providing deeper visibility than API-based third-party tools while respecting data residency boundaries established by customers.
1:10:31 New MCP integrations to Google Cloud Databases | Google Cloud Blog
- Google’s MCP Toolbox now enables AI coding assistants like Claude Code, Cursor, and Windsurf to directly query and modify Google Cloud databases including Cloud SQL, AlloyDB, Spanner, and BigQuery through natural language commands in your IDE.
- Developers can skip writing complex SQL queries and instead use plain English to explore database schemas, create tables, modify structures, and generate test data – tasks that previously took hours or days can now be completed in minutes.
- The tool implements Anthropic’s Model Context Protocol (MCP), an emerging open standard that replaces fragmented custom integrations between AI systems and data sources with a unified protocol approach.
- This positions Google competitively against AWS CodeWhisperer and GitHub Copilot by offering deeper database integration capabilities, though those services don’t yet support direct database manipulation through natural language.
- Key use cases include onboarding new developers, rapid prototyping, schema refactoring, and automated test generation – particularly valuable for e-commerce, SaaS, and enterprise applications with complex data models.
1:12:33 Datadog integrates Google Cloud AI | Google Cloud Blog
- Datadog now monitors Google’s Vertex AI Agent Engine through its new AI Agents Console, providing unified visibility into autonomous agents’ actions, permissions, and business impact across third-party and Google-orchestrated agents.
- The integration covers the full AI stack on Google Cloud: application layer (AI agents), model layer (Gemini and Vertex AI LLMs with auto-instrumentation), infrastructure layer (Cloud TPU monitoring), and data layer (expanded BigQuery monitoring for cost optimization).
- Datadog has implemented Google Cloud’s Active Metrics APIs to reduce monitoring costs by only calling APIs when new data exists, complementing their Private Service Connect support to minimize data transfer expenses.
- The expanded BigQuery monitoring helps teams identify top spenders, slow queries, and failed jobs while flagging data quality issues – addressing a key pain point for organizations using BigQuery for AI data insights.
- Customers can purchase Datadog directly through Google Cloud Marketplace with deployment in minutes, making it straightforward for GCP users to add comprehensive AI observability to their existing infrastructure.
1:13:52 📢 Justin – “Datadog only has some of the responsibility. A lot of it is because of all of these managed monitoring solutions, it’s what you send to it. And they’re just charging by ingestion rates. And so if you’re in control of your data, your spend is not going crazy big.”
1:15:27 Introducing Google Cloud Serverless for Apache Spark in BigQuery | Google Cloud Blog
- Google Cloud Serverless for Apache Spark is now generally available within BigQuery, eliminating cluster management overhead and charging only for job runtime rather than idle infrastructure.
- This integration provides a unified developer experience in BigQuery Studio with seamless interoperability between Spark and BigQuery SQL engines on the same data.
- The service includes Lightning Engine (in Preview) which delivers up to 3.6x faster query performance through vectorized execution and intelligent caching. Pre-packaged ML libraries like PyTorch and Transformers come standard with Google-certified Spark images, plus GPU acceleration support for distributed AI workloads.
- BigLake metastore enables Spark and BigQuery to operate on a single copy of data whether in BigQuery managed tables or open formats like Apache Iceberg and Delta Lake. All data access is unified through the BigQuery Storage Read API with no additional cost for reads from serverless Spark jobs.
- BigQuery spend-based CUDs now apply to serverless Spark usage, and the service supports full OSS compatibility with existing Spark code across Python, Java, Scala, and R. Enterprise features include job isolation, CMEK encryption, custom org policies, and end-user credential support for data access traceability.
- Gemini-powered features include PySpark code generation with data context awareness and Cloud Assist for troubleshooting recommendations (both in Preview).
- The service integrates with BigQuery Pipelines and Schedules for orchestration, plus supports Apache Airflow/Cloud Composer operators for deployment.
Azure
1:17:56 Enhance AI security with Azure Prompt Shields and Azure AI Content Safety | Microsoft Azure Blog
- Azure Prompt Shields provides real-time protection against prompt injection attacks, which OWASP identifies as the top threat to LLMs, by analyzing inputs to detect both direct jailbreak attempts and indirect attacks embedded in documents or emails.
- The service integrates directly with Azure OpenAI content filters and Azure AI Foundry, offering contextual awareness to reduce false positives and a new Spotlighting capability that distinguishes between trusted and untrusted inputs in generative AI applications.
- Microsoft Defender now integrates with Azure AI Foundry to surface AI security recommendations and runtime threat alerts directly in the development environment, helping developers identify prompt injection risks early in the development process.
- Enterprise customers like AXA and Wrtn Technologies are using Prompt Shields to secure their AI deployments, with AXA preventing prompt injection in their Secure GPT solution and Wrtn leveraging customizable content filters for their Korean AI companion platform.
- Azure OpenAI customers can enable Prompt Shields through built-in content filters while Azure AI Content Safety customers can activate it for non-OpenAI models.
1:19:12 📢 Ryan – “These types of tools are invaluable, right? AI is such a changing landscape, if you’re writing an AI app or taking inputs from a customer…responsible AI is built into all the larger models, but if you’re trying to use a custom model..having this is super key to protecting yourself.”
1:21:27 Announcing Azure Command Launcher for Java | Microsoft Community Hub
- Microsoft introduces jaz, a JVM launcher that automatically optimizes Java applications for Azure cloud environments by detecting container limits and selecting appropriate heap sizing, garbage collection, and diagnostic settings without manual configuration.
- The tool addresses a significant problem where over 30% of developers deploy Java workloads with default OpenJDK settings that are too conservative for cloud environments, leading to underutilized resources and higher operational costs.
- Currently in private preview for Linux containers using Microsoft Build of OpenJDK and Eclipse Temurin (Java 8), jaz simplifies deployment by replacing complex JAVA_OPTS configurations with a single command: jaz -jar myapp.jar.
- The roadmap includes AppCDS support for improved startup times, future Project Leyden integration, and continuous tuning capabilities with Prometheus telemetry sharing, positioning it as a cloud-native alternative to manual JVM tuning or tools like Paketo Buildpacks.
- Target users include developers deploying Spring Boot, Quarkus, or Micronaut microservices on Azure Container Apps, AKS, Azure Red Hat OpenShift, or Azure VMs who want better performance without deep JVM expertise.
1:23:56 📢 Matt – “It just feels like these things should be things out of the box at this point. And then you could tweak them if you want to override them, not default to 128 or 256. And then you’re like, I have a 20 terabyte RAM system. Why am I using 250 megabytes? Hey, by the way, the AI that earlier from FinOps will tell you to scale down, which would be good for you.”
Cloud Journey
1:25:17 The coming downfall of the cloud FinOps tools market and who falls first
- Blog Author: Will Kelly
- The FinOps tools market is heading for a massive shakeout by 2027, with native cloud provider tools like AWS Cost Explorer and Azure Cost Management finally catching up to third-party vendors by offering free, built-in features like tagging enforcement, anomaly detection, and savings plan recommendations that used to be the bread and butter of standalone FinOps platforms.
- AI is fundamentally changing the game by automating what FinOps vendors used to charge premium prices for – instead of manually reviewing cost anomalies or building reservation coverage charts, AI can now generate and execute optimization plans in real-time, making dashboard-only tools look like expensive relics from a bygone era.
- The article calls out specific vendors who are in trouble, including Kion’s desperate pivot to partner with ProsperOps for Kubernetes visibility after years of chasing SEO and compliance messaging instead of focusing on actual cost optimization, and Apptio Cloudability, which despite IBM’s backing, remains bloated and tied to legacy enterprise reporting models.
- There’s a brutal reality check for vendors disguising managed services as SaaS platforms – companies like CloudKeeper that promise “guaranteed savings” but are really just offshored analysts preparing manual reports behind a sleek UI, charging enterprise SaaS prices for what amounts to templated spreadsheets and consulting work.
- The lack of deep cloud provider alignment is becoming a death sentence for FinOps vendors, as enterprises increasingly want tools that integrate directly with their CSP contracts, procurement flows, and Enterprise Discount Programs – if you’re not in the AWS, Azure, or GCP marketplaces with proper billing integration, you’re essentially invisible to enterprise buyers.
- By 2027, the author predicts only full-stack automation platforms that embed into CI/CD pipelines, Kubernetes orchestration, and finance workflows will survive, while dashboard-only tools, fake SaaS platforms, and vendors who confused blog traffic for product-market fit will be consolidated, acqui-hired, or simply shut down.
- The market saturation has reached a breaking point where every vendor pitches the same “visibility, optimization, savings” story, and budget-conscious buyers are exhausted by the sameness – there’s simply no room left for “just another dashboard” in an increasingly commoditized market.
- This consolidation might actually be good for customers who are tired of paying for expensive tools that generate pretty charts but don’t actually reduce their cloud bills – the survivors will be forced to deliver real, automated value rather than just insights and recommendations that require manual implementation.
Closing
And that is the week in the cloud! Visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloudPod