259: If Only All My Disasters Could Be Managed

Welcome to episode 259 of the Cloud Pod podcast – where the forecast is always cloudy! This week your hosts Justin, Matthew, and Jonathan and Ryan (yes, all 4!) are covering A LOT of information – you’re going to want to sit down for this one. This week’s agenda includes unnecessary Magic Quadrants, SecOps, Dataflux updates, CNAME chain struggles, and an intro into Phi-3 – plus so much more!

Titles we almost went with this week:

⚛️GKE Config Sync or the Auto Outage for K8 Feature
🌋If only all my disasters could be managed
🪆The Cloud Pod builds a Rag Doll
🤕Understanding Dataflux has given me reflux
🤖Oracle continuing the trend of adding AI to everything even databases
💸A new way to burn your money on the cloud which isn’t even your fault
🏅Google Gets a Magic Quadrant Participation Trophy
🏆We’re All Winners to Magic Quadrant
😡Don’t be a giant DNAME

A big thanks to this week’s sponsor:

Big thanks to Sonrai Security for sponsoring today’s podcast

Check out Sonrai Securities’ new Cloud Permission Firewall. Just for our listeners, enjoy a 14 day trial at https://sonrai.co/cloudpod

General News

00:33 Dropbox dropped the ball on security, hemorrhaging customer and third-party info

Dropbox has revealed a major attack on its systems that saw customers’ personal information accessed by unknown and unauthorized entities.
The attack, detailed in a regulatory filing, impacted Dropbox Sign, a service that supports e-signatures similar to Docusign.
The threat actor had accessed data related to all users of Dropbox Sign, such as emails and usernames, in addition to general account settings.
For a subset of users, the threat actor accessed phone numbers, hashed passwords and certain authentication information such as API keys, OAuth tokens and multi-factor authentication.
To make things *extra* worse – if you never had an account but received a signed document your email and name has also been exposed. Good times.
Want to read the official announcement? You can find it here.

03:06 📢 Jonathan- “It’s unfortunate that it was compromised. It was their acquisition, wasn’t it – ‘HelloSign’ that actually had the defect, not their main product at least.”

05:44 VMware Cloud on AWS – here today, here tomorrow

Last week at recording time Matt mentioned the VMWare Cloud on AWS rumors on twitter that Broadcom was terminating.
Hock Tan, President and CEO of Broadcom wrote a blog post letting you know that VMWare Cloud on AWS is Here today, and here tomorrow.
He says the reports have been false, and contends that the offering would be going away forcing unnecessary concern for their loyal customers who have used the service for years. He quotes Winston Churchill (which is an interesting choice) and then goes on to report the service is alive, available and continues to support costumer’s strategic business initiatives.
What’s *really* going on is that VMWare Cloud on AWS is no longer directly sold by AWS or its channel partners.
- “Its that simple” means that if you previously purchased VMWare cloud on AWS from AWS, you will now work with Broadcom or an authorized broadcom reseller to renew their subscription and expand your environment.
- Customers can have active one or three-year subscriptions with monthly payments that were purchased from AWS will continue to be invoiced till the end of your term.

07:38 📢 Justin – “So basically what was happening on Friday was that people were getting wind that Amazon was going to be able to resell VMware. And people were panicking about that. And yeah, right. So if you didn’t get that deal done before this happened, sorry, you’re now negotiating with Broadcom directly.”

AI Is Going Great (Or, How ML Makes All It’s Money)

08:14 Better See and Control Your Snowflake Spend with the Cost Management Interface, Now Generally Available

Snowflake is dedicated to providing customers with intuitive solutions that streamline their operations and drive success.
To help customers, they are introducing updates to the cost-managed interface making managing Snowflake easier at the org level and accessible to more roles
You can tap into cost data at multiple levels, from the organization’s view to individual teams.
The latest enhancements provide visibility into your spend at the organization and account levels, ensuring you have the insights needed to make informed decisions and seek proactive measures.
Organization Overview gives you spend summary, contract overview and account spend summary data.
New features to account overview include monitoring account spend, forecasting spending, identifying top areas by spending, and optimizing spend.

10:59 📢 Jonathan – “Yeah, at least they have budgets though. They can enforce spending limits per account or group of people. So you can stop a row gap from going off and spending millions of dollars over a weekend doing things you shouldn’t be doing.”

AWS

11:40 Stop the CNAME chain struggle: Simplified management with Route 53 Resolver DNS Firewall

You can now configure the DNS firewall to automatically trust all domains in a resolution chain (such as a CNAME or DNAME Chain)
The DNS firewall allows you to control and monitor the domains that your application can query. However, this causes some issues when your app uses AWS services.
IE: You Query alexa.amazon.com, but that’s a CNAME for pitangui.amazon.com, which is a CNAME to tp.5fd53c724-frontier.amazon.com which is a CNAME to d1wg1w6p5q855.cloudfront.net with only the cloudfront address resolving to an IP 3.162.42.28.
As a firewall admin you might have been tempted to just put in *.amazon.com but that would then fail because it’s cloudfront.net. Worse, the DNS CNAME is controlled by the service and the chain might change at any time, forcing you to manually maintain the list of rules and authorized domains.
With a new parameter added to the UpdateFirewallRule API and AWS Managed Console to configure the DNS firewall so that it follows and automatically trusts all the domains in a CNAME or DNAME chain.
This makes it simpler by just entering your application query domain. You can turn this on specific to a rule, so you don’t need it on for everything.

14:15 📢 Ryan – “I can’t imagine this not coming up during a beta test or early adopter test. Like this is a very common, you know, Amazon workload is, is going to see, you’d think they’d hit this day one with that testing. It’s crazy.”

15:55 📢 Jonathan – “DNAMES, it’s a way of mapping subdomains into parts of other domains. So you could map…let me think of an example. You can map multiple subdomains into a different namespace, effectively.”

17:36 Amazon EC2 simplifies visibility into your active AMIs

You can now find out when your AMI was last used to launch an EC2 instance by describing your AMI, enabling you to efficiently filter and track your active AMI’S.
Want to see the documentation? Find it here.
THANK YOU!

17:49 Amazon EC2 now protects your AMIs from accidental deregistration

You can also prevent AMI from accident deregistration by marking them as protected. A protected AMI cannot be deregistered until you explicitly disable deregistration protection.
Find the blog post here.
Also Thank you.

19:07 Build RAG applications with MongoDB Atlas, now available in Knowledge Bases for Amazon Bedrock

You can now use MongoDB Atlas as a vector store in KB for Amazon Bedrock. With this integration, you can build RAG (Retrieval Augmented Generation) solutions to securely connect your organizations private data sources to FMs in Amazon Bedrock.
This integration adds to the list of vector stores supported by KB for Bedrock, including Aurora Postgres Compatible Edition, vector engine for OpenSearch Serverless, Pinecone and Redis Enterprise Cloud.

19:46 📢 Jonathan – “I had a chat with the Mongo sales guy not that long ago about this actually. It’s pretty cool. I don’t, yeah, it’s definitely an OS2 feature. I don’t think, you know, it’s, it’s if you want a vectored engine, I don’t think MongoDB will be your first choice if you weren’t already using it, but it’s a great, it’s a great additional feature if you’ve already got it in the stack.”

20:12 Introducing file commit history in Amazon CodeCatalyst

I would have just assumed this feature existed, but apparently you can now see file commit history in Amazon CodeCatalyst. Customers can now views the file git commit history. This helps you plan, code, build, test and deploy applications on AWS

21:11 AWS CodePipeline supports stage level manual and automated rollback

CodePipeline V2 type pipelines now support stage level rollback to help customers to confidently deploy changes to their production environment.
When a pipeline execution fails in a stage due to any action(s) failing, customers can quickly get that stage to a known good state by rolling back to a previously successful pipeline execution in that stage.

21:29 📢 Justin – “Now, if only it was really that easy of just rolling back a stage like no big deal, like, oh yeah, I rolled back. That assumes, of course, a lot of assumptions about your application… If it’s a static web application, yes, 100 % accurate. If this is a DB deployment, 100 % inaccurate and do not do this without understanding the risks to your business.”

22:52 How an empty S3 bucket can make your AWS bill explode

JeffBarr Twitter

JeffBarr Twitter update #2

A fun Medium post was written by Maciej Pocwiera, that imagined a scenario of creating an empty, private AWS s3 bucket, and what should it cost you the next morning?
Marciej did basically this, he created an S3 bucket, and uploaded some files for testing of a new document indexing system. Two days later, he checked his AWS billing page to make sure he was in the free tier, to find out he wasn’t, and it was costing him over $1,300 – with the billing console showing nearly 100,000,000 S3 put requests executed in a single day.
He didn’t know where this was coming from, and he hadn’t enabled S3 logging or Cloudtrail, he enabled the cloud trail logs to see thousands of write requests originating from multiple accounts or entirely outside of AWS.
Come to find out there is an open source tool that had a default configuration to store their backups in S3, and as a placeholder for a bucket name, they used the same name Maciej used.
So a ton of systems are attempting to store data, and worse S3 is charging him for unauthorized incoming requests. Worse if you don’t specify a bucket region, AWS will redirect your budget request from US-EAST-1 to the actual bucket and you get to pay for that too.
He went further and decided to let the bucket accept public writes for 30 seconds and received 10GB of backup data.
He updated the open source tool, and they fixed their default configuration. He also notified the AWS security team, and he reported the customer’s data he found in the bucket. AWS canceled the Bill.
Jeff Barr publicly acknowledged this issue on Twitter, and has voiced that AWS agrees customers should not have to pay for unauthorized requests that they did not initiate and are going to fix it.
Today, we learned that they are working hard on it and that it will cover a range of HTTP 3xx/4xx status codes, including all of the ones mentioned in Maciej’s article. They hope to share more details later this week.

25:55 📢 Ryan – “I was more impressed with Amazon’s reaction to this in terms of like, you know, like they haven’t fixed it. Apparently this is not a new issue. It’s been reported before, but just the amount of attention that’s got and how quickly there was a response. And then now, you know, a follow -up with, with an, you know, next coming week, sort of ETA, which is, I thought, was pretty impressive given the timescale that we’re talking about.”

GCP

28:26 Auto-upgrades for Config Sync in GKE Enterprise now in preview

Config Sync, Google Cloud’s fully managed GitOps offering for GKE, lets cluster operators and platform administrators deploy configurations and applications from a source of truth.
Today they are announcing a new auto-upgrade feature in preview, letting you automatically upgrade Config Sync versions and oversee the lifecycle of Config Sync Components.
Auto_upgrade is an opt-in feature available for new and existing config sync installations.
Benefits:
- Low Maintenance Overhead
- Maintained supported
- Enforced Reliable
Auto-upgrades that match GKE release channels – Rapid, Regular and STable.\

29:12 📢 Ryan – “I wish, I mean, I still go back to like, I wish Kubernetes was simple enough where this wasn’t as big of a deal. Like it should be able to auto upgrade between versions and, and that shouldn’t break everything, but it does. It breaks everything. I’ve seen it. I don’t understand why it breaks everything when you update Kubernetes. It’s frustrating.”

29:49 📢 Justin – “I mean, the problem is there’s so much complexity in Kubernetes and so much deprecation of old legacy APIs right now that I just don’t feel like the API is that stable. So breaking changes is just the nature of the beast.”

30:26 Google is a Leader in the 2024 Gartner® Magic Quadrant™ for Cloud AI Developer Services

Cloud AI Developer Services magic quadrant is out and of course I’m sure everyone wants to be on it this time!
Surprisingly there are 4 companies in the leader magic quadrant with Amazon being the highest to execute but not as complete of a vision.
Microsoft having a complete vision but poor ability to execute and Google sitting below AWS/MS and to the left of MS and right of Amazon. The 4th barely holding on to the leader quadrant is IBM.
Now this is specifically about Cloud AI Developer services as cloud-hosted or containerized services and products that enable software developers who are not data science experts to use AI models via APIs, SDK’s or applications.
Must have features: tabular services, language services, vision services.
Standard Features: Automated data prep, automated feature engineering and model building, model management/operationalization, responsible AI, Natural Language Understanding, speech to text, Text to speech, natural language generation, translation, image recognition, video AI, ML enabled OCR, image/video generation, AI code assistance
AWS
- Geographic Strategy
- Vertical/industry strategy
- Overall viability
- Cautions
  - Marketing execution
  - Market understanding
  - Innovation
Google
- Product
- Market Responsiveness
- Overall Viability
- Cautions
  - Customer Experience
  - Vertical/industry strategy
  - Marketing execution
Microsoft
- Product
- Geographic Strategy
- Overall Viability
- Cautions
  - Market Understanding
  - Marketing Execution
  - Innovation

32:28 📢 Jonathan – “I wonder why Amazon lacked complete vision, honestly. I guess it depends, I mean, from what perspective are they reporting on this? Because, you know, in my mind, I think what Amazon has done is very smart. They have all the tools to use any model you want, and they didn’t pay a cent in building their own models. You know, Mesa paid for Llama, Anthropic paid for Claude. There’s a whole bunch of models you can use on Amazon. Plus, they do have the vision services to do with the natural language services, things like that. But they didn’t pay any money.”

37:13 Introducing Dataflux Dataset for Cloud Storage to accelerate PyTorch AI training

Google is launching a Pytorch Dataset Abstraction, the Dataflux dataset, for accelerating data loading from GCS.
Dataflux provides up to 3.5x faster training compared to fsspec, with small files.

37:38 Maintain business continuity across regions with BigQuery managed disaster recovery

Out of the box with BigQuery you get an industry-leading 99.99% uptime SLA for availability within a single geographic region.
Full redundancy across two datacenters within a single region is included with every BigQuery dataset you create and is managed transparently.
If you need enhanced redundancy across large geographic regions, we are now introducing managed disaster recovery for BigQuery.
This feature, now in preview, offers automated failover of compute and storage and a new cross-regional SLA tailored for business-critical workloads.
This feature enables you to ensure business continuity in the unlikely event of a total regional infrastructure outage.
Managed DR also provides failover configurations for capacity reservations, so you can manage query and storage failover behavior. This is all part of BigQuery Enterprise Plus edition.

38:53 📢 Matthew – “I like the ability to give Google more money with capacity reservations in your DR region so that when the first region fails and everyone goes and launches in the DR region, you still have your reservation capacity.”

39:29 📢 Justin – “What I want is the cloud providers to provide transparency of like, what’s the spot market percentage in a given data center? Because if the spot market is, you know, equivalent of like 30 or 40% of the workload in that region, those people are all dead in DR. So we’re taking their capacity and I don’t think I’m too worried about it, but, you know, there’s some transparency that the cloud providers could provide, but then they’ll just sell you this guaranteed capacity at an upcharge.”

41:33 Introducing Google Threat Intelligence: Actionable threat intelligence at Google scale

It’s RSA this week, so Google has two announcements in the infosec space.
First up they announce Google Threat Intelligence, a new offering that combines the unmatched depth of their Mandiant front line expertise, the global reach of the VirusTotal community, and the breadth of visibility only Google can deliver, based on billions of signals across devices and emails. Google Threat Intelligence includes Gemini in Threat Intelligence, the AI powered agent that provides conversational search across their vast repository of threat intelligence, enabling customers to gain insights and protect them from threats faster than before.
Key Features:
- Google threat insights: Google protects 4 billion devices and 1.5 billion email accounts and blocks 100 million daily phishing attempts. This provides us with a vast sensor array and a unique perspective on internet and email-borne threats that allow us to connect the dots back to attack campaigns.
- Frontline intelligence: Mandiant’s eIite incident responders and security consultants dissect attacker tactics and techniques, using their experience to help customers defend against sophisticated and relentless threat actors worldwide in over 1,100 investigations annually.
- Human-curated threat intelligence: Mandiant’s global threat experts meticulously monitor threat actor groups for activity and changes in their behavior to contextualize ongoing investigations and provide the insights you need to respond.
- Crowdsourced threat intelligence: VirusTotal’s global community of over 1 million users continuously contributes potential threat indicators, including files and URLs, to offer real-time insight into emerging attacks.
- Open-source threat intelligence: We use open-source threat intelligence to enrich our knowledge base with current discoveries from the security community.

42:29 Introducing Google Security Operations: Intel-driven, AI-powered SecOps

Google Security Operations is getting additional AI capabilities and the update is designed to reduce the do-it-yourself complexity of SecOps and enhance the productivity of the entire SOC.
To help reduce manual processes and provide better security outcomes for their customers, Google Security Operations includes a rich set of curated detections with new ones:
- Cloud detections can addresses serverless threats, crypto mining incidents across Google Cloud, all Google Cloud and Security Command Center Enterprise findings, anomalous user behavior rules, machine learning-generated lists of prioritized endpoint alerts (based on factors such as user and entity context), and baseline coverage for AWS including identity, compute, data services, and secret management. We have also added detections based on learnings from the Mandiant Managed Defense team. Detections are now available in Google Security Operations Enterprise and Enterprise Plus packages.
Frontline threat detections can provide coverage for recently-detected methodologies, and is based on threat actor tactics, techniques and procedures (TTPs), including from nation-states and newly-detected malware families. New threats discovered by Mandiant’s elite team, including during incident response engagements, are then made available as detections. It is now available in the Google Security Operations Enterprise Plus package.

43:33 📢 Justin – “I think anything we can help security people with is a win. So I don’t know all the threat intelligence, it sounds like threat noise in a lot of ways, because when you win with too many signals, it’s just all noise at some point, and yes, it could be valid, like your dark web monitoring, Ryan. But it also could just be noise, because I’m like, I don’t know who’s data got hacked to get my email address this time. It’s only the 15th this week, so who knows?”

Azure

44:59 Azure Governance Update – Management Groups

Beginning last week Azure started enabling the Root Manage Group for tenants that have not been enabled yet.
Azure Management groups leverage best practices when applying Azure Policy and having it pre-enabled reduces the initial set up work to follow the best practices.
This is being done to provide a governance scope above subscriptions to manage policies and compliance for those subscriptions efficiently.

45:37 📢 Matthew – “Essentially in the past, when you have your organization structure, there was no top level. So if you wanted to apply a policy to everything, you had to apply to all the subfolders. This was one of those things that over time was just, Hey, best practices, you just set this up. And now this is just Microsoft saying, here you go. We’re setting it up for you.”

47:30 Azure Virtual Network Manager user-defined route (UDR) management now in public preview

User-Defined Route (UDR) management in Azure Virtual Network Manager is now in public preview.
This feature enables you to describe your desired routing behavior in Azure Virtual Network Manager by defining and applying routing rules to multiple subnets and virtual networks without manually configuring the route tables for each subnet.

48:50 Introducing Phi-3: Redefining what’s possible with SLMs

Microsoft is excited to introduce Phi-3, a family of open AI models developed by Microsoft. Phi-3 models are the most capable and cost-effective small language models (SLMs) available, outperforming models of the same size and next size up across a variety of language, reasoning, coding and math benchmarks. This release expands the selection of high-quality models for customers, offering more practical choices as they compose and build generative AI applications.
Phi-3-mini, a 3.8B language model, is available on MS Azure AI studio, Hugging Face and Ollama.
Phi-3-mini is available in two context-length variants — 4k and 128k tokens.
Two additional models, the Phi-3-small (7b) and Phi-3-medium (14B), will be available in the next few weeks.

50:23 📢 Jonathan – “As soon as you start training models to beat the benchmarks, they cheat, you know, and it doesn’t become meaningful anymore. I think asking a, you know, you see questions, plenty of questions online, like, you know, apart from Europe, which are the concerns that begin with an A? Like obviously Europe doesn’t begin with an A, but many models just gloss over that, ignore the error in the question and answer the questions the best they can. And so… I think things like that are the real tests to catch these models out. Also some funny stuff.”

51:36 Prioritizing security above all else

After the shellacking MS took over the Exchange hack by foreign states Microsoft has taken quite a bit of time to respond.
Satya Nadella addressed it at the earnings call last week, and has published a blog post as well.
Satya’s note starts out with an edict “Prioritizing security above all else”
Microsoft runs on trust, and their success depends on earning and maintaining it. We have a unique opportunity and responsibility to build the most secure and trusted platform that the world innovates upon.
Satya says they launched our Secure Future Initiative (SFI) with this responsibility in mind, bringing together every part of the company to advance cybersecurity protection across both new products and legacy infrastructure.
Going forward, they will commit the entirety of the organization to SFI as they double down on the initiative with an approach grounded in three core principles:
- Secure by Design: security comes first when designing any product or service
- Secure by Default: Security protections are enabled and enforced by default, require no extra effort, and are not optional
- Secure Operations: Security controls and monitoring will continuously be improved to meet current and future threats.
These principles will govern every facet of their SFI pillars: Protect identities and secrets, protect tenants and isolate production systems, protect networks, protect engineering systems, monitor and detect threats, and accelerate response and remediation.
In addition, we will instill accountability by basing part of the compensation of the senior leadership team on our progress towards meeting our security plans and milestones.
We must approach this challenge with both technical and operational rigor, and with a focus on continuous improvement.
Security is a team sport, and accelerating SFI isn’t just job number one for our security teams, it’s everyone’s top priority and our customers greatest need.
If you’re with the tradeoff between security and another priority, your answer is clear: Do Security. In some cases, this will mean prioritizing security above other things we do, such as releasing new features or providing ongoing support for legacy systems.

56:23 📢 Matthew – “The product teams don’t always consider that. Product managers don’t always consider a feature. They need the next shiny thing out there. So where do they end up sitting and does the product team and does then Microsoft get dinged on their next quarterly earning of, hey, last time you released 50 features and this time you released 40 features. What happened? Oh, well, we were fixing all of our security holes. Well, it’s not really a good story either.”

General Availability: Microsoft Azure now available from new cloud region in Mexico

The First Cloud region in Mexico is now available with Azure Availability Zones and provides organizations across the globe with access to scalable, highly available, and resilient Microsoft Cloud services while confirming its commitment to promoting digital transformation and sustainable innovation in the country.

Oracle

57:12 Announcing Oracle Database 23ai : General Availability

Oracle to announce the GA of Oracle Database 23ai.
Over the last four year’s Oracle Database Development has worked hard to deliver the next long-term support release of the Oracle Database, with a focus on AI and developer productivity.
Given the focus on AI in this release of the database, we decided to change the database’s name from Oracle Database 23c to Oracle Database 23ai.
The three focused key areas:
- AI for Data
- Dev for Data

58:05 📢 Jonathan – “AI for data, AI for developers and AI for more money.”

Closing

And that is the week in the cloud! Go check out our sponsor, Sonrai and get your 14 day free trial. Also visit our website, the home of the Cloud Pod where you can join our newsletter, slack team, send feedback or ask questions at theCloud Pod.net or tweet at us with hashtag #theCloud Pod

259: If Only All My Disasters Could Be Managed