OT: Potentially largest IT outage in history today

Bulldog Bruce · Jul 19, 2024

And people want self driving cars and AI??? Are you nucking futs?

Trojanbulldog19 · Jul 19, 2024

Pretty much every IT department today.

The Cooterpoot · Jul 20, 2024

Was it as bad as the Scott's Tissue add that sucks the life out of this site when it shows up?

QuadrupleOption · Jul 20, 2024

DeeEE! said:
Expect to see more of these type of events in the future. With there being a push to cloud for all SaaS apps and cloud hosting, the We are an interconnected global economy now. Everything relates to the other, and causes a domino sequence.

The only way to combat these type of failures is to have better planning, testing backup plans, etc. While not realistic, the best way to plan to to have a fleet of offline devices at all time, have a copy of all data offline (Airgapped), and have multiple vendors for different products. Have ATT and Verizon, have Mac and Windows OS. Have multiple EDR Solutions, etc.

QA is bad in many cases, but most (if not all) of the big-boy SaaS operations run containerized systems that are easily spun up/down to handle extreme demand, are geo-redundant, and can be rolled back quickly if a software update goes wrong.

This issue was due to poor QA pushing out a patch that disabled individual laptops. I assume most IT shops had their servers up and running quickly after this occurrence.

Shmuley · Jul 20, 2024

Crowdstrike collapses Microsoft. Milkshake machine at McD’s starts working. Coincidence?

thatsbaseball · Jul 20, 2024

Shmuley said:
Crowdstrike collapses Microsoft. Milkshake machine at McD’s starts working. Coincidence?

But did they get the rest of the order right ?

MSUGUY · Jul 21, 2024

Are the effects of the Crowdstrike problem limited now or is this the beginning of something more serious?

My job is canceled tomorrow because my Clients are stuck out of country with no flights. I have friends that can’t return from the Caribbean due to no flights.

Willow Grove Dawg · Jul 21, 2024

I flew JAN to ATL Friday & returned Saturday on Delta
Friday flight scheduled for 6:07 AM departed Jackson at 9:30 AM.
Saturday return was scheduled for 1:15 PM and departed about 4:00 PM
I was lucky because I did not have any connections. I thought Delta managed the situation as well as possible given the circumstances because they had very little information available to them especially Friday morning.
Atlanta Hartsfeld was a complete disaster both days though. I could not imagine the number of people in the airport. They weren't any hotel rooms or rental cars available, so there were literally thousands of people sleeping in the airport with flights delayed by days if not cancelled. The walkways between the terminals looked like a homeless camps.

Boom Boom · Jul 21, 2024

DeeEE! said:
Expect to see more of these type of events in the future. With there being a push to cloud for all SaaS apps and cloud hosting, the We are an interconnected global economy now. Everything relates to the other, and causes a domino sequence.

The only way to combat these type of failures is to have better planning, testing backup plans, etc. While not realistic, the best way to plan to to have a fleet of offline devices at all time, have a copy of all data offline (Airgapped), and have multiple vendors for different products. Have ATT and Verizon, have Mac and Windows OS. Have multiple EDR Solutions, etc.

I will admit this one is a new one no one has seen before. The main issue with this event was that it required boots on the ground for physical endpoints. This wasn't a situation that was isolated to a single organization like a typical Ransomware event where you could bring in an IR firm on reinforcements.

You can rest assure that our adversaries (China, Russia, Iran, and North Korea) has taken note. The best way to have the biggest impact is to infiltrate the "supply chain". An example of this was back when SolarWinds was compromised via updates a few years back. You hire a developer and gain trust in the software development process, you get the access you need and learn the ropes of the approval processes. You learn the culture and determine the checks and balances, then you slip in a little code over time and have it deployed.

While this wasn't a compromise, it was similar in that a single piece of software used global by all organizations was impacted.

Imagine having the ability to remotely "kill switch" all devices (Windows, Nest, iPhone, etc.)

One day this will occur, and when it does all hell will break loose.

The problem is ample QA hurts margins, so corp America hates it. Maybe they're not as bad about it as manufacturing in America has gotten. Yet.

That's more info on SolarWinds than I've ever seen publicly reported. It's like the media isn't allowed to talk about it....

00Dawg · Jul 21, 2024

QuadrupleOption said:
QA is bad in many cases, but most (if not all) of the big-boy SaaS operations run containerized systems that are easily spun up/down to handle extreme demand, are geo-redundant, and can be rolled back quickly if a software update goes wrong.

This issue was due to poor QA pushing out a patch that disabled individual laptops. I assume most IT shops had their servers up and running quickly after this occurrence.

It disabled any computer using Crowdstrike that was powered by any of several editions of Microsoft OSs, including all the major PC and server editions still in use, at least back through Server 2008.
Took us about 12 hours to get things 100% running again, although we were never 100% down because not all of our servers' Crowdstrike installs had updated by the time Crowdstrike pulled the update down. I still have at least one team member whose laptop will have to be reimaged or replaced, and this is a guy with two decades of IT experience.

Meanwhile, a UAB computer forensics went on the local news here and said Crowdstrike fixed the issue by sending out another update. That was definitely incorrect. Any computer that got the update required some kind of intervention to run again, be that by rolling back to an earlier backup or by manual deletion of the file causing the issue; impacted computers couldn't get online to receive another update.

MSUGUY · Jul 21, 2024

Boom Boom said:
The problem is ample QA hurts margins, so corp America hates it. Maybe they're not as bad about it as manufacturing in America has gotten. Yet.

That's more info on SolarWinds than I've ever seen publicly reported. It's like the media isn't allowed to talk about it....

I think this is what happened to MGM last year, they refused to pay for IT security upgrades and eventually got a ransom ware attack which they opted to not pay. They shut the whole system down and started from scratch.

onewoof · Jul 22, 2024

The fact that most of the world runs on Microsoft Windows... says a lot

Anon1704414204 · Jul 22, 2024

I find it amazing that for some 10K years the horse was the fastest mode of transportation till some 175 yrs ago the steam locomotive showed up. Since then we've gone to the moon and now AI when only a little while back the Pony Express was cutting edge. Makes me think of Elton John's song "Country Comforts" ...."Down at the well they got a new machine... Foreman says it cuts manpower by 15...oh but that ain't natural Old Clay would say cuz he's a horse drawn man until his dying day."

OT: Potentially largest IT outage in history today

Bulldog Bruce

All-American

Trojanbulldog19

All-American

The Cooterpoot

Heisman

QuadrupleOption

All-Conference

Shmuley

Heisman

thatsbaseball

All-American

MSUGUY

Junior

Willow Grove Dawg

All-Conference

Boom Boom

All-Conference

00Dawg

Senior

MSUGUY

Junior

onewoof

Heisman

Anon1704414204

Senior