Back to Blog

Five Outages in One Day: What GitHub's February Meltdown Teaches Us About Resilience

resilience, GitHub, CI/CD

Today, GitHub went down. Not once — five times. If your engineering team spent the day refreshing status pages, cursing at failed CI/CD pipelines, and frantically Slacking each other about whether that production deploy actually shipped… you're not alone. But the real story isn't that GitHub had a bad day. The real story is what it exposed about how we build systems in 2026.

The Incident

Starting at 08:15 UTC, GitHub experienced cascading failures across nearly every service. Here's the timeline:

  • 08:15–11:26 — Pull Requests, Webhooks, Issues, Actions, and Git Operations hit intermittent timeouts. A faulty infrastructure component was identified and failed over. ~1% of requests affected.
  • 10:01–12:12 — Copilot Coding Agent degraded.
  • 14:17–15:46 — Actions run-start delays hit ~4% of users due to a pipeline bottleneck.
  • 15:54–19:29 — Notification delivery delays escalated to an average latency of 1 hour 20 minutes.
  • 16:19–17:40Major incident. Pull Requests, Issues, Actions, Git Operations, Webhooks, and Pages all degraded. Intermittent errors across the platform.
  • 19:01–20:09Second major incident. Actions, Copilot, Issues, Git, Webhooks, Pages, Pull Requests, Packages, and Codespaces all impacted.

Five separate incidents. Two service-wide. Twelve hours of instability on a platform that most of the software industry treats as critical infrastructure.

Context matters: GitHub is currently mid-migration from its legacy datacenter to Azure, a process that started in October 2025. As engineers on Hacker News noted, this half-migrated state is causing instability. One observed that GitHub is "down to a single 9" across all services — a remarkable statement for a platform that wants to be your entire development ecosystem.

The Blast Radius

A GitHub outage in 2026 isn't just "we can't push code for a few hours." The blast radius is enormous because we've built massive dependency chains on top of it:

  • CI/CD pipelines stopped. Teams using GitHub Actions watched builds queue, fail, or silently never trigger. Production deploys never fired. Engineers had to "watch like a hawk" to catch the gaps.
  • Webhooks delayed up to 2.5 hours. Every downstream system triggered by webhooks — Slack notifications, deployment orchestrators, monitoring alerts — went silent or stale.
  • Go builds failed globally. Go's module system fetches dependencies from GitHub. When GitHub is down, go build doesn't work. Not just your project — all Go projects.
  • Pages deployments stalled. Documentation updates, marketing sites, and developer portals sat in limbo. We experienced this firsthand.

One service provider's bad day became everyone's bad day. That's not a GitHub problem. That's an architecture problem.

The AI Dependency Problem

Here's where it gets interesting — and where 2026 diverges sharply from outages past.

GitHub Copilot Coding Agent was down during two of the incidents. But Copilot wasn't the only AI tool affected. Every AI coding assistant that depends on GitHub for repository context, file access, or code search was impaired. Claude Code, Codex, and similar tools that clone or read GitHub repos couldn't function normally.

Think about the dependency chain: AI coding tool → GitHub API → Azure infrastructure. Three layers of abstraction, each a single point of failure. When any link breaks, the entire chain goes dark.

As engineers on Hacker News observed: "It's extra galling that they advertise all the new buzzword-laden AI pipeline features while the regular website and actions fail constantly." Copilot, it turns out, has the worst uptime record of all GitHub services.

We're building increasingly powerful AI-assisted development workflows on top of infrastructure that can't reliably serve pull requests. That should make every engineering leader uncomfortable.

The Compliance Trap

This is the part that keeps me up at night — and it should keep you up too.

Many organizations use GitHub Pull Requests as their SOC 2 change management evidence. PR reviews, approval workflows, and merge records form the audit trail that demonstrates your change management controls are operating effectively.

When GitHub is down, you literally cannot produce compliance evidence. You can't create PRs. You can't approve them. You can't merge them. Your SOC 2 control is inoperable — not because of anything you did wrong, but because you outsourced a critical compliance control to a third party.

It gets worse. I've reviewed compliance policies at dozens of companies that effectively require GitHub by name. Not "a version control system with peer review capabilities" — specifically GitHub. As one engineer noted: "Every product vendor has a wet dream: to have their product explicitly named in corporate policies."

When you name a specific tool in your compliance framework, you've created an undocumented single point of failure. Your SOC 2 auditor isn't evaluating GitHub's uptime. They're evaluating your controls. And if your control depends on a service that had five outages in one day, that's your problem.

The fix is straightforward: compliance frameworks should specify controls, not tools. Your policy should say "all code changes require peer review documented in an auditable system" — not "all code changes require an approved GitHub Pull Request."

What You Should Do

Five things every engineering and security leader should do this week:

1. Map Your GitHub Dependencies

Draw the full dependency graph. Every system that calls a GitHub API or depends on a webhook. Every CI/CD pipeline. Every deployment trigger. Every compliance workflow. You'll be surprised — or horrified — by how deep it goes.

2. Build Deployment Bypass Procedures

You need a documented, tested procedure to deploy to production when GitHub is unavailable. Not abandoning your normal workflow — a glass-break procedure for shipping a critical hotfix when Actions is down. If you don't have one, today proved you need one.

3. Decouple Your CI/CD

Consider running critical pipelines on self-hosted runners or alternative CI systems (GitLab CI, CircleCI, Buildkite) that can operate independently. At minimum, mirror your repositories so builds can proceed when github.com is unreachable.

4. Fix Your Compliance Language

Review every compliance policy that names a specific vendor. Replace tool names with control descriptions. "GitHub Pull Request" becomes "peer-reviewed code change with documented approval." This gives you flexibility to maintain compliance continuity during outages — and vendor transitions.

5. Assess Your AI Tool Resilience

If your team depends on AI coding assistants — and in 2026, most do — understand what happens when those tools lose access to your codebase. Can your developers still be productive? Treat AI tool outages like any other dependency failure and plan accordingly.

The Bigger Picture

Today's GitHub outage is a symptom of a broader problem: we've built an entire industry on tightly coupled systems and then acted surprised when a single failure cascades everywhere. GitHub is critical infrastructure, but it's operated as a commercial SaaS product with SaaS-level reliability guarantees.

The companies that weathered today well are the ones that already treated GitHub as a dependency to be managed, not an assumption to be made. They had fallbacks. They had bypass procedures. They had compliance language that survived a vendor outage.

The rest learned an expensive lesson.

If today's outage exposed gaps in your resilience or compliance posture, book a free strategy call and let's review your dependency map together — before the next incident.

Peter Hallen is a fractional CISO and SOC 2 compliance consultant who helps growth-stage companies build security programs that actually work. He has spent 25+ years making systems more resilient — and has strong opinions about vendor lock-in.

resilienceGitHubCI/CDSOC 2complianceAI dependencyinfrastructureoutage

Ready to Assess Your Security?

Take our free 2-minute compliance checklist to see where you stand with SOC 2, HIPAA, and more.