Google
Full transcript (Instant)

Borg, Omega, and Kubernetes

Google built three container-management systems over a decade — then open-sourced the third while the first, with every flaw catalogued in this paper, still runs everything at Google. The lesson that

research.google.com

Gist

1.

Original

Continue Reading

Full transcript (Deep)

Borg, Omega, and Kubernetes

Google built three container-management systems over a decade — then open-sourced the third while the first, with every flaw catalogued in this paper, still runs everything at Google. The lesson that

research.google.com

Gist

1.

Google built three container-management systems over a decade — then open-sourced the third while the first, with every flaw catalogued in this paper, still runs everything at Google. The lesson that changed infrastructure forever: stop managing machines, start managing applications.

Logic

2.

Three systems, one bloodline — each born from the last one's scars

  • Borg unified two earlier systems (Babysitter for services, Global Work Queue for batch) into Google's first container manager — before Linux cgroups even existed
  • Omega rebuilt Borg's architecture from scratch with a Paxos-based store and optimistic concurrency control, then Google folded its best innovations back into Borg
  • Kubernetes took both systems' lessons open-source, wrapping state access in a versioned REST API — but Borg, flaws and all, remains Google's primary system at scale

3.

Containers shifted the "primary key" of the data center from machine to application

  • Hermetic container images encapsulate nearly all dependencies; the only external dependency is the Linux kernel syscall interface — Google runs its entire fleet on a handful of OS versions maintained by a small staff
  • When the container is the application, monitoring, logs, load balancing, and failure attribution all key on application identity, not machine identity — no more SSH-ing in to run top
  • The pod model (Borg's "alloc," Kubernetes' "pod") nests containers together: main app in one, log rotation or click-log offloading in siblings — different teams, shared resources, independent failure domains

4.

Orchestration is day one; the ecosystem is the real product

  • Borg spawned naming (BNS), master election (Chubby), autoscaling, rollout tools, and monitoring — each with idiosyncratic APIs, conventions, and integration depth that made deployment progressively harder
  • Kubernetes imposed a uniform three-field structure on every object: ObjectMetadata (name, UID, version, labels), Spec (desired state), Status (observed state) — learn one, learn all
  • Decoupling works: the replication controller ensures N pods exist; the autoscaler adjusts N without knowing how pods are created — each component does one thing, and the reconciliation loop converges desired to observed state on restart

5.

Four Borg mistakes became four Kubernetes design rules

  • Borg shared host IPs and assigned ports at scheduling time — breaking DNS, URLs, and every tool expecting plain IP addresses; Kubernetes allocates one IP per pod, restoring well-known ports like 80
  • Borg numbered tasks in a compact zero-indexed vector, creating holes on exit and data unavailability during rolling upgrades of range-sharded services; Kubernetes replaced indexes with dynamic key-value labels and label selectors
  • Borg's jobs owned their tasks absolutely — delete the job, delete the tasks, no way to quarantine a misbehaving instance; Kubernetes uses label selectors for loose ownership, enabling orphan-and-adopt patterns for live debugging
  • Omega exposed raw store state to all clients, pushing all semantics into client libraries; Kubernetes routes everything through a centralized API server that enforces validation, versioning, and policy while keeping components decoupled

6.

Configuration is the tar pit — ten years of brainpower and it's still unsolved

  • Config becomes the catch-all for everything the container system doesn't do: boilerplate reduction, parameter validation, image versioning, release management, package workarounds
  • Every configuration system eventually invents a Turing-complete DSL — moving computation from a real language into one with no debugger and no unit test framework
  • The authors' prescription: accept programmatic configuration, keep data in JSON or YAML, and do all computation in a real language — the same separation Angular enforces between markup and JavaScript

7.

Dependency management: the problem nobody has cracked

  • Standing up a service means standing up monitoring, storage, and CI/CD — plus registering as a consumer, passing auth and billing across transitive dependencies
  • Almost no system captures or exposes this dependency information; manual tracking rots instantly and automated tracing can't distinguish "must go to that instance" from "any instance would do"
  • The proposed fix — require declared dependencies, refuse undeclared access, like compiler imports in Bazel — remains too complex for any mainstream system to adopt; "doing so remains an open challenge"

Counter-Argument

8.

The system with every flaw in this paper still runs all of Google — and that's the fact nobody addresses

  • Borg — shared ports, numbered task vectors, rigid job ownership, monolithic Borgmaster — remains Google's primary container system because of "its scale, breadth of features, and extreme robustness." Every lesson here is retrospective, not proven at Borg-scale in Kubernetes.
  • Omega's principled architecture was supposed to replace Borg's monolith; instead its innovations were absorbed back into Borg. The monolith won. The clean design became a patch set.
  • The article provides zero performance data, zero utilization numbers, zero quantitative comparisons between the three systems. Every claimed benefit — application-oriented infrastructure, choreography over orchestration, IP-per-pod — is argued from design aesthetics, not measured outcomes.

Steelman

9.

Google didn't open-source a container system — it open-sourced a decade of operational scar tissue

  • Both the thesis and the counter-argument assume the question is which architecture is technically superior. That's the wrong question. The real event is that five engineers who built Google's infrastructure published every mistake they made and shipped a free system encoding the fixes.
  • Before Kubernetes, running containers at scale required reinventing every lesson in this paper from scratch. After Kubernetes, a three-person startup deploys with the same patterns Google took ten years and three systems to discover. The baseline shifted permanently.
  • Borg still running at Google doesn't disprove the lessons — it proves they're hard-won. The value isn't in replacing Borg; it's in making sure nobody else has to build Borg first to learn what the authors already know.

Original

Continue Reading

Transcript

Borg, Omega, and Kubernetes

Google built three container-management systems over a decade — then open-sourced the third while the first, with every flaw catalogued in this paper, still runs everything at Google. The lesson that

research.google.com

Gist

1.

Google built three container-management systems over a decade — then open-sourced the third while the first, with every flaw catalogued in this paper, still runs everything at Google. The lesson that changed infrastructure forever: stop managing machines, start managing applications.

Logic

2.

Three systems, one bloodline — each born from the last one's scars

  • Borg unified two earlier systems (Babysitter for services, Global Work Queue for batch) into Google's first container manager — before Linux cgroups even existed
  • Omega rebuilt Borg's architecture from scratch with a Paxos-based store and optimistic concurrency control, then Google folded its best innovations back into Borg
  • Kubernetes took both systems' lessons open-source, wrapping state access in a versioned REST API — but Borg, flaws and all, remains Google's primary system at scale

3.

Containers shifted the "primary key" of the data center from machine to application

  • Hermetic container images encapsulate nearly all dependencies; the only external dependency is the Linux kernel syscall interface — Google runs its entire fleet on a handful of OS versions maintained by a small staff
  • When the container is the application, monitoring, logs, load balancing, and failure attribution all key on application identity, not machine identity — no more SSH-ing in to run top
  • The pod model (Borg's "alloc," Kubernetes' "pod") nests containers together: main app in one, log rotation or click-log offloading in siblings — different teams, shared resources, independent failure domains

4.

Orchestration is day one; the ecosystem is the real product

  • Borg spawned naming (BNS), master election (Chubby), autoscaling, rollout tools, and monitoring — each with idiosyncratic APIs, conventions, and integration depth that made deployment progressively harder
  • Kubernetes imposed a uniform three-field structure on every object: ObjectMetadata (name, UID, version, labels), Spec (desired state), Status (observed state) — learn one, learn all
  • Decoupling works: the replication controller ensures N pods exist; the autoscaler adjusts N without knowing how pods are created — each component does one thing, and the reconciliation loop converges desired to observed state on restart

5.

Four Borg mistakes became four Kubernetes design rules

  • Borg shared host IPs and assigned ports at scheduling time — breaking DNS, URLs, and every tool expecting plain IP addresses; Kubernetes allocates one IP per pod, restoring well-known ports like 80
  • Borg numbered tasks in a compact zero-indexed vector, creating holes on exit and data unavailability during rolling upgrades of range-sharded services; Kubernetes replaced indexes with dynamic key-value labels and label selectors
  • Borg's jobs owned their tasks absolutely — delete the job, delete the tasks, no way to quarantine a misbehaving instance; Kubernetes uses label selectors for loose ownership, enabling orphan-and-adopt patterns for live debugging
  • Omega exposed raw store state to all clients, pushing all semantics into client libraries; Kubernetes routes everything through a centralized API server that enforces validation, versioning, and policy while keeping components decoupled

6.

Configuration is the tar pit — ten years of brainpower and it's still unsolved

  • Config becomes the catch-all for everything the container system doesn't do: boilerplate reduction, parameter validation, image versioning, release management, package workarounds
  • Every configuration system eventually invents a Turing-complete DSL — moving computation from a real language into one with no debugger and no unit test framework
  • The authors' prescription: accept programmatic configuration, keep data in JSON or YAML, and do all computation in a real language — the same separation Angular enforces between markup and JavaScript

7.

Dependency management: the problem nobody has cracked

  • Standing up a service means standing up monitoring, storage, and CI/CD — plus registering as a consumer, passing auth and billing across transitive dependencies
  • Almost no system captures or exposes this dependency information; manual tracking rots instantly and automated tracing can't distinguish "must go to that instance" from "any instance would do"
  • The proposed fix — require declared dependencies, refuse undeclared access, like compiler imports in Bazel — remains too complex for any mainstream system to adopt; "doing so remains an open challenge"

Counter-Argument

8.

The system with every flaw in this paper still runs all of Google — and that's the fact nobody addresses

  • Borg — shared ports, numbered task vectors, rigid job ownership, monolithic Borgmaster — remains Google's primary container system because of "its scale, breadth of features, and extreme robustness." Every lesson here is retrospective, not proven at Borg-scale in Kubernetes.
  • Omega's principled architecture was supposed to replace Borg's monolith; instead its innovations were absorbed back into Borg. The monolith won. The clean design became a patch set.
  • The article provides zero performance data, zero utilization numbers, zero quantitative comparisons between the three systems. Every claimed benefit — application-oriented infrastructure, choreography over orchestration, IP-per-pod — is argued from design aesthetics, not measured outcomes.

Steelman

9.

Google didn't open-source a container system — it open-sourced a decade of operational scar tissue

  • Both the thesis and the counter-argument assume the question is which architecture is technically superior. That's the wrong question. The real event is that five engineers who built Google's infrastructure published every mistake they made and shipped a free system encoding the fixes.
  • Before Kubernetes, running containers at scale required reinventing every lesson in this paper from scratch. After Kubernetes, a three-person startup deploys with the same patterns Google took ten years and three systems to discover. The baseline shifted permanently.
  • Borg still running at Google doesn't disprove the lessons — it proves they're hard-won. The value isn't in replacing Borg; it's in making sure nobody else has to build Borg first to learn what the authors already know.

Original

Continue Reading