7 Ways Docker’s Virtual Agent Fleet Revolutionizes CI/CD and Testing

At Docker, the Coding Agent Sandboxes team has built a groundbreaking virtual workforce called the Fleet—a team of seven AI agent roles that autonomously test, triage, write release notes, and fix bugs. Running on secure microVM-based isolation provided by the sbx tool, these agents operate with full autonomy inside sandboxes, never touching the host system. This article dives into the seven key aspects of the Fleet, from its local-first design to its seamless CI integration, showing how it’s redefining developer productivity.

1. What Is the Fleet?

The Fleet is a virtual team of seven AI agent personas, each with a distinct role: testing the product, triaging issues, posting release notes, and even fixing bugs. These agents run autonomously in CI, orchestrated by Claude Code skills. The foundation is Docker’s sbx (Coding Agent Sandboxes) tool, which provides secure, microVM-based isolation. Inside each sandbox, agents get their own Docker daemon, network, and filesystem—full autonomy without interacting with the host system. The Fleet wasn’t built overnight; it emerged from a couple of weeks of iterative development, turning the traditional CI pipeline on its head.

7 Ways Docker’s Virtual Agent Fleet Revolutionizes CI/CD and Testing — Source: www.docker.com

2. Claude Code Skills: Roles, Not Scripts

Each agent is defined by a Claude Code skill—a markdown file that gives the agent a persona, responsibilities, and allowed tools. Unlike a script that dictates steps, a skill describes a role: “You are the build engineer; here’s what you know and how you make decisions.” This distinction is crucial because agents need judgment, not just instructions. When a test fails unexpectedly, a script stops; a role investigates. The same skill file works identically whether run on a developer’s laptop or in CI, eliminating the usual translation layer between local and remote execution.

3. The Local First, CI Second Philosophy

The team’s core design principle: every skill runs on your machine first. For example, the /cli-tester skill (the Fleet’s exploratory tester) was developed locally. Developers invoked it from their terminal, watched it build binaries, exercise CLI commands, find issues, and report them. They tweaked the skill file until it performed correctly. This local-first approach avoids the painful commit-push-wait-read-logs debugging cycle common with CI-only agents. Iteration time drops from minutes to seconds, as you see the agent think in real time and correct confusion immediately.

4. One Skill, Two Runtimes

CI is just another runtime for the same skill. The /cli-tester that runs nightly on MacOS, Linux, and Windows runners is the exact same skill file developers use in their terminals. The CI workflow sets up the environment, checks out the code, and invokes the skill—no separate CI version, no translation layer. This consistency means behaviors tested locally behave identically in production. The Fleet’s design ensures that agents scale smoothly across environments, making testing and reporting reliable across all platforms. It’s a true plug-and-play model for autonomous agents.

5. The /cli-tester Role: Autonomous Exploratory Testing

One of the Fleet’s key agents, the /cli-tester, is an exploratory tester that exercises Docker’s CLI commands. It builds binaries, runs sequences of commands, and monitors outputs for anomalies. When it finds an issue, it reports it without human intervention. This agent runs nightly across multiple platforms and upgrade paths, catching regressions that traditional unit tests might miss. The autonomy allows it to explore edge cases and unexpected behaviors, providing the team with daily visibility into what shipped. It’s like having a dedicated QA engineer working 24/7.

6. Investigation Over Halting

Traditional CI scripts halt on failure—a test fails, and the pipeline stops until a human intervenes. The Fleet’s agents, built on roles, investigate instead. When a test fails unexpectedly, the agent analyzes logs, checks system state, and triages the issue. It can even attempt a fix. This investigation mindset reduces false positives and saves developer time. The agent might discover that a flaky network condition caused the failure and retry, or it might detect a genuine bug and create a detailed report. The result: fewer manual interventions and faster resolution cycles.

7. Scaling Maintenance Without Burnout

Maintaining a growing codebase with multiple platforms, upgrade paths, and issue backlogs can overwhelm small teams. The Fleet handles these tasks autonomously: triaging issues, posting release notes, and fixing bugs without developer burnout. The agents work in parallel across MacOS, Linux, and Windows, providing daily visibility. This scalability means the team can focus on high-value work while the Fleet handles repetitive maintenance. The autonomous nature ensures consistency—every release note is accurate, every triaged issue is categorized, and every bug fix follows established patterns. It’s CI/CD evolution at its finest.

Conclusion

Docker’s virtual agent Fleet demonstrates a paradigm shift in how teams approach CI/CD and software maintenance. By combining secure sandboxing with Claude Code skills that define roles rather than scripts, the team has built a system that is both powerful and flexible. The local-first design accelerates development, while the seamless CI integration ensures reliability at scale. As autonomous agents become more sophisticated, this approach could become the standard for how development teams augment their capacity—not by hiring more humans, but by empowering them with virtual teammates that never sleep.

Tags: