Capy and Boost.Cobalt: A Comparison
Both libraries use C++20 coroutines for asynchronous programming. The differences begin with the foundation.
Cobalt is a coroutine layer built on Boost.Asio. It adds coroutine syntax — promise, task, generator — on top of Asio’s existing I/O infrastructure. Asio is not coroutines-only. It supports callbacks, futures, and coroutines equally. Cobalt inherits this foundation. It can add coroutine types on top, but it cannot change what lies beneath.
Capy is a coroutine-native I/O foundation designed from the ground up. The design started from the ideal use case and worked backward to the implementation. The concept hierarchy, the type-erased wrappers, the allocator model — these fell out naturally from use-case-first design, without compromise.
The Dimovian Ideal
An I/O library should make the implementation completely invisible to its consumers. Public headers declare the interface — types, functions, contracts. All platform-specific machinery lives in the translation unit. No implementation detail leaks into the consumer’s code.
Capy achieves the Dimovian Ideal. The proof is in example/asio/.
The Header
api/capy_streams.hpp is the public interface. It contains zero Asio includes:
#include <boost/capy/ex/execution_context.hpp>
#include <boost/capy/ex/executor_ref.hpp>
#include <boost/capy/io/any_stream.hpp>
#include <utility>
namespace boost { namespace asio { class io_context; } }
class asio_context : public capy::execution_context
{
struct impl;
impl* impl_;
public:
using executor_type = capy::executor_ref;
asio_context();
~asio_context();
net::io_context& context() noexcept;
executor_type get_executor() noexcept;
void run();
};
std::pair<capy::any_stream, capy::any_stream>
make_stream_pair(asio_context& ctx);
Asio appears only as a forward declaration. The context uses pimpl. The factory returns capy::any_stream — a type-erased stream that hides the concrete socket type entirely.
The Translation Unit
api/capy_streams.cpp is where every Asio header lives. The concrete asio_socket wraps tcp::socket. The concrete asio_executor wraps io_context::executor_type. All of it is invisible to consumers of the header.
The Algorithm Code
any_stream.cpp demonstrates the result. It includes api/capy_streams.hpp and Capy headers. No Asio headers. None.
capy::task<>
writer(capy::any_stream& stream, std::size_t total)
{
char buf[128];
std::memset(buf, 'X', sizeof(buf));
std::size_t written = 0;
while(written < total)
{
std::size_t chunk = (std::min)(sizeof(buf), total - written);
auto [ec, n] = co_await stream.write_some(
capy::make_buffer(buf, chunk));
if(ec)
co_return;
written += n;
}
}
capy::task<>
reader(capy::any_stream& stream, std::size_t total)
{
char buf[128];
std::size_t read_total = 0;
while(read_total < total)
{
auto [ec, n] = co_await stream.read_some(
capy::make_buffer(buf));
if(ec)
co_return;
read_total += n;
}
}
writer() and reader() operate on capy::any_stream&. They don’t know what I/O backend produced the stream. They never need to know.
What Cobalt Does Instead
Cobalt’s cobalt::io namespace provides wrappers around Asio I/O objects. These wrappers expose concrete Asio types through their interfaces. A cobalt::io::steady_timer is an asio::basic_waitable_timer. A cobalt::io::socket is an asio::basic_stream_socket. The wrappers preserve direct access to the underlying Asio types.
Consumers of Cobalt I/O objects must include Asio headers. The backend remains part of the public interface.
Relink Without Recompile
A library written against Capy’s type-erased streams can be relinked against entirely different stream implementations. TCP today. QUIC tomorrow. A test mock in CI. The polymorphism is the same as what templated Asio code achieves — except the library does not need a recompile. The binary is the interface. Drop in a new .so or .dll that implements the stream contract, relink, and behavior changes.
Templates can achieve this by type-erasing every customization point. The cost makes it impractical.
| Aspect | Capy | Cobalt |
|---|---|---|
Backend includes in header |
None (forward declaration only) |
Required |
Implementation hiding |
Pimpl + type-erased returns |
Concrete Asio types exposed |
Algorithm code depends on backend |
No |
Yes |
Relink without recompile |
Yes |
No |
ABI stability across implementations |
Yes |
No |
Stream Concepts
Capy defines seven coroutine-only stream concepts. Cobalt inherits Asio’s AsyncReadStream and AsyncWriteStream, which are hybrid concepts supporting callbacks, futures, and coroutines. Cobalt’s cobalt::io wrappers simplify the API and Cobalt defines stream abstractions (write_stream, read_stream, stream) as abstract base classes, a distinct approach from Capy’s concept-based hierarchy. Cobalt’s wrappers still include full Asio headers. See Write Stream Design for a detailed comparison of the two approaches.
Capy’s concepts form a refinement hierarchy that emerged naturally from use-case-first design:
ReadStream WriteStream
(partial reads) (partial writes)
| |
v v
ReadSource WriteSink
(complete reads) (complete writes + EOF)
BufferSource BufferSink
(zero-copy pull) (zero-copy prepare/commit)
BufferSource and BufferSink implement callee-owns-buffers I/O. The source provides buffers; the caller processes them in place. No copies. Memory-mapped files, hardware DMA buffers, and kernel-provided memory all work naturally through this pattern.
| Concept | Capy | Cobalt |
|---|---|---|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
Type-Erased Streams
Traditional approaches to type erasure in Asio focus on the lowest-level elements: the completion handler, the executor, the allocator. This is not the right layer. Type-erasing these individually adds overhead at every customization point while still leaving the stream type concrete and visible.
Capy type-erases the stream itself. This is possible because coroutines provide structural type erasure — the continuation is always a handle, not a template parameter. When the library is coroutines-only, one virtual call per I/O operation is the total cost. The completion handler, executor, and allocator do not need individual erasure because they are not part of the stream’s operation signature.
Cobalt defines stream abstractions (write_stream, read_stream, stream) as abstract base classes in cobalt/io/stream.hpp, taking a different approach from Capy’s concept + type-erased wrapper model. See Write Stream Design for a side-by-side analysis.
The wrappers compose. any_buffer_source also satisfies ReadSource — natively if the wrapped type supports both, synthesized otherwise. any_buffer_sink also satisfies WriteSink. You pick the abstraction level you need.
Concept Type-Erased Wrapper --------------------+------------------------ ReadStream -----> any_read_stream WriteStream -----> any_write_stream Stream -----> any_stream ReadSource -----> any_read_source WriteSink -----> any_write_sink BufferSource -----> any_buffer_source ----> also satisfies any_read_source BufferSink -----> any_buffer_sink ----> also satisfies any_write_sink
This is how the Dimovian Ideal is mechanically achieved.
| Type-Erased Wrapper | Capy | Cobalt |
|---|---|---|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
Mock Streams and Testability
When algorithms operate on type-erased interfaces, testing becomes deterministic. Capy provides mock implementations for every stream concept. Cobalt defines stream abstractions as abstract base classes but does not provide mock implementations for testing. See Write Stream Design for a comparison of the two stream designs.
Capy’s mock types:
-
test::read_stream,test::write_stream— partial I/O mocks -
test::stream— connected pair for bidirectional testing -
test::read_source,test::write_sink— complete I/O mocks -
test::buffer_source,test::buffer_sink— zero-copy mocks
test::fuse injects errors systematically at every I/O operation point. test::run_blocking executes coroutines synchronously for deterministic unit tests. max_read_size and max_write_size simulate chunked delivery. expect() validates written data.
Tests run without sockets or network access, eliminating non-determinism.
| Testing Feature | Capy | Cobalt |
|---|---|---|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
Error injection ( |
Yes |
|
Synchronous execution ( |
Yes |
|
Chunked delivery simulation |
Yes |
|
Data validation ( |
Yes |
Threading Model
Cobalt is single-threaded by design. One executor per thread. Channels are restricted to a single thread — Cobalt’s own documentation states: "Channels can be used to exchange data between different coroutines on a single thread." Primitives cannot be shared between threads.
Capy supports multi-threaded execution. thread_pool distributes work across threads. strand serializes execution without blocking OS threads. The Executor concept is open — implement your own.
| Threading | Capy | Cobalt |
|---|---|---|
Multi-threaded execution |
|
No |
Serialized execution |
|
Single-threaded only |
Executor model |
Concept-based (open) |
Single-threaded (closed) |
Cross-thread channels |
Yes |
No |
Primitives shareable across threads |
Yes |
No |
Context Propagation
Cobalt stores executor context in thread-local variables. Coroutines access it via this_coro::executor. This works on a single thread with a single executor. This design is scoped to single-threaded, single-executor configurations.
Capy introduces the IoAwaitable protocol and uses it for context propagation. When you co_await, the caller passes its execution environment to the child structurally:
auto await_suspend(std::coroutine_handle<> h, io_env const* env);
No thread-local state. No ambient context. The executor and stop token flow forward through the call chain via the io_env parameter.
| Context Propagation | Capy | Cobalt |
|---|---|---|
Mechanism |
|
Thread-local variables |
Works with strands |
Yes |
No |
Works with multiple executors |
Yes |
No |
Stop token delivery |
Structural ( |
|
Cancellation
Both libraries propagate cancellation automatically through coroutine chains. Both support OS-level cancellation of pending I/O operations (CancelIoEx on Windows, IORING_OP_ASYNC_CANCEL on Linux).
Capy uses std::stop_token, propagated via the IoAwaitable protocol’s io_env parameter. The token flows forward structurally alongside the executor.
Cobalt uses Asio’s cancellation_signal and cancellation_slot. Propagation is wired automatically in await_suspend via forward_cancellation. this_coro::cancellation_state provides filtering control over which cancellation types pass through.
| Cancellation | Capy | Cobalt |
|---|---|---|
Token type |
|
|
Propagation |
Automatic ( |
Automatic (slot/signal wiring) |
Filtering |
Application-level |
|
OS-level cancellation |
Yes (via Corosio) |
Yes (via Asio) |
Buffer Sequences
Capy adopts Asio’s buffer sequence model — ConstBufferSequence, MutableBufferSequence — because it works. Capy’s buffer types are fully compatible with Asio’s. You can pass Capy buffers to Asio operations and vice versa, seamlessly. Then Capy extends the model with additional types and algorithms, while still achieving the Dimovian Ideal — none of this requires exposing Asio headers to consumers.
Cobalt does not provide buffer sequence types or dynamic buffer support. Users who need these features use Asio’s types directly, inheriting the DynamicBuffer_v1/DynamicBuffer_v2 split.
Capy has one DynamicBuffer concept. The v1/v2 split in Asio exists because of a fundamental ownership problem: when an async operation takes a buffer by value and completes via callback, who owns the buffer? The original design had flaws, and the fix created two incompatible versions. By going coroutines-only, Capy avoids this entirely. The coroutine frame owns the buffer. Parameters have their lifetimes extended by the suspended frame, and the awaitable lives in the frame alongside them. There is no decay-copy, no ownership transfer, no ambiguity. One concept is sufficient.
| Buffer Feature | Capy | Cobalt |
|---|---|---|
|
Yes |
Via Asio |
|
Yes |
Via Asio |
|
Unified |
None (use Asio directly) |
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
|
Yes |
|
Byte-level trimming |
Yes |
Allocator Control
Cobalt sets up a thread-local PMR resource via main or thread. All coroutines on that thread share it. Every awaitable embeds a fixed SBO buffer:
// cobalt/op.hpp
struct awaitable : awaitable_base
{
char buffer[BOOST_COBALT_SBO_BUFFER_SIZE]; // default: 4096
detail::sbo_resource resource{buffer, sizeof(buffer)};
};
If the buffer is exhausted, allocations fall back to the upstream PMR resource or operator new. The buffer size is a compile-time constant. Changing it requires recompiling the library.
Capy leaves these decisions to the user. run_async(executor, allocator)(my_task()) sets the allocator before the task is created. The task’s operator new reads it from thread-local storage. This is a small, flexible customization point that permits usage patterns the authors did not anticipate: per-connection arenas, bounded pools, tracking allocators, per-tenant memory budgets. The allocation strategy is a deployment decision, not a library decision.
recycling_memory_resource provides zero-overhead recycling after warmup. Memory isolated per connection. Reclaimed instantly on disconnect.
| Allocator Control | Capy | Cobalt |
|---|---|---|
Granularity |
Per-task |
Per-thread |
Allocation model |
Forward-flow |
Thread-local PMR |
Per-connection arenas |
Yes |
No |
Recycling allocator |
|
|
Custom allocator support |
|
Global setup only |
Deterministic freeing |
Yes |
Non-deterministic on MSVC |
Execution/Platform Separation
Cobalt is coupled to Asio’s io_context. The execution model and the platform abstractions are one thing.
Capy separates them. The execution model — executors, cancellation, allocation — lives in Capy. Platform abstractions live in Corosio, a companion library that provides native TCP sockets, acceptors, TLS streams, timers, DNS resolution, and signal handling — all built on Capy’s IoAwaitable protocol with native IOCP and epoll backends. You can test Capy’s execution model without a network stack. You can swap the I/O backend without changing your application code.
| Architecture | Capy | Cobalt |
|---|---|---|
Execution model |
Capy (independent) |
Coupled to |
Platform abstractions |
Corosio (separate library) |
Asio (same dependency) |
Testable without I/O backend |
Yes |
No |
Swappable backends |
Yes |
No |
Coroutine Overhead
To measure the overhead that coroutines add to a real workload, an experimental JSON serializer drives output through a chain of co_await calls instead of direct function calls. Each JSON value type — null, bool, integer, double, string, array, object — is handled by a coroutine that writes fragments through a write_sink. The baseline is boost::json::serialize, a highly optimized non-coroutine implementation.
The input is numbers.json from the Boost.JSON benchmark suite. Results are best-of-four runs, Clang 20, -O3, Windows x64:
| Serializer | Time | vs baseline |
|---|---|---|
|
317 us |
1.0x |
|
537 us |
1.69x |
|
1,361 us |
4.29x |
|
26,079 us |
82.3x |
Capy’s coroutine-driven serializer runs at 1.69x the baseline. Cobalt’s promise is 4.29x. Cobalt’s task is 82x.
The Capy implementation:
namespace {
template<class WS> task<> serialize(json::value const& v, WS& ws);
template<class WS> task<> write(std::nullptr_t, WS& ws) {
co_await ws.write("null", 4);
}
template<class WS> task<> write(bool v, WS& ws) {
if(v) co_await ws.write("true", 4);
else co_await ws.write("false", 5);
}
template<class WS> task<> write(std::int64_t v, WS& ws) {
char buf[32];
auto r = std::to_chars(buf, buf + sizeof(buf), v);
co_await ws.write(buf, r.ptr - buf);
}
template<class WS> task<> write(std::uint64_t v, WS& ws) {
char buf[32];
auto r = std::to_chars(buf, buf + sizeof(buf), v);
co_await ws.write(buf, r.ptr - buf);
}
template<class WS> task<> write(double v, WS& ws) {
char buf[32];
auto r = std::to_chars(buf, buf + sizeof(buf), v);
co_await ws.write(buf, r.ptr - buf);
}
template<class WS> task<> write(std::string_view v, WS& ws) {
co_await ws.write("\"", 1);
co_await ws.write(v.data(), v.size());
co_await ws.write("\"", 1);
}
template<class WS> task<> write(json::array const& v, WS& ws) {
co_await ws.write("[", 1);
bool first = true;
for(auto const& x : v) {
if(!first) co_await ws.write(",", 1);
first = false;
co_await serialize(x, ws);
}
co_await ws.write("]", 1);
}
template<class WS> task<> write(json::object const& v, WS& ws) {
co_await ws.write("{", 1);
bool first = true;
for(auto const& x : v) {
if(!first) co_await ws.write(",", 1);
first = false;
co_await write(x.key(), ws);
co_await ws.write(":", 1);
co_await serialize(x.value(), ws);
}
co_await ws.write("}", 1);
}
template<class WS> task<> serialize(json::value const& v, WS& ws) {
return visit([&](auto const& v) { return write(v, ws); }, v);
}
struct write_sink {
std::string r;
task<> write(void const* p, std::size_t n) {
r.append(static_cast<char const*>(p), n);
co_return;
}
};
} // namespace
std::string serialize_capy_task(json::value const& jv) {
write_sink ws;
capy::test::run_blocking()(serialize(jv, ws));
return std::move(ws.r);
}
Every co_await ws.write(…) call creates a coroutine frame, suspends, resumes, and destroys. This is the worst case for coroutine overhead — many tiny operations that complete synchronously. In a real application where I/O operations take microseconds or milliseconds, the coroutine machinery becomes negligible.
Summary
| Feature | Capy | Cobalt |
|---|---|---|
Design methodology |
Use-case-first, coroutines-only |
Coroutine layer on hybrid Asio |
Implementation hiding |
Dimovian Ideal achieved |
Backend types exposed |
Stream concepts |
7 coroutine-only (refinement hierarchy) |
Asio’s (hybrid) |
Type-erased streams |
7 wrappers |
None |
Mock streams |
7 mock types + |
None |
Threading |
Multi-threaded ( |
Single-threaded |
Context propagation |
Structural ( |
Thread-local |
Cancellation |
|
|
Buffer sequences |
Extended, unified |
None (use Asio directly) |
Allocator control |
Per-task, forward-flow |
Per-thread, global setup |
Execution/platform |
Separated |
Coupled |
Relink without recompile |
Yes |
No |