All field notes

Mgraph: A collaborative architecture for AGI.

Vectors, streams, trees, graphs, tensors, documents. Redis speed, git style commits, thread-safe, zero-copy, distributed across networks using wire-optimised deltas.

Jordan Rancie 1 February 2026 11 min read

The problem of complex data coordination

The data structures that matter most in modern applications are the ones with the greatest complexity, adding costs to just about every layer in the stack — code, infrastructure, state synchronisation, communication with and between users.

A knowledge graph. A vector embedding. A real-time stream of events. A tensor representing a model's learned state. A nested document with access control at every level. These are not exotic — they are the building blocks of every AI application, every collaborative tool, every system that needs to understand relationships rather than just store records. And every infrastructure layer in the stack treats them as second-class citizens.

Redis stores strings. Postgres stores rows. Kafka carries bytes it cannot see inside. If your data is a graph, you serialise it into something flat, ship it across a boundary, and rebuild the graph on the other side. The operations that define the data type — add an edge, traverse a path, update a weight — exist in your application code but not in your infrastructure. Every boundary is a round trip through a translation layer that does not necessarily need to be there.

This is particularly relevant for collaboration, and is amplified with emerging AI systems. The need to not only capture structure, but communicate it efficiently and quickly across boundaries, while remaining responsive to new requirements for ephemeral and shared structures — agent working memory, shared model state between processes, knowledge graphs that evolve mid-conversation.

This is an environment where the humble database does not quite reach. It is a level of exchange that a Redis or Valkey cache cannot quite meet.

How we solve it for AGI

It is my opinion that a coordination primitive for complex data types is required as AI systems become more interconnected and adaptive towards whatever AGI looks like in the future.

mgraph is my approach to this problem. An engine designed to capture and coordinate complex data structures across traditional boundaries — first the threads between CPUs and GPUs, and then between network boundaries. Capture and share a text or binary stream, coordinate on small tensors in real-time, operate asynchronously on a shared knowledge graph. Collaboration, solved as a primitive.

mgraph is a Rust engine for atomic, transactional, high-speed operations on complex data types. Graphs, vectors, tensors, streams, documents, tables, trees — stored as themselves, operated on natively, and when shared across boundaries, communicated in terms the type understands. It started as an embeddable library for thread-safe structured data. It was then extended to a server. Then to a distributed network. The same engine at every scale.

What Complex Data Types?

If the premise is that infrastructure should understand the data it holds, the first question is: what data? Not rows. Not strings. Not opaque byte arrays with a key attached.

A knowledge graph is nodes and edges with weights and direction. A vector embedding is a point in high-dimensional space. A model's learned state is a tensor. A conversation is an ordered stream. A user profile is a nested document.

mgraph treats each as a first-class citizen. A graph is stored as a graph. Operations on it — add a node, add an edge, traverse a path — are native to the type. A stream is stored as a stream. Append is a native operation, not a write to a key that happens to contain a list.

Data Type Example Operation What a change looks like
Graph add_edge(A, B, weight: 0.7) Edge A→B added with weight 0.7
Vector add_vector([0.1, 0.3, ...]) Vector appended at position N
Stream append(b"hello") Entry appended at sequence 42
Tree add_node(parent: 3, child: 7) Node 7 attached under node 3
Document set_field("price", 29.99) Field "price" changed to 29.99
Tensor update_slice([2,3], value) Slice [2,3] updated

The library

At its core, mgraph is a library. A package you embed in your application. No server, no network, no infrastructure — just an engine for atomic operations on complex data types within your process.

The first boundary it crosses is threads. In a multithreaded application, concurrent access to structured data is where most complexity lives. Read and write coordination is handled at the engine level — type-aware operations, with transactional guarantees, safe across threads.

use mgraph::core::Engine;
use mgraph::datatypes::DocKind;
 
let mut engine = Engine::new();
 
// First-party types — graphs, vectors, streams, trees.
// DocKind is an open u8 newtype: third-party types register
// their own kind and ops without modifying the engine.
let graph  = engine.create_doc(DocKind::GRAPH,  false).unwrap();
let vector = engine.create_doc(DocKind::VECTOR, true).unwrap();
let stream = engine.create_doc(DocKind::STREAM, false).unwrap();
let tree   = engine.create_doc(DocKind::TREE,   false).unwrap();
// Graph: add nodes and edges — the operation IS the delta
let doc = engine.get_doc(graph).unwrap();
let mut g = doc.as_graph().unwrap();
g.add_node(1);
g.add_edge(1, 2);
 
// Stream: append entries — ordered, typed, native
let doc = engine.get_doc(stream).unwrap();
let mut s = doc.as_stream().unwrap();
s.append(b"hello");
s.append(b"world");

Transactions follow a git-style approach. Branches can be checked out for isolating read and write operations. Commit is an atomic transaction. Merge is a consolidation event as you would expect in any data-memory-aware operation.

Conflicts, resolutions, and failure strategies are determined by the document and transaction — fast-forward, merge, rebase.

Note: The roadmap ahead will see the delta operations themselves tied to a Merkle chain for cryptographic proof of lineage. Currently, conflict resolution is handled in software.

Capturing Operations in Delta wire-format

This is where deltas come in. Inside the engine, an operation like add_edge(1, 2) is both the mutation and the record of what changed. Over the wire, that same operation is encoded as a delta — a compact binary frame describing exactly what happened. The relay reads the header to know where it goes.

  Example: add_edge(1, 2) on a graph document

  ┌─── Fixed Header (routing) ──┬─── Payload (operation) ──┐
  │                              │                         │
  │  op     seq     len          │  source  target  weight │
  │  1b     4b      2b           │  4b      4b      4b     │
  │  ──     ────    ───          │  ──────  ──────  ────── │
  │  0x02   0042    0012         │  0001    0002    3f80.. │
  │                              │                         │
  └──────────────────────────────┴─────────────────────────┘
            │                                 │
  Relay reads this.                Relay never touches this.
  Routes without                   Applied at destination by
  deserialising payload.           the receiving engine.

  Document identity is established at the stream level
  (QUIC stream or TCP handshake), not per-delta.

The frame format is deliberately minimal. A receiving node applies the delta to its local engine without deserialising the full document state, and in many cases for atomic payloads, without serialising the delta payload. The cost of communicating a change scales with the diff.

Cross the wire

Wrap the engine in a server and the same operations are now available over the network.

A single-threaded wrapper gives you a Redis-speed structured data server — get, set, and transactional operations on typed documents, not string blobs. A multi-threaded wrapper gives you high-throughput concurrent access across connections — the same atomic guarantees, now serving many clients.

use std::sync::{Arc, Mutex};
use mgraph::transport::{serve_tcp_local, TcpRoute};
 
let engine = Arc::new(Mutex::new(engine));
let server = serve_tcp_local(
    Arc::clone(&engine),
    TcpRoute::Doc(stream),  // route all frames to the stream document
).unwrap();
 
println!("listening on {}", server.local_addr());
// Inbound delta frames apply directly to the engine.
// Same operations, same types, now over the network.

Subscriptions, distribution and the mesh

The engine handles the thread boundary. A server handles the network boundary. A mesh handles many.

mgraph's network is delta first, data representation second. The source of truth can always be replayed from a stream of delta operations. The network layer propagates deltas across a mesh of connected nodes using the QUIC protocol. A node could be another relay, another app on a device, or a user in a web browser — mgraph is deployable in all these architectures.

Each user-document pair gets its own QUIC stream. Why QUIC? Ordering and backpressure are available out of the box, per user and per document, not per connection. A burst of updates on one document cannot stall deltas flowing to another. Each user connection multiplexes across the streams they subscribe to, with independent flow control per stream.

There are really two code paths in the architecture.

The hot path — subscriptions: a relay node receives a delta frame, reads the fixed header, and forwards it to subscribers. Zero. Copy. Microseconds. It never deserialises the payload.

The processing path — how the delta is applied to state: when capacity is available, accumulated deltas are applied in order, producing a materialised snapshot for new subscribers or queries.

One path is fast because it is network facing. The other is intelligent because it is type aware.

Not every node needs every document. Selective propagation means nodes subscribe to the data they care about. A relay in Sydney holds Australian documents. A relay in Frankfurt holds European documents. A browser client subscribes to the documents it displays. The network carries only what is needed.

The same engine binary runs at every tier. Datacenter relay, browser via WASM, mobile native SDK, Apple Silicon edge node — all apply the same deltas, maintain the same typed state. The engine is architecture and deployment agnostic.

// A relay node in the mesh.
// Accepts deltas, applies locally, propagates to subscribers.
let node = MeshNode::new(engine)
    .bind("0.0.0.0:4400")
    .connect_to(&["syd.relay:4400", "fra.relay:4400"])
    .await?;
 
// Subscribe to a document — deltas arrive as they happen
let sub = node.subscribe(doc_id).await?;
while let Some(delta) = sub.next().await {
    // Hot path: delta forwarded zero-copy from origin.
    // Processing path: applied to local engine when capacity allows.
    println!("{}: {} bytes", delta.seq, delta.len());
}
 
// Publish a change — propagates to all subscribers in the mesh
node.publish(doc_id, &AppendEntry { data: b"hello".to_vec() }).await?;

The delta is the same bytes whether it travels between threads, across a local socket, or across the globe. Build once, send anywhere, apply identically.

Speed and throughput

The network layer has been designed at just about every level for throughput: zero-copy from network to document, zero-copy for delta propagation, deltas designed to capture atomic data operations rather than datasets.

We are midway through a larger benchmarking milestone that evaluates mgraph as the embedded engine for a wire-to-compute pipeline (CPU/GPU, unified memory architecture) — here is the plan. We will keep you posted on the results.

The benchmark targets are relatively simple:

  • Redis-speed for delta create_doc operations
  • Transactional database speed for atomic diff/patch operations
  • Global propagation speeds comparable to leading CDNs

Why now

AI systems are becoming increasingly interconnected. Agents coordinate across devices and users and teams. Models set shared state mid-inference. Memory and knowledge graphs evolve in real-time as users interact with them. The data structures these systems require in the moment — trees, embeddings, tensors, graphs, streams — need to move between processes, between devices, between users and between continents, in a way that is both agile and fast and with lineage guarantees.

A centralised database can handle this today, but not easily, not within milliseconds of an agent deciding what it needs and creating it. Traditional databases are not equipped to handle what is coming next. The volume, the velocity, and the structural complexity of AI-native data coordination is heading somewhere that origin-and-query architectures cannot follow.

At the same time, real-time collaboration is no longer a feature — it is a baseline expectation. Live editing. Shared state. Multiplayer by default. The infrastructure that supports this today is either centralised (Firestore, Supabase) or limited to text and JSON. Neither handles the breadth of data types that modern applications actually use.

What's ahead

mgraph is in active development. The transport layer is functional over TCP. QUIC streams, the relay mesh, and the managed network are on the roadmap.

Near-term: complete the benchmarking milestone, prove the speed claims, publish the results. Medium-term: relay mesh over QUIC, selective propagation, authority delegation. Longer-term: the managed network, msearch integration for distributed search and inference collocated with your data.

If this is interesting to you — follow along. The benchmarks will be public. The conversation is ongoing.

Discuss this piece

Want to explore this with us?

We reply within two business days. If a call would be faster, book a thirty minute conversation.

We don't share your details. Replies come from a real person, not a CRM.