Internal · Benchmarks Plan · Apr 2026
mgraph · msearch

A zero-boundary database architecture.

From wire to GPU on Apple Silicon.
Treating data flows as a compute primitive.

Jordan Rancie · Principal 01
Opening · the two technologies

Two technologies. One architecture.

mgraph concept art
mgraph

A data engine for complex data structures — trees, graphs, streams, tensors, tables, vectors.

Structure-aware. Zero-copy from network to compute. mgraph is data collaboration as a primitive.

msearch concept art
msearch

Vector and relational database built for Apple Silicon’s unified memory.

Treats the GPU as primary compute. CPU and GPU on shared memory for a previously unseen class of operations.

Opening · the combined architecture

A direct line from wire to GPU.

Together, they provide a direct line from wire to GPU compute that was previously impossible.

Both are industry firsts. A structure-aware zero-copy data engine, and a database that treats the Apple Silicon GPU as primary compute. No existing system provides this path.

Pipeline comparison: conventional path versus mgraph zero-copy path from network buffer to GPU compute
Benchmarks plan

Measured in three parts.

Each leg benches the architectural claim it rests on.

Three legs of the benchmark plan: data layer, GPU layer, and parallel surface — overlapping spans across one shared substrate
I · DATA LEG

Zero serialisation from network to compute.

Structured payloads moved without translation at the boundary.

II · GPU LAYER

The first database built for Apple Silicon’s GPU.

On-device vector retrieval on hardware every developer already owns.

III · PARALLEL SURFACE

Coordinated CPU+GPU execution on shared memory.

A property unique to Apple’s unified memory architecture.

PART I · 01

The data leg.

Structure-aware, zero-copy delta propagation at message-broker throughput — with database-grade atomic guarantees across complex types.

Part I · Claim

Broker throughput.
Structured data.
Atomic Ops.

Delta propagation at Redis speeds, with database-like atomic commits on complex data types. No production system combines all of these today.

Part I · Positioning

Fast or structured. Today you pick one.

Throughput versus semantic atomicity across today’s systems.

Brokers give up structure for speed. Databases pay for structure in latency. The upper-right corner is empty — that’s what we bench against.

Positioning chart: throughput versus semantic atomicity across today's systems — brokers (Kafka, Redis, MQTT) bottom-right, databases (Postgres, DuckDB, MySQL) upper-left, graph/search (Neo4j, Elasticsearch) middle, mgraph alone in the upper-right
Part I · What we bench

Throughput against brokers. Atomics against databases.

Bench Against Measured
Throughput Redis · Kafka Ops per second. Latency percentiles. Payload scaling.
Atomics Postgres · DuckDB Table insert, vector update, stream append — single commit. Baselines must serialise structured payloads into JSON or BLOB columns; mgraph operates on them natively.
PART II · 02

The GPU layer.

The first vector database to leverage Apple Silicon’s GPU for retrieval — on-device, zero-dependency, on hardware every developer already owns.

Part II · Claim

On-device vector retrieval on the Apple Silicon GPU.

Every Mac, iPad, and iPhone ships with a capable GPU. Today’s vector databases either leave it idle or route to a cloud service. msearch puts the full GPU in the developer’s hands.

Part II · What we bench

Latency, throughput, and energy per query.

Bench Against Measured
Single-query latency pgvector · Chroma · Qdrant P50 / P99 latency. Recall at target accuracy. Cold-start behaviour. Identical Apple Silicon hardware.
Throughput at scale pgvector · Chroma · Qdrant QPS. Latency under load. Index-build throughput. Index sizes approaching unified memory capacity.
Energy per query Milvus on NVIDIA Joules per query at matched recall and latency. NVIDIA cloud-provisioned.
PART III · 03

The parallel query surface.

CPU and GPU, coordinated on shared memory. A class of database operations available only on Apple Silicon.

Part III · Claim

Each processor on the regime it’s best at. Shared memory between them.

GPUs are massively parallel. CPUs are optimally sequential. By coordinating their operations in shared memory, we deliver performance that no system outside Apple Silicon can match.

Part III · GPU-over-CPU on database primitives

GPU against the CPU baseline.

Against Postgres and DuckDB across latency and throughput, at varying data sizes.

Part III · Coordinated CPU + GPU

Each processor on its lane.
Shared state in UMA.

Five workloads where the handoff cost disappears because there is no handoff.

Schedule

Four weeks.

One week per leg. One week to write.

Week 1
The Data Leg
Throughput against Redis and Kafka. Atomics against Postgres and DuckDB on the three-part commit workload.
Week 2
The GPU Layer
Single-query latency and throughput at scale against pgvector, Chroma, and Qdrant. Energy per query against Milvus on NVIDIA.
Week 3
The Parallel Surface
Five bench groups: hash join, two-phase top-k, nested loop with vector scan, HNSW reindex with concurrent traversal, streaming aggregation with windows.
Week 4
The Paper
Intro, methods, results, discussion. Figures from weeks 1–3 folded in. Submission-ready draft.
End of plan

Benchmarks published.