Project · Apple Silicon Vector Database

The vector DB built for unified memory.

an embeddable and standalone vector database designed to utilise the full GPU capabilities of your Apple device

Every Mac, iPhone, and iPad ships with a GPU — potentially dozens of cores designed for massively parallel computation. No vector database on the market uses them.

Local-first vector search at GPU speed — 5–45× over CPU — on hardware your users already own.

01 The thesis

Every vector database on the market was shaped by a single architectural constraint: CPU or GPU — not both. Data lives in one memory pool or the other. Moving it between them means copying across a bus — PCIe — with hard bandwidth and latency costs. For the small, frequent operations that define real-time search, the transfer takes longer than the computation. The result is an industry built around choosing one processor or the other.

Apple Silicon removes the wall. In unified memory, CPU and GPU read the same bytes at the same addresses. No copy. No transfer. No bus. Both processors, same data, same cycle.

We're building a vector database for that hardware. CPU and GPU read the same index, the same cache. Distance kernels and algebraic operations run on the GPU via Metal compute shaders. Graph traversal and filtering stay on the CPU where branching is cheaper. Both lanes execute concurrently across the same data, coordinated without copies.

Available as a standalone application for developers and power users — with a UI, a REST API, and the ability to benchmark on your own hardware. And as an embeddable SDK for app developers — a compiled engine that ships inside your iOS, iPadOS, or macOS app, giving it on-device vector search without a cloud dependency.

02 How it's built

The architecture in one read.

Shared index

CPU and GPU read and write the same vector index in unified memory. No mirrored structures, no host↔device copies, no synchronisation tax.

GPU execution

Distance, index scans, and algebraic predicates run as Metal compute shaders dispatched directly over shared memory. HNSW traversal and filtering stay on CPU. Both lanes execute concurrently.

DAG queries

Queries are directed acyclic graphs. A simple search is scan-then-rank. A hybrid search fans out — vector similarity on GPU, filtering on CPU — and converges at a fusion node.

Zero-copy pipeline

Data stays in place from query to result. No intermediate copies, no serialisation between stages. UMA shares address space; the software pipeline never duplicates it.

Local-first

The entire retrieval path runs on the user's device. No cloud routing, no per-query cost, no data leaving the machine. Privacy and latency improve in the same move.

03 Status

Status

Active development

Since

2025

Stack

Rust · Metal · UMA

Ships as

standalone app + embeddable SDK

04 Roadmap

Where it goes from here.

2025

foundation

Shared-memory vector index on Apple Silicon
CPU + GPU both reading the same bytes; Metal compute shaders for distance and index operations.
developed
Hybrid retrieval — vector + keyword
BM25 and vector recall composed inside one query plan.
developed
msearch integration
First-class operator inside msearch pipelines — embed and search in the same execution plan.
developed

2026

adoption

Standalone app + embeddable SDK
A standalone application with UI and REST API, and a compiled xcframework for iOS, iPadOS, and macOS apps.
in flight
DAG query engine
Queries as dependency graphs with parallel branch execution across CPU and GPU.
in flight
App Store availability
Public release of the standalone application.
in flight
MCP server
Model Context Protocol integration for local tool-use and agent workflows.
next
Memory-mapped persistence
Larger-than-RAM indexes that still pay zero-copy in the hot path.
next

2027

horizon

CUDA mirror runtime
A CUDA backend that respects the same shared-memory contract on non-Apple GPUs.
next
On-device rerankers
Lightweight rerank models that run in the same memory plane as the index.
next

Contact

Tell us what you're working on.

We reply within two business days. If a call would be faster, book a thirty minute conversation.

Prefer to talk?

Book a thirty-minute call