Project · Apple Silicon Vector Database

The vector DB built for unified memory.

an embeddable and standalone vector database designed to utilise the full GPU capabilities of your Apple device

Every Mac, iPhone, and iPad ships with a GPU — potentially dozens of cores designed for massively parallel computation. No vector database on the market uses them.

Local-first vector search at GPU speed — 5–45× over CPU — on hardware your users already own.

01 The thesis

Every vector database on the market was shaped by a single architectural constraint: CPU or GPU — not both. Data lives in one memory pool or the other. Moving it between them means copying across a bus — PCIe — with hard bandwidth and latency costs. For the small, frequent operations that define real-time search, the transfer takes longer than the computation. The result is an industry built around choosing one processor or the other.

Apple Silicon removes the wall. In unified memory, CPU and GPU read the same bytes at the same addresses. No copy. No transfer. No bus. Both processors, same data, same cycle.

We're building a vector database for that hardware. CPU and GPU read the same index, the same cache. Distance kernels and algebraic operations run on the GPU via Metal compute shaders. Graph traversal and filtering stay on the CPU where branching is cheaper. Both lanes execute concurrently across the same data, coordinated without copies.

Available as a standalone application for developers and power users — with a UI, a REST API, and the ability to benchmark on your own hardware. And as an embeddable SDK for app developers — a compiled engine that ships inside your iOS, iPadOS, or macOS app, giving it on-device vector search without a cloud dependency.

02 How it's built

The architecture in one read.

01

Shared index

CPU and GPU read and write the same vector index in unified memory. No mirrored structures, no host↔device copies, no synchronisation tax.

02

GPU execution

Distance, index scans, and algebraic predicates run as Metal compute shaders dispatched directly over shared memory. HNSW traversal and filtering stay on CPU. Both lanes execute concurrently.

03

DAG queries

Queries are directed acyclic graphs. A simple search is scan-then-rank. A hybrid search fans out — vector similarity on GPU, filtering on CPU — and converges at a fusion node.

04

Zero-copy pipeline

Data stays in place from query to result. No intermediate copies, no serialisation between stages. UMA shares address space; the software pipeline never duplicates it.

05

Local-first

The entire retrieval path runs on the user's device. No cloud routing, no per-query cost, no data leaving the machine. Privacy and latency improve in the same move.

03 Status
Status
Active development
Since
2025
Stack
Rust · Metal · UMA
Ships as
standalone app + embeddable SDK
04 Roadmap

Where it goes from here.

2025
foundation
  • Shared-memory vector index on Apple Silicon
    CPU + GPU both reading the same bytes; Metal compute shaders for distance and index operations.
    developed
  • Hybrid retrieval — vector + keyword
    BM25 and vector recall composed inside one query plan.
    developed
  • msearch integration
    First-class operator inside msearch pipelines — embed and search in the same execution plan.
    developed
2026
adoption
  • Standalone app + embeddable SDK
    A standalone application with UI and REST API, and a compiled xcframework for iOS, iPadOS, and macOS apps.
    in flight
  • DAG query engine
    Queries as dependency graphs with parallel branch execution across CPU and GPU.
    in flight
  • App Store availability
    Public release of the standalone application.
    in flight
  • MCP server
    Model Context Protocol integration for local tool-use and agent workflows.
    next
  • Memory-mapped persistence
    Larger-than-RAM indexes that still pay zero-copy in the hot path.
    next
2027
horizon
  • CUDA mirror runtime
    A CUDA backend that respects the same shared-memory contract on non-Apple GPUs.
    next
  • On-device rerankers
    Lightweight rerank models that run in the same memory plane as the index.
    next
Contact

Tell us what you're working on.

We reply within two business days. If a call would be faster, book a thirty minute conversation.

We don't share your details. Replies come from a real person, not a CRM.