Project · Apple Silicon Vector Database
an embeddable and standalone vector database designed to utilise the full GPU capabilities of your Apple device
Every Mac, iPhone, and iPad ships with a GPU — potentially dozens of cores designed for massively parallel computation. No vector database on the market uses them.
Local-first vector search at GPU speed — 5–45× over CPU — on hardware your users already own.
Every vector database on the market was shaped by a single architectural constraint: CPU or GPU — not both. Data lives in one memory pool or the other. Moving it between them means copying across a bus — PCIe — with hard bandwidth and latency costs. For the small, frequent operations that define real-time search, the transfer takes longer than the computation. The result is an industry built around choosing one processor or the other.
Apple Silicon removes the wall. In unified memory, CPU and GPU read the same bytes at the same addresses. No copy. No transfer. No bus. Both processors, same data, same cycle.
We're building a vector database for that hardware. CPU and GPU read the same index, the same cache. Distance kernels and algebraic operations run on the GPU via Metal compute shaders. Graph traversal and filtering stay on the CPU where branching is cheaper. Both lanes execute concurrently across the same data, coordinated without copies.
Available as a standalone application for developers and power users — with a UI, a REST API, and the ability to benchmark on your own hardware. And as an embeddable SDK for app developers — a compiled engine that ships inside your iOS, iPadOS, or macOS app, giving it on-device vector search without a cloud dependency.
CPU and GPU read and write the same vector index in unified memory. No mirrored structures, no host↔device copies, no synchronisation tax.
Distance, index scans, and algebraic predicates run as Metal compute shaders dispatched directly over shared memory. HNSW traversal and filtering stay on CPU. Both lanes execute concurrently.
Queries are directed acyclic graphs. A simple search is scan-then-rank. A hybrid search fans out — vector similarity on GPU, filtering on CPU — and converges at a fusion node.
Data stays in place from query to result. No intermediate copies, no serialisation between stages. UMA shares address space; the software pipeline never duplicates it.
The entire retrieval path runs on the user's device. No cloud routing, no per-query cost, no data leaving the machine. Privacy and latency improve in the same move.
We reply within two business days. If a call would be faster, book a thirty minute conversation.