Project · msearch
an intelligence runtime — composable pipelines for search, inference, training, and agents
A DAG execution engine for composable intelligence pipelines. Search, infer, traverse, train, coordinate — composed into a single execution plan, parallel across CPU and GPU, zero-copy throughout. Profiled for Apple Silicon's unified memory, deployable on any server.
Intelligence pipelines as a first-class intelligence runtime.
The dominant pattern for AI applications today is to chain separate services — search here, inference there, coordination somewhere else — across the network. As pipelines become more adaptive and agents compose their own plans at runtime, that architecture becomes the bottleneck.
msearch flips that. Pipeline are composed as a directed graph that the engine executes. Pipelines are made of up common intelligence primitives — vector search, inference, training, reasoning, indexing. Execution occurs across GPU and CPU cores in parallel, each operation comsumping the compute framework best suited to it.
The result is a class of application that hasn't really existed: a dynamic, on-device intelligence runtime — with latency budgets measured in milliseconds and privacy guarantees that hold because the data never left the device.
Pipelines are graphs of operators, not chained service calls. Search, infer, traverse, train, coordinate — composed and optimised together as one execution plan.
Data stays in place from query to result. No intermediate copies, no serialisation between stages. The pipeline never duplicates what's already in memory.
Operators choose where they run. Graph traversal stays on CPU; distance kernels and inference lift onto GPU. Both lanes coordinate through shared memory without copies.
On unified memory, the zero-copy architecture extends all the way to GPU compute. No transfer tax between processors. A single binary that also deploys on any server.
mgraph carries structured state across environments. msearch turns that structured state into pipelines that produce structured output. The two are co-designed.
We reply within two business days. If a call would be faster, book a thirty minute conversation.