We released Marqo 1.0 on Jul 24, 2023, as one of the first end-to-end vector search platforms. Marqo is more than just a vector database. Standard vector databases are focused on enabling fast similarity search over embeddings generated by you, often using an approximate nearest neighbour (ANN) algorithm. Marqo is more than that. We refer to Marqo as a documents-in-documents-out solution: The Marqo indexing API accepts documents, not embeddings, and the search API takes plain text (or an image URL) and returns document search results. While Marqo lets you be up and running with state of the art in semantic information retrieval in minutes, you can also work directly with embeddings, run your custom machine learning model and much more.
We are incredibly proud of what our users have been able to achieve with Marqo 1. However, we identified a critical use case in serving e-commerce workloads, which demanded a level of performance that Marqo 1's original architecture, relying on OpenSearch as a vector store, was not ideally suited for. To address these challenges and fulfil our vision, we decided to make a number of architectural changes and release Marqo 2.
Many of the pain points our users dealt with were related to Marqo 1’s backend vector store, OpenSearch. These issues would fall into two broad categories:
As a consequence of the already restricted performance of OpenSearch, we were also unable to improve search recall as much as we wanted, as this required higher HNSW efSearch
values which would in turn increase search latency.
For Marqo 2, we looked at a number of open source and proprietary vector databases, including Milvus, Vespa, OpenSearch (AWS managed), Weaviate, Redis, and Qdrant. Published benchmarks frequently fail to answer many questions that are critical in choosing a vector database for high throughput production vector search at very large scales. For instance:
Our internal benchmarks revealed that Vespa excelled as the optimal choice, satisfying all the criteria mentioned above, including being highly performant. For example, with 50M vectors, Vespa had a P50 latency of 16ms vs 140ms for Milvus for an infrastructure identical in cost. Vespa is highly configurable, allowing for precise control over various aspects. It offers customisation of the underlying data structures, as well as the choice between memory and disk storage at the field level. Additionally, Vespa supports multi-phase ranking, providing further flexibility in search result optimisation. A proof of concept demonstrated that this configurability was invaluable in reducing Marqo's latency by more than half and increasing its throughput by up to 2x, compared to Marqo 1.
In the following sections, we will delve into the enhancements introduced in Marqo 2, followed by a comprehensive benchmark comparing the recall and performance of Marqo 1 and Marqo 2 on Marqo Cloud. We will then provide a detailed exploration of the design and implementation of Marqo 2, and conclude with a glimpse into what the future holds for this innovative vector search solution.
For Marqo 2.0, our focus has been on delivering everything Marqo has to offer with much higher performance, scalability and reliability. In our benchmarks, we have seen 2x improvement in throughput (QPS), 2x reduction in P50 latency and 2.3x reduction in P99 latency, as we have pushed recall up to 99%. In the context of search, recall is a metric that compares the number of relevant documents retrieved by an approximate search method to the total number of relevant documents identified by a k-nearest neighbour (k-NN) search, which serves as the ground truth.
From a functionality point of view, perhaps the most significant change in Marqo 2 would be the choice of index type. When creating a Marqo 2 index, you can choose between an unstructured index (default) or a structured index. An unstructured index will behave almost identically to Marqo 1, inferring document structure and data types as new documents are indexed. A structured index, however, will require all document fields to be defined at index creation time, together with other properties that are determined by how the data will be queried. This information helps Marqo design the underlying schema to be as space efficient and performant as possible and provides stricter validation when adding documents or searching. We will talk more about index types in the Deep dive section of this post.
Other new features in Marqo 2 are:
efSearch
: When searching with Marqo 2, the efSearch
parameter of the HNSW retrieval phase is configurable, so you can get the best recall. The efSearch
parameter controls the trade-off between search accuracy and search speed.approximate
parameter to false
. This means Marqo is all you need to calculate recall and fine-tune your index configuration.euclidean
, angular
, dotproduct
, prenormalized-angular
, and hamming
distance metrics. L1 and Linf distance metrics are no longer supported.Based on our experiments, a default value of 2000 was selected for the efSearch
parameter. This value was determined to be the point of diminishing returns, where the rate of decline in the gradient of the recall curve with respect to efSearch
starts to flatten out. In other words, increasing efSearch
beyond 2000 results in only marginal improvements in recall, making 2000 an optimal choice for balancing accuracy and performance.
We used 10 million documents from the DBPedia dataset to compare Marqo 1 and Marqo 2 in terms of recall and search performance on Marqo Cloud. 2x marqo.performance shards and 1 replica were used for storage with 5x marqo.GPU nodes for inference. For embeddings, Wikipedia article titles were used with the hf/e5-base-v2
model (768 dimensions). Queries were picked randomly from a set of 1000 options. e.g., "modern skyscrapers”, "photographing historic battlefield”. We used Marqo 2’s exact nearest neighbour search to obtain reference results for recall@10 calculation. Recall@k for a query is defined as the percentage of ANN search results that are also returned with an exact KNN search, when searching with a limit of k
.
Many vector database benchmarks concentrate exclusively on nearest neighbour search. However, these results may not accurately predict real-world performance, as implementing vector search involves much more than just nearest neighbour search. A typical search request involves the following stages:
The numbers in this benchmark represent the entire vector search process, as illustrated in the figure above, which includes full-context inference of 512 tokens to compute the query embedding.
Using the default configuration of Marqo 2 (i.e. efSearch
not specified in the search request), we obtained the following results:
Metrics | Marqo 1 | Marqo 2 |
---|---|---|
Recall@10 | 0.81 | 0.97 |
Recall standard deviation | 0.16 | 0.05 |
QPS | 147.8 | 157.7 |
Avg. latency (ms) | 171.98 | 72.11 |
P50 latency (ms) | 170 | 70 |
P99 latency (ms) | 250 | 140 |
As shown in the table above, Marqo 2 offers not just a 20% improvement in recall, but also a much lower recall standard deviation. The significant improvement to search result quality is achieved in addition to, not in spite of, an increase in QPS and a major reduction of search latency, with a 59% reduction in P50 latency and a 44% reduction in P99 latency.
While a low median latency is a must for any production system, maintaining an acceptable level of standard deviation across different queries (i.e. low P90 and P99 latency) is often the biggest challenge. Specific combinations of query and filter string are often responsible for the occasional slow search results and such combinations tend to be impossible to predict in advance. To assess how Marqo 2 performs in such cases, we indexed an additional synthetic field with the value tag_70
for 70% of the documents, tag_20
for 20% of the documents and random values for the rest. As the tables below demonstrate, the latency gap between Marqo 1 and Marqo 2 grows even larger when searching with these filter values.
Filtering for tag_70
Metrics | Marqo 1 | Marqo 2 | Marqo 2 Improvemnt |
---|---|---|---|
Avg. latency (ms) | 222.34 | 85.72 | +61.4% |
P50 latency (ms) | 220 | 83 | +62% |
P99 latency (ms) | 300 | 150 | +50% |
Filtering for tag_20
Metrics | Marqo 1 | Marqo 2 | Marqo 2 Improvement |
---|---|---|---|
Avg. latency (ms) | 292.72 | 106.77 | +60% |
P50 latency (ms) | 290 | 100 | +65% |
P99 latency (ms) | 400 | 180 | +55% |
At the core of Marqo 2 are structured indexes. A structured index requires all fields to be defined at index creation time (adding fields to an existing index will be supported in the future). A field definition consists of a name, a type (e.g., text, bool, int etc) and zero or more features. Possible field features are lexical_search
, filter
and score_modifier
.
This predefined configuration allows Marqo to design the underlying Vespa schema in the most optimal manner as the index configuration tells Marqo both what will be stored and, more importantly, how the stored data will be queried. The latter is determined by field features and affects the data structures used for a field.
lexical_search
feature makes a text or text array field available for lexical search by creating an inverted index enabled for BM25 ranking.filter
feature creates an in-memory B-tree index that enables fast exact and range filtering for the field. The index is further optimised for filtering by storing a bit vector posting list.score_modifier
feature makes a numeric field available as a score modifier at search time by adding it to the score modifiers tensor.Index metadata is stored in a system schema, marqo__settings
by default, so that Marqo instances remain stateless. Depending on its features, the value of a field could be stored more than once. For example, a field with both lexical_search
and filter
features will create both an inverted index and a B-tree index. The value of a filter
field is also stored in memory. For these fields, Marqo will avoid disk access when reading documents by accessing only the in-memory copy of the field.
Structured indexes deliver the most robust and performant experience that Marqo has to offer, but they require upfront planning about data schema and query design. To offer the ease of use seen in Marqo 1, we introduced unstructured indexes. These allow users to experiment with Marqo and explore new data without making all decisions in advance. Unlike many other vector databases that necessitate defining a schema beforehand, Marqo's unstructured indexes enable rapid prototyping and vector search with just a few lines of code, without the constraints of a predefined schema.
An unstructured index takes only a single line of code to create:
mq.create_index("my-first-index")
Once created, unstructured indexes can index and search semi-structured data, including data type inference and dynamic field recognition. All text fields are lexically searchable and filterable (up to a configurable max length limit) and all numeric fields are filterable and available as score modifiers. The search API is almost identical to structured indexes, but with some modest space efficiency and performance considerations (more on this later). Under the hood though, unstructured indexes are implemented very differently.
As noted before, a Marqo index is backed by a structured Vespa schema. In order to index semi-structured data, we need to be able to handle new fields that appear in a document on-the-fly and efficiently. Some of our ideas were:
<string, string>
dictionary. We would need to maintain a mapping from field names to data types. This mapping would need to be strongly consistent and support serialisable updates, which would have considerable performance implications. Furthermore, storing numeric values as strings is wasteful and will increase disk and memory costs significantly.Performance at scale was front-of-mind here. We decided to use a generic, yet flexible Vespa schema design for unstructured indexes that did not require complex concurrency management, but also did not sacrifice performance. The Vespa schema backing an unstructured index comprises the following data structures:
map<string, string>
storing all string fields, available for lexical search using an inverted index.map<string, string>
. Strings longer than filterStringMaxLength
, stored on disk and used only for Marqo document reconstruction.map<string, string>
. Strings shorter than or equal to filterStringMaxLength
. Stored in an in-memory B-tree index for fast filtering.array<string>
. Flattened list made up of all string arrays in the document. Values are prefixed before being stored so that each array’s items are easily identifiable and the original array fields can be recovered. Backed by an in-memory B-tree index for fast filtering.log(n)
time range and exact filtering.map<string, byte>
. Indexed in-memory for fast filtering.array<string>
. All tensor field chunks, prefixed with field name and flattened.log(n)
time similarity search.To enable the flexibility that comes with unstructured indexes, we had to make some minor efficiency and functionality concessions. The most notable of these is how embeddings are stored when you have more than one tensor field. Structured indexes create an HNSW graph for each tensor field. Unstructured indexes, on the other hand, store all embeddings from all tensor fields in a single HNSW graph. This has implications both in terms of functionality and recall.
In terms of functionality, this difference means we were not able to implement searchable attributes with unstructured indexes. The searchable attributes feature enables the user to search only a subset of an index’s tensor fields. With a structured index, you can choose to search only one or more specific tensor fields. Behind the scenes, Marqo carries out retrieval on each tensor field’s HNSW graph, and ranks the union of the results to compute the final result set. With an unstructured index, this is not possible as there is only one HNSW graph to search.
Another consequence of this difference in the underlying HNSW data structure is its potential effect on recall. When searching more than one tensor field and for a fixed value of efSearch
, a structured index will be retrieving and ranking a larger number of documents. In such scenarios, expect to need a higher efSearch
with an unstructured index than a structured index to achieve your desired rate of recall.
Unstructured indexes could also be less memory efficient, though this is dependent on your data. An unstructured index will store all numeric values, as well as strings shorter than a configurable threshold (filterStringMaxLength
) in memory. So if your documents contain lots of numeric values or short strings that you are not planning on using in filter strings, a structured index can be much more memory efficient as you can simply avoid using the filter
feature for these fields to keep them out of memory.
We are proud of the features and major performance improvements we have delivered with Marqo 2. In the next few months, we will be releasing a host of new, exciting features built on top of the foundation that Marqo 2 has laid, driven by the needs and feedback of our customers, including the open-source community. Some of these are high throughput partial updates, inference caching and more sophisticated result ranking options. Keep an eye out for what’s to come!
For more information and to get started with Marqo, check out the Getting Started Guide. Explore Marqo Cloud for a managed Marqo offering.