Back to all Blog Posts

February 18, 2026

How to dramatically improve search speed in Marqo

Search speed is a revenue lever in ecommerce. Even small increases in latency can create friction, reduce engagement, and lower conversion rates, especially on mobile. Modern retailers need search that is not only intelligent, but also fast enough to support real time product discovery at scale.

Marqo is an AI native ecommerce search and product discovery platform designed to deliver low latency, high relevance search across large and complex product catalogs. Marqo trains a dedicated AI model per retailer, enabling deeper catalog understanding while maintaining the performance required for high traffic ecommerce experiences.

In this post, we will walk through practical ways to dramatically improve search speed in Marqo, including configuration choices and deployment optimizations that help retailers deliver faster discovery and better shopper experiences.

Why Search Speed Matters for Ecommerce

Fast search improves product discovery by reducing bounce rates and helping shoppers find relevant products quickly. High performance search is especially important for retailers with large catalogs, frequent inventory updates, and high traffic volumes, where every delay impacts conversion and revenue.

Decrease the number of searchable_attributes

The total number of searchable_attributes in your query will impact the speed of your query. By default, every attribute is treated as a searchable attribute.

In Marqo, tensor indexes are stored in memory as a collection of HNSW indexes. Each attribute of a document has its own HNSW index.

Depending on the hardware you are using, having many searchable_attributes can cause some attributes to be loaded out of memory in replacement of other attributes at query time, which can lead to a significant slow down in search time.

For the full search API documentation (including REST) check here.

The following query only searches the “title” attribute of the documents in the index.

We also recommend that you set any fields that you don't want to apply tensor search to as non_tensor_fields at indexing time, to improve indexing throughput and minimise disk usage. You can still apply filtering and lexical search to non-tensor-fields.

Decrease the number of attributes_to_retrieve

By default, marqo will retrieve all attributes in any documents that match the query.

Therefore, if you have attributes that contain long lengths of text, we recommend you set attributes_to_retrieve not to include long text attributes, unless you truely want to retrieve them as the quantity of data transfer can have a significant impact on the time taken to complete the search.

Note that even if you don’t return an attribute, it can still be searched and the relevant component will be returned in the highlights (which are returned by default). Link to search docs here.

You can combine searchable_attributes and attributes_to_retrieve to return different attributes to the ones you are searching:

Co-locate your resources in the same region as Marqo

Marqo Cloud is working on developing multi-region capability, but at the time of writing Cloud our services are located only in North Virginia (us-east-1 on AWS).

Network latency is significantly decreased when you locate your resources either in us-east-1 itself, or in a Google or Azure data centre close by.

Avoid requesting more than limit=10 results; where possible, use pagination instead

Marqo performs significantly faster when you limit the number of results that you request at any one time. Pagination is available following Marqo 0.0.11.

First page:

Second page:

Use rerankers with caution

Using larger rerankers can significantly improve results and provide benefits like improved image highlighting. However, the reranker must also compute over all returned results. Therefore the latency increase from a reranker is as follows:

total additional latency from reranker = “time taken for the reranker to perform a single inference operation” * len(searchable_attributes) * limit

I recommend consulting the reranking documentation before utilising a reranker, and performing testing on your own data to test the latency before integrating it into an application.

You can integrate a reranker at search time:

Consider using smaller or more performant models

When choosing a model in Marqo, remember to consult our models reference. Search speed is impacted by the model used for inference.

You can set a specific model when creating the index as follows:

Use a GPU machine to run Marqo

When using Marqo, using a GPU will provide the lowest latency in the majority of cases, particularly when using Marqo with images. Note that in order to start Marqo on a GPU machine the starting command is slightly different (detailed instructions here):

Once Nvidia docker is installed (as per the above link), you need to run the following to start Marqo — note the gpus=all flag:

Then, if you want to take advantage of the GPU speed up, you need to specify cuda as the device when searching. Note that for text, depending on the model you may find the CPU faster at search time, in which case you can use the default device=”cpu”.

Summing up

Search speed is not just a technical metric. In ecommerce, it directly impacts product discovery performance, shopper satisfaction, and conversion rates. A fast search experience keeps shoppers engaged, supports higher session depth, and increases the likelihood of purchase.

Marqo is built for ecommerce at scale, combining AI native relevance with the low latency performance required for modern retail. By optimizing Marqo for your catalog size and traffic patterns, you can deliver faster search experiences and unlock stronger revenue outcomes from product discovery.

If you want to see how Marqo delivers fast, catalog trained search and discovery in production ecommerce environments, book a demo

Ready to explore better search?

Marqo drives more relevant results, smoother discovery, and higher conversions from day one.

Talk to a Search Expert