Research shows that 80% of consumers abandon websites due to ineffective search experiences, and 79% are more likely to purchase when search results are on target. These figures emphasize the importance of robust search systems—especially in the fashion industry.
Traditional search engines often fall short for many tasks, especially when they depend solely on keyword searches or text-based data. This limitation becomes even more apparent in fields like fashion, where effectively combining textual and visual information is crucial. Marqo provides an elegant solution by seamlessly integrating advanced multimodal search capabilities, combining text and visuals to deliver precise, context-aware results. With Marqo you can combine the best of embedding/vector search and keyword search into a powerful hybrid search system. Marqo’s state-of-the-art fashion embedding models ensure that shoppers find exactly what they’re looking for. This blog will guide you through implementing Marqo within your search systems for state-of-the-art results.
Shoppers searching for fashion items face unique challenges that traditional search engines struggle to address. Product searches require a deep understanding of both visual and textual inputs, as well as the nuances of personal style, material preferences, and intent. Here’s why most search systems fail—and how Marqo solves these issues.
We’ll now take a look at how you can build a personalized fashion search engine with Marqo. We’ll start by collecting and preparing product and historical search data, then guide you through creating an index, adding documents, and using advanced search techniques like exact match boosters and revenue modifiers to enhance relevance and drive conversions.
To build an effective search system, you need two key types of data: product and historical search. The latter can be obtained using search logs. In this section, we’ll discuss and demonstrate these datasets.
Product data contains static, descriptive attributes about each item in the catalogue. This includes fields such as product images, titles, descriptions, categories, and stock status. Unlike search data, product data remains independent of user interactions or sales performance but is essential for providing context and filtering search results.
Here’s an example of product data:
image_url,_id,product_name,category,cost,in_stock
https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/16066811_1.jpeg,16066811_1,black skinny cotton twill cargo pants,pants,17.6,True
https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/20373492_0.jpeg,20373492_0,gray easy cargo pants,pants,25.66,True
https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/2307458_0.jpeg,2307458_0,black lace sheath dress,dresses,18.37,True
https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/2502265_0.jpeg,2502265_0,black cropped straight-leg pants red grain de poudre wool,pants,19.43,True
For this example, we use a ~15k subset of the Marqo/fashion200k
dataset. To simulate a real-world scenario, we’ve added fields like product cost and stock status, which are crucial for filtering and ranking search results. You can find this dataset on our GitHub.
Search logs capture event-level interactions between users and the search system. These logs record each action (e.g., view, click, add to cart, purchase) that a user performs after submitting a query. This data is essential for understanding user behavior and generating metrics that can be used to improve search relevance.
Here’s an example of raw search log data:
query,_id,action,days_ago_action_performed
green candy dress,91055827_0,click,30
green candy dress,91055827_0,click,28
green candy dress,71246356_0,purchased,1
green candy dress,71246356_0,purchased,1
Each row represents a single interaction for a given query and product. However, to make this data useful for search optimization, it needs to be processed and aggregated. By grouping the raw event data by query-item pairs, we can derive key metrics such as total purchases, add-to-cart counts, total clicks, and revenue. This aggregated data, combined with product data, forms the basis of historical search data.
Historical data helps optimise search relevance by highlighting popular products for specific queries. For given queries, we are able to collect information about what products are returned in the search and how well these products perform. We can take our search log data from above and create our historical data. For information and instructions on how you can convert your search logs into historical data, see our GitHub.
Here is an example of historical search data.
query,_id,total_purchases,add_to_cart_count,total_click_count,one_day_revenue,three_day_revenue,five_day_revenue
green candy dress,71246356_0,17,120,250,145.25,249.0,352.75
green candy dress,91055827_0,6,60,120,59.43,118.86,118.86
green candy dress,91256064_0,2,40,80,36.74,36.74,36.74
green candy pant culottes,89986394_0,65,65,155,1400.13,1400.13,1936.35
We see 4 different queries, the corresponding product items that were returned for this query, and how popular they are (popularity here being based on purchases/add to cart/clicks). For example, when a user searched for “green candy dress”, the item “green short dress” (id:
71246356_0
) received 17 total purchases. We see that this is a popular item for this query and so, if a user was to search this query again, featuring this item is likely to perform well. We will see later how we can leverage this historical data for improved search and revenue with Marqo.
Now we have discussed the types of data needed for successful search, we can begin building with Marqo. First, we need to install the Marqo Python client:
pip install marqo
Next, we define our Marqo API Key and set up our Marqo client. This establishes a connection to the Marqo API. To find your API Key, visit this article.
from marqo import Client
# Replace with your actual Marqo API key
api_key = "INPUT_YOUR_MARQO_API_KEY_HERE"
# Initialize the Marqo client with the API URL and your API key
mq = Client(url="https://api.marqo.ai", api_key=api_key)
We now define the settings for our Marqo index. This is where we specify what embedding model we’re going to use. Embedding models are machine learning models that transform data, such as text or images, into dense vector representations, enabling efficient and accurate similarity search.
At Marqo, we created a general-purpose, state-of-the-art, fashion embedding model that you can use out-of-the-box, Marqo/marqo-fashionSigLIP
. This model was trained using Marqo’s novel framework, Generalized Constrastive Learning (GCL), to optimize over seven fashion specific aspects including descriptions, titles, colors, details, categories, keywords and materials.
While our general model is a great starting point, fine-tuning on actual data from your specific use case can significantly improve search accuracy and relevance. Fine-tuning allows the model to learn nuances unique to your data, resulting in better understanding and representation of your products. Fine-tuning also scales well to billions of products [1].
If you want to fine-tune on your own data, you can leverage Marqtune—Marqo’s fine-tuning solution designed to make this process straightforward. Marqtune helps you enhance your embedding models by optimizing them for specific tasks, enabling even better retrieval performance for your application.
Let’s take a look at how we can use Marqo’s state-of-the-art fashion embedding model when creating an index:
# Define the name for your Marqo index
index_name = "fashion-product-search"
# Define the index settings
settings = {
"treatUrlsAndPointersAsImages": True, # Indicates that URLs or pointers in the data should be treated as image inputs
"model": "Marqo/marqo-fashionSigLIP", # Specifies the embedding model to be used for indexing and querying
"normalizeEmbeddings": True, # Enables normalization of embeddings for better similarity calculations
"inferenceType": "marqo.GPU", # Specifies GPU-based inference for faster computations
"numberOfShards": 1, # Sets the number of shards for the index (affects distributed storage)
"numberOfReplicas": 0, # Specifies the number of replicas for the index (data redundancy)
"numberOfInferences": 1, # Defines the maximum number of concurrent inferences
"storageClass": "marqo.basic", # Specifies the storage class for the index (basic is a standard option)
}
# Create the Marqo index with the specified settings
mq.create_index(index_name=index_name, settings_dict=settings)
It will take a few minutes to create your index. While your index is being created, the Marqo Cloud console will populate. The console will display a ‘Ready’ status when your index has been successfully created.
Now we have created our Marqo index, we can begin adding documents to it. As mentioned, we have two types of data: product and historical search. We will first take a look at how we can combine these two and then how to add these documents to our Marqo index.
We can take our historical data and update our product data to include this information. We do this so that any product item that positively impacts revenue for a given query, is added to that document. For example, let’s take the following item from our product catalogue.
Let’s look at how this item is structured in our product data. We can see all the key information (name, category, cost, etc.) which are all important features in a search engine but let’s see if this item has any historical search data associated with it.
image_url,_id,product_name,category,cost,in_stock
https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/91256064_0.jpeg,91256064_0,green sheath dress,dresses,18.37,True
From our historical search data, we see that this item performed well from a revenue and click perspective for the query “green candy dress”. It received 80 total clicks and generated revenue in the last 5 days.
query,_id,total_purchases,add_to_cart_count,total_click_count,one_day_revenue,three_day_revenue,five_day_revenue
green candy dress,91256064_0,2,40,80,36.74,36.74,36.74
Given its popularity for this specific query, we can update this document with this information i.e. adding "one_day_revenue_modifiers.green_candy_dress": 36.74
. Marqo supports maps of score modifiers which are a flexible data structure we can use to add arbitrary score boosting for documents. We can use modifiers at query time by inserting the user submitted query into the modifiers, if it matches then it will get boosted, if it doesn’t then it just adds 0
to the score and has no impact. By including these modifier fields in the documents, it allows these items to be weighted for better search results.
Another crucial aspect to a successful search engine is ensuring that if a user’s query matches exactly to that of the product name (and/or other attributes like title, the product name in another language, etc.) it must appear at the top of the search feed. For example, if a user searches for “gray guipure cape draped dress” and there is a product in the catalogue with that exact name or title, this item should be the top returned result. This can be incorporated by a field called exact_match_boosters
which is used to rank items that match the exact query. The exact_match_boosters
field maps the product_name
to a large number which we can use to boost it in search.
Given the two CSV datasets, we can combine them into a single CSV dataset. By combining the original product data with exact match boosters and modifiers, we can expect our documents to look as follows in JSON format:
query,_id,total_purchases,add_to_cart_count,total_click_count,one_day_revenue,three_day_revenue,five_day_revenue,image_url,product_name,category,cost,in_stock
$58 multicolor pants compare,89199404_0,13,99,105,64.14,277.94,277.94,https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/89199404_0.jpeg,multicolor woven pants compare $58,pants,21.38,True
1 dress shoulder tank,73633821_1,26,49,194,503.62,503.62,503.62,https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/73633821_1.jpeg,black 1 shoulder tank dress,dresses,19.37,True
1 dress tank black,73633821_1,12,53,63,154.96,213.07,232.44,https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/73633821_1.jpeg,black 1 shoulder tank dress,dresses,19.37,True
1 midi gray slip,91270795_0,12,42,79,91.45,164.61,219.48,https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/91270795_0.jpeg,gray 2 1 cami slip midi dress,dresses,18.29,True
1 shoulder tank black,73633821_1,22,52,120,348.66,348.66,426.14,https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/73633821_1.jpeg,black 1 shoulder tank dress,dresses,19.37,True
1 x orange 2,90992177_0,69,94,182,731.12,731.12,1327.56,https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/90992177_0.jpeg,orange x paris 2 1 dress band belt,dresses,19.24,True
This can then be converted to JSON format ready to be loaded into Marqo. Each product item will look as follows:
{
"image_url": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/fashion/fashion200k/91256064_0.jpeg",
"_id": "91256064_0",
"product_name": "green sheath dress",
"category": "dresses",
"cost": 18.37,
"in_stock": true,
"exact_match_boosters": {
"green_sheath_dress": 1000
},
"one_day_revenue_modifiers": {
"green_candy_dress": 36.74
},
"three_day_revenue_modifiers": {
"green_candy_dress": 36.74
},
"five_day_revenue_modifiers": {
"green_candy_dress": 55.11
}
},
This is then repeated for all entries in the historical search data that match id’s in the product catalogue data. Please note, we have this GitHub that contains all the code for data processing, creating modifiers and performing searches.
We can now add these updated documents to our Marqo index. We have a JSON file which can be found on our GitHub here so you can get up and running but feel free to substitute your own search and product data here.
json_file_path = "./data_processing/data/complete_data.json" # Path to the JSON file containing the document data
with open(json_file_path, 'r') as file: # Open the JSON file in read mode
documents = json.load(file) # Load the contents of the file into a Python object (list or dict)
# Add the loaded documents to the specified Marqo index with customized configurations
res = mq.index(index_name).add_documents(
documents, # List of documents to be indexed
client_batch_size=64, # Number of documents to process and upload in each batch
mappings={ # Custom field mapping for multimodal combination
"image_title_multimodal": { # Field name to be created
"type": "multimodal_combination", # Specifies that the field is a combination of text and image
"weights": { # Weights assigned to each component in the combination
"product_name": 0.1, # Low weight given to product name (text)
"image_url": 0.9, # High weight given to image URL (visual content)
},
}
},
tensor_fields=["image_title_multimodal"], # Specifies that this field should be processed as tensors
use_existing_tensors=True, # Reuse existing tensors if they already exist for these documents
)
The product data is first loaded from a JSON file and then added to the index in batches for efficient processing. A key feature here is the use of a multimodal_combination
field, which blends text data (product_name
) and visual data (image_url
) using assigned weights. By giving more weight to the image data, we ensure that the visual aspect of a product plays a dominant role in search queries. Additionally, the use of existing tensors, when available, speeds up the process by avoiding redundant computation. This approach helps create a rich, multimodal index that can provide more accurate and visually relevant search results for users.
Our dataset is ~15k items which you can expect to take roughly 10 minutes to add to your index. However, you can start searching while documents are being added. Let’s now take a look at how you can perform searches with Marqo.
We will now take a look at searching over our index and explore 4 different search approaches. Please note, we have an easily deployable UI which allows you to simply select between these 4 search methods and observe the results. To do this, visit the GitHub repository.
We will start with a very basic search. We perform a vector search for the query “green candy dress”, return the top 20 results (limit=20
), and filter the results to make sure that only items in stock are returned.
query = "green candy dress"
res = mq.index(index_name).search(
query,
limit=20,
filter_string="in_stock:(true)"
)
Note, you can also filter other items in here such as available regions. These are the results:
We see immediately that the Marqo/marqo-fashionSigLIP
embedding model is surfacing relevant and accurate search results by showing items that match similar to the search query. This is great but let’s look at how we can improve this search even further.
We’ve seen the results from a tensor search but we can extend this to combine both tensor and lexical search methods with hybrid search. We specify the search method, search_method="HYBRID"
as well as the hybrid parameters. Alpha is the linear weight of the tensor RRF score. At alpha=0.5
, this balances both lexical and tensor search. Reciprocal rank fusion (rrfK
) is the smoothing factor for RRF. The higher rrfK
, the lower the contribution of RRF to the ranking. The product’s name is used as the attribute for lexical search. The rest of the settings are the same as before with the addition of attributes_to_retrieve
which specifies which fields should be included in the search results.
query = "green candy dress"
res = mq.index(index_name).search( # Perform a search on the specified Marqo index
query, # The search query provided by the user
search_method="HYBRID", # Use a hybrid search method combining lexical and tensor-based search
limit=20, # Limit the number of search results returned to 20
attributes_to_retrieve=[ # Specify the fields to be included in the search results
"product_name", # Name of the product
"image_url", # URL of the product image
"cost", # Cost or price of the product
# add any other items you want to retrieve
],
hybrid_parameters={ # Configure parameters for the hybrid search
"alpha": 0.5, # Balance factor between lexical and tensor-based relevance (0.5 means equal weight)
"rrfK": 60, # Reciprocal Rank Fusion parameter to control blending of results
"searchableAttributesLexical": ["product_name"], # Specify which attributes to use for lexical search
},
show_highlights=False, # Disable highlights in the search results
filter_string="in_stock:(true)", # Apply a filter to only return products that are in stock
)
With hybrid search, we expect more lexical based matches to appear in our searches which is exactly what we see — the top three items returned include “Green Candy Dress” and “Green Candy Off Shoulder Dress”.
The next question is, can we improve our search so that if someone inputs a query that matches a product name exactly, this will rank as the top result?
Hybrid search allows for the flexibility of both lexical and visual based searches. However, for more general searches, it may not return exact results. Take the query “green candy dress” for example as searched above. The top result is a “Green Short Dress” but the second result is an exact query to product name match. Here we can leverage the exact_match_boosters
to ensure any matching items are returned as the top result. Let’s take a look at how we can do this with modifiers in Marqo.
All of our search settings remain the same as our previous hybrid search but this time we also specify scoreModifiersTensor
and scoreModifiersLexical
which boost the scores for documents that have exact matches in the specified fields. This ensures that results with a perfect match to the query are ranked higher. In this example, if the query matches the product name of any document, this will be boosted higher in the search results.
query = "green candy dress"
res = mq.index(index_name).search( # Perform a search on the specified Marqo index
query, # The search query provided by the user
search_method="HYBRID", # Use a hybrid search method combining lexical and tensor-based search
limit=20, # Limit the number of search results returned to 20
attributes_to_retrieve=[ # Specify the fields to be included in the search results
"product_name", # Name of the product
"image_url", # URL of the product image
"cost", # Cost or price of the product
# add any other items you want to retrieve
],
hybrid_parameters={ # Configure parameters for the hybrid search
"alpha": 0.5, # Balance factor between lexical and tensor-based relevance (0.5 means equal weight)
"rrfK": 60, # Reciprocal Rank Fusion parameter to control blending of results
"scoreModifiersTensor": { # Modifications to tensor-based scores
"add_to_score": [ # Add a boost to the tensor score for exact matches
{
"field_name": f"exact_match_boosters.{query_key}", # Field to boost based on exact match
"weight": 1000 # Large weight to prioritize exact matches
},
]
},
"scoreModifiersLexical": { # Modifications to lexical-based scores
"add_to_score": [ # Add a boost to the lexical score for exact matches
{
"field_name": f"exact_match_boosters.{query_key}", # Field to boost based on exact match
"weight": 1000 # Large weight to prioritize exact matches
},
]
},
"searchableAttributesLexical": ["product_name"], # Specify which attributes to use for lexical search
},
show_highlights=False, # Disable highlights in the search results
filter_string="in_stock:(true)", # Apply a filter to only return products that are in stock
)
Let’s take a look at what this has done to our search results for the query “green candy dress”:
We can clearly see the ‘Green Candy Dress’ ranking top as desired. This was a great introduction to score modifiers in Marqo but they can be further leveraged — this time, taking advantage of historical data like revenue or clicks.
As mentioned earlier, we can use revenue values as modifiers. The process is the same as above, just with some additional fields. For this example, we use the revenue from the historical query data as well as revenue information from the product data itself.
First, let’s take a look at what our hybrid search with exact boost modifiers looks like for our previous query “green candy dress”.
We can now use our historic data to ensure that popular items are surfaced closer to the top. Here's some historical data for this query:
query,_id,total_purchases,add_to_cart_count,total_click_count,one_day_revenue,three_day_revenue,five_day_revenue,product_name,price,in_stock
green candy dress,91256064_0,20,40,80,36.74,36.74,55.11,green sheath dress,18.37,True
green candy dress,91055827_0,35,60,120,59.43,118.86,138.67,green long dress,19.81,True
green candy dress,71246356_0,100,120,250,145.25,249.0,352.75,green open-back dress,20.75,True
blue flowery dress,36432259_0,35,60,87,45.54,113.85,159.39,blue floral sleeveless v-neck dress,22.77,True
We see from our historical data that for the query “green candy dress”, the items 91256064_0
and 71246356_0
performed well amongst users — these items generated revenue and clicks. These items look as follows:
We can use modifiers in search to boost these popular items. Our search settings are exactly the same as before but this time we include modifiers for one, three, and five day revenue for a given query. That is, if an item performs well (from a revenue perspective in this case), we weight this item. The weights for revenue modifiers were derived through hyperparameter optimization.
res = mq.index(config.INDEX_NAME).search(
query,
search_method="HYBRID",
limit=50,
attributes_to_retrieve=["product_name", "image_url", "cost"],
filter_string="in_stock:(true)",
show_highlights=False,
hybrid_parameters={
"alpha": 0.5,
"rrfK": 60,
"scoreModifiersTensor": {
"add_to_score": [
{"field_name": f"exact_match_boosters.{query_key}", "weight": 1000},
{"field_name": f"one_day_revenue_modifiers.{query_key}", "weight": 0.000005},
{"field_name": f"three_day_revenue_modifiers.{query_key}", "weight": 0.0000016},
{"field_name": f"five_day_revenue_modifiers.{query_key}", "weight": 0.000001}
]
},
"scoreModifiersLexical": {
"add_to_score": [
{"field_name": f"exact_match_boosters.{query_key}", "weight": 1000},
{"field_name": f"one_day_revenue_modifiers.{query_key}", "weight": 1},
{"field_name": f"three_day_revenue_modifiers.{query_key}", "weight": 0.3},
{"field_name": f"five_day_revenue_modifiers.{query_key}", "weight": 0.2}
]
},
"searchableAttributesLexical": ["product_name"]
}
)
Through this, we expect the items shown above to rank more highly when incorporated into the score modifiers when searching for “green candy dress”. Let’s look at the results:
Both of these items are now in the top 5 search results with the “Green Open-Back Dress” ranking higher than the “Green Sheath Dress” which we expect as this item generated more in revenue for this particular query. This particular example used revenue but this can be extended to any type of data.
We’ve performed four different search methods and now, we can evaluate and compare the performance of these search methods against a dataset by generating comparative reports.
This graph shows how well the search strategies rank the most relevant results at the top. Mean Average Precision (MAP) tells us, on average, how often the top-ranked items are what users actually want.
The blue line (tensor) performs poorly throughout, meaning it struggles to consistently rank relevant items at the top. The orange, green, and red lines (hybrid methods) perform significantly better. Among them, the red line (hybrid with exact boosters and modifiers) performs best, suggesting that this search method is optimal when wanting to rank relevant items at the top.
NDCG is all about ranking quality, with extra emphasis on getting the most relevant results near the top of the list. A higher NDCG score means users see the best matches without needing to scroll too far.
Again, the blue line (tensor) performs significantly worse. If you’re building a search system where users need quick, high-quality results (like when searching for products), the hybrid approach with boosters and modifiers is the best option.
Recall measures how many of the relevant items the search strategy finds overall. A higher recall score means users are seeing more of the items that match their query.
The blue line (tensor) starts low and only improves gradually as K increases (i.e., when users look at a lot of results). The red line (hybrid_with_exact_boosters_and_modifiers) hits nearly perfect recall very quickly, meaning it retrieves almost all relevant items early on.
From this analysis, if you’re building a search experience where ranking quality and relevance matter, hybrid search with boosters and modifiers is the best option. This approach ensures users find exactly what they’re looking for, faster and more accurately.
In this blog, we demonstrated how Marqo can be configured to handle fashion-specific data, integrate historical search information, and apply advanced techniques like exact match boosters and revenue modifiers. The result? A smarter, context-aware search engine that evolves with user behaviour, ensuring that the most relevant products are always in front of your customers.
To get the exact same modifiers and configurations using the same historic data, see our GitHub which contains further information on how to obtain this.
As mentioned earlier in the article, the best search performances are with a model fine-tuned on your data. For more information on how Marqo can help you achieve this as well as a fully state-of-the-art search system, book a demo or contact us.