All of the code in this article can be run in Google Colab here. Let's dive in!
In this article, we'll be using the end-to-end vector search engine, Marqo. First, we need to install this with pip:
pip install marqo
Next, we will need to initialize the Marqo Client. This will allow us to create and add documents to an index. To obtain your API Key, see this article.
import marqo
# Set up Marqo Client
api_key = "API_KEY" # To find your API Key, see https://www.marqo.ai/blog/finding-my-marqo-api-key
mq = marqo.Client("https://api.marqo.ai", api_key=api_key)
For information on how to set up Marqo locally, see our GitHub.
Let’s set up the configuration for our index. This configuration will specify the type of index, the embedding model to use, and any specific settings for handling image URLs and inference. We will be using marqo-ecommerce-embeddings-L
. This is Marqo's state-of-the-art embedding model for ecommerce. For more information, see our blog post.
settings = {
"type": "unstructured", # Set the index type as unstructured data
"model": "Marqo/marqo-ecommerce-embeddings-L", # Specify alternative model
"modelProperties": {
"name": "hf-hub:Marqo/marqo-ecommerce-embeddings-L", # Name of the model on Hugging Face Hub
"dimensions": 1024, # Larger dimensionality for embeddings
"type": "open_clip" # Model type, using OpenCLIP architecture
},
"treatUrlsAndPointersAsImages": True, # Enable image URLs as image sources
"inferenceType": "marqo.CPU.large", # Specify the inference type using Marqo's large CPU instance
}
With the settings defined, we specify our next key line of code which is creating our index with Marqo.
index_name = "marqo-ecommerce-l" # Specify the name of the index
try:
mq.index(index_name).delete() # Delete the existing index if it already exists to avoid conflicts
except:
pass # If the index does not exist, skip deletion
mq.create_index(index_name, settings_dict=settings) # Create a new Marqo index with the specified settings
Key Line 4: Add Documents to the IndexWith the index set up, it’s time to add our product data. We will be using a 100k dataset that is a subset of the Marqo-GS-10M dataset. You can access this CSV in our GitHub.
We’ll start by loading our product data:
import pandas as pd # Import the pandas library for data manipulation
path_to_data = "data/marqo-gs_100k.csv" # Define the path to the CSV file containing product data
df = pd.read_csv(path_to_data) # Load the product data from the CSV file into a pandas DataFrame
Next, we convert the data into a format suitable for Marqo:
documents = [
{"image_url": image, "query": query, "title": title}
for image, query, title in zip(df["image"], df["query"], df["title"])
]
Finally, we add our next key line of code using the add_documents
function. We’ll also apply a custom mapping to weight different fields based on their importance in the search process.
batch_size = 64 # Define the batch size for uploading documents
for i in range(0, len(documents), batch_size): # Loop through documents in batches
batch = documents[i:i + batch_size] # Select a batch of documents
mq.index(index_name).add_documents( # Add the batch of documents to the Marqo index
batch,
client_batch_size=batch_size, # Set the batch size for the client
mappings={
"image_title_multimodal": { # Define a multimodal field combining image, title, and category
"type": "multimodal_combination", # Set the field type as multimodal
"weights": {"title": 0.1, "query": 0.1, "image_url": 0.8}, # Assign weights to each field
}
},
tensor_fields=["image_title_multimodal"], # Specify fields for tensor generation
)
Key Line 5: Search with MarqoSearching with Marqo only involves one key line of code. For the purpose of this tutorial, we'll add some additional code to make the search UI more user-friendly using Gradio. This interface will allow users to enter a query, specify if they want more or less of something in their query, and view the top results returned by Marqo.
import requests # Import the requests library for handling HTTP requests
import io # Import io for handling byte streams
from PIL import Image # Import PIL's Image module for image processing
def search_marqo(query, themes, negatives):
query_weights = {query: 1.0} # Assign a weight of 1.0 to the main query
if themes:
query_weights[themes] = 0.75 # Apply a positive weight to emphasize additional themes
if negatives:
query_weights[negatives] = -1.1 # Apply a negative weight to de-emphasize certain themes
# Perform search on the Marqo index
res = mq.index(index_name).search(query_weights, limit=10) # Limit results to top 10
# Process results to prepare for display
products = []
for hit in res['hits']:
image_url = hit.get('image_url') # Get the image URL from the search hit
title = hit.get('title', 'No Title') # Get the product title, default to 'No Title' if missing
# Retrieve image from the provided URL
response = requests.get(image_url) # Make a request to the image URL
image = Image.open(io.BytesIO(response.content)) # Open the image from the response content
# Prepare product details for display in the interface
product_info = f'{title}'
products.append((image, product_info)) # Append the image and details to the results list
return products # Return the list of processed products for display
With the search function ready, we can now build the interface.
import gradio as gr
# Gradio Blocks Interface
with gr.Blocks(css=".orange-button { background-color: orange; color: black; }") as interface:
gr.Markdown("Multimodal Ecommerce Search with Marqo
")
with gr.Row():
query_input = gr.Textbox(placeholder="Coffee machine", label="Search Query")
themes_input = gr.Textbox(placeholder="Silver", label="More of...")
negatives_input = gr.Textbox(placeholder="Buttons", label="Less of...")
search_button = gr.Button("Submit", elem_classes="orange-button")
results_gallery = gr.Gallery(label="Top 10 Results", columns=4)
# Set up button click functionality
search_button.click(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery)
# Launch the app
interface.launch()
When launched, users can enter a main search query, add themes for more refinement, and even specify themes to avoid.
If you follow the steps in this guide, you will create an index with CPU large inference and a basic storage shard. This index will cost $0.38 per hour. When you are done with the index you can delete it with the following code:
mq.delete_index(index_name)
If you do not delete your index you will continue to be charged for it.
In just 5 key lines of code, you’ve set up an ecommerce image search engine powered by Marqo! This solution is scalable, efficient, and user-friendly, making it ideal for ecommerce applications of all sizes. Whether refining queries or delivering top-notch results, your search application is ready to provide a seamless user experience.