Showcase

How to Implement Ecommerce Search in 5 Key Lines of Code

December 10, 2024

mins read

Discover how to create a powerful ecommerce image search application with minimal effort! This article walks you through using Marqo's advanced embeddings to set up a search engine in just 5 key lines of code, ready for deployment.

All of the code in this article can be run in Google Colab here. Let's dive in!

Key Line 1: Install Marqo

In this article, we'll be using the end-to-end vector search engine, Marqo. First, we need to install this with pip:


pip install marqo

Key Line 2: Set Up Marqo Client

Next, we will need to initialize the Marqo Client. This will allow us to create and add documents to an index. To obtain your API Key, see this article.


import marqo

# Set up Marqo Client 
api_key = "API_KEY"   # To find your API Key, see https://www.marqo.ai/blog/finding-my-marqo-api-key
mq = marqo.Client("https://api.marqo.ai", api_key=api_key)

‍

For information on how to set up Marqo locally, see our GitHub.

Key Line 3: Create Marqo Index

Let’s set up the configuration for our index. This configuration will specify the type of index, the embedding model to use, and any specific settings for handling image URLs and inference. We will be using marqo-ecommerce-embeddings-L. This is Marqo's state-of-the-art embedding model for ecommerce. For more information, see our blog post.


settings = {
    "type": "unstructured",  # Set the index type as unstructured data
    "model": "Marqo/marqo-ecommerce-embeddings-L",  # Specify alternative model
    "modelProperties": {
        "name": "hf-hub:Marqo/marqo-ecommerce-embeddings-L",  # Name of the model on Hugging Face Hub
        "dimensions": 1024,  # Larger dimensionality for embeddings
        "type": "open_clip"  # Model type, using OpenCLIP architecture
    },
    "treatUrlsAndPointersAsImages": True,  # Enable image URLs as image sources
    "inferenceType": "marqo.CPU.large",  # Specify the inference type using Marqo's large CPU instance
}

‍

With the settings defined, we specify our next key line of code which is creating our index with Marqo.


index_name = "marqo-ecommerce-l"  # Specify the name of the index

try:
    mq.index(index_name).delete()  # Delete the existing index if it already exists to avoid conflicts
except:
    pass  # If the index does not exist, skip deletion

mq.create_index(index_name, settings_dict=settings)  # Create a new Marqo index with the specified settings

`‍`Key Line 4: Add Documents to the Index

With the index set up, it’s time to add our product data. We will be using a 100k dataset that is a subset of the Marqo-GS-10M dataset. You can access this CSV in our GitHub.

We’ll start by loading our product data:


import pandas as pd  # Import the pandas library for data manipulation

path_to_data = "data/marqo-gs_100k.csv"  # Define the path to the CSV file containing product data
df = pd.read_csv(path_to_data)  # Load the product data from the CSV file into a pandas DataFrame

‍

Next, we convert the data into a format suitable for Marqo:


documents = [
    {"image_url": image, "query": query, "title": title}
    for image, query, title in zip(df["image"], df["query"], df["title"])
]

‍

Finally, we add our next key line of code using the add_documents function. We’ll also apply a custom mapping to weight different fields based on their importance in the search process.


batch_size = 64  # Define the batch size for uploading documents

for i in range(0, len(documents), batch_size):  # Loop through documents in batches
    batch = documents[i:i + batch_size]  # Select a batch of documents

    mq.index(index_name).add_documents(  # Add the batch of documents to the Marqo index
        batch,
        client_batch_size=batch_size,  # Set the batch size for the client
        mappings={
            "image_title_multimodal": {  # Define a multimodal field combining image, title, and category
                "type": "multimodal_combination",  # Set the field type as multimodal
                "weights": {"title": 0.1, "query": 0.1, "image_url": 0.8},  # Assign weights to each field
            }
        },
        tensor_fields=["image_title_multimodal"],  # Specify fields for tensor generation
    )

`‍`Key Line 5: Search with Marqo

Searching with Marqo only involves one key line of code. For the purpose of this tutorial, we'll add some additional code to make the search UI more user-friendly using Gradio. This interface will allow users to enter a query, specify if they want more or less of something in their query, and view the top results returned by Marqo.


import requests  # Import the requests library for handling HTTP requests
import io  # Import io for handling byte streams
from PIL import Image  # Import PIL's Image module for image processing

def search_marqo(query, themes, negatives):
    query_weights = {query: 1.0}  # Assign a weight of 1.0 to the main query
    if themes:
        query_weights[themes] = 0.75  # Apply a positive weight to emphasize additional themes
    if negatives:
        query_weights[negatives] = -1.1  # Apply a negative weight to de-emphasize certain themes

    # Perform search on the Marqo index
    res = mq.index(index_name).search(query_weights, limit=10)  # Limit results to top 10

    # Process results to prepare for display
    products = []
    for hit in res['hits']:
        image_url = hit.get('image_url')  # Get the image URL from the search hit
        title = hit.get('title', 'No Title')  # Get the product title, default to 'No Title' if missing

        # Retrieve image from the provided URL
        response = requests.get(image_url)  # Make a request to the image URL
        image = Image.open(io.BytesIO(response.content))  # Open the image from the response content

        # Prepare product details for display in the interface
        product_info = f'{title}'
        products.append((image, product_info))  # Append the image and details to the results list

    return products  # Return the list of processed products for display

‍

With the search function ready, we can now build the interface.


import gradio as gr

# Gradio Blocks Interface
with gr.Blocks(css=".orange-button { background-color: orange; color: black; }") as interface:
    gr.Markdown("Multimodal Ecommerce Search with Marqo")
    with gr.Row():
        query_input = gr.Textbox(placeholder="Coffee machine", label="Search Query")
        themes_input = gr.Textbox(placeholder="Silver", label="More of...")
        negatives_input = gr.Textbox(placeholder="Buttons", label="Less of...")

    search_button = gr.Button("Submit", elem_classes="orange-button")
    results_gallery = gr.Gallery(label="Top 10 Results", columns=4)

    # Set up button click functionality
    search_button.click(fn=search_marqo, inputs=[query_input, themes_input, negatives_input], outputs=results_gallery)

# Launch the app
interface.launch()

‍

‍When launched, users can enter a main search query, add themes for more refinement, and even specify themes to avoid.

Step 6: Clean Up

If you follow the steps in this guide, you will create an index with CPU large inference and a basic storage shard. This index will cost $0.38 per hour. When you are done with the index you can delete it with the following code:


mq.delete_index(index_name)

‍

If you do not delete your index you will continue to be charged for it.

Conclusion

In just 5 key lines of code, you’ve set up an ecommerce image search engine powered by Marqo! This solution is scalable, efficient, and user-friendly, making it ideal for ecommerce applications of all sizes. Whether refining queries or delivering top-notch results, your search application is ready to provide a seamless user experience.