In the age of big data and artificial intelligence, the ability to efficiently search through large datasets has become increasingly important. Traditional keyword-based search methods often fall short in providing accurate and relevant results, especially in complex scenarios involving natural language processing and multimedia data. This is where vector search comes into play.
Before diving in, if you need help, guidance, or want to ask questions, join our Community and a member of the Marqo team will be there to help.
1. What is Vector Search?
Vector search, also known as similarity search or vector similarity search, is a method that involves representing data points (such as text, images, or other types of documents) as vectors in a multi-dimensional space. Each vector captures the semantic meaning of the data, allowing for more accurate and context-aware search results. Instead of matching keywords, vector search calculates the similarity between vectors to find the most relevant results.
2. Why Use Vector Search?
- Improved Relevance: Vector search can capture the context and nuances of natural language, leading to more accurate search results.
- Multimodal Search: It enables searching across different types of data (e.g., text and images) simultaneously, making it versatile for various applications.
- Scalability: Vector search is designed to handle large-scale datasets efficiently, making it suitable for big data applications.
- Flexibility: It allows for complex queries that can incorporate multiple factors and weights, providing more sophisticated search capabilities.
3. Let’s Build! A Simple Search Demo
For this article, we will be using Marqo, an end-to-end vector search engine. Marqo is super easy to implement (only takes a few lines of code to set up) and they handle a lot of the complicated stuff for you, including embedding generation.
1. Set Up and Installation
We’ll start with downloading and installing Marqo. If you have any issues setting up Marqo, visit our Slack Community and send us your issue on the ‘get-help’ channel where we’ll be there to help!
- Marqo requires Docker. To install Docker go to Docker Docs and install for your operating system.
- Once Docker is installed, you can use it to run Marqo. First, open the Docker application and then head to your terminal and enter the following:
First, you will begin pulling from marqoai/marqo followed by setting up a vector store. Next, Marqo artefacts will begin downloading. Then, you’ll be greeted with this lovely welcome message once everything is set up successfully. This can take a little bit of time while it downloads everything needed to begin searching.
That’s it - It really is as easy as that! Now we’re ready to use Marqo! It’s important that you keep your terminal open while we begin programming.
2. Start Searching!
While Docker is running, we can use Marqo as we would any other Python library. We’ll begin with a simple example where we create an index and perform searches on movie descriptions. If you have any issues with the following code, visit our Slack Community and send us your issue on the ‘get-help’ channel where we’ll be there to help!
Let’s first install Marqo in our terminal:
Now we’re ready to write our first vector search system!
Navigate to a Python script and begin by importing Marqo:
This step sets up the client to interact with the Marqo API, allowing us to perform various operations such as creating indexes and adding documents.
Before we create a new index, it's good practice to delete any existing index with the same name to avoid conflicts. Here, we are deleting the "movies-index" if it already exists.
This ensures that we start with a clean slate every time we run our script.
Creating an index is crucial as it prepares Marqo to store and manage the documents we'll be working with.
Now, we add some movie descriptions to our index. These descriptions will be vectorized and stored in the index, making them searchable. We specify a 'Title' and 'Description' for each movie.
With our index populated with movie descriptions, we can now perform a search query. Let's search for a movie related to space exploration.
This query searches the descriptions in our index for content related to space exploration.
Finally, we print out the search results, including the title, description, and the relevance score for each movie that matches the query.
Let’s look at the outputs:
Interstellar has the highest relevance score (0.817), indicating it is the most relevant to the query "Which movie is about space exploration?". The Martian follows closely with a score of 0.808, also highly relevant to the query. Inception and Shrek have lower scores (0.798 and 0.762, respectively), indicating they are less relevant to the space exploration theme. These scores help us understand how well each movie's description aligns with the search query, allowing us to identify the most pertinent results efficiently.
Awesome! Now we’ve seen how to get started with a simple search demo with Marqo, let’s look at searching over different types of data!
4. Multimodal Search - Searching Images
We’ll now walk through a practical example of using the Marqo library for multimodal indexing. We'll create an index that can handle both text and image data, add a document to the index, and perform a search.
As with the previous example, we'll import the Marqo library and create a Marqo client.
As with our previous example, before we create the index, it's important to delete any index with the same name that may already exist.
Now we'll add a document to our index. This document includes an image of a hippopotamus and a description. The image URL is treated as a tensor field.
Finally, we can perform a search on our index. We'll search for the term "animal" and print the results.
After running the search query for the term "animal," we received the following output:
Let's break down what each part of the output means:
The search output provides detailed information about the documents that match your search query, including their descriptions, image URLs, relevance scores, and more. By understanding this output, you can gain insights into how your data is being indexed and retrieved, allowing you to refine your search capabilities and improve the relevance of your results.
5.Conclusion
In this article, we've walked through the steps of setting up a Marqo client, creating an index, adding documents, and performing a search query. This process allows us to efficiently search through content using vector search. Marqo makes it straightforward to implement powerful search capabilities in your applications.
If you want to see what else Marqo is capable of, visit our documentation here.
6.Code
https://github.com/marqo-ai/fine-tuning-embedding-models-course