Marqo's new release allows you to:
Note, this is available in Marqo open source and will be coming to Marqo Cloud very soon!
In an increasingly multimedia-centric world, the ability to search across various content/data types is crucial. Whether you're managing a vast library of educational videos, a repository of podcasts, or a collection of marketing materials, Marqo's new capabilities ensure that you can find exactly what you need, when you need it.
There are endless possibilities with audio and video search but below are some of the most common use cases.
This section will show you how to get set up on Marqo Cloud in 5 simple steps.
First, we need to install Marqo with pip:
pip install marqo
Next, we will need to initialize the Marqo Client. This will allow us to create and add documents to an index. To obtain your API Key, see this article.
import marqo
api_key = "put_your_api_key_here"
mq = marqo.Client(url='https://api.marqo.ai', api_key=api_key)
Let’s set up the configuration for our index. Here, we specify the LanguageBind/Video_V1.5_FT_Audio_FT_Image
model which allows us to create an index that can handle video, audio, image, and text files. We also specify that we want to use marqo.GPU
as the inference type as well as configure other basic settings.
# Define settings for the index
settings = {
"type": "unstructured", # Unstructured data allows flexible input types
"vectorNumericType": "float", # Use floating-point numbers for vector embeddings
"model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image", # Model to handle text, audio, video, and images
"normalizeEmbeddings": True, # Normalize embeddings to ensure comparability
"treatUrlsAndPointersAsMedia": True, # Treat URLs as media files
"treatUrlsAndPointersAsImages": True, # Specifically treat certain URLs as images
"audioPreprocessing": {"splitLength": 10, "splitOverlap": 5}, # Split audio into 10-second chunks with 5-second overlap
"videoPreprocessing": {"splitLength": 20, "splitOverlap": 5}, # Split video into 20-second chunks with 5-second overlap
"inferenceType": "marqo.GPU", # Specify inference type
}
# Create a new index with the specified settings
mq.create_index("audio-and-video-search", settings_dict=settings)
We now add our audio, video and image documents to our index. We will add one of each type. The video file is a video of our co-founder, Jesse Clark, giving a presentation at Google HQ. The audio file is blues music. The image file is our fashion-CLIP model logo - a hippo with a hat. These URLs are public so feel free to inspect them for yourself.
mq.index("audio-and-video-search").add_documents(
documents=[
# Add an audio file (blues music)
{"audio_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-audio.mp3", "_id": "id1"},
# Add a video file (public speaking)
{"video_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4", "_id": "id2"},
# Add an image (Marqo logo which is a hippo)
{"image_field": "https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-image.png", "_id": "id3"},
# Add more documents here if needed
],
tensor_fields=['audio_field', 'video_field', 'image_field'] # Specify which fields should be embedded
)
Let's search over this index. We will use the 'public speaking' query as this description matches our video file well.
# Search the index for a query related to public speaking
res = mq.index("audio-and-video-search").search("public speaking")
print(res['hits'][0]) # Print the top hit (should relate to the video of public speaking)
After performing our search, we obtain the output:
{'_id': 'id2', 'video_field': 'https://marqo-tutorial-public.s3.us-west-2.amazonaws.com/example-video.mp4', '_highlights': [{'video_field': '[0.8858670000000011, 20.885867]'}], '_score': 0.5409741804365457}
From this, we can clearly see the video file is our top result.
We encourage you to add more video, audio, image, and text documents to your index and experiment with different queries.
If you follow the steps in this guide, you will create an index with GPU inference and a basic storage shard. When you are done with the index you can delete it with the following code:
mq.delete_index("audio-and-video-search")
If you do not delete your index you will continue to be charged for it.
You've seen how to get set up on Marqo Cloud. This section will explain how to run Marqo locally through Docker.
Marqo requires Docker. If you haven’t already, install Docker. Then, navigate to your terminal and input the following to run marqo:
docker rm -f marqo
docker pull marqoai/marqo:latest
docker run --name marqo -it -p 8882:8882 \
-e MARQO_MODELS_TO_PRELOAD="[]" \
-e MARQO_MAX_CUDA_MODEL_MEMORY=16 \
-e MARQO_MAX_CPU_MODEL_MEMORY=16 marqoai/marqo:latest
Note, we configure some variables here. MARQO_MODELS_TO_PRELOAD
is set to []
so that no models are automatically loaded—we will load our audio/video models later. In addition, MARQO_MAX_CUDA_MODEL_MEMORY
and MARQO_MAX_CPU_MODEL_MEMORY
have a default value of 4 so we increase this to 16 to allow for larger model sizes and more complex computations, enabling faster processing and handling.
Now, we install marqo
, the Python client for interacting with the Marqo server running in Docker:
pip install marqo
That’s it! Now we’re ready to create an index and begin searching over audio and video files.
First, create a new Python script and input the following. Here, we import marqo
and set up the client:
import marqo
# Set up the marqo client
mq = marqo.Client("http://localhost:8882")
Next, we specify our settings for this index.
settings = {
"type": "unstructured", # Type of index
"vectorNumericType": "float", # Numeric type for vector encoding
"model": "LanguageBind/Video_V1.5_FT_Audio_FT_Image", # The model to use to vectorise doc content
"normalizeEmbeddings": True, # Normalize the embeddings to have unit length
"treatUrlsAndPointersAsMedia": True, # Fetch images, videos and audio from pointers
"treatUrlsAndPointersAsImages": True, # Fetch image from pointers
"audioPreprocessing": {"splitLength": 10, "splitOverlap": 3}, # The audio preprocessing object
"videoPreprocessing": {"splitLength": 10, "splitOverlap": 3} # The video preprocessing object
}
Here, we are using the LanguageBind/Video_V1.5_FT_Audio_FT_Image
model. For more information on this model, see the model card here. For more information on the additional inputs featured here, visit our documentation.
Now, we create our index and add documents to it. Below we’ve included an audio example but you can also specify video files/URLs here too.
# Create your marqo index
resp = mq.create_index("my-index", settings_dict=settings)
# Add documents to your index
res = mq.index("my-index").add_documents(
documents = [
# Add an audio file of music
{"audio_field": "https://dn720302.ca.archive.org/0/items/cocktail-jazz-coffee/01.%20Relaxing%20Jazz%20Coffee.mp3", "_id": "id1"},
# Or add a video file of a movie
# {"video_field": "https://ia800103.us.archive.org/27/items/electricsheep-flock-248-22500-1/00248%3D22801%3D20924%3D20930.mp4", "_id": "id2"},
# Or add an image
# {"image_field": "https://raw.githubusercontent.com/marqo-ai/marqo-api-tests/mainline/assets/ai_hippo_realistic.png", "_id": "id3"},
# Add more documents here
],
tensor_fields=['audio_field']
)
3. Search!Now we can perform a text search over this data.
# Search for jazz music
res = mq.index("my-index").search("jazz music")
# Print the top hit
print(res['hits'][0])
This returns:
{'audio_field': 'https://dn720302.ca.archive.org/0/items/cocktail-jazz-coffee/01.%20Relaxing%20Jazz%20Coffee.mp3', '_id': 'id1', '_highlights': [{'audio_field': '[156.034286, 166.034286]'}], '_score': 0.5701236134891678}
We see that the top hit for our query returns the jazz music audio file we added to our index.
Audio and video search in Marqo really is as easy as that!
We recommend the following when using audio and video search with Marqo:
This major milestone wouldn’t have been possible without our incredible community offering suggestions, feedback and ideas. Join our growing community to share your experiences, ask questions, and collaborate: