Getting Started

Getting Started with Marqtune: A Complete Guide

This article contains information on how to get up and running with Marqtune, the embedding model training platform.

What is Marqtune?

Marqtune is an embedding model training platform designed to power search systems, improving relevance based on business-specific requirements. Marqtune harnesses the power of its novel training framework, Generalized Contrastive Learning (GCL). With GCL, Marqtune fine-tunes embedding models to rank results based on both semantic relevance and a ranking system defined by search teams.

How Do I Get Access to Marqtune?

First, sign up to Marqo Cloud, if you haven’t already, and select ‘Marqtune’ from the left-hand navigation and request access.

How Do I Fine-Tune With Marqtune?

This section will walk you through the process of fine-tuning a pre-trained model using a multi-modal training dataset. We will then evaluate the performance of the fine-tuned model and compare it with an equivalent evaluation of the pre-trained model to demonstrate an improvement in performance. This tuned model can subsequently be used in a Marqo index to provide more relevant results for queries.

By completing the steps in this walkthrough you will learn how to use the Marqtune Python client to:

  1. Set up datasets in Marqtune
  2. Fine-tune a pre-trained model with a training dataset
  3. Evaluate models with an evaluation dataset
  4. Download a fine-tuned model

The code in this walkthrough can be found on GitHub here or in Google Colab here.

1. Set Up and Installation

To use Marqtune you will need:

  • Python (3.11+)
  • A Marqo API key with access to Marqtune. To obtain this, sign up to Marqo Cloud and navigate to the API keys section and create your own. For more information on obtaining your Marqo API Key, see this article.
  • The Marqtune Python client which can be installed with pip install marqtune

Let's first install the Marqtune Python client:


pip install marqtune

You are also recommended to set up a Python virtual environment and to use IPython to run the code interactively but you can simply run the following code in a single Python script too.

2. Initializing the Client

We now make the necessary imports and set up the Marqtune Python client. Note that all the Python snippets in this article are designed for you to simply copy and paste unchanged (though you are encouraged to experiment, of course); the api_key value below is the only exception - you are required to replace this with your own Marqo API key. To find your Marqo API key, follow this article.


from marqtune.client import Client
from marqtune.enums import DatasetType, ModelType, InstanceType
from urllib.request import urlopen
import gzip
import json
import uuid

# suffix is used just to make the dataset and model names unique
suffix = str(uuid.uuid4())[:8]
print(f"Using suffix={suffix} for this walkthrough")

# Change this to your API Key:
api_key = ""

# Creating Marqtune client
marqtune_client = Client(url="https://marqtune.marqo.ai", api_key=api_key)

Note that the results of datasets and other resources generated in this walkthrough can be viewed with the Marqtune UI. The Marqtune UI will look as follows:

Figure 1: The Marqtune UI

The Marqtune UI will become populated when we create datasets, models and evaluations. Let’s look at how we can do that now.

3. Dataset Creation

We will now create two datasets, one for training and another for evaluation. The datasets will be sourced from a couple of CSV files. The data in these CSV files consists of shopping data generated from a subset of Marqo-GS-10M which is described in more detail in our open-source GCL repository.

Both CSV files have the same format; however, the first one is larger (100,000 rows) which we will use for training a model, the second is smaller (25,000 rows) which we will use for model evaluation.

The datasets are multi-modal, consisting of both text and images. The images are represented by URLs that Marqtune will use to download.

Let’s begin by downloading these data files locally:


print("Downloading data files:")

# The base path for the datasets
base_path = (
    "https://marqo-gcl-public.s3.us-west-2.amazonaws.com/marqtune_test/datasets/v1"
)
# The names of the training and evaluation data 
training_data = "gs_100k_training.csv"
eval_data = "gs_25k_eval.csv"

# Download training and evaluation data
open(training_data, "w").write(
    gzip.open(urlopen(f"{base_path}/{training_data}.gz"), "rb").read().decode("utf-8")
)
open(eval_data, "w").write(
    gzip.open(urlopen(f"{base_path}/{eval_data}.gz"), "rb").read().decode("utf-8")
)

We now want to create datasets in Marqtune. In order to do this, we need to identify the columns in the CSVs as well as their types by defining a data schema. We will reuse the same data schema for both training and evaluation datasets though this is not strictly necessary. It's important to ensure all columns in your CSV are specified in your data schema.


data_schema = {
    "query": "text",
    "title": "text",
    "image": "image_pointer",
    "score": "score",
}

After defining the data schema we can then create the two datasets. Note that creating a dataset takes a few minutes to complete as it accomplishes a few steps:

  1. The CSV file has to be uploaded
  2. Some simple validations have to pass (e.g. the data schema needs to be validated against each row in the CSV input)
  3. The URLs in the image_pointer columns are used to download the image files to the dataset

# Create the training dataset.
training_dataset_name = f"{training_data}-{suffix}"
print(f"Creating training dataset ({training_dataset_name}):")
training_dataset = marqtune_client.create_dataset(
    dataset_name=training_dataset_name,
    file_path=training_data,
    dataset_type=DatasetType.TRAINING,
    data_schema=data_schema,
    query_columns=["query"],
    result_columns=["title", "image"],
    # setting wait_for_completion=True will make this a blocking call and will also print logs interactively
    wait_for_completion=True,
)

# Similarly we create the Evaluation dataset.
eval_dataset_name = f"{eval_data}-{suffix}"
print(f"Creating evaluation dataset ({eval_dataset_name}):")
eval_dataset = marqtune_client.create_dataset(
    dataset_name=eval_dataset_name,
    file_path=eval_data,
    dataset_type=DatasetType.EVALUATION,
    data_schema=data_schema,
    query_columns=["query"],
    result_columns=["title", "image"],
    wait_for_completion=True,
)

Let’s take a look at the Marqtune UI now we are creating datasets.

Figure 2: The datasets table in the Marqtune UI becomes populated when creating datasets.

We can see the dataset name, ID, type and status. While the datasets are being created you will see the ‘Creating’ status and when they are created, this will change to ‘Ready’ as can be seen in the image above.

4. Model Tuning

Now we're ready to train a model. To do so, we define a few training hyperparameters. In this example we've set some parameters that work well with the sample dataset but you are encouraged to experiment with these values for your own datasets.

In our example for the base pre-trained open clip model, we've chosen to use ViT-B-32 - laion400m_e31 which is a good model to start with as it gives us good performance with low latency/memory usage. We have previously published a guide to help you choose the right model for your use case.


# Setup training hyper parameters:
training_params = {
    "leftKeys": ["query"],
    "leftWeights": [1],
    "rightKeys": ["image", "title"],
    "rightWeights": [0.9, 0.1],
    "weightKey": "score",
    "epochs": 5,
}

base_model = "ViT-B-32"
base_checkpoint = "laion2b_s34b_b79k"

model_name = f"{training_data}-model-{suffix}"
print(f"Training a new model ({model_name}):")
tuned_model = marqtune_client.train_model(
    dataset_id=training_dataset.dataset_id,
    model_name=f"{training_data}-model-{suffix}",
    instance_type=InstanceType.BASIC,
    base_model=f"Marqo/{base_model}.{base_checkpoint}",
    hyperparameters=training_params,
    wait_for_completion=True,
)

The training_params dictionary is used to define the training hyperparameters. We've chosen a minimal set of hyperparameters to get you started - primarily the left/right keys define the columns in the input CSV that we're training on. You can experiment on these parameters yourself, refer to the Training Parameters documentation for documentation on these and other parameters available for training.

You may choose to run this training faster using more powerful hardware. You can specific this with: instance_type=InstanceType.PERFORMANCE.

It's also worth noting that once training has been successfully kicked off in Marqtune it will continue until completion no matter what happens to your local client session. On start, the logs will show the new model id that can be used to identify your model - copy this id so that if your local console disconnects for some reason during training you can always resume the rest of this guide after loading the completed model: tuned_model = marqtune_client.model('<model id>').

Again, in the UI, we observe the models name, ID, base model and status.

Figure 3: The models table in the Marqtune UI becomes populated when we begin training our model.

While the training is occurring, we can take a look at the logs. Simply click on the model and you will be redirected as follows:

Figure 4: Logs for the model that we are training in the Marqtune UI.

Note, the logs contain information about the training process. Here’s an example:


1721298452795 2024-07-18 10:27:32,795 - INFO - 2024-07-18T10:27:22.521705828Z 2024-07-18,10:27:22 | INFO | Train Epoch: 0 [   256/100000 (0%)] Data (t): 1.996 Batch (t): 6.086, 42.0608/s, 42.0608/s/gpu LR: 0.000000 Logit Scale: 100.003, Logit Bias: 0.000, Txt_img_0_0_loss: 1.5323 (1.5323) Txt_txt_0_1_loss: 2.2189 (2.2189) Weighted_mean_loss: 1.1285 (1.1285) Loss: 1.6030 (1.6030)
1721298602882 2024-07-18 10:30:02,882 - INFO - 2024-07-18T10:29:57.309371399Z 2024-07-18,10:29:57 | INFO | Train Epoch: 0 [ 25856/100000 (26%)] Data (t): 0.742 Batch (t): 1.548, 160.529/s, 160.529/s/gpu LR: 0.000005 Logit Scale: 99.984, Logit Bias: 0.000, Txt_img_0_0_loss: 0.89163 (1.2119) Txt_txt_0_1_loss: 0.40258 (1.3107) Weighted_mean_loss: 0.61996 (0.87425) Loss: 0.70145 (1.1522)
1721298762976 2024-07-18 10:32:42,975 - INFO - 2024-07-18T10:32:34.148183677Z 2024-07-18,10:32:34 | INFO | Train Epoch: 0 [ 51456/100000 (52%)] Data (t): 0.766 Batch (t): 1.568, 161.327/s, 161.327/s/gpu LR: 0.000010 Logit Scale: 99.969, Logit Bias: 0.000, Txt_img_0_0_loss: 0.89120 (1.1050) Txt_txt_0_1_loss: 0.32536 (0.98227) Weighted_mean_loss: 0.66933 (0.80594) Loss: 0.69427 (0.99957)
1721298913063 2024-07-18 10:35:13,063 - INFO - 2024-07-18T10:35:10.632618682Z 2024-07-18,10:35:10 | INFO | Train Epoch: 0 [ 77056/100000 (77%)] Data (t): 0.762 Batch (t): 1.565, 167.016/s, 167.016/s/gpu LR: 0.000015 Logit Scale: 99.943, Logit Bias: 0.000, Txt_img_0_0_loss: 1.0737 (1.0972) Txt_txt_0_1_loss: 0.36037 (0.82679) Weighted_mean_loss: 0.74134 (0.78979) Loss: 0.81226 (0.95274)
1721299053145 2024-07-18 10:37:33,145 - INFO - 2024-07-18T10:37:29.910381986Z 2024-07-18,10:37:29 | INFO | Train Epoch: 0 [ 99840/100000 (100%)] Data (t): 0.762 Batch (t): 1.565, 165.425/s, 165.425/s/gpu LR: 0.000019 Logit Scale: 99.911, Logit Bias: 0.000, Txt_img_0_0_loss: 0.97796 (1.0733) Txt_txt_0_1_loss: 0.27951 (0.71734) Weighted_mean_loss: 0.60969 (0.75377) Loss: 0.71128 (0.90445)

We see here information about the epoch, data, batch, logit scale, logit bias, text-image loss, text-text loss, weighted mean loss and loss.

5. Evaluation

Once we've successfully tuned the model we will want to be able to quantify the performance of the tuned model against the baseline set by the original base model. To do this we can get Marqtune to use the evaluation dataset to run an evaluation on the original base model to establish a baseline and then a subsequent evaluation with the same dataset on the last checkpoint generated by our freshly tuned model.

Finally, we will print out the results of each evaluation which should show the tuned model returning better performance numbers than the base model.


eval_params = {
    "leftKeys": ["query"],
    "leftWeights": [1],
    "rightKeys": ["image", "title"],
    "rightWeights": [0.9, 0.1],
    "weightKey": "score",
}

print("Evaluating the base model:")
base_model_eval = marqtune_client.evaluate(
    dataset_id=eval_dataset.dataset_id,
    model=f"Marqo/{base_model}.{base_checkpoint}",
    hyperparameters=eval_params,
    wait_for_completion=True,
)

print("Evaluating the tuned model:")
tuned_model_id = tuned_model.model_id
tuned_checkpoint = tuned_model.describe()["checkpoints"][-1]
tuned_model_eval = marqtune_client.evaluate(
    dataset_id=eval_dataset.dataset_id,
    model=f"{tuned_model_id}/{tuned_checkpoint}",
    hyperparameters=eval_params,
    wait_for_completion=True,
)


# Convenience function to inspect evaluation logs and extract the results
def print_eval_results(description, evaluation):
    results = next(
        (
            json.loads(log["message"][index:].replace("'", '"'))
            for log in evaluation.logs()[-10:]
            if (index := log["message"].find("{'mAP@1000': ")) != -1
        ),
        None,
    )
    print(description)
    print(json.dumps(results, indent=4))


print_eval_results("Evaluation results from base model:", base_model_eval)
print_eval_results("Evaluation results from tuned model:", tuned_model_eval)

Again, we've chosen a minimal set of hyperparameters for the evaluation tasks, and you can read about these in the Evaluation Parameters documentation.

Due to the inherent stochasticity of training and evaluation the results you see will likely be different from our measurements, but you should see improvements similar to the measurements below (higher numbers are better):

Figure 5: Results from our fine-tuning with Marqtune.

Picking out one of the above metrics: NDCG@10 (Normalized Discounted Cumulative Gain - a measure of the ranking and retrieval quality of the model by comparing top 10 model retrievals with the ground truth) we can see our tuned model performed better than the base model. Similarly, the other metrics also show consistent improvements. Refer to our blog post on Generalised Contrastive Learning for Multimodal Retrieval and Ranking for more information as well as an explanation of each of the metrics above.

Let’s take a look at what we can expect from the Marqtune UI. When both evaluations are complete, you will see the following:

Figure 6: The evaluations table in the Marqtune UI becomes populated when we begin evaluations.

We can click on either evaluation to obtain their logs which will show us information such as Recall@1 → Recall@1000 as well as other metrics.

Figure 7: Logs for the evaluations in the Marqtune UI.

6. Download and Cleanup

At this point, you can download the model to your local disk:


tuned_model.download()

This will download the model locally in the form of a checkpoints file, .pt. From here you can choose to create a Marqo index with this custom model.

Finally, you can choose to (optionally) clean up your generated resources:


training_dataset.delete()
eval_dataset.delete()
tuned_model.delete()
base_model_eval.delete()
tuned_model_eval.delete()

Conclusion

This article has guided you through the process of fine-tuning a base open clip model using a multi-modal training dataset with Marqtune. We evaluated the performance of this newly fine-tuned model and found significant improvements when compared to the base model. Marqtune can be used to fine-tune a variety of different models—try it yourself, today!

Code

GitHub

Google Colab

Ishaaq Chandy
Senior Principal Engineer at Marqo