In a vector database, embedding size plays a crucial role in determining the operating cost. While smaller embeddings can lead to higher efficiency and lower cost, they also offer less granularity. It would be beneficial for a vector database to provide flexible embedding sizes catering to different end-users. In this blog post, we discuss a technique that allows such flexibility.
Matryoshka Representation Learning (MRL) is a technique that allows flexible sizes of embeddings with minimal model adjustments. Once a model generates an embedding of fixed-size for a sample, it utilizes the first several dimensions to create a separate, smaller embedding. For example, if the size of original embedding is 512, we can extract the first 256 dimensions, 128 dimensions, and 64 dimensions to create three smaller embeddings. This group of user-selected dimensions (i.e., {512, 256, 128}) is referred to as a dimension set.
When using MRL, a model needs to be trained with training losses computed by each sub-dimension embedding. By doing so, the model learns to condense important information in the smaller dimensions. In this blog, we trained Generalized Contrastive Learning (GCL), our model that extends CLIP to allow multiple representations for a sample, with MRL on a subset of GS-Marqo-10M. For more information on GCL and the dataset, refer to our blog.
For each dimension in the dimension set, we extract the Matryoshka representation for image and text features. These representations are used to calculate a loss for each sub-dimension. The final loss is a weighted sum of these losses. The weights, referred to as relative importance scales, can be either ones or predetermined weights.
We compared the retrieval and ranking performance of GCL with and without MRL across in-domain, novel query, novel document, and zero-shot splits. The relative reduction of normalized discounted cumulative gain (nDCG) is demonstrated when the embedding size decreases, with the original embedding's performance set as the benchmark at 100%. It's evident that GCL trained with MRL maintains nDCG across different splits, while GCL without MRL experiences a significant decrease in performance. This demonstrates that the embedding sizes can be reduced by training the model with MRL, while minimally affecting its performance.
A crucial question is whether the GCL trained with MRL performs as well as the original GCL without MRL, using the same embedding size. This would confirm that the model trained with MRL can be used even without reducing the embedding size, offering flexibility in choosing to either reduce or maintain the original size. Otherwise, separate models would need to be trained, limiting its practical use. Our findings demonstrate that both GCL with MRL and without MRL perform similarly with the original embedding size.
Important notes: MRL is known to encourage a model to converge faster, which complicates a straightforward comparison between GCL with and without MRL by merely adding MRL losses during training. In our experiments, we decreased the number of epochs and modified relative importance scales to establish a setting where both GCL with MRL and GCL without MRL show similar in-domain performance. Therefore, the results shouldn't be interpreted as MRL damaging in-domain performance while enhancing novel query, novel document, and zero-shot performance. Rather, they indicate that both achieve comparable performance.
There are crucial hyperparameters such as the dimension set and relative importance scales and architecture choices we can make . Here are some observations on them:
Please note that Method #2 bypasses the MRL truncation process, creating a loose connection to MRL. Despite this, we have included it for comparison as it serves as a straightforward baseline. Method #1 shows advantages in the novel query split, but it underperforms in other splits. In contrast, Method #2 performs best in the novel document and zero-shot splits when using smaller embedding sizes. The decision to include a projection linear layer is not straightforward and might be worth experimenting with, depending on the specific use case.
The original paper introduced "adaptive retrieval" for retrieval tasks. This method uses smaller dimensions to first retrieve document candidates, which are then reranked using larger dimensions. However, in this blog, we only considered a first-stage retrieval system and did not apply the adaptive retrieval. Implementing such a method could potentially improve performance further.
In this blog, we analyzed the effectiveness of MRL for a CLIP-based model, GCL, in multimodal retrieval and ranking. Our findings indicate that training the model with MRL can mitigate performance degradation when the embedding size decreases. We also examined the impact of changes in hyperparameters and the model's architecture on its performance. Overall, MRL could enable users to select the embedding sizes for the same model without incurring additional computational cost. However, careful consideration and thorough experimentation are crucial to optimize the model's performance.