Mean Reciprocal Rank (MRR) is a metric that measures the effectiveness of a ranking system by focusing on the position of the first relevant result in the ranked list returned by the model. It provides insight into how efficiently a system retrieves the top relevant item for a query, which is especially useful in applications like search engines, recommendation systems, and question-answering systems.
To calculate MRR, you first need to compute the reciprocal rank for each query. This is defined as:
$$ \text{Reciprocal Rank} = \frac{1}{\text{rank of the first relevant result}} $$
The MRR is the average of the reciprocal ranks across all queries:
$$ \text{MRR} = \frac{1}{|Q|} \sum^{|Q|}_{i=1} \frac{1}{\text{rank}}_i $$
where \(Q\) is the total number of queries and \(rank_i\) is the rank position of the first relevant result for the \(i\)-th query.
Let’s say we have three queries with the following first relevant result ranks:
Now, the MRR is the mean of these reciprocal ranks:
$$ \text{MRR} = \frac{1}{3}\big( 0.5 + 1 + 0.25\big) = 0.583 $$
In this example, an MRR of \(0.583\) suggests that, on average, the first relevant item appears around the second position.
When working with MRR, you might see it specified with an “@” symbol followed by a number, like MRR@10 or MRR@100. These denote the cutoff point for evaluating how quickly the first relevant result appears within the result list. For example:
These metrics provide insight into how quickly a system presents relevant results to users within specified sections of the results list.
MRR is particularly useful for systems where users are likely to care more about the first relevant result rather than later results. Such scenarios include:
MRR is a powerful, easy-to-understand metric that provides insight into a model’s ability to rank the first relevant item at the top. While it has limitations, its simplicity makes it an excellent starting point for evaluating search and recommendation systems where the relevance of the first result is crucial. By focusing on MRR, you can better understand your model’s strengths and identify areas for improvement, ultimately leading to better user satisfaction in your applications.