NDCG stands for Normalized Discounted Cumulative Gain. It’s a metric that evaluates the relevance of results returned by a search engine or recommendation algorithm, giving higher importance to the order of relevance. This metric is particularly useful when the goal is to ensure that the most relevant items appear as close to the top of the list as possible, maximizing user satisfaction by making the most useful information quickly accessible.
NDCG is given by the following formula:
$$ NDCG@K = \frac{DCG@K}{IDCG@K} $$
This formula contains Discounted Cumulative Gain (DCG) and Ideal Discounted Cumulative Gain (IDCG). Let’s discuss these two components.
DCG is given by:
$$ DCG@K = \sum^{K}_{i=1}\frac{\text{relevance score of the item at }i}{\log_2(i+1)} $$
This formula contains two key components:
If you have a list of relevance scores \([3, 2, 3, 0, 1]\) and want to calculate DCG@5:
$$ DCG@5 = \frac{3}{\log_2(1+1)} + \frac{2}{\log_2(2+1)} + \frac{3}{\log_2(3+1)} + \frac{0}{\log_2(4+1)} + \frac{1}{\log_2(5+1)} $$
Calculating each term, you get the cumulative score that reflects both relevance and position sensitivity for the top 5 results.
IDCG is the theoretical maximum DCG that you can achieve for a specific list of results. Essentially, IDCG is calculated by ordering all items in an ideal sequence—where the most relevant results are at the top of the list—and then applying the DCG calculation.
The formula for this can be written as:
$$ DCG@K = \sum^{K}_{i=1}\frac{\text{relevance score of the ideal item at }i}{\log_2(i+1)} $$
Again, this formula contains:
If you’re calculating IDCG@5 with relevance scores sorted in ideal order as \([3, 3, 2, 1, 0]\) then:
$$ DCG@5 = \frac{3}{\log_2(1+1)} + \frac{3}{\log_2(2+1)} + \frac{2}{\log_2(3+1)} + \frac{1}{\log_2(4+1)} + \frac{0}{\log_2(5+1)} $$
This ideal DCG score provides a benchmark for normalizing DCG to calculate NDCG, allowing you to measure how close a ranked list is to the ideal.
Now we've established the two key components to NDCG, let's take a look at an example. Imagine a search query where you return a ranked list of items with the following relevance scores: \([3, 2, 3, 0, 1]\) . Let’s say we’re evaluating NDCG@5. We follow the steps:
The resulting NDCG@5 score would indicate how well the system performed compared to an ideal ordering within the first five items. This process can be repeated for different positions (e.g., NDCG@10 or NDCG@100) to gain a broader view of ranking quality at varying list depths.
When working with NDCG, you'll often see it specified with an “@” symbol followed by a number (e.g., NDCG@10, NDCG@100). These indicate the depth of the result list being evaluated. For instance:
These metrics give insight into how well the system ranks relevant results within specific ranges of the results list.
NDCG is a powerful metric for understanding the relevance and quality of ordered results, making it a staple for evaluating search engines and recommendation systems. By focusing on both the relevance of items and their positions, NDCG provides a nuanced picture of how well a system meets user needs.
To understand how Marqo can help improve the relevance and quality of your results, book a demo with our team today.