grupoarrfug.com

The Emergence of Two-Tower Models in Recommendation Systems

Written on

Chapter 1: Understanding Recommender Systems

Recommender systems are a prevalent application of Machine Learning globally. Despite their widespread use, the ranking models that underpin these systems often suffer from various biases, which can significantly diminish the quality of the recommendations provided. The challenge of developing unbiased rankers, known as Unbiased Learning to Rank (ULTR), remains a crucial area of research in ML and is far from being resolved.

In this discussion, we will delve into a specific modeling technique that has recently enabled the industry to manage biases effectively, resulting in vastly improved recommender systems: the two-tower model. In this architecture, one tower is dedicated to learning relevance, while the other, a shallower tower, focuses on understanding biases.

The concept of two-tower models has likely been in use within the industry for several years, but it was formally introduced to the broader ML community through Huawei’s 2019 PAL paper.

Section 1.1: The PAL Paper and Position Bias

Huawei's paper PAL (Position-Aware Learning to Rank) addresses the issue of position bias, specifically within the context of the Huawei app store. Position bias refers to the phenomenon where users tend to click on items that are ranked higher, which can skew the results of recommender systems. This bias can arise from various factors, such as users' time constraints or their implicit trust in the ranking algorithms.

To illustrate this bias, consider the following plot from Huawei’s research:

Demonstration of position bias in ranking models

Position bias complicates our ability to determine whether a user clicked on an item because it was genuinely the most relevant option or simply due to its higher position in the ranking. The goal of recommender systems is to optimize for the former, not the latter.

The PAL paper proposes a solution to this issue by decomposing the learning problem into two components:

p(click|x,position) = p(click|seen,x) * p(seen|position)

Here, x represents the feature vector, while seen indicates whether the user has viewed the impression. In this model, seen is influenced solely by the item's position, although other variables can also be incorporated.

From this framework, a two-tower model can be constructed, with each tower outputting one of the probabilities mentioned above.

Framework of the two-tower model demonstrating neural networks

The towers are essentially neural networks: a shallow network for the bias tower, which processes a limited number of features, and a deeper network for the engagement tower, which handles a more extensive set of features and their interactions. Importantly, during inference, when position data is unavailable, only the engagement tower is utilized, contrasting with the training phase where both towers operate.

Does this approach yield results? Absolutely. The authors of PAL developed two variants of the DeepFM ranking model—one incorporating PAL and another using a more basic method of treating item position. Their online A/B tests revealed that the PAL model enhanced both click-through rates and conversion rates by approximately 25%, a significant improvement.

The PAL study demonstrated that positions could serve as inputs for ranking models, but they should be processed through a dedicated tower rather than the main model. This principle was subsequently included as Rule 36 in Google's "Rules of ML."

Chapter 2: YouTube's Additive Two-Tower Model

The "Watch Next" paper released by YouTube around the same period as PAL also aimed to mitigate biases in recommender systems through a two-tower model. However, unlike PAL, YouTube employed an additive two-tower model rather than a multiplicative one.

To understand this approach, consider the factorization of the learning objective again:

p(click|x,position) = p(click|seen,x) * p(seen|position)

Since probabilities can be expressed as sigmoids of logits, which are weighted exponential functions, we can reframe the model as:

logit(click|x,position) = logit(click|seen,x) + logit(seen|position)

This represents the additive two-tower model. One notable advancement in YouTube's approach is the inclusion of various features beyond just position in the shallow tower. For instance, user device type can also be factored in, as different devices may exhibit different patterns of position bias—smaller screens on mobile devices may lead to more pronounced biases.

To ensure that the shallow tower effectively utilizes these diverse features, the authors incorporated a Dropout layer with a 10% dropout probability on the position feature. This step is crucial, as it prevents the model from overly depending on the position feature alone and encourages learning of other potential biases.

Through A/B testing, the authors demonstrated that the addition of the shallow tower increased their engagement metric by 0.24%.

Section 2.1: Disentangling Relevance and Bias

The assumption that the two towers in ULTR can learn independently during model training is not entirely accurate, as highlighted in a recent Google paper titled "Towards Disentangling Relevance and Bias in Unbiased Learning to Rank."

Consider a thought experiment: if a perfect relevance model exists—one that can accurately predict clicks—then, upon retraining with new data, the bias tower would only need to map positions to clicks, rendering the relevance tower ineffective.

This scenario illustrates the confounding effect of the relevance tower on the bias tower. To mitigate this issue, the authors propose adding a Dropout layer to the bias logit, nudging the model to depend more on the relevance tower.

The rationale behind this strategy is that by randomly dropping the bias logit, the model is encouraged to prioritize the relevance tower rather than merely learning the historical mapping of positions to relevance. This concept is reminiscent of YouTube's Watch Next approach, though here, the entire bias logit is dropped rather than just the position feature.

For optimal results, the dropout probability for the logit should be substantial: with a 50% dropout rate, the authors observed a 1% improvement in click NDGC over PAL, utilizing production data from the Chrome App store—a compelling validation of this straightforward technique.

Conclusion

In summary, the two-tower model is a robust approach for developing unbiased ranking models, where one tower focuses on relevance and the other on position bias (and potentially other biases). The outputs from these towers can be combined either through multiplication (as shown in Huawei’s PAL) or addition (as utilized in YouTube’s Watch Next).

The input to the bias tower typically includes the item's position, but additional features that may introduce bias, such as user device type, can also be integrated. Utilizing Dropout techniques helps prevent over-reliance on historical positions, thereby enhancing model generalization.

This is merely the beginning. Considering the pivotal role of recommender systems in the tech industry and the potential of two-tower modeling to significantly boost predictive performance, this research area is likely to explore further advancements in the future.

If you found this information valuable and wish to stay updated on the latest developments in machine learning technologies, consider subscribing.

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

Exploring Black Mirror Season 6: A Mixed Bag of Innovation

A deep dive into the latest season of Black Mirror, reviewing its highs and lows through a unique lens.

Samsung's QD-OLED TVs: A Game Changer in the Display Market

Samsung's unexpected QD-OLED TV launch has stirred excitement and concern in the market, challenging competitors like Sony on pricing and availability.

From Ordinary Moments to Profound Insights: Navigating Challenges

Discover how to tackle life's challenges by knowing whom to call and what resources to use when faced with obstacles.

Striving Beyond

Explore the pitfalls of a

A Surprising Encounter Amidst the Snow

A snowy day leads to an unexpected connection during a car breakdown.

Challenging the Toxic Narrative of Relationship Advice

An exploration of how modern relationship advice can perpetuate toxic beliefs and hinder genuine connections.

Understanding Deep Learning: A Comprehensive Overview

A detailed exploration of deep learning, its relationship with AI and ML, and the methodologies involved.

The Myth of the Perfect Job: Why I Keep Searching Anyway

Exploring the elusive concept of the perfect job and why the search continues despite its non-existence.