Building a smarter home feed
The home feed should be a reflection of what each user cares about. Content is sourced from inputs such as people and boards the user follows, interests, and recommendations. To ensure we maintain fast, reliable and personalized home feeds, we built the smart feed with the following design values in mind:
1. Different sources of Pins should be mixed together at different rates.
2. Some Pins should be selectively dropped or deferred until a later time. Some sources may produce Pins of poor quality for a user, so instead of showing everything available immediately, we can be selective about what to show and what to hold back for a future session.
3. Pins should be arranged in the order of best-first rather than newest-first. For some sources, newer Pins are intuitively better, while for others, newness is less important.
The architecture behind smart feed
We shifted away from our previously time-ordered home feed system and onto a more flexible one. The core feature of the smart feed architecture is its separation of available, but unseen, content and content that’s already been presented to the user. We leverage knowledge of what the user hasn’t yet seen to our advantage when deciding how the feed evolves over time.
Smart feed is a composition of three independent services, each of which has a specific role in the construction of a home feed.
The role of the smart feed worker
The smart feed worker is the first to process Pins and has two primary responsibilities - to accept incoming Pins and assign some score proportional to their quality or value to the receiving user, and to remember these scored Pins in some storage for later consumption.
Essentially, the worker manages Pins as they become newly available, such as those from the repins of the people the user follows. Pins have varying value to the receiving user, so the worker is tasked with deciding the magnitude of their subjective quality.
Incoming Pins are currently obtained from three separate sources: repins made by followed users, related Pins, and Pins from followed interests. Each is scored by the worker and then inserted into a pool for that particular type of pin. Each pool is a priority queue sorted on score and belongs to a single user. Newly added Pins mix with those added before, allowing the highest quality Pins to be accessible over time at the front of the queue.
Pools can be implemented in a variety of ways so long as the priority queue requirement is met. We choose to do this by exploiting the key-based sorting of HBase. Each key is a combination of user, score and Pin such that, for any user, we may scan a list of available Pins according to their score. Newly added triples will be inserted at their appropriate location to maintain the score order. This combination of user, score, and Pin into a key value can be used to create a priority queue in other storage systems aside from HBase, a property we may use in the future depending on evolving storage requirements.
Smart feed content generator
Distinct from the smart feed worker, the smart feed content generator is concerned primarily with defining what “new” means in the context of a home feed. When a user accesses the home feed, we ask the content generator for new Pins since their last visit. The generator decides the quantity, composition, and arrangement of new Pins to return in response to this request.
The content generator assembles available Pins into chunks for consumption by the user as part of their home feed. The generator is free to choose any arrangement based on a variety of input signals, and may elect to use some or all of the Pins available in the pools. Pins that are selected for inclusion in a chunk are thereafter removed from from the pools so they cannot be returned as part of subsequent chunks.
The content generator is generally free to perform any rearrangements it likes, but is bound to the priority queue nature of the pools. When the generator asks for n pins from a pool, it’ll get the n highest scoring (i.e., best) Pins available. Therefore, the generator doesn’t need to concern itself with finding the best available content, but instead with how the best available content should be presented.
Smart feed service
In addition to providing high availability of the home feed, the smart feed service is responsible for combining new Pins returned by the content generator with those that previously appeared in the home feed. We can separate these into the chunk returned by the content generator and the materialized feed managed by the smart feed service.
The materialized feed represents a frozen view of the feed as it was the last time the user viewed it. To the materialized Pins we add the Pins from the content generator in the chunk. The service makes no decisions about order, instead it adds the Pins in exactly the order given by the chunk. Because it has a fairly low rate of reading and writing, the materialized feed is likely to suffer from fewer availability events. In addition, feeds can be trimmed to restrict them to a maximum size. The need for less storage means we can easily increase the availability and reliability of the materialized feed through replication and the use of faster storage hardware.
The smart feed service relies on the content generator to provide new Pins. If the generator experiences a degradation in performance, the service can gracefully handle the loss of its availability. In the event the content generator encounters an exception while generating a chunk, or if it simply takes too long to produce one, the smart feed service will return the content contained in the materialized feed. In this instance, the feed will appear to the end user as unchanged from last time. Future feed views will produce chunks as large as, or larger than, the last so that eventually the user will see new Pins.
By moving to smart feed, we achieved the goals of a highly flexible architecture and better control over the composition of home feeds. The home feed is now powered by three separate services, each with a well-defined role in its production and distribution. The individual services can be altered or replaced with components that serve the same general purpose. The use of pools to buffer Pins according to their quality allows us a greater amount of control over the composition of home feeds.
Continuing with this project, we intend to better model users’ preferences with respect to Pins in their home feeds. Our accuracy of recommendation quality varies considerably over our user base, and we would benefit from using preference information gathered from recent interactions with the home feed. Knowledge of personal preference will also help us order home feeds so the Pins of most value can be discovered with the least amount of effort.
If you’re interested in tackling challenges and making improvements like this, join our team!
Chris Pinchak is a software engineer at Pinterest.
Acknowledgements: This technology was built in collaboration with Dan Feng, Dmitry Chechik, Raghavendra Prabhu, Jeremy Carroll, Xun Liu, Varun Sharma, Joe Lau, Yuchen Liu, Tian-Ying Chang, and Yun Park. This team, as well as people from across the company, helped make this project a reality with their technical insights and invaluable feedback.