The official Pinterest engineering blog.

A lot goes on in the backend when a person clicks the Pin It button. Thumbnails of all sizes are generated, the board thumbnail is updated, and a Pin is fanned out to those who follow the Pinner or the board. We also evaluate if a Pin should be added to a category feed, check for spam, index for search, and so on.

These jobs are critically important but don’t all need to happen before we can acknowledge success back to the user. This is where an asynchronous job execution system comes in, where we need to enqueue one or more jobs to execute these actions at a later time and rest assured they will eventually be executed. Another use case is when a large batch of jobs needs to be scheduled and executed with retries for resiliency toward temporary backend system unavailability, such as a workflow to generate and send emails to millions of Pinners each week. Here’s a look at how we developed an asynchronous job execution system in-house, which we call PinLater.

Evaluating options

We had originally implemented a solution based on Pyres for this purpose, however it had several limitations:

  • Job execution was best effort, i.e. there was no success acknowledgement (ACK) mechanism.
  • There was a lack of visibility into the status of individual job types, since jobs were all clubbed into a single set of nine priority queues.
  • The system wasn’t entirely configurable or manageable, e.g. no ability to throttle job execution or configure retries.
  • It was tied to Redis as the storage backend, and only worked for jobs written in Python, both of which were restrictions that would not continue to be acceptable for us.
  • It didn’t have built-in support for scheduled execution of jobs at a specific time in the future, a feature that some of our jobs needed.

We looked at a few other open source queue or publish/subscribe system implementations, but none provided the minimum feature set we needed, such as time-based scheduling with priorities and reliable ACKs, or could properly scale. Amazon Simple Queue Service (SQS) would likely meet many of our requirements, but for such a critical piece of infrastructure, we wanted to operate it ourselves and extend the feature set as needed, which is why we developed PinLater.

Designing for execution of asynchronous jobs

In building PinLater, we kept the following design points in mind:

  • PinLater is a Thrift service to manage scheduling and execution of asynchronous jobs. It provides three actions via its API: enqueue, dequeue and ACK that make up the core surface area.
  • PinLater is agnostic to the details of a job. From its point of view, the job body is just an opaque sequence of bytes. Each job is associated with a queue and a priority level, as well as a timestamp called run_after that defines the minimum time at which the job is eligible to run (by default, jobs are eligible to run immediately, but this can be overridden to be a time in the future).
  • When a job is enqueued, PinLater sends it to a backend store to keep track of it. When a dequeue request comes in, it satisfies the request by returning the highest priority jobs that are eligible to run at that time, based on run_after timestamps. Typically there are one or more worker pools associated with each PinLater cluster, which are responsible for executing jobs belonging to some subset of queues in that cluster. Workers continuously grab jobs, execute them and then reply to PinLater with a positive or negative ACK, depending on whether the execution succeeded or failed.
  • In our use of PinLater, each job type maps 1:1 to a specific queue. The interpretation of the job body is a contract between the enqueuing client(s) and the worker pool responsible for that queue. This 1:1 mapping isn’t mandated by PinLater, but we have found it to be operationally very useful in terms of managing jobs and having good visibility into their states.

Job state machine

A newly enqueued job starts in state PENDING. When it becomes eligible for execution (based on priority and its run_after timestamp), it can be dequeued by a worker, at which point its state changes to RUNNING.

If the worker completed the execution successfully, it will send a success ACK back, and the job will move to a terminal SUCCEEDED state. Succeeded jobs are retained in PinLater for diagnostics purposes for a short period of time (usually a day) and then garbage collected.

If the job execution failed, the worker will send a failure ACK back, at which point PinLater will check if the job has any retries available. If so, it will move the job back to PENDING. If not, the job goes into a terminal FAILED state. Failed jobs stay around in PinLater for diagnostics purposes (and potentially manual retries) for a few days. When a job is first enqueued, a numAttemptsAllowed parameter is set to control how many retries are allowed. PinLater allows the worker to optionally specify a delay when it sends a failure ACK. This delay can be used to implement arbitrary retry policies per job, e.g. constant delay retry, exponential backoff, or a combination thereof.

If a job was dequeued (claimed) by a worker and it didn’t send back an ACK within a few minutes, PinLater considers the job lost and treats it as a failure. At this point, it will automatically move the job to PENDING or FAILED state depending on whether retries are available.

The garbage collection of terminal jobs as well as the claim timeout handling is done by a scheduled executor within the PinLater thrift server. This executor also logs statistics for each run, as well as exports metrics for longer term analysis.

PinLater’s Python worker framework

In addition to the PinLater service, we provide a Python worker framework that implements the PinLater dequeue/ACK protocol and manages execution of python jobs. Adding a new job involves a few lines of configuration to tell the system which PinLater cluster the job should run in, which queue it should use, and any custom job configuration (e.g. retry policy, number of execution attempts). After this step, the engineer can focus on implementing the job logic itself.

While the Python framework has enabled smooth transition of jobs from the earlier system and continues to support the vast majority of new jobs, some of our clients have implemented PinLater workers in other languages like Java and C++. PinLater’s job agnostic design and simple Thrift protocol have made this relatively straight forward to do.

Implementation details

The PinLater Thrift server is written in Java and leverages Twitter’s Finagle RPC framework. We currently provide two storage backends: MySQL and Redis. MySQL is used for relatively low throughput use cases and those that schedule jobs over long periods and thus can benefit from storing jobs on disk rather than purely in memory. Redis is used for high throughput job queues that are normally drained in real time.

MySQL was chosen for the disk-backed backend since it provides the transactional querying capability needed to implement a scheduled job queue. As one might expect, lock contention is an issue and we use several strategies to mitigate it including a separate table for each priority level , use of UPDATE … LIMIT instead of SELECT FOR UPDATE for the dequeue selection query, and carefully tuned schemas and secondary indexes to fit this type of workload.

Redis was chosen for the in-memory backend due to the sophisticated support it has for data structures like sorted sets. Being single threaded, lock contention is not an issue with Redis, but we did have to implement optimizations to make this workload efficient, including the use of Lua scripting to reduce unnecessary round trips.

Horizontal scaling is provided by sharding the backend stores across a number of servers. Both backend implementations use a “free” sharding scheme (shards are chosen at random when enqueueing jobs). This makes adding new shards trivial and ensures well balanced load across shards. We implement a shard health monitor that keeps track of the health of each individual shard and pulls out of rotation shards that are misbehaving either due to machine failure, network issues or even deadlock (in the case of MySQL). This monitor has proven invaluable in automatically handling operational issues that could otherwise result in high error rates and paging an on-call operator.

Production experience

PinLater has been in use in production for months now, and our legacy Pyres based system was fully deprecated in Q1 2014. PinLater runs hundreds of job types at aggregate processing rates of over 100,000 per second. These jobs vary significantly on multiple parameters including running time, frequency, CPU vs. network intensive, job body size, programming language, enqueued online vs. offline, and needing near real time execution instead being scheduled hours in advance. It would be fair to say nearly every action taken on Pinterest or notification sent relies on PinLater at some level. The service has grown to be one of Pinterest’s most mission critical and widely used pieces of infrastructure.

Our operational model for PinLater is to deploy independent clusters for each engineering team or logical groupings of jobs. There are currently around 10 clusters, including one dedicated for testing and another for ad hoc one-off jobs. The cluster-per-team model allows better job isolation and, most importantly, allows each team to configure alerting thresholds and other operational parameters as appropriate for their use case. Nearly every operational issue that arises with PinLater tends to be job specific or due to availability incidents with one of our backend services. Thus having alerts handled directly by the teams owning the jobs usually leads to faster resolution.

Observability and manageability

One of the biggest pain points of our legacy job queuing system was that it was hard to manage and operate. As a result, when designing PinLater, we paid considerable attention to how we could improve on that aspect.

Like every service at Pinterest, PinLater exports a number of useful stats about the health of the service that we incorporate into operational dashboards and graphs. In addition, PinLater has a cluster status dashboard that provides a quick snapshot of how the cluster is doing.

PinLater also provides two features that have greatly helped improve manageability: per-queue rate limiting and configurable retry policies. Per-queue rate limiting allows an operator to limit the dequeue rate on any queue in the system, or even stop dequeues completely, which can help alleviate load quickly on a struggling backend system, or prevent a slow high priority job from starving other jobs. Support for configurable retry policies allows deployment of a policy that’s appropriate to each use case. Our default policy allows 10 retries, with the first five using linear delay, and the rest using exponential backoff. This policy allows the system to recover automatically from most types of sustained backend failures and outages. Job owners can configure arbitrary other policies as suitable to their use case as well.

We hope to open source PinLater this year. Stay tuned!

Want an opportunity to build and own large scale systems like this? We’re hiring!

Raghavendra Prabhu is a software engineer at Pinterest.

Acknowledgements: The core contributors to PinLater were Raghavendra Prabhu, Kevin Lo, Jiacheng Hong and Cole Rottweiler. A number of engineers across the company provided useful feedback, either directly about the design or indirectly through their usage, that was invaluable in improving the service.

Read More

As part of an ongoing series, engineers will share a bit of what life is like at Pinterest. Here, Engineering Manager Makinde Adeagbo talks about his early years as an engineer, recent projects, and how he spends his time outside of work.

How did you get involved with CS?

I first started programming on my graphing calculator in middle school—just​ simple games or programs to solve math equations. Later on in high school, I got hooked on building games in C++. It was a great feeling—a​ll you needed was a computer and determination…with that, the sky’s the limit.

How would you describe Pinterest’s engineering culture?

We GO! If you have an idea, go build and show it to people. The best way to end a discussion is to put the working app in someone’s hand and show that it’s possible.

What’s your favorite Pinterest moment?

Alongside a team, I launched Place Pins in November ​2013​. We had an event at the office to show off the result of lots of hard work by engineers, designers, and others from across the company. The launch went smoothly and we were able to get some sleep after many long nights.

How do you use Pinterest? What are your favorite things to Pin?

I Pin quite a few DIY projects. A recent one was a unique mix of a coding challenge and wood glue to make some nice looking coasters.

How do you spend your time outside of work?

I’m a runner, and have been since elementary school. Over the years I’ve progressed from sprinting to endurance running. It’s a great way to relax and reflect on the day. All I need is some open road and my running shoes.

What’s your latest interest?

I’ve recently started learning about free soloing, a form of free climbing where the climber forgoes ropes and harnesses. It’s spectacular to watch. There’s also deep water soloing, which involves climbing cliffs over bodies of water so falling off is fun, and you can just climb back on the cliffs.

Fun fact?

I’ve been known to jump over counter tops from a standstill.

Interested in working with engineers like Makinde? Join us!

Read More

We launched Place Pins a little over six months ago, and in that time we’ve been gathering feedback from Pinners and making product updates along the way, such as adding thumbnails of the place image on maps and the ability to filter searches by Place Boards. The newest feature is a faster, smarter search for Web and iOS that makes it easier to add a Place Pin to the map.

There are now more than one billion travel Pins on Pinterest, more than 300 unique countries and territories are represented in the system, and more than four million Place Boards have been created by Pinners.

Here’s the story of how the Place Pins team built the latest search update.

Supercharging place search

People have been mapping Pins for all types of travel plans, such as trips to Australia, places to watch the World Cup, cycling trips, a European motorcycle adventure, best running spots, and local guides and daycations.

Even with the growth in usage of Place Pins, we knew we needed to make the place search experience more intuitive. In the beginning, the place search interface was based on two distinct inputs: one for the place’s name (the “what”) and another for the search’s geospatial constraint (the “where”). We supported searching within a named city, within the bounds of the current map view, and globally around the world. While powerful, many Pinners found this interface to be non-intuitive. Our research showed Pinners were often providing both the “what” and the “where” in the first input box, just like they do when using our site-wide search interface. With that in mind, we set out to build a more natural place search interface based on just a single text input field.

The result is our one-box place search interface:

We start by attempting to identify any geographic names found within the query string. This step is powered by Twofishes, an open source geocoder written by our friends at Foursquare. Twofishes tokenizes the query string and uses a Geonames -based index to identify named geographic features. These interpretations are ranked based on properties such as geographic bounds, population, and overall data quality.

This process breaks down the original query string into two parts: one that defines the “what”, and one that defines the “where”. It also lets us discard any extraneous connector words like “in” and “near”. For example, given the query string “city hall in san francisco”, the top-ranked interpretation would return “city hall” as the “what” and “san francisco” as the “where” while completely dropping the connector word “in”.

Some geographic names are ambiguous, in which case Twofishes returns multiple possible interpretations. By default, we use the top-ranked result, but we also provide a user interface affordance that allows Pinners to easily switch between the alternatives.

Configuring place search

We use the result of the query splitting pass to configure our place search. Foursquare is our primary place data provider, and Foursquare venue search requests can be parameterized to search globally or within a set of geospatial constraints.

A single query can produce multiple venue search requests. Continuing with our example, we would issue one search for “city hall” within the bounds of “san francisco” and as well as a global search for the entire original query string “city hall san francisco”. This approach helps us find places that have geographic names in their place names, like “Boston Market” and “Pizza Chicago”.

We experimented with performing a third search for the full query string within the bounds of the geographic feature (“city hall san francisco” near “san francisco”), but in practice that didn’t yield significantly different results from those returned by the other two searches.

If we don’t identify a geographic feature (e.g. “the white house”), we only issue the global search request.

Blending and ranking results

We gather the results of those multiple search requests and blend them into a single ranked list. This is an important step because Pinners will judge the quality of our place search results based on what’s included in this list and whether their intended place appears near the top. Our current approach takes the top three “global” results, adds the top seven unique “local” results, and then promotes some items closer to the top (based on attributes like venue categorization).

More to come

In early tests, the new one-box Place search interface has been well-received by Pinners, and Place Pin creation is higher than ever. The updated place search is now available in the Pinterest iOS app and our web site, and look for it to make its appearance in our Android app soon.

One-box place search was built by engineers Jon Parise, Connor Montgomery (web) and Yash Nelapati (iOS), and Product Designer Rob Mason, with Product Manager Michael Yamartino.

If you’re interested in working on search and discovery projects like this, join us!

Jon Parise is an engineer at Pinterest.

Read More

The security of Pinners is one of our highest priorities, and to keep Pinterest safe, we have teams dedicated to solving issues and fixing bugs. We even host internal fix-a-thons where employees across the company search for bugs so we can patch them before they affect Pinners.

Even with these precautions, bugs get into code. Over the years, we’ve worked with external researchers and security experts who’ve alerted us to bugs. Starting today, we’re formalizing a bug bounty program with Bugcrowd and updating our responsible disclosure, which means we can tap into the more than 9,000 security researchers on the Bugcrowd platform. We hope these updates will allow us to learn more from the security community and respond faster to Whitehats.

This is just the first step. As we gather feedback from the community, we have plans to turn the bug bounty into a paid program, so we can reward experts for their efforts with cash. In the meantime, Whitehats can register, report and get kudos using Bugcrowd. We anticipate a much more efficient disclosure process as a result, and an even stronger and bug-free environment for Pinners!

Paul Moreno is a security engineer at Pinterest.

Read More

Marc Andreessen famously said that for startups, “the only thing that matters is getting to product/market fit.” Product/market fit means providing enough value to enough people that the startup can flourish. We believe the key to sustainable growth is putting Pinners first, and finding ways to increase the value people get from Pinterest. That could mean improving the experience for existing Pinners, more effectively communicating the benefit of Pinterest to new users, or improving content for less engaged people. With tens of millions of Pinners, though, it can be a challenge to understand if we’re reaching our goals.

We measure success with four techniques: user state transitions, Xd28s, cohort heat maps, and conversion funnels. This post covers how to understand these different types of metrics and how we use them to identify problem areas and inform our strategy and decision-making on the Growth Team.

Understanding gains and losses with user state transitions

The metric: For this metric we use a simple model, with three states to understand the growth of our service: Monthly Active Users (MAUs), dormant Pinners, and new Pinners that just joined. The chart monitors the number of people who go from one state to another on a daily basis.The sum of the four different transitions yields our Net MAU line, which shows the total number of additional MAUs we added that week.

Possible user state transitions are:

  • New signup: When a new person joins Pinterest
  • New -> Dormant: When a new Pinner doesn’t use Pinterest in the 28 days following sign up
  • MAU -> Dormant: Pinner was an MAU, but didn’t use Pinterest for 28 days.
  • Dormant -> MAU: Pinner used Pinterest after having been inactive for 28+days.

How we use it: This is one of the most important graphs for the Growth team because it tells us where to focus. By looking at where we’re losing Pinners, and where we’re gaining them, we can decide where to concentrate our efforts to deliver maximum impact. For instance, if we see an increase in the number of new Pinners transitioning to dormant, we know to focus our efforts on better communicating Pinterest’s value in the new user experience during the person’s first week.

Monitoring engagement through Xd28s

The metric: Xd28s are the number of Pinners who have used Pinterest X days in the past 28 days. For instance, 4d28s+ are the number of users that used Pinterest 4 or more days during the past 28.

How we use it: There are many ways people can use Pinterest, so there’s no one specific thing Pinners do to gain value. We use Xd28s as a proxy for the amount of value a person is getting from the service. We segment into three major categories: 14d28s+ are core Pinners who are deriving a lot of value; 4d28s+ are casual and getting some value, and anyone below 4d28 is a marginal Pinner who’s likely at risk of churning because they’re not receiving much value. By monitoring the ratio between the different groups, we can determine how much value people are getting and see how it changes over time. If one of the less desirable segments (such as marginal users or casual users) begin to increase, we can focus on understanding why that’s happening and determine what we can do to fix it.

Tracking new user retention with cohort heat maps

The metric: The cohort heat map shows the activity level for new Pinners; where red represents high activity and blue indicates low. The columns along the x-axis represent the day the person joined, and the rows along the y-axis represent the number of days since they joined. The coloring of a specific square in the graph represents what percentage of Pinners who joined on day X were subsequently active on day Y.

How we use it: The foundation for sustainable growth is retaining users. We use graphs like this to see how our new user retention curve changes over time. When the red and yellow extend further up a column, retention is improving. If the blue and green areas begin to decrease, a retention or new user activation problem has been introduced. In the mock example above, something happened around 2013-04-01 that hurt retention. This graph becomes especially powerful when segmented by gender or locale, which allows for easy identification of segments of the user base where retention can be improved. We can then monitor over time to see if retention is indeed improving.

Understanding Pinner interactions using conversion funnels

The metric: For multi-step flows, conversion funnels measure how many Pinners get to each step of the flow.

How we use it: We use conversion funnels for monitoring landing pages and sharing, invitation, and signup flows. By understanding how people are interacting with the feature and seeing where users are dropping off, we know where to focus our efforts on improving the flow. Sometimes the fix is functional: If someone tries to send a Pin to a friend, but can’t find the friend they are looking for, we can improve the friend recommendations or our typeahead logic. However, Pinners can also drop off in the flow because they don’t understand the value and don’t have enough motivation. At this point, we collaborate with the design team on creative ways to communicate that value. A great example is our current sign up walls on iOS and web, where we show use cases to communicate how people use Pinterest.

Putting Pinners first

As you can see, fixing retention issues can be as simple as reminding users what they may be missing out on, or as complicated as rethinking the user experience for a segment of the user base. For us, it always starts and ends with ensuring a great experience for new and existing Pinners. If challenges like this interest you, the Pinterest Growth team is hiring!

John Egan is an engineer on the Growth team.

Read More

The core value of Pinterest is to help people find the things they care about, by connecting them to Pins and people that relate to their interests. We’re building a service that’s powered by people, and supercharged with technology.

The interest graph - the connections that make up the Pinterest index - creates bridges between Pins, boards, and Pinners. It’s our job to build a system that helps people to collect the things they love, and connect them to communities of engaged people who share similar interests and can help them discover more. From categories like travel, fitness, and humor, to more niche areas like vintage motorcycles, craft beer, or Japanese architecture, we’re building a visual discovery tool for all interests.

The interests platform is built to support this vision. Specifically, it’s responsible for producing high quality data on interests, interest relationships, and their association with Pins, boards, and Pinners.

Figure 1: Feedback loop between machine intelligence and human curation

In contrast with conventional methods of generating such data, which rely primarily on machine learning and data mining techniques, our system relies heavily on human curation. The ultimate goal is to build a system that’s both machine and human powered, creating a feedback mechanism by which human curated data helps drive improvements in our machine algorithms, and vice versa.

Figure 2: System components

Raw input to the system includes existing data about Pins, boards, Pinners, and search queries, as well as explicit human curation signals about interests. With this data, we’re able to construct a continuously evolving interest dictionary, which provides the foundation to support other key components, such as interest feeds, interest recommendations, and related interests.

Generating the interest dictionary

From a technology standpoint, interests are text strings that represent entities for which a group of Pinners might have a shared passion.

We generated an initial collection of interests by extracting frequently occurring n-grams from Pin and board descriptions, as well as board titles, and filtering these n-grams using custom built grammars. While this approach provided a high coverage set of interests, we found many terms to be malformed phrases. For instance, we would extract phrases such as “lamborghini yellow” instead of “yellow lamborghini”. This proved problematic because we wanted interest terms to represent how Pinners would describe them, and so, we employed a variety of methods to eliminate malformed interests terms.

We first compared terms with repeated search queries performed by a group of Pinners over a few months. Intuitively, this criterion matches well with the notion that an interest should be an entity for which a group of Pinners are passionate.

Later we filtered the candidate set through public domain ontologies like Wikipedia titles. These ontologies were primarily used to validate proper nouns as opposed to common phrases, as all available ontologies represented only a subset of possible interests. This is especially true for Pinterest, where Pinners themselves curate special interests like “mid century modern style.”

Finally, we also maintain an internal blacklist to filter abusive words and x-rated terms as well as Pinterest specific stop words, like “love”. This filtering is especially important to interest terms which might be recommended to millions of users.

We arrived at a fair quality collection of interests following the above algorithmic approaches. In order to understand the quality of our efforts, we gave a 50,000 term subset of our collection to a third party vendor which used crowdsourcing to rate our data. To be rigorous, we composed a set of four criteria by which users would evaluate candidate Interests terms:

- Is it English?

- Is it a valid phrase in grammar?

- Is it a standalone concept?

- Is it a proper name?

The crowdsourced ratings were both interesting if not somewhat expected. There was a low rate of agreement amongst raters, with especially high discrepancy in determining whether an interest’s term represented a “standalone concept.” Despite the ambiguity, we were able to confirm that 80% of the collection generated using the above algorithms satisfied our interests criteria.

This type of effort, however, is not easy to scale. The real solution is to allow Pinners to provide both implicit and explicit signals to help us determine the validity of an interest. Implicit signals behaviors like clicking and viewing, while explicit signals include asking Pinners to specifically provide information (which can be actions like a thumbs up/thumbs down, starring, or skipping recommendations).

To capture all the signals used for defining the collections of terms, we built a dictionary that stores all the data associated with each interest, including invalid interests and the reason why it’s invalid. This service plays a key role in human curation, by aggregating signals from different people. On top of this dictionary service, we can build different levels of reviewing system.

Identifying Pinner interests

With the Interests dictionary, we can associate Pins, boards, and Pinners with representative interests. One of the initial ways we experimented with this was launching a preview of a page where Pinners can explore their interests.

Figure 3: Exploring interests

In order to match interests to Pinners, we need to aggregate all the information related with a person’s interests. At its core, our system recommends interests based upon Pins with which a Pinner interacts. Every Pin on Pinterest has been collected and given context by someone who thinks it’s important, and in doing so, is helping other people discover great content. Each individual Pin is an incredibly rich source of data. As discussed in a previous blog post on discovery data model, one Pin often has multiple copies — different people may Pin it from different sources, and the same Pin can be repinned multiple times. During this process, each Pin accumulates numerous unique textual descriptions which allows us to connect Pins with interests terms with high precision.

However, this conceptually simple process requires non-trivial engineering effort to scale to the amount of Pins and Pinners that the service has today. The data process pipeline (managed by Pinball) composes over 35 Hadoop jobs, and runs periodically to update the user-interest mapping to capture users’ latest interest information.

The initial feedback on the explore interests page has been positive, proving the capabilities of our system. We’ll continue testing different ways of exposing a person’s interests and related content, based on implicit signals, as well as explicit signals (such as the ability to create custom categories of interests).

Calculating related interests

Related interests are an important way of enabling the ability to browse interests and discover new ones. To compute related interests, we simply combine the co-occurrence relationship for interests computed at Pin and board levels.

Figure 4: Computing related interests

The quality of the related interests is surprisingly high given the simplicity of the algorithm. We attribute this effect to the cleanness of Pinterest data. Text data on Pins tend to be very concise, and contain less noise than other types of data, like web pages. Also, related interests calculation already makes use of boards, which are heavily curated by people (vs. machines) in regards to organizing related content. We find that utilizing the co-occurrence of interest terms at the level of both Pins and boards provides the best tradeoff between achieving high precision as well as recall when computing the related interests.

One of the initial ways we began showing people related content was through related Pins. When you Pin an object, you’ll see a recommendation for a related board with that same Pin so you can explore similar objects. Additionally, if you scroll beneath a Pin, you’ll see Pins from other people who’ve also Pinned that original object. At this point, 90% of all Pins have related Pins, and we’ve seen 20% growth in engagement with related Pins in the last six months.

Powering interest feeds

Interests feeds provide Pinners with a continuous feed of Pins that are highly related. Our feeds are populated using a variety of sources, including search and through our annotation pipeline. A key property of the feed is flow. Only feeds with decent flow can attract Pinners to come back repeatedly, thereby maintaining high engagement. In order to optimize for our feeds, we’ve utilized a number of real-time indexing and retrieval systems, including real-time search, real-time annotating, and also human curation for some of the interests.

To ensure quality, we need to guarantee quality from all sources. For that purpose, we measure the engagement of Pins from each source and address quality issue accordingly.

Figure 5: How interest feeds are generated

More to come

Accurately capturing Pinner interests and interest relationships, and making this data understandable and actionable for tens of millions of people (collecting tens of billions of Pins), is not only an engineering challenge, but also a product design one. We’re just at the beginning, as we continue to improve the data and design ways to empower people to provide feedback that allows us to build a hybrid system combining machine and human curation to power discovery. Results of these effort will be reflected in future product releases.

If you’re interested in building new ways of helping people discover the things they care about, join our team!

Acknowledgements: The core team members for the interests backend platform are Ningning Hu, Leon Lin, Ryan Shih and Yuan Wei. Many other folks from other parts of the company, especially the discovery team and the infrastructure teams, have provided very useful feedback and help along the way to make the ongoing project successful.

Ningning Hu is an engineer at Pinterest.

Read More

One of the most exciting aspects of working with Pinterest data is its opportunity to connect people with things and ideas they’re interested in. We know that interests change over time, and even day to day. What you’re interested in on Sunday morning when you want an awesome pancake recipe may not align exactly with the travel plans you’re dreaming up on Saturday.

Since one of our goals is to help Pinners find the content that inspires them at any moment, we’re constantly asking ourselves how we can help people discover the things they care about by making the right recommendations at the optimal time. Our answer lies in the data infrastructure we’ve built.

Digging into Pin Trends

We recently looked at aggregate data to see which categories peak throughout the week and which interests were most popular among Pinners at various times.

What we found is that TGIF is real. People start the week off motivated and Pinning mostly to fitness boards on Mondays, technology is popular on Tuesdays, and inspirational quotes see a spike on Wednesdays as people work through hump day. Fashion is big on Thursdays, while people are ready for a laugh on Friday and humor Pins spike. Over the weekend, travel is the top category on Saturday, and the week closes out on Sunday with food and craft ideas.

Improving discovery with context

As new content is created on Pinterest, we can identify the context behind a Pin based on a mix of signals, such as the board in which the Pin was added. Just knowing when an individual Pin is created might not give us too much information on its own, but because hundreds of others may have saved a similar Pin, we can deduce what that Pin is about. With a timestamp for that action, we can track how popular different categories of Pins are at different times of day or across the days of the week.

We can go a level deeper by looking at the context of an action, such as if it was discovered in home feed, category feed, or search. We can use this information to make the product easier to navigate, as well as to build a more relevant recommendation engine.

Using these different sources, we did the analysis of Pinners’ propensity to engage with different topics by time of day, day of week, and month of the year. Learn more about these Pin trends on our Pinner Blog. If you’re interested in digging into this type of data, join our team!

Andrea Burbank is a data engineer at Pinterest.

Read More

Thanks to everyone who came to our Engineering Tech Talks last week at the Pinterest HQ in San Francisco, where we covered:

Mobile & Growth

Scaling user education on mobile, and a deep dive into the NUX using the Experience Framework, with engineers Dannie Chu and Wendy Lu

Monetization & Data

The open sourcing of Pinterest Secor, and a look at zero data loss log persistence services, with engineer Pawel Garbacki

Developing & Shipping Code at Pinterest

The tools and technologies we use to build and deploy confidently, with engineers Chris Danford and Jeremy Stanley

For those who couldn’t make the talks, or would like a refresher, we’ve posted the slides.

Pinterest Engineering Tech Talks - 4/29/14 by Pinterest_Eng

You can always find more information from Pinterest Engineering right on this blog, or on our Facebook Page, where we’ll keep you posted on future tech talks.

Read More

As we build products to eventually power Promoted Pins, it’s vital to maintain a no-fail reliable data infrastructure. Today we’re open sourcing Secor, a zero data loss log persistence service whose initial use case was to save logs produced by our monetization pipeline.

Secor persists Kafka logs to long-term storage such as Amazon S3. It’s not affected by S3’s weak eventual consistency model, incurs no data loss, scales horizontally, and optionally partitions data based on date.

Big data at Pinterest

Pinterest is a data-driven company, and at any point in time we track thousands of metrics derived from hundreds of log types. We collect petabytes of data and add tens of terabytes per day.

There are hundreds of Hadoop jobs slicing the data across multiple dimensions to produce reports that track our business metrics and generate derived aggregates that feed into our serving infrastructure.

Pinterest logging pipeline

Our data logging center of gravity is a Kafka cluster. Kafka introduces abstractions that simplify collecting logs, but is only capable of streaming data to local disks and therefore isn’t suitable as a long-term data store.

Logs are stored on Amazon S3, and while it’s a highly reliable and scalable storage solution, S3 comes with the possibility of eventual consistency, meaning there are no guarantees for when uploaded files will become visible to readers. S3 also has non-monotonic properties that may cause files to “disappear” and reappear moments later.

A service you can rely on

Project Secor was born from the need to persist messages logged to Kafka to S3 for long-term storage. Data lost or corrupted at this stage isn’t recoverable so the greatest design objective for Secor is data integrity.

Mechanisms built into Secor assure that as long as Kafka won’t drop messages before Secor can extract them (such as an aggressive retention policy), every single message will persist on S3.

Offset lag tracking health metrics

No-reads principle

Secor works around the limitations of the eventual consistency model by adhering to a principle that it never reads back anything it wrote to S3. It relies on Kafka consumer offset management protocol to keep track of what’s been uploaded to S3. Kafka stores the underlying metadata in ZooKeeper, while metadata commit points are controlled by Secor and they occur with a very low frequency of roughly one update per Kafka partition per hour.

The fact that metadata is stored separately from the data introduces a potential complication of keeping two stores in sync. Secor addresses this issue by enforcing that data is updated before the metadata and by using deterministic S3 paths. Any inconsistency caused by a successful update of the data followed by a failed commit to the metadata store will auto-resolve itself during subsequent state updates.

The Benefits of Secor

In addition to guaranteeing data integrity, Secor comes with a number of functional features:

  • Load Distribution: It can be distributed across multiple machines.
  • Horizontal Scalability: Scaling the system out to handle more load is as easy as starting extra processes. Reducing the resource footprint can be achieved by killing any of the running processes. Neither ramping up nor down has any impact on data consistency.
  • Output Partitioning: It parses incoming messages and puts them under partitioned S3 paths to enable direct import into systems like Hive.
  • Configurable Upload Policies: Commit points controlling when data is persisted in S3 are configured through size-based and time-based policies (e.g., upload data when local buffer reaches size of 100MB and at least once per hour).
  • Monitoring: Metrics tracking various performance properties are exposed through Ostrich and optionally exported to OpenTSDB.
  • Customizability: External log message parser may be loaded by updating the configuration.
  • Qubole Interface: It connects to Qubole to add finalized output partitions to Hive tables.

Try it out with us!

The architecture of Secor is flexible enough to fit into various environments and its code has been hardened by production use since the initial rollout in February 2014. You can now access the Secor source code and design details for your own use. If you have any questions or comments, you can reach us at secor-users@googlegroups.com.

If you’re interested in tackling big data challenges like this, join us!

Pawel Garbacki is a software engineer at Pinterest.

Read More

As we focus on building a great user experience for the tens of millions of existing Pinners, it’s equally important to engage and retain new Pinners through the new user experience (NUX).

We recently rebuilt our new user experience and created a new framework to power it. Through the process, we determined the best content to show that would educate without overwhelming. Here you’ll learn how we arrived at a NUX that performs significantly better than the previous experience across all of our core engagement metrics.

Rethinking NUX from the ground up

We started by conducting qualitative and quantitative research to better understand new Pinners. The user experience research team interviewed a group of inactive Pinners to understand major pain points, while the data team analyzed a large sample of existing Pinners and determined the core set of actions that would increase the likelihood of a retaining a new person joining the site.

After looking at the insights and iterating on dozens of versions, we gathered new learnings about retaining new Pinners:

Demonstrate a simple value proposition that clearly shows off utility. A Pin is our primary value proposition so we immediately educate the person about how Pins work, and their value.

Actualize the value proposition immediately. Searching and discovering Pins is a core feature, so immediately following the Pin step, we give education on how to find and save interesting Pins.

Educate new Pinners at their own pace. The previous Pin and Search steps are mandatory for new Pinners because we’ve found they lead to increased long term engagement. However, if the Pinner doesn’t seem to get it the first time we’ll gently re-educate them on subsequent visits. For example, if he or she still hasn’t saved a Pin on their second visit, we’ll provide reeducation, and conduct the same process for board creation, following, and other features.

Encourage immediate action. Understanding what it means to Pin early substantially increases the likelihood of retaining the new Pinner. He or she will get a simplified experience where Pinning is highlighted and other advanced features are hidden, until they save their first Pin. We call this the First Pin Experience (more on that below).

After becoming active for the first time and saving a Pin, the Pinner will graduate to a richer experience.

The need for a framework

The updated NUX is a multi-session experience that differs based on Pinner state such as what they’ve done, how long they’ve seen an experience, etc. Therefore we needed a system that could control what the Pinner experiences based on those variables. We also needed a way to easily run experiments to test different NUX steps, messaging, and educational units.

Similar experiences had already existed, such as new feature tutorials and education. We realized the logic powering these existing experiences were standalone and shared similar logic, and created an Experience Framework to build NUX and power new and existing experiences.

You can think of an experience as any feature on Pinterest, each of which require logic to determine when they need to be shown, persistence logic for when they’re dismissed/completed, and logging (i.e impressions vs. completions).

Here’s how the logic was laid out in our client and backend:

Here’s how the logic looks like with the Experience Framework:

Each experience is configured in one place, the client delegates display logic to the backend, and persistence, logging and experimentation is all powered by the framework.

Boiling down to solutions

The Experience Framework answers one simple question: what experiences should a Pinner see on a given view within the app?

Every time the client renders a view, the Experience Framework will tell the client what experience the Pinner should see. For example, when rendering the home page, the Experience Framework tells the client whether to show a specific step in NUX, a tutorial, or a feed of Pins. How the decision is made is opaque to the client.

The decision engine

The core of the framework is the decision engine, responsible for determining the experience a Pinner should see by considering configured and eligible instances for all potential views. The best experience is then decided upon based on static configuration (start date, seconds_to_expire, max_display_count, etc), the Pinner’s state (such as number of Pins created, level of engagement, and features experienced), the experience state (enabled, expired, view count, etc), the client type, experiment group, and many other properties. These decision parameters allow us to build complicated experiences like our First Pin Experience.

To recap, the First Pin Experience is shown to new Pinners who’ve never saved a Pin, and it lasts for no longer than 24 hours. The configuration is simple: set the seconds_to_expire to 24 hours and write a handler that will ensure it’s only enabled for new users with no Pins. In this experience we’ve also configured an experiment to further test whether 24 hours is really the best duration for this experience.

Experience.WEB_FIRST_PIN_EXP: {

'description': 'First Pin Experience.',

'start_date': '2013-11-01',

'seconds_to_expire': 60*60*24,

'handler': autobahn.FirstPin,

'experiment': {

'name': 'first_pin_duration',

'groups': {

'1_day': {'seconds_to_expire': 60*60*24,

},

'2_days': {'seconds_to_expire': 60*60*24*2, },

'7_days': {'seconds_to_expire': 60*60*24*7, }

}

}

}

We then associate the experience with a unique placement in the client (web home page).

Placement.WEB_HOME_TAKEOVER:

{

'experiences': [

Experience.WEB_MANDATORY_AUTOBAHN,

Experience.WEB_FIRST_PIN_EXP,

Experience.WEB_FIRST_PIN_USER_ED,

Experience.WEB_YOUR_BOARDS_USER_ED,

Experience.WEB_FAST_FOLLOW_USER_ED,

Experience.WEB_FIND_FRIENDS_USER_ED

],

'cooldown': SESSION_LENGTH # 2 hour cooldown - session length

}

There could be many eligible experiences on the home page placement. The framework resolves this by guaranteeing only one experience can be shown on that view.

Be fast or fail fast

The experience framework is the gatekeeper in determining what experience a Pinner should see, so at it’s peak it can see about 50k decision requests/second and growing. Since in some cases the view needs to synchronously call the Experience Framework before rendering, it needs to be fast or at the very least fail fast.

The major bottleneck in our case is I/O, i.e accessing persistent user and experience state data. We addressed this by:

  • Storing all our data in an HBase cluster that’s highly optimized for retrieving state data
  • Minimizing the number of calls to HBase
  • Making use of gevent

Luckily we also applied much of our past learnings when optimizing HBase for fast online reads. As a result, we were able to achieve an upper90 latency of 30-40ms.

Even with fast response times, it’s important to avoid making any unnecessary backend calls. In order to achieve this, our clients periodically pull down and caches all displayable experiences and uses that to decide and render an experience whenever possible. However, keeping this “state of the world” cache up to date can be tricky.

As a last resort, if the decision engine takes too long to respond, we fail fast. This means in the worst case the Pinner will experience the default user experience, which is not ideal, but allows for the Pinner to continue using Pinterest.

Today the Experience Framework powers the majority of experiences on our website. The framework is also steadily powering experiences on our mobile apps, which is exciting because it enables us to dynamically render experiences as we run experiments without pushing a new release. You can expect to see better and improved experiences coming to a Pinterest app near you.

If problems like this interest you, the Pinterest Growth Team is hiring product-minded hackers to help billions of users worldwide discover the things they love and inspire them to go do those things.

Daniel Chu is an engineer at Pinterest.

Read More