The official Pinterest engineering blog.

As part of the Q&A with Pinterns series, Pinterest interns share their experiences working on projects and features with our engineers. Here, 2014 Summer Pinterns Lucas and Nicole talk about what they’ve learned and built over the past few months.

What did you focus on this summer?

Nicole: As an intern on the Growth team I worked on acquisition and activation, my projects focused on increasing signups as well as app installs. I created an app install banner on logout, implemented a full redesign of the pinterest.com unauthenticated landing page, and worked on a more streamlined signup flow throughout the rest of the unauth pages.

Lucas: During my internship at Pinterest, my work was mostly focused on our international growth. My first project was to create a brand-new admin system for Pinterest’s community managers to curate and contact some of our influential Pinners (Pinfluencers). This effort allowed our team to easily whitelist Pinners from different countries and I could then use those lists with curated Pinfluencers metadata to algorithmically recommend local Pinfluencer accounts to Pinners with similar interests. Those Pinfluencers are now featured in three areas of the website: NUX, categories pages and in a homefeed carousel. This last one was one of my favorite projects, since I got to work on it all the way from selecting the Pinfluencers on our backend to implementing the design and making it responsive.

Describe the team you worked with. What is the culture like at Pinterest?

Nicole: I worked on the Growth team where I met the most incredible people this summer. It was a diverse team—everyone brought something really unique to the table and as a result I learned even more. My mentor was especially motivating and supportive, giving me the resources to get my work done as well as encouraging me to take more ownership of my projects and come up with new ideas.

I’ve never felt more comfortable and welcomed in a workplace environment as I did at Pinterest, the culture is very collaborative. Everyone I talked to, both on and off of my team was always willing to stop what they were doing to answer a question. Additionally, everyone is authentic and cares so much about the product.

Lucas: This summer I worked concurrently in two teams: Growth and International. I had the privilege of working with talented and passionate people who recognize each other’s skills and use their unique expertises to work together (knit) and make Pinterest a better product everyday.

Besides getting exceptional advice from my mentor, project manager and teammates, I also got help and learned a lot from people in other teams such as Web, Interests, Writing, Design, Recruiting, and others. Pinterest has this amazing culture of knitting to solve challenges as well and fast as possible and this kind of habit not only made me a better and faster developer, but also taught me new skills and more than anything, made me end up making many new friends.

Pinterest feels like family and people know and care about each other. One of my favorite things about Pinterest is how crafty and creative our team is and how much support you get from Pinterest when you have a DIY craving. We have DIY stations in the office and display our crafts all around. This summer, for instance, some Pinterest friends and I decided to make Pinterest a logo made of Rubik’s cubes. Three days later, our Workplace team had already ordered and delivered 60 Rubik’s cubes to us and we now display it all over our office:

What kind of impact did you have?

Nicole: One of the best parts about being on the Growth team is I can see the direct impact of all of my work on the company’s revenue, which is unique as an intern. I saw all the numbers and know how many people downloaded the app because of my upsell, and how many more additional signups we got from the redesign of the signup flow. As much of an impact as I had on Pinterest, without a doubt Pinterest had even more of an impact on me. I was so lucky to work with the people I worked with and learned more than I ever thought I would.

Lucas: It was really validating to see how much impact my work at Pinterest had in other people’s lives. By creating an admin system for Pinfluencer management, for instance, I allowed our community managers to work several times faster and whitelist thousands of Pinfluencers throughout the summer. By working closely with them, I was also able to identify opportunities of improvement to my work and develop or re-tailor features to optimize their work.

On the Pinners side, I was able to run several A/B experiments and try different features and revamps to our product in order to improve Pinner experience. Those experiments were never limited to my main project and my mentor and project manager always gave me the freedom of working on my own ideas for Pinterest and taking ownership of my projects.

What surprised you about this internship?

Nicole: Definitely the biggest surprise for me was how much responsibility I had and how much I was able to grow as a person throughout the internship. I came in expecting to do a lot of cool work but that was pretty much it. I had no idea that I would be so challenged, not only from an engineering perspective but also to take complete ownership of my work and to push myself to accomplish more during my 12 weeks.

Lucas: I was really surprised and delighted to see how much code I developed and how much I learned during my internship. Pinterest challenged me to think critically about problems and develop scalable and maintainable solutions to several projects that I could take ownership of and develop them from the first design until the final product.

What advice do you have for fellow CS students?

Nicole: Work at Pinterest!! Besides that though, when looking for internships I would definitely say to find a place where you can make an impact and work in a challenging environment. In school don’t ever think that something is too difficult or too hard, because that becomes self fulfilling. The most important thing is believing everything you can accomplish and then working hard to get there. Don’t be afraid to fail either, its one of the best ways to learn.

Lucas: When looking for a company to work at, look for a place where you will be challenged to work on interesting and creative problems and that has a culture that makes you happy. I found that at Pinterest and if you think that Pinterest could be that for you too, be sure to apply!

Read More

As the community of Pinners grows, so does the number of businesses on Pinterest. Earlier this week, we revamped our analytics tool to help businesses better understand their audiences and how their organic content is performing.

This second version of our analytics product offers new features such as profile and audience analytics, user country and metro segmentation, app segmentation, more comprehensive event types (such as impressions, clicks, and repins), and Pin It button analysis for the off-network and on-network usage.

The analytics architecture

There are four major components of the Analytics architecture:

  • A scalable and reliable Hadoop MapReduce batch data processing pipeline that creates a new dataset everyday.
  • Robust HBase data storage that serves data with minimal latency.
  • A set of API endpoints that supports fetching data programmatically and enforces access control.
  • A web application powered by an interactive UI for business users to consume data.

The raw data comes from two sources:

  • Event-based data comes from Kafka and is logged onto AWS S3.
  • Object-based data comes from production db.

The pipeline is managed by Pinball, an internal workflow management tool.

Understanding the analytics data

Analytics provides three different datasets to the user, namely domain analytics, profile analytics and audience analytics.

  • Domain analytics contains usage data about Pins or boards with Pins that link to a business domain.
  • Profile analytics contains usage data about Pins or boards that belong to a business user.
  • Audience analytics contains data about users or followers that interact with domain/profile content.

Pinterest analytics relies heavily on different types of user interactions, including Pin impressions, click-throughs, repins, likes and creates. We provide aggregated counts for each type of event at the Pin and board level, as well as data for Pins that are the most engaging or highest ranked in the search result with proprietary algorithms.

Batch data processing pipeline

To provide an accurate dataset for analytics, we built a Hadoop MapReduce data pipeline. We process tens of terabytes of data each day, so it’s important to ensure the pipeline is both scalable and reliable.

The MapReduce pipeline starts to process data as soon as the data is available. It’s triggered by the condition jobs which periodically check if the data is available on S3. We split up the pipeline into about 100 jobs. If some of the jobs fail unexpectedly, other independent jobs can continue processing without interruption.

There are four different stages of the pipeline, and jobs within the same stage are generally independent and can run at the same time.

Stage 1 is to extract object items, including Pins, boards and users. We schematize and only keep necessary fields since the data is heavily reused. We then extract events that are associated with those Pins, boards and users. All later stages depend on those events, so we made this phase highly paralleled and process each event type separately. In stage 2, we aggregate all events and users at the domain and profile level, and these metrics then power the daily metrics graphs. Stage 3 is for top item extraction, where we find the top items based on either event counts, or proprietary ranking criteria for each profile and domain. The last stage is to persist all data into HBase.

Tune up the processing pipeline

Since our traffic is increasing and the pipeline has a 19 hour ETA, we put in a lot of effort into making it fast and reliable. All of the data pipeline jobs are written in Hive and Cascading, and we used a few tricks to improve the performance.

1. Optimize dependency and increase parallelism. The general guideline for our jobs is to be as simple as possible. We store a lot of temporary intermediate results to increase data reusability. It also helps us alleviate data processing failures as we can resume the workflow at these checkpoints.

2. Minimize data size. Disk I/O is usually the bottleneck of processing jobs, so it’s very important to minimize the data size with shared data extraction jobs, and keep only the necessary data fields.

3. Avoid sorting the dataset. Query clauses such as ORDER BY and DISTRIBUTED BY are all heavyweight. While processing extremely large datasets, chances are that you’ll only want a limited number of top results. A better approach is to use CLUSTER BY to cluster the keys together and use a priority queue to keep only the top results.

4. Data pruning. For profile and domain level aggregated events and users, the dataset is usually heavily skewed, namely some profiles and domains own way more data than others. This makes it difficult for MapReduce to evenly distribute data onto reducers and hurts query performance. A workaround is to study the distribution of the dataset, and prune data entries with a small chance of appearing in the final results.

5. Avoid or optimize joins. Think carefully about the trade off between logging more data and spending more processing time on data joins. When a join cannot be prevented, take advantage of map join and bucket join.

6. Take advantage of UDF functions. UDF functions offer great values in increasing productivity and are a scalable way of sharing query logics.

Scalable data store and serving

From the analytics data pipeline, we create a half terabyte of data each day and keep most of the data around for a month. Our storage backend is powered by HBase, which is integrated well with our Hive and Cascading processing pipeline. HBase is a key/value store, and we design our storage schema as follows:

  • Row key: userparam_platform_aggregation_date
  • Column family: metrics/topitems
  • Column: metric names

Our schema design has a few advantages, such as, the date is the last element in our key, so all of the data is consecutive when displayed in a metric graph. Also, the locality enables us to quickly fetch the data with a scan command.

We pre-aggregate data on a few levels: daily, weekly, biweekly and monthly, so we never need to aggregate any data in real time. We also pre-aggregate all available app types and store the aggregated results as separate entries. We never need to aggregate multiple userparams, so we keep it as the first element in the key, and the data will be split evenly across all region servers, and the load will be well balanced. We only have one column family in a single table because for the analytics data, it’s hard to make sure that the data have similar sizes across different column families.

Surfacing data in the web application

The analytics web application is powered by API endpoints. Users with access rights can fetch the same data and do analysis on their own. The web application has rich UI components to help users dig in and discover insights.

You can check out the finished product now. If you’re interested in working on new monetization engineering challenges like this one, join us!

Tongbo Huang is a software engineer at Pinterest.

Acknowledgements: The revamped analytics was built in collaboration with Long Cheng, Tracy Chou, Huayang Guo, Mark Cho, Raymond Xiang, Tianying Chang, Michael Ortali, Chris Danford, and Jason Costa along with the rest of the monetization team. Additionally, a number of engineers across the company provided helpful feedback.

Read More

As part of the Discover Pinterest series, engineers from across disciplines share insights at tech talks and meetups at our San Francisco headquarters. Here, Pinterest growth engineer John Egan shares a wrap up from a talk on data-driven growth. To stay updated on our events, check out the event page to see what’s new each month.

Interest in growth as a discipline has grown over the years, with two branches becoming especially prominent: product-driven growth and marketing-driven growth accomplished by factors such as advertising and inbound marketing. Most of our growth at Pinterest is product-driven , with a small team of a dozen engineers having a big impact.

We’re constantly asking ourselves, how can we improve the experience for existing Pinners, more effectively communicate the uses of Pinterest to new users, and improve content for unengaged users. We’re always running experiments and iterating on what we’ve accomplished so far, but we’re always looking for new ideas.

image

At the last Discover Pinterest talk, we teamed with fellow growth engineers to discuss ideas and best practices. I shared insights on our growth engineering experiences at Pinterest and how data plays into our quarterly road-mapping process. Airbnb’s Gustaf Alstromer, Yozio founder Lei Sun, Pinterest growth engineering manager Dannie Chu, and Pinterest product manager Cat Lee had a panel discussion about related technical challenges and data analytics.

You can find the slides from the talk on our Discover Pinterest Slideshare account.

John Egan is a growth engineer at Pinterest.

Acknowledgements: Thanks to Pinterest’s Dannie Chu and Cat Lee, as well as Gustaf Alstromer from Airbnb and Lei Sun from Yozio for participating.

Read More

The home feed should be a reflection of what each user cares about. Content is sourced from inputs such as people and boards the user follows, interests, and recommendations. To ensure we maintain fast, reliable and personalized home feeds, we built the smart feed with the following design values in mind:

1. Different sources of Pins should be mixed together at different rates.

2. Some Pins should be selectively dropped or deferred until a later time. Some sources may produce Pins of poor quality for a user, so instead of showing everything available immediately, we can be selective about what to show and what to hold back for a future session.

3. Pins should be arranged in the order of best-first rather than newest-first. For some sources, newer Pins are intuitively better, while for others, newness is less important.

The architecture behind smart feed

We shifted away from our previously time-ordered home feed system and onto a more flexible one. The core feature of the smart feed architecture is its separation of available, but unseen, content and content that’s already been presented to the user. We leverage knowledge of what the user hasn’t yet seen to our advantage when deciding how the feed evolves over time.

Smart feed is a composition of three independent services, each of which has a specific role in the construction of a home feed.

The role of the smart feed worker

The smart feed worker is the first to process Pins and has two primary responsibilities - to accept incoming Pins and assign some score proportional to their quality or value to the receiving user, and to remember these scored Pins in some storage for later consumption.

Essentially, the worker manages Pins as they become newly available, such as those from the repins of the people the user follows. Pins have varying value to the receiving user, so the worker is tasked with deciding the magnitude of their subjective quality.

Incoming Pins are currently obtained from three separate sources: repins made by followed users, related Pins, and Pins from followed interests. Each is scored by the worker and then inserted into a pool for that particular type of pin. Each pool is a priority queue sorted on score and belongs to a single user. Newly added Pins mix with those added before, allowing the highest quality Pins to be accessible over time at the front of the queue.

Pools can be implemented in a variety of ways so long as the priority queue requirement is met. We choose to do this by exploiting the key-based sorting of HBase. Each key is a combination of user, score and Pin such that, for any user, we may scan a list of available Pins according to their score. Newly added triples will be inserted at their appropriate location to maintain the score order. This combination of user, score, and Pin into a key value can be used to create a priority queue in other storage systems aside from HBase, a property we may use in the future depending on evolving storage requirements.

Smart feed content generator

Distinct from the smart feed worker, the smart feed content generator is concerned primarily with defining what “new” means in the context of a home feed. When a user accesses the home feed, we ask the content generator for new Pins since their last visit. The generator decides the quantity, composition, and arrangement of new Pins to return in response to this request.

The content generator assembles available Pins into chunks for consumption by the user as part of their home feed. The generator is free to choose any arrangement based on a variety of input signals, and may elect to use some or all of the Pins available in the pools. Pins that are selected for inclusion in a chunk are thereafter removed from from the pools so they cannot be returned as part of subsequent chunks.

The content generator is generally free to perform any rearrangements it likes, but is bound to the priority queue nature of the pools. When the generator asks for n pins from a pool, it’ll get the n highest scoring (i.e., best) Pins available. Therefore, the generator doesn’t need to concern itself with finding the best available content, but instead with how the best available content should be presented.

Smart feed service

In addition to providing high availability of the home feed, the smart feed service is responsible for combining new Pins returned by the content generator with those that previously appeared in the home feed. We can separate these into the chunk returned by the content generator and the materialized feed managed by the smart feed service.

The materialized feed represents a frozen view of the feed as it was the last time the user viewed it. To the materialized Pins we add the Pins from the content generator in the chunk. The service makes no decisions about order, instead it adds the Pins in exactly the order given by the chunk. Because it has a fairly low rate of reading and writing, the materialized feed is likely to suffer from fewer availability events. In addition, feeds can be trimmed to restrict them to a maximum size. The need for less storage means we can easily increase the availability and reliability of the materialized feed through replication and the use of faster storage hardware.

The smart feed service relies on the content generator to provide new Pins. If the generator experiences a degradation in performance, the service can gracefully handle the loss of its availability. In the event the content generator encounters an exception while generating a chunk, or if it simply takes too long to produce one, the smart feed service will return the content contained in the materialized feed. In this instance, the feed will appear to the end user as unchanged from last time. Future feed views will produce chunks as large as, or larger than, the last so that eventually the user will see new Pins.

Outcomes

By moving to smart feed, we achieved the goals of a highly flexible architecture and better control over the composition of home feeds. The home feed is now powered by three separate services, each with a well-defined role in its production and distribution. The individual services can be altered or replaced with components that serve the same general purpose. The use of pools to buffer Pins according to their quality allows us a greater amount of control over the composition of home feeds.

Continuing with this project, we intend to better model users’ preferences with respect to Pins in their home feeds. Our accuracy of recommendation quality varies considerably over our user base, and we would benefit from using preference information gathered from recent interactions with the home feed. Knowledge of personal preference will also help us order home feeds so the Pins of most value can be discovered with the least amount of effort.

If you’re interested in tackling challenges and making improvements like this, join our team!

Chris Pinchak is a software engineer at Pinterest.

Acknowledgements: This technology was built in collaboration with Dan Feng, Dmitry Chechik, Raghavendra Prabhu, Jeremy Carroll, Xun Liu, Varun Sharma, Joe Lau, Yuchen Liu, Tian-Ying Chang, and Yun Park. This team, as well as people from across the company, helped make this project a reality with their technical insights and invaluable feedback.

Read More

As part of an ongoing series, engineers share a bit of what life is like at Pinterest. Here, engineer Tilde Pier talks about pivoting her career, internationalizing Pinterest’s front-end, and her awesome hairdos!

How did you get involved with Computer Science?

I wanted to be a programmer when I was little, but began studying social justice in college and ended up in Human Resources. I then started dating an engineer, and she taught me a little Python and gave me the courage to quit my job and change careers.

What are you working on right now?

I’m working on a smarter system for handling translations. We want to maintain the ability for engineers to move fast, while also ensuring that international Pinners have a consistent experience with those in the U.S.

How would you describe Pinterest’s engineering culture?

Everyone is warm, kind, and thoughtful. We all care deeply about doing right by our users. There’s an inherent tension between code quality and actually shipping things, and we maintain a measured balance between the two. Oh, and there are a lot of high fivers!

How do you use Pinterest? What are your favorite things to save and discover on Pinterest?

Right now I’m planning my next tattoo. Most of the incredible watercolor tattoo artists I’m finding on Pinterest are halfway around the world — not sure if that’s a blessing or a curse!

Also, I’ve been vegan for over a decade, so I enjoy collecting delicious veg-friendly places using Place Pins.

How do you spend your time outside of work?

I’m a painter, and create abstract geometric cityscapes that couldn’t exist in three dimensional reality. In July, I had my first solo exhibition.

What’s your latest interest?

At the beginning of the year, I started taking static trapeze classes (circus aerials). Last month, I did my first legit pull up from straight arms!

Fun fact?

I get a lot of compliments from 6-year-old girls about my hair.

Read More

The massive volume of discovery data that powers Pinterest and enables people to save Pins, create boards and follow other users, is generated through daily Hadoop jobs. Managed by Pinball, these jobs are organized into indexing workflows that outputs dozens of terabytes of data daily.

Previously, the only statistics collected were each job’s native Hadoop counters, which presented a few problems. For one, they’re expensive, because each counter change has to synchronize with the Hadoop job tracker. Additionally, counter history isn’t preserved, so it’s impossible to trace statistics trends and strange changes, which are often strong indicators of data corruption or code bugs.

To address these issues, we designed a new statistics collection engine to power the indexing workflows, which has allowed us to develop real-time alerts to protect our data, and can be used to collect useful statistics for ongoing data analysis.

Architecting the statistics collection engine

The statistics collection engine stores statistics for each job at run time, and then are merged with native Hadoop counters and stored to S3 immediately after job completion. We then do alerts on job completion, copy the data to our database, and alerts and data analysis are further developed on database. The statistics data are structural and small compared to that of a Hadoop job output. The MySQL DB serves our applications very well in practice.

Designing the engine

We kept the following design values in mind when building the statistics collection engine.

Lightweight: To avoid expensive communication between the Hadoop job tracker and the tasks, we implemented the statistics collection engine at per shard layer. In our initial effort, the engine was implemented inside Hadoop reducer. Since each reducer has its own shard, the collection of statistics is local to the reducer task. Note the reducer doesn’t have to communicate with the job tracker.

Our job output is stored on AWS S3, where each reducer takes one partition. On completion, a reducer uses its partition number to store its own statistics. Once all the reducers finish, we aggregate the total statistics by scanning through the per shard statistics. At the end, we merge the statistics with the Hadoop native counters to get the final result. We store both intermediate and final statistics in thrift format.

Simple: We implemented the statistics collection engine as a singleton class. As the reducer runs within one process, it simply grabs the singleton at preparation phase. During the merging phase, the reducer calls the exposed APIs to update the statistics.

We support multiple types of statistics including counters, sums, top/bottom items, histograms, each of which were chosen because the final results can be derived by merging their per-shard intermediate results. The APIs corresponding to the statistics are listed in the following. Note that the “name” is unique within a reducer process, and used to identify a specific statistics metric. For example, we use “PageRank” in a pagerank reducer to identify all the pagerank related information.

1. void incrCount(String name) and void incrCount(String name, long count) are used to increase a metric count, where the first one is a special case of the second, with count = 1L.

2. void addSum(String name, double value) is used to add up the values of a specific metric.

3. void addToHistogram(String name, double value) uses a histogram buckerizer and adds the value to the histogram counts of the metric.

4. void addItem(String name, Object object, double value) adds an item into two priority queues for the metric. The queues keep fixed numbers of top and bottom items ranked by their values. The object’s name string is used to identify the item in the queue.

To make it simple to use, we also set the defaults for the statistics APIs so one can use the above APIs without customizing any parameters. By default, a log bucketizer is used for histograms (the size of top/bottom items are set to 100). However, a programmer can implement a new bucketizer and expose it for research purpose, and also set the size of top/bottom queue with a few extra lines.

Alert and data analysis applications

We mainly use the collected statistics information for two types of applications — alert and data analysis.

Alerts: We’ve implemented two types of alerts based on absolute value and on temporal estimation.

1) Alert based on absolute value: Any number that’s below a preset threshold will trigger an alert. One example is USER_WITH_INTERESTS metric in UserJoins calculation. In the figure, we can clearly see the dip due to a bug on 12/23.

2) Alert based on temporal estimation: In this case, we use recent data to predict current value. Any number that’s out of track will trigger an alert. One example is the the S3 output size check of a job. In the figure, the output size of the job changed significantly around 07/22, which was due to corrupted data input.

We also have two degrees of alert: warning and fatal. In the former case, we only send warning email to the job owner. In the latter, we stop the job immediately and alert the owner to do the investigation. Since the alert is useful in detecting sudden changes, programmers tend to tolerate false positives in practice. This is purely based on the fact that post debugging on corrupted data and bugs are way more expensive.

Data analysis: Almost all the statistics information we collect are useful for data analysis. For example, after the UserJoins calculation job is done, we can collect the user related board and pin counters. In another example on top/bottom items, the pagerank job collects top ranked users, boards from the joins and makes them available for post analysis at each iteration. As a third example, we collect the histograms of social ranks in the PinJoins calculation job for research and experiment. At each iteration, we tune the social rank parameters and get the result on the fly.

Looking ahead

The success of the statistics collection framework enables us to manage the indexing workflow in a total new way. There are still many things we want to explore in the future.

  • We may use an extra Hadoop job to do the final aggregation to improve performance if the per shard statistics data become too big.
  • We may create more sophisticated statistical model for better alert and analysis.
  • To reduce false positives, we may allow the programmer to initialize the workflow with some parameters on data or code changes.
  • We may also use the statistics to tune the scheduling of the jobs in the workflows to achieve optimum cluster usage.

If you’re interested in tackling these possibilities with us, join our team!

Xinding Sun is a software engineer at Pinterest.

Acknowledgements: The framework presented here is a collaboration with Tao Tao. We want to thank the whole Discovery team for actively applying it to their Hadoop jobs/workflows and giving invaluable feedbacks. Thanks to Hui Xu for the suggestions and technical insights.

Read More

We have hundreds of A/B experiments running at anytime, as we launch new products and improve the quality of the user experience. To power these experiments and make timely and data-driven decisions, we built a system to scale the process.

Building the experimentation framework

We set out with three main goals for the framework:

  • Extensible and generic enough so that we could easily extract the most valuable information from experiment data to support different kinds of cohort/segmented analysis.
  • Scalable to process and store a large amount of data generated by ever increasing number of experiments, users and metrics to track.
  • Compute and serve metrics via dashboards in batch/real time with high quality, providing the optimal statistical test modules.

Each component of the framework needed to be independently and easily scalable, and included:

  • Kafka for the log transport layer
  • Hadoop for running the MapReduce jobs
  • Pinball for orchestrating the MapReduce workflows
  • Storm for real-time experiment group validation
  • HBase & MySQL to power the dashboard backend
  • Redshift for interactive analysis

Front-end messages are logged to Kafka by our API and application servers. We have batch processing (on the middle-left) and real-time processing (on the middle-right) pipelines to process the experiment data. For batch processing, after daily raw log get to s3, we start our nightly experiment workflow to figure out experiment users groups and experiment metrics. We use our in-house workflow management system Pinball to manage the dependencies of all these MapReduce jobs.

The final output is inserted into HBase to serve the experiment dashboard. We also load the output data to Redshift for ad-hoc analysis. For real-time experiment data processing, we use Storm to tail Kafka and process data in real-time and insert metrics into MySQL, so we could identify group allocation problems and send out real-time alerts and metrics.

Lightweight logging

In logging, the topline goal is to make the process as lightweight as possible. There is no persistence layer involved for experiment group allocation, so that we could minimize the latency/load on our production services. We left all of the complicated metric computations to offline data processing.

To determine users’ experiment (feature) group, whenever a request comes through to the application server, we hash by experimentid and userid/requestid to assign the user the right bucket based on group size configurations, such that the same user would always be allocated to the same experiment group for consistent user experiences. For each request, we log a thrift message in the following format.

We also have separate logs for all users’ actions and context. In later steps, we join these logs against this user experiment activation log to determine the performance of each experiment group and corresponding activity metrics. Since this is a large-scale and time consuming computation, we leverage the Hadoop framework to do this.

Batch-processing MapReduce workflow

The MapReduce workflow starts to process experiment data nightly when data of the previous day is copied over from Kafka. At this time, all the raw log requests are transformed into meaningful experiment results and in-depth analysis. To populate experiment data for the dashboard, we have around 50 jobs running to do all the calculations and transforms of data.

To support ‘days-in’ analysis for separating novelty effect, the experiment workflow starts by first figuring out the first day the user joining the experiment, then that information together with the user’s other attributes (such as gender and country segmentations) are joined with the raw actions log (event/context). Later, the aggregator job will do the aggregation on these activity metrics per experimental group, per segmentation, per days-in. The segment rollup job will then generate both fully denormalized/aggregated data and insert into HBase.

To account for novelty decay effect of features, for each user in the experiment, we first figure out the earliest date when the user was first seen in the experiment group, then for each experiment group, we join experiment users with their subsequent actions on a particular day, so that user activities are grouped by experiment group and also the number of days the user was in the experiment.

To support segmented analysis of features’ impact on a particular group, e.g. iPhone activity of male active users located in United Kingdom, we categorize users into different fully denormalized attribute groups (cohorts). The final output of fully segmented metrics are inserted into HBase, exploiting the fast and scalable data write/read access provided by HBase. Previously, we were storing this data in MySQL, which had high ingestion latencies (in the order of hours). By moving to HBase we were able to slash the ingestion latency to a few mins and the reads also became much faster.

Scalable serving of experiment metrics

After the MapReduce workflow computation, we end up with around 20TB of experiment metrics data (every six months). To make that data easily accessible within sub seconds, we built a service to serve experiment metrics data using HBase as backend. HBase is essentially a key/value store, where each key consists of the following parts: row-key, column family, column and time-stamp. We store our experiment metrics, such as number of active users and number of actions, with the following schema.

The row key is constructed by multiple different components, where we classify different components into: non-aggregates, cohorts and ranges (descriptions below). One intricacy is that sometimes we need aggregated metrics across certain segments (e.g. we need metrics of the experiment across all the countries). To support that, we do aggregations over different segment cohorts based on the fully denormalized data and generate all potential extra rows for rolled up results.

Columns in HBase are grouped into column families. We group metrics in different categories to different column families. For example all user active/action metrics are grouped together in ‘t’ column family, all metrics segmented by apptype in ‘a’ column family, all funnel/context metrics grouped in ‘c’ column family, etc. Each column family contains metrics such as the total number of users in the group, the number of active users/actions across all different events/metrics so on and so forth.

Non-aggregate: These values are constant and never need to be pre-aggregated. So we put them as prefix for prefix filtering.

Cohorts: These dimensions will need to be looked up in aggregate. Extra rolled up rows will be generated for each potential combination with other cohorts. We pre-aggregate the values and generate keys where the value is rolled up and represented by ‘ALL’.

Ranges: This section of the key is for the convenience of range scan. We will need to make range requests on these values.

Real-time data processing

In addition to batch processing, we also wanted to achieve real-time data processing. For example, to improve the success rate of experiments, we needed to figure out experiment group allocations in real-time once the experiment configuration was pushed out to production. We used Storm to tail Kafka and compute aggregated metrics in real-time to provide crucial stats.

Powering data driven decisions

We rendered experiment results through an experiment dashboard, which is constituted by many useful tools to make the experiment results easily accessible.

Cohort analysis: From the dashboard, the experiment owner could compare metrics from different user group such as country, gender, user state and app type. It’s also possible to filter by the time when they join the experiment.

Days-in analysis: User actions happened in the same days-in are grouped together and compared as shown in the following graph grouped in different days joining the experiment. The days-in analysis powers us to understand user behavior under the impact of novelty decay.

Statistical significance: To provide guidance on the significance of differences across experiment groups, we used unpaired t-test for num_active and num_actions test. The test relied on the assumption that the sample data was independently and identically distributed, but not on the underlying sample distribution. The null hypothesis was that the mean values of control and treatment groups were the same. Given the null hypothesis, if the probability of our observation was extremely small, we could reject the null hypothesis under certain significant level. In the analytics dashboard, if the p-value was below significance level 0.05, we marked the result using blue/orange color, as shown in above graph.

Group validation: To help experiment launches successfully, we also support automated group allocation test to ensure that users of each segmentation are allocated as expected. For group allocation validation test, we used Pearson’s chi-square test. A red-glow box would remind experiment owners for group allocation anomalies.

This experimentation framework powers all types of data driven decisions at Pinterest, and has helped us to understand hundreds of A/B experimental results and deciding the launch of new features. If challenges like this interest you, join us on the data engineering team!

Chunyan Wang is a software engineer at Pinterest.

Acknowledgements: The core contributors to the A/B experimentation analytics framework were Chunyan Wang, Joshua Inkenbrandt, Anastassia Kornilova, Andrea Burbank, Mohammad Shahangian, Stephanie Rogers and Krishna Gade along with the rest of the data engineering team. Additionally, a number of engineers across the company provided helpful feedback.

Read More

A core part of Pinterest is giving people the ability to discover content related to their interests. Many times, a Pin can be discovered when people share it with one another, through features like Group Boards and messages.

To make the experience of finding a person as seamless as possible, we recently rebuilt our user typeahead. The goal was to quickly surface the most relevant contacts for the user given what we know about that person’s social connections and following activity on Pinterest.

The legacy typeahead was often slow and unresponsive, and limited the number of contacts a user could store. Additionally, all of the logic for the typeahead layer resided in our internal API. We set out to not only make the typeahead faster, but to also split it into a separate service that could be deployed and maintained independently of our API and web application. Building our typeahead in line with our service-oriented architecture would improve testability, ease the process of adding and deploying new features, and make our code more maintainable.

Developing the Contact Book signal

To surface the most relevant contacts, we leveraged the Pinterest following graph, and social graphs from Pinners’ social networks, such as Twitter, Facebook, and Gmail. Still, many users don’t link their social networks to Pinterest, so additional signals based on mobile contacts were used.

Asking for permission to access contacts on mobile can be tricky if intentions aren’t clear. Simply showing users a native dialog can result in a confusing user experience and a low acceptance rate. To build a seamless permissions experience, we integrated learnings from other companies, such as that the best way to ask for permissions is to explain the value of connecting upfront before asking for native permissions.

We also leveraged the Experience Framework for our mobile permissions flow, which allowed us swap out the text of the mobile permissions flow without changing client code. It also meant we could easily experiment with different flows to further optimize our success rate.

Building the new backend

We built a separate backend service (we call Contacts Service) to store the data for the new user typeahead. The new system consisted of three major components:

  • A modifiable, real-time online “index” for fast access of any user’s contacts from different sources.
  • A Thrift server (which we call Contacts Service) exposing interfaces to manage this index. This allows clients to update the index and look up top contacts using prefix-matching.
  • A set of PinLater tasks that keeps the index in sync with the contacts sources using the Thrift interface.

We chose HBase as the storage solution for the contacts index because it met all of our requirements:

  • Speed: The primary requirement of the typeahead is to look up names quickly. If names are sorted, the operation is simply a binary search using the prefix combo to locate the position then scan until we have enough results. This is a typical HBase scan operation. Our performance data shows that a scan of 20 rows in HBase takes less than two milliseconds — fast enough to meet our requirements.
  • Scalability: HBase is horizontally scalable, and adding more machines to a cluster is easy with minimal manual intervention, and gives linear throughput increase.
  • Fault tolerance: HBase supports auto failover. If a box dies, the HBase cluster moves the data to other live servers and the whole process takes a few minutes to finish with no change required on client side.
  • Writable: We chose to maintain an updatable index instead of two-layer system solution involving base index and fresh index to simplify the implementation and maintenance. HBase provided us with great write performance.

Supporting millions of contacts for one user

We support and aggregate contacts from various sources for each user in the contacts service. Therefore, it’s extensible if we want to add new sources, and flexible if we want to query contacts from certain sources. In most cases, the number of contacts from each source is under two hundred. However, some Pinners have millions of followers, so to tackle this challenge, we used two kinds of schemas to store contacts: wide schema and tall schema.

Wide schema: This is the default schema, which is expected to fit most sources. Contacts in one source for one user are stored together in one row. Each name token is stored as the prefix of column name. With our own implementation of ColumnPaginationFilter (which provides similar feature as Scan but inside one row), we are able to batch these GET requests to all sources in the wide schema in one RPC to do a prefix lookup.

Tall schema: This schema is specifically designed for sources (e.g. Pinterest follower) with potentially large number of contacts. The wide schema cannot support this use case because data in one row cannot be split across regions in HBase. In the tall schema, for each user we store contacts from one source in nearby rows, where the name token is part of the row key. Then for each source, one Scan request can achieve the prefix lookup among the contacts in this source.

Ranking and de-duping contacts

As we provide contacts lookup from different sources, it’s important that we display the most relevant contact at the top. It could also be annoying if we returned the same contact multiple times because it appears as a contact in multiple of your sources. We found an accurate de-duping logic to filter out contacts was highly beneficial to a user’s typeahead experience. To further improve on the above two areas, we relied on many of our existing services to provide real-time data access and enable Contacts Service to return de-duped contacts in the pre-defined ranking order.

  • Ranking: We have configurable ranking order based on different sources. For example, mutual followers is a strong signal of relevancy, so we always want to boost mutual follower contacts at top. This ranking order is configurable, which can be easily replaced with another one.
  • De-duping: Contacts Service talks to Friend Service for social network (including Facebook, Twitter, G+) id to Pinterest id lookup, and Data Service (with our own positive and negative caching) for email to Pinterest id lookup. With this information, we can easily tell whether these contacts are actually the same person based on their Pinterest id.

In order to store the data for efficient lookups, we tokenized each of the contacts names, and store each token with a reference to the original contact. We use the same algorithm to tokenize the contact’s full name, and the query string. This guarantees consistent results. We were able to use this tokenization to help score each match based on the proximity of the terms within the contact name and query string. For example, searching for “John Smith” should yield “John Smith” as a stronger match (higher in the results) compared to “John Michael Smith.”

How to enable real-time results?

We considered running an update job daily, to refresh the index with all of the new connections, but decided real-time results were far more useful. For example, when you connect your Facebook account to your Pinterest account, you should be able to send pins to your Facebook friends almost immediately.

In order to have real-time results, we hooked in updates to the new backend service for every time a user’s contact information changes. There are a few places this can happen:

  • Pinterest name change: When a Pinner changes his/her name, it’s reflected in all of their contacts. That way, others can search based on this new name, and the old name will no longer yield this contact as a result. It can get tricky for those with millions of connections for a particular source, though. We wrote a chained PinLater task that uses pagination to help fan out the updates without overloading the system.
  • Connecting / updating social network source: When a user connects their Pinterest account to Facebook, Twitter, etc., we want them to be able to send pins, and so we created a new PinLater task for this user, to lookup their new connections and update the backend Contacts Service accordingly. We also hooked up these updates in the existing social network refresh tasks.
  • Updating follower relationships: When a Pinterest user follows or unfollows another Pinterest user, those changes should be reflected immediately.

Getting all that data loaded

Since we wrote a new backend service for the new user typeahead, we had to populate it somehow. The initial set of data was not small by any means.

To upload the initial data, we wrote a task to upload all of a user’s connections for each supported source type. We ran this task for every Pinterest user, which took about three days to complete, even with a dedicated PinLater cluster.

Timing was critical. We needed to write all of the real-time updates before doing the initial upload so the corresponding updates would be done on top of the base set of data. So, first we added the real-time updates logic, and then we did the uploaded all of the data. Any actions, thereafter, that changed a user’s contacts updated the backend automatically.

Introducing a faster typeahead

The new user typeahead is markedly faster than the original implementation, with a server-side p99 of 25ms. When we A/B tested our new typeahead implementation against the old version, we found that message sends and Pinner interactions increased significantly.

The next improvement is to compute second-order connections to expand the breadth of the typeahead. Stay tuned!

Devin Finzer, Jiacheng Hong, and Kelsey Stemmler are software engineers at Pinterest.

Acknowledgements: Dannie Chu, Xun Liu, Varun Sharma and Eusden Shing

Read More

Every person sees each page of Pinterest differently, based on factors such as their interests and who they follow. Each pageload is computed anew by our back-end servers, with JavaScript and XHR powering client-side rendering for interactive content and infinite scrolling. This process provides Pinners with unique experiences, but at the scale of our operations, it can create friction.

Here I’ll discuss the atomic deploy system, a solution to some of the challenges that occur during deployment, and a path to successful and ongoing deployments.

Introducing the atomic deploy system

When a Pinner first visits the site, the backend server instructs the web browser to load a particular JavaScript bundle, and ensures the bundle matches the version of the backend software running on that server. So far, everything’s in lock-step, but when we deploy an update, we create a problem for ourselves.

We deploy software at Pinterest in waves, taking 10% of our servers offline in a batch, replacing the software, and putting them back into service. This allows us to have continuous service during our deployments, but it also means we’re running a mixed fleet of old- and new-version servers.

The atomic deploy system is our answer, which grew out of our desire to balance rapid innovation with a consistent, seamless user experience. We aim to develop our technology as quickly as possible, so this system was designed to avoid the intricate dance of backward-compatible updates. At the same time, we wanted to avoid the jarring experience of a page reloading by itself (potentially even losing the user’s context), or forcing the user to click on something to force the site to reload itself.

We hadn’t heard of other cases of doing deploys this way, which meant it could be a terrible idea, or a great one. We set out to find out for ourselves.

Managing “flip flops”

Say for example you visit the website on a Monday morning. You’re likely to view a page generated by a version of our front-end software, version “A”. If you hit reload, you’ll get pages generated by version “A”. Furthermore, we use XHR, so when a you interact with the web app, you’ll be served by dozens of requests in the background. All of these requests are powered by version “A.”

Later, you might wish to deploy “B”. Our standard model for deployments is to roll through our fleet 10-15% at a time slowly converting one web server from serving “A” to “B”.

Now a request has a chance of being served by “A” or “B” with no guarantees. With standard page-reloads this is not a problem, but much of Pinterest is XHR-based, meaning only part of a page will reload when a link is clicked. Our web framework can detect when it’s expecting a certain version and it gets something unexpected, in which case it’ll often force a reload.

For example if you go to www.pinterest.com and it’s served by A and you click a Pin, and the XHR is served by B, you’ll get a page reload. At which point you might click on another Pin which might be served by an mismatched version, which will cause another reload. In fact, you can’t escape a chance of reloads until the deploy is complete, which we call the “flip-flopping effect”. In this case, rather than browse Pinterest smoothly with nice clean interactions, you’ll get a number of full-page reloads.

Our architecture changes

When you visit the site, you talk to a load balancer which chooses a varnish front-end which in turn talks to our web front-ends which used to run nine python processes. Each of these processes are serving the exact same version on any given web front-end.

What we really wanted was a system that would only force a user to reload at most once during a deploy. The best way to ensure this was to ensure atomicity, meaning if we’re running version “A” and you’re deploying “B”, we flip a switch to version “B” and all users are on “B.”

We decided the best way to achieve this was to support serving two different versions of Pinterest and have Varnish intelligently decide which version to use. We created beefier web front-end boxes (c3.8xlarges from c1.xlarges), which could not only handle more load, but easily run 64 Python processes where half were serving the current version of Pinterest and the other serving the previous. The new and old versions were backed behind nginx with a unique port per each version of the site being served. For example, port 8000 might serve version “A” on one host, and port 8001 might serve version “B”.

Varnish will happily serve either the current version of the site or the previous version if you specify which you want (presumably you wouldn’t, but our JavaScript framework would). If you make a request without specifying a version you’ll get the current version of the site. Varnish will route to the right host/port which happens to serve the desired version.

Coordination and deploys

In order to inform Varnish what we should do, we developed a series of barriers, which tell Varnish what version to serve and when. Additionally we created “server sets” in ZooKeeper that let Varnish know which upstream nginx are serving.

Let’s imagine a steady state where “A” is our previous version, “B” is our current version. Users can reach either version “A” or “B”, and within a page load, they will always stay on either “A” or “B” and not switch unless they reload their browser. If they reload their browser they will get version “B”.

If we decide to roll out version C we do the following:

  • Through ZooKeeper we tell Varnish to no longer serve version “A”.
  • Varnish responds when it’s no longer serving version “A”.
  • We roll through our web fleet and uninstall “A” and install “C” in it’s place.
  • When all the web has “C” available we let varnish know that it’s ok to serve.
  • Varnish responds when all the varnish nodes can serve “C”.
  • We switch the default version from “B” to “C”.

By using these barriers, it’s not until the second step that people who were on “A” are now being forced onto “B”. At step 6 we allow new users to be on “C” by default, and users who were on “B” stay on “B” until the next deploy.

A look at the findings

The absolute values are redacted, but you can see the relative effect. Note the dips correspond with weekends, which is when we tend not to deploy our web app. In mid-April, we switched completely to the new atomic deploy system.

We found that the new atomic deployments reduced annoyances for Pinners and contributed to an overall improved user experience. This ultimately means that deploys are stealthier and can we can reasonably do more deploys throughout the day or as the business might require.

Nick Taylor is a software engineer at Pinterest.

Acknowledgements: Jeremy Stanley and Dave Dash, whose contributions helped make this technology a reality.

Read More

Last week we hosted a Tech Talk on Pinterest’s logging infrastructure and Apache Mesos, with engineers Roger Wang of Pinterest, Connor Doyle of Mesosphere, and Bernardo Gomez Palacio of Guavus. You can find a recap of the discussion below, and slides on SlideShare.

Logging Infrastructure at Pinterest (Roger Wang)

Roger joined Pinterest as a software engineer on the data engineering team last year, after working in engineering at Amazon, Google, and Microsoft. Over the years, he’s had the opportunity to work on persistence services, search features and infrastructure. For the talk, Roger shared insights into Pinterest’s new high performance logging agent, Singer, which plays a crucial role in our data architecture. Singer is currently in production to ship logs from application hosts to Kafka, and we plan to open-source it soon.

Deploying Docker Containers at Scale with Mesos & Marathon (Connor Doyle)

Connor joined Mesosphere as an engineer after most recently working at Gensler and One Orange Software. His current work focuses on Marathon - an Apache Mesos framework, which was the topic of his presentation. Connor started with a high level overview of Mesos before diving into new features of Marathon, including deploying Docker containers at scale.

Scaling Big Data with Hadoop & Mesos (Bernardo Gomez Palacio)

Bernardo joined Guavus from HashGo and IGN Entertainment. With a previous focus on backend APIs and infrastructure, he’s now working closer to big data. His presentation focused on using Mesos as a resource management layer for Spark and Hadoop clusters. He also shared case studies from personal experiences managing the Mesos clusters for these two frameworks, as well as comparisons between Mesos with Yarn.

Interested in keeping an eye out for future events? Follow us on Facebook and LinkedIn .

Krishna Gade is a software engineer at Pinterest.

Read More