Pinterest engineering blog

  • Back to all posts
  • Aug 16, 2016
  • Share

Tracker: Ingesting MySQL data at scale - Part 2

Rob Wultsch

Rob Wultsch is a database engineer on the SRE team

In Part 1 we discussed our existing architecture for ingesting MySQL called Tracker, including its wins, challenges and an outline of the new architecture with a focus on the Hadoop side. Here we’ll focus on the implementation details on the MySQL side. The uploader of data to S3 has been open-sourced as part of the Pinterest MySQL Utils.

Pinterest engineering blog

  • Back to all posts
  • Aug 11, 2016
  • Share

Tracker: Ingesting MySQL data at scale - Part 1

Henry Cai

Henry is a software engineer on the Data Eng team

At Pinterest we’re building the world’s most comprehensive discovery engine, and part of achieving a highly personalized, relevant and fast service is running thousands of jobs on our Hadoop/Spark cluster. To feed the data for computation, we need to ingest a large volume of raw data from online data sources such as MySQL, Kafka and Redis. We’ve previously covered our logging pipeline and moving Kafka data onto S3.

Pinterest engineering blog

  • Back to all posts
  • Mar 11, 2015
  • Share

Open-sourcing Pinball

Pawel Garbacki, Mao Ye, Changshu Liu and Jooseong Kim

Pawel is a software engineer on the Monetization team. Mao, Changshu and Jooseong are software engineers on the Data team.

As we continue to build in a fast and dynamic environment, we need a workflow manager that’s flexible and can keep up with our data processing needs. After trying a few options, we decided to build one in-house. Today we’re open-sourcing Pinball, which is designed to accommodate the needs of a wide range of data processing pipelines composed of jobs ranging from simple shell scripts to elaborate Hadoop workloads.

Pinterest engineering blog

  • Back to all posts
  • Feb 18, 2015
  • Share

Real-time analytics at Pinterest

Krishna Gade

Krishna is an engineering manager on the Data team

As thousands of people gather in the Bay Area this week for Strata + Hadoop World, we wanted to share how data-driven decision making is in our company DNA.

Pinterest engineering blog

  • Back to all posts
  • Aug 22, 2014
  • Share

Hadoop statistics collection and applications

Xinding Sun

Xinding is a software engineer at Pinterest

The massive volume of discovery data that powers Pinterest and enables people to save Pins, create boards and follow other users, is generated through daily Hadoop jobs. Managed by Pinball, these jobs are organized into indexing workflows that outputs dozens of terabytes of data daily.

Subscribe to RSS - Hadoop