Pinterest engineering blog

  • Back to all posts
  • Aug 16, 2016
  • Share

Tracker: Ingesting MySQL data at scale - Part 2

Rob Wultsch

Rob Wultsch is a database engineer on the SRE team

In Part 1 we discussed our existing architecture for ingesting MySQL called Tracker, including its wins, challenges and an outline of the new architecture with a focus on the Hadoop side. Here we’ll focus on the implementation details on the MySQL side. The uploader of data to S3 has been open-sourced as part of the Pinterest MySQL Utils.

Pinterest engineering blog

  • Back to all posts
  • Aug 11, 2016
  • Share

Tracker: Ingesting MySQL data at scale - Part 1

Henry Cai

Henry is a software engineer on the Data Eng team

At Pinterest we’re building the world’s most comprehensive discovery engine, and part of achieving a highly personalized, relevant and fast service is running thousands of jobs on our Hadoop/Spark cluster. To feed the data for computation, we need to ingest a large volume of raw data from online data sources such as MySQL, Kafka and Redis. We’ve previously covered our logging pipeline and moving Kafka data onto S3.

Pinterest engineering blog

  • Back to all posts
  • Apr 10, 2015
  • Share

Learn to stop using shiny new things and love MySQL

A good portion of the startups I meet and advise want to use the newest, hottest technology to build something that’s cool, but not technologically groundbreaking. I have yet to meet a startup building a time machine, teleporter or quantum social network that would actually require some amazing new tech. They have awesome new ideas with down-to-earth technical requirements, so I kept wondering why they choose this shiny (and risky) new stuff when all they need is a good ol’ trustworthy database.

Pinterest engineering blog

  • Back to all posts
  • Mar 11, 2015
  • Share

Open-sourcing Pinball

Pawel Garbacki, Mao Ye, Changshu Liu and Jooseong Kim

Pawel is a software engineer on the Monetization team. Mao, Changshu and Jooseong are software engineers on the Data team.

As we continue to build in a fast and dynamic environment, we need a workflow manager that’s flexible and can keep up with our data processing needs. After trying a few options, we decided to build one in-house. Today we’re open-sourcing Pinball, which is designed to accommodate the needs of a wide range of data processing pipelines composed of jobs ranging from simple shell scripts to elaborate Hadoop workloads.

Subscribe to RSS - secor