“If you can’t measure it, you can’t improve it” — Peter Drucker
Peter Drucker, the founder of modern management, was right: a business needs metrics to understand whether it’s growing towards success or falling into the abyss. 📚
The same philosophy can be applied in product data science. Once a product is built and launched, creating metrics that define success is a must.
Almost every major companies have metrics for their core products and services. Google, Netflix, and Facebook have primary and secondary KPIs that enable executives to make long-term decisions and analysts (i.e. data scientists, product analysts) to execute experiments and models that move the needle incrementally.
By the end of the guide, you will have learned about:
- Mission Statement
- Metric Types (North Star, Driver and Guardrail)
- Formulating a Metric
- Frameworks (AARRR, HEART, GAME)
- YouTube Case Study
The lessons discussed in this guide will help you cultivate product data science skills and prepare for interviews.
Real quick, a shameless plug here :) Just want to briefly introduce myself. I am Dan, an Ex-Data Scientist at Google and PayPal with 5.5 YoE. I founded datainterview.com to help candidates remove frustrations in data scientist and MLE interview prep.
If you want to accelerate your prep process, check out my prep course that will give you courses, coaching and Slack group that has helped candidates land top jobs in their dream companies, including Google, Facebook, Amazon and much more!
Make sure to check out datainterview.com! 😊
Every company has a mission statement that defines its purpose to the world. Google’s mission statement is “to organize the world’s information and make it universally accessible and useful.” Facebook’s mission statement is “to give people the power to build community and bring the world closer together.”
This is the single most important statement that influences a company’s vision, strategy, execution, and success. From executives to ICs, their day-to-day work centers on supporting the company’s mission.
Understanding the mission statement is the first step to defining the most important metric, the North Metric metric, as discussed next.
To acquire more information about a company’s mission statement, a simple Google search will do.
A company’s mission statement lays the groundwork for a North Star metric — the key performance indicator of a company’s success. This is the single most important metric that corporate executives and investors use to gauge whether the company is heading in the right direction.
Every area of a company from marketing, product, and engineering channel their efforts in advancing the North Star metric.
Let’s take a look at the recipe of a north star metric:
- Measures the long-term growth of a company
- Improves user experience
- Supports the company’s bottom line
A north star metric provides a flag that a company targets in the long-term, at least a year ahead. In shareholder and company-wide meetings, a CEO might say: We aim to increase our North Star by 10% by next year.
The metric gauges the value a company provides to its users. For instance, Spotify is in the business of streaming songs, podcasts, and videos. A north star metric could be hours streamed on a monthly basis.
As a benefit to focusing on hours streamed, Spotify is able to execute plans that improve user experience; thereby, increase Ad revenue and retention among premium users.
One could argue that a north star metric should be revenue. Although revenue keeps the light on, the metric overlooks long-term value provided to users. For instance, Spotify could reap the benefit of bombarding users with Ads in the short term. However, in the long term, usage and revenue would suffer as users churn given the distractions in the user experience caused by Ads.
Consider additional examples of North Star metrics:
The North Star metric is a long-term metric that helps executives and investors gauge whether a business is moving in the right direction. Driver metrics are short-term metrics that align with the North Star in a hierarchy:
Given the granularity of a driver metric, product managers, data scientists and product analysts will use it in hypothesis testing and machine learning models to improve product quality.
For instance, as noted previously, Facebook’s North Star is hours spent on Facebook. The platform offers various features from News Feed, Friends, Comments, Notifications, Marketplace and such. Each feature has its own driver metric that measures the core value offered to users and aligns with the North Star metric:
The qualities of a great driver metric are:
- Aligned with the North Star — Is it meaningfully and statistically correlated with the North Star metric?
- Actionable — Does it inform and influence key product decisions?
- Sensitive — Is it sensitive enough to measure core actions?
Suppose that a FB’s PM saw that the active engagement time on News Feed increased by 5% with statistical significance in an experiment. The change should be launched, right? Well, not sure fast. Before making a launch decision, the guardrail metrics must be carefully assessed.
Guardrail metrics are secondary metrics that safeguard the overall product experience and monetization outside of the primary metric (Aka driver metric) that is being tested.
For instance, in the FB News Feed example, guardrail metrics could be Ad revenue and app performance. If the News Feed engagement increases at the cost of decreased Ad revenue and app performance, then the PM may not be ready to launch the feature yet. She may want to diagnose the root cause and propose a re-design that may not tamper the guardrails while aiming to improve the primary.
Note that guardrail metrics can be classified into two types:
- Business — Metrics tied to user experience other than the primary metric. Often, there are some trade-offs between the primary metric and business metric. For instance, a search may want to improve ad revenue without afflicting the guardrail which protects user engagement (i.e. queries). Another example could be the sign-up security of an application. An experiment may show that sign-up security with fewer steps increases the sign-up rate. However, the increase should not increase the number of bad actors entering the website.
- Internal Validity — Usually pertains to metrics that monitor an app’s performance and bugs which can afflict user experience in the long term. For instance, they could be the loading time and # of errors. In addition, statistical measurements such as the sample ratio mismatch can serve as a guardrail metric.
You’ve learned about the North Star, driver, and guardrail metrics. Now let’s put on a product data scientist hat and formulate a metric. Here are the steps:
- Identify an action to measure:
- Count (e.g. clicks, page views, visits, downloads)
- Time (e.g. minutes per session)
- Value (e.g. revenue, units purchased, ads clicked)
2. Chose the unit of analysis:
- Per session (e.g. minutes per session)
- Per user (e.g. clicks per user)
- Per page (e.g. revenue per page)
- Per time (e.g. views per month)
3. Choose a statistical function
- Average (e.g. average minutes per session)
- Total (e.g. total revenue per month)
- Count (e.g. click count per week)
- Median (e.g. median revenue per user)
Now, let’s briefly apply the learnings in a sample product case problem.
Suppose that you are a product data scientist at Spotify, and you were asked to design a primary metric for an experiment that improves the song recommendation in a playlist. What would be your primary metric?
Consider the following options:
- Songs clicked
- Songs added
- Songs added per user
- The average number of songs added per user
Option 1, songs clicked, misses the purpose of a playlist recommendation, which helps users find songs to add to their playlist. Merely clicking a song is not representative of finding a relevant song. Hence, click is not the most meaningful action to use.
Option 2, songs added, is slightly better than songs clicked as it measures the intended purpose of a playlist recommendation. However, the metric lacks granularity based on a unit of analysis and statistical function.
Option 3, songs added per user, is definitely better than the first two, but it’s unclear on the calculation of the songs added per user. Is it the total or average number of songs added per user?
Option 4, the average number of songs added per user, is the best given that it contains all the three properties required to make a meaningful metric: action, unit of analysis and statistical function.
The AARRRg metric framework, created by venture capitalist Dave McClure, is perhaps one of the most widely used product metrics across startups and tech companies. This is one framework to know as a product data scientist and interview candidate. As illustrated below, the metric represents a funnel of a customer’s journey from the beginning (acquisition) to the end (revenue).
In projects or interviews that involve measuring the success or funnel of a product, try the AARRRg framework. In fact, a great way to cultivate your product-sense is to choose a product you enjoy using and extract metrics using the framework.
Another metric framework data scientists can use is HEART created by Google’s research team. Similar to AARRRg, HEART covers some elements of a funnel (excluding acquisition, referral and revenue). While AARRRg is focused on the sales funnel from the beginning to the end, HEART, on the other hand, is focused on the overall user experience of a product as reflected by Happiness and Task Success dimensions seen below:
Happiness — A set of metrics that are attitudinal in nature. What is the customer’s level of happiness and satisfaction of your product? Such levels are often measured using surveys.
Engagement — User’s level of engagement with the product. For instance, on Google Photos, the engagement metric could be # of photos uploaded per day.
Adoption — How many visitors start using a product during a time period. For instance, how many users signed up in the last seven days?
Retention — Track active users in a time period. Metrics could be daily active users (DAUs) and monthly active users (MAUs).
Task Success — A measure of frictionless user experience. For instance, how long does it take for a user to complete a task without any breakages or bad user experience?
While AARRRg and HEART can help you brainstorm metrics to measure the overall health of a product, often in interviews, you will be asked to measure a product with a specific purpose. In such situations, using the GAME framework can guide your response.
GAME is a 4-step process (Goals, Actions, Metrics, Evaluation) that can help you answer questions like:
- Measure success on the launch of Stories on Instagram?
- How would you measure user churn on Uber?
- How would you measure success on Airbnb?
Goals — define the user and business goals of the product
To know how to measure, you have to start with what you want to measure. Starting your analysis in a top-down approach is a surefire way to demonstrate that you have product knowledge and brainstorm potential metrics.
Start by deconstructing the product or feature you are asked to measure with respect to user and business goals.
User goal — What is the value offered to the user? How does the user interact with the product or feature to achieve a task? What is the user experience?
Business goal — What is the business goal of the product or feature? What is the benefit produced from active engagement among users?
User and business contexts serve as the groundwork for the next step in GAME — actions.
Actions — create a qualitative list of user actions
The purpose of this step is to define key actions that users take within a product. You can start by listing actions across the different stages of a user journey. You can leverage frameworks such as HEART or AARRRg to flesh out actions.
Metrics — convert the qualitative actions into quantifiable metrics
If the actions step is a qualitative list of actions, the metrics step is the quantitative measures that count those actions. For each of the metrics you listed, you can use the three-step procedure — identify an action, set the unit of analysis, apply a statistical function — discussed in the section, “Formulating a Metric.”
Evaluate — consider the pros/cons and provide a recommendation
The last step involves evaluating the pros and cons of each of the metrics you listed then choose one or more that best aligns with the user and business goals. Ask yourself key questions:
- How does the metric address the question asked?
- What is the limitation of the metric?
- What is the final recommendation on which metric to use?
Do you want more data scientist content like this for interviews?
Check out datainterview.com for courses and coaching services that helped candidates land $200K+ data scientist and MLE roles at top tech companies including: Facebook, Amazon, Apple, Netflix and Google.
The flagship product, the monthly subscription course (updated every month), contains the following core features:
- Case in Point
- AB Testing Course
- Mock Interview Videos
- Question Bank
- SQL Drills
- Slack Study Group
Say hello to Dan @ datainterview.com
Here’s a demonstration of applying the GAME framework in answering a product-case question: “How would you measure success on YouTube?”
Here are additional resources that can be helpful for your prep :)