#4 Proxy Metrics

How to define a metric to prove or disprove your hypotheses and measure progress

6 min readJul 12, 2019

Monthly retention was the metric we used at Netflix to evaluate overall product quality. This high-level product engagement metric improved significantly over twenty years. In the early days, about 10% of members canceled each month. In 2005, the monthly cancellation rate was around 4.5%. Today, it’s close to 2%.

Using retention as a metric for all projects isn’t feasible, however. It’s a hard metric to move, and proving a retention improvement requires large-scale A/B tests. Lower-level metrics — proxy metrics — are easier and faster to move than high-level engagement metrics. Ideally, moving a proxy will improve the high-level metric (e.g., retention for Netflix), demonstrating a correlation between the two. Later, you can prove causation via an A/B test.

Today’s “Movie Display Page” is very simple. Start playing, or add “Badlands” to your list. It’s all about the movie or TV show — the interface doesn’t stand in the way of the film.

How do you measure “simple?”

One of our hypotheses was that a more straightforward member experience would improve retention. But how do you measure “simple?” And how do you demonstrate that it improves retention?

We began by exploring customer service data. Why do members call or email Netflix with questions or complaints? What links do they click on when they visit the help pages? Where do customers get confused? Over time, we focused on new members as many potential customers at the top of the sign-up funnel provided a substantial business opportunity.

We talked to new members in one-on-one sessions and focus groups. We asked a small group of customers to write a journal describing their weekly Netflix activity. Last, we looked at existing data for the new member sign-up flow and their first few weeks with the service.

One point of confusion among new members was that our early DVD-by-mail service required customers to create an ordered list of movies we would send them. However, some new members failed to add videos to their Netflix “Queue.” Some new members chose a plan, entered their credit card information, and asked, “Now what?” Adding at least three titles to their queue confused many new members.

We needed to simplify the sign-up process and make it easier for customers to create a list of movies. Eventually, we executed a series of “day one” projects focused on eliminating steps, reducing cognitive overhead, and clarifying how the service worked.

The proxy metric we devised was “the percentage of new members who add at least three titles to their queue during their first session.” When we first looked at the data, 70% of new members added at least three titles to their queue during their first session. After a series of fast-paced experiments, we increased this percentage to 90% by the end of the year.

Over the same period, we drove month-one retention from 88% to 90% — retention and our “simple” metric moved together. However, we chose not to take the time to execute a large-scale A/B test because we were confident that the more straightforward experience improved retention.

The right proxy metric

Proxy metrics are a stand-in for your high-level engagement metric, defining your product’s overall quality. First, you seek a correlation between your high-level and proxy metrics. Later, you work to prove causation.

Here’s a simple model to define proxy metrics:

Percentage of (members/new customers/returning customers) who do at least (the minimum threshold for user action) by (X period in time).

Some examples of proxies for retention at Netflix:

The percentage of members who add at least one member to their “Friends” list within six months. The Netflix Friends feature launched with one percent of members using it, grew to 5% over three years, and then Netflix killed the feature. The assumption was that the Friends proxy metric needed to surpass twenty percent to achieve a meaningful retention improvement.
Percent of members stream at least 15 minutes of video monthly. At the launch of streaming in 2007, this metric was 5%. Today, it’s north of 90%. We chose fifteen minutes because this was the smallest value increment — the shortest TV episode was fifteen minutes. ( I’m sure Netflix measures a similar proxy today but at a variety of much higher “hurdles” — likely the percent of members who watch at least 10/20/30/40 hours a month.)
Percent of members who add at least six DVDs to their queue monthly. The merchandising team’s job was to make it easy for members to find and add movies to their lists. Initially, the metric was 70%. Over time, we moved it to 90%.
The percentage of new members who rate at least 50 movies in their first six weeks with the service. This metric was our proxy for our personalization efforts. The theory was that if customers were willing to rate movies, they valued the movie recommendations Netflix provided. Over a few years, we drove this metric from the low single digits into the high twenties.
Percent of first-choice DVDs delivered to members the next day in the mail. One of the early insights about our DVD-by-mail service was that providing the first-choice DVD the next day was critical. At first measurement, the metric was seventy percent. We drove this metric to ninety percent by setting up fifty automated DVD delivery hubs throughout the US. We also integrated the inventory data from each delivery hub with the merchandising system. We only merchandise titles available in a member’s local shipping center.

As you evaluate potential metrics, make sure the proxy:

Is measurable. You can find, collect, and measure the data. Ideally, you can assess the metric in an A/B test, and the metric helps answer the question, “Should we launch this feature or not?” In evaluating a new product strategy, ask yourself, “In an A/B test, what metric would we use to make a go/no-go decision?”
Is moveable. You can affect the metric through changes to the product experience.
It is not an average. The danger of averages is that you may move the metric by inspiring a small subset of customers to do a lot more of something. However, this may not affect enough members to improve the overall product experience.
It correlates to your high-level engagement metric. For Netflix, successful proxy metrics and retention moved together. Long-term, you hope to prove causation via a large-scale A/B test.
Specifies new vs. existing customers. As Netflix grew, we learned to focus our efforts on new members. We needed to optimize for new members to become a sizeable worldwide service. We would test features with new members and then roll them out to all members based on positive results. Existing members sometimes noticed the change and complained about it, but they rarely canceled. (Occasionally, if we believed there was a real risk of hurting retention, we ran an A/B test with existing members, too.)
It is not gameable. One product manager focused on customer service. His job was to make it easy for members to help themselves so they did not call our customer service team via our 800 number. The metric that defined his role was “contacts per 1,000 customers,” the goal was to lower this metric to below 20 contacts per 1,000 customers. But the product manager quickly discovered he could game the metric by hiding the 800 number. Consequently, we revised the proxy: “Contacts/1,000 members with the 800 number available within two clicks.”

A big surprise at Netflix: we made decisions quickly, but isolating the right proxy metric sometimes took six months. It took time to capture the data, discover if we could move the metric, and see if there was causation between the proxy and retention. Given a trade-off of speed and finding the right metric, we focused on the latter. It’s costly to have a team focused on the wrong metric.

Eventually, each of the product managers on my team could measure their performance through one or two proxy metrics that contributed to improving monthly retention.

Product Strategy Exercise (#6)

Identify your high-level engagement metric, which is equivalent to Netflix’s monthly retention. Review your work from the last essay (The Strategy/Metric/Tactic Lockup) and re-evaluate your proxy metric for each high-level strategy against “The Right Proxy Metric” outlined above.

The following essay outlines an alternative approach to defining your product strategy:

Essay #5: Working Top-down and Bottom-up

Enjoy!

Gib

Gibson Biddle

www.gibsonbiddle.com

PS. November 2024 Update: Sign up for my new 3-hour virtual “Product Strategy Workshop” on Maven. (Monthly cohorts from 9–12 am PT.)

PPS. NEW! Check out my cohort-based “Product Strategy Workshop” on Maven.

PPPS. Here’s an index of all the articles in this series:

Intro: How to Define Your Product Strategy
#1 “The DHM Model.”
#2 “From DHM to Product Strategy”
#3 “The Strategy/Metric/Tactic Lockup”
#4 “Proxy Metrics”
#5 “Working Bottom-up”
#6 “A Product Strategy for Each Swimlane”
#7 “The Product Roadmap”
#8 “The GLEe Model”
#9 “The GEM Model”
#10 “How to Run A Quarterly Product Strategy Meeting”
#11 “A Case Study: Netflix 2020”
#12 “A Startup Case Study: Chegg”
#13 “TLDR: Summary of the Product Strategy Frameworks”

#4 Proxy Metrics

How to define a metric to prove or disprove your hypotheses and measure progress

How do you measure “simple?”

The right proxy metric

Product Strategy Exercise (#6)

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Gibson Biddle

Responses (12)