Case Study

How The Zebra Uses Sumatra To Create Personalized Shopping Experiences

The Zebra Logo
Austin, TX
Market Cap

The Zebra is the nation's leading insurance comparison site for car insurance, homeowner's insurance, and more. Its online quote comparison tool helps consumers find policies with the coverage, service level, and pricing to suit their unique needs.

In April 2021, The Zebra raised a $150M Series D round, with expansion plans to, among other things, “create personalized experiences, including results informed by machine learning.”

Personalizing the experience and education is critical. That's where machine learning and artificial intelligence comes in. You can't build one product to serve everyone's needs.

Meetesh Karia, CTO @ The Zebra

Executive Summary

With Sumatra, The Zebra's data science team gained real-time access to customer behavior data that was previously only actionable offline, from their data warehouse. By building their ML features and analytics with Sumatra, they were able to deliver more responsive applications. As a result, their customers received online shopping and support experiences better tailored to their immediate needs.




Cost Calculator

Want to improve ML model accuracy by adding features that automatically refresh
Migrated features from Snowflake to Sumatra for real-time updates
Reduced feature staleness from >24 hours to milliseconds




Personalization as a Service

Want to deliver internal personalization platform to Product teams
Used Sumatra to feed real-time engagement signals to the online ML models
Onboarded two successful use cases that adapt in real time




360 Customer View

Want to reduce support agent wait time when accessing recent customer history
Migrated agent portal queries from Snowflake to Sumatra
Reduced application latency from 15 seconds to 0.5 seconds

The Problem

Prior to working with Sumatra, The Zebra's engineering team had instrumented the site to collect interactions like page views, form submissions, and clicks. This event data was being journaled to their Snowflake data warehouse for offline analysis.

A Data Warehouse isn't enough

The Zebra's data science team was deriving value from batch analytics in Snowflake and had recognized opportunities to operationalize their analytics by making behavioral features available in real time.

The key challenge the team needed to solve was how to join and aggregate event data on-the-fly to make fresh features available immediately for low-latency inference. Further, they needed an agile and accessible solution that could support ongoing development of new features by The Zebra's data scientists.

Build or buy?

When considering the build-vs-buy decision, the team recognized that building a platform in-house would come with higher time-to-value and would force the lean team to sacrifice work with more direct business value.

Evaluating the feature stores available on the market, they found that they required a heavy lift to deploy and operate. These solutions would not only require the engineering team to deploy, size, and manage new, unfamiliar infrastructure like Spark and Flink, it would also introduce significant overhead to The Zebra's IT team.

The Solution

When the data team was introduced to Sumatra, it became obvious that they had found a better way.

A Streaming Semantic Layer

Directing their Kinesis event stream at Sumatra, the engineers were able to start processing website events in real time. The data transformation and business logic that previously incurred the delay of going through the data warehouse, could now be carried out instantly, making fresh semantic signals available to downstream applications, such as analytics, customer service portals, and of course, machine learning models.

Car Insurance Calculator
Sumatra transforms raw behavioral events into meaningful signals, as they occur

Declarative Feature Engineering

In Sumatra, The Zebra's data scientists could now self-sufficiently declare new stateful features for their models. With a few lines of code, they could compute everything from engagement signals like click counts per session to target encodings like average premium rates by zip code over the last X days.

Using Sumatra's declarative feature language, Scowl, the data scientists could join events together in real time, transform data with common math and logic operations, and compute windowed aggregates to generate new complex features—all without thinking about pipelines, jobs, or infrastructure.

Scowl Code
Calculating a continuous 90-day running average in Scowl

For model-training and exploratory data analysis, the data scientists no longer had to maintain separate offline feature definitions in SQL. Instead, they could now use Sumatra's Python SDK to backfill all features historically and pull the result into their Python-based data science tools. By defining their features only once, they could avoid the up-front cost of reimplementation and the ongoing cost of validating online-offline consistency.

Small Infrastructure Footprint

After a successful pilot integration with Sumatra's hosted SaaS, The Zebra migrated to a private cloud deployment to meet their security and data-privacy requirements. To deploy or upgrade the platform, an engineer runs a Sumatra-provided Terraform script to configure the necessary serverless components (e.g. Lambda, Kinesis, and DynamoDB) and install upgraded platform binaries. No EC2 instances or Spark clusters to manage.

Since launch, the team has deployed about a dozen platform upgrades with no downtime and nearly zero operational overhead.

The Results

In the past few months, The Zebra's data science team has launched several key personalization initiatives built on top of the Sumatra platform.

The platform filled a critical technology gap, helping the data science team to deliver on several key ML initiatives. The Sumatra partnership is a core part of The Zebra's ongoing AI/ML strategy.

Claire Look, VP of Data @ The Zebra

Accurate Cost Estimates

The Zebra provides an ML-powered Car Insurance Calculator on its website to help prospective buyers quickly estimate their premiums, based on a few key factors like driver age and vehicle year. Some of the most valuable model features are target encodings— rate averages over the past X days, broken down by zip code, city, state, etc.

Car Insurance Calculator
An ML model uses Sumatra-generated features to estimate price range based on latest quotes from carriers

Prior to Sumatra, these features were computed as data warehouse queries, run at model-training time, then embedded into the model artifact itself. This led to large model artifacts, stale features, and frequent retraining requirements.

Using the Scowl language, the data scientists were able to define and test the first version of their features in about 20 minutes. Sumatra's developer tools made it easy to try many variants of the features, with different smoothing parameters, and to iterate on those features quickly.

Moving the target encodings to Sumatra, where the features are computed on-the-fly from the most recent data, provided multiple benefits. The model became smaller and simpler. More importantly, the model could now produce accurate estimates without the need for frequent retraining and deployment.

Personalization as a Service

A key part of The Zebra's personalization strategy is to enable internal Product teams to quickly launch and analyze live experiments. To this end, the team built a service that would allow different versions of website components to be presented to different subpopulations. To overcome shortcomings of traditional A/B testing, the team employed the Contextual Bandits algorithm, which adapts online, based on real-time engagement feedback.

Contextual Bandits
The Contextual Bandits algorithm adapts which users see which website variants based on real-time click and impression signals

The Zebra leveraged the Sumatra platform to compute these engagement signals in real time, based on the clickstream behavior of a user. Further, they used Sumatra to attribute those signals back to the original experimental group assignments in order to provide the algorithm with the necessary training feedback. Finally, using Sumatra's historical replay capabilities, daily batch retraining of the Contextual Bandits models can be supported as well.

As a result, the team has been able to build a platform that truly adapts in real time. So far, they have onboarded two impactful use cases with more in the works.

360 Customer View

Finally, delivering extraordinary customer service is central to The Zebra's brand. Support agents are better able to serve callers when they have fresh context about what quotes the customer has recently seen and how they have engaged with the site.

Prior to Sumatra, the agents' case management tool assembled the user's site-navigation context by querying offline event data in Snowflake. These queries took upwards of 15 seconds to execute, leaving the agent with a long wait before they could start engaging.

The data team was able to quickly stand up a service to fetch the features they needed from Sumatra's GraphQL API and make fresh behavior data instantly available in the support agent tools. Query time was reduced from 15 seconds to 0.5 seconds. Further, their infrastructure footprint was reduced by shutting down cloud components that were no longer necessary.

What's Next

With Sumatra, The Zebra's data team was empowered to self-sufficiently deploy fast, fresh, stateful services to serve ML and analytics use cases across the organization.

The Zebra has big plans for continuing to enhance the personalization of customer experience, partnering with Sumatra on those initiatives.