Navigating the Data Landscape: Quality Over Quantity

Aug 28

When I first started my career as a Product Manager, my boss said to me, “What action can I take with this data?” I asked, “What do you mean?” He replied, “If I can’t take an action that gets us closer to our target goals, then this data is a vanity metric, meaning it has no purpose. Don’t show me data just for the sake of showing data; show me something that I can act on.”

This really resonated with me because it solidified why we use data and highlighted the difference between data and insights. Insights drive you towards confidence in a goal, whereas data can simply be numbers organized in a chart with no intrinsic meaning. I have witnessed many instances where data was used incorrectly or was structured in a way that obscured the insights. It's easy to fall into this trap, so it’s essential to identify and correct such issues before making decisions. Here are the top culprits that can help you catch these pitfalls before it’s too late:

Goals and objectives

Clearly outlining your goals and objectives is crucial because it informs you about what you are measuring, and, more importantly, what you are NOT measuring. In a startup or a small team setting, there often isn't enough time to analyze everything, and data might not be available everywhere. Hence, tough decisions on what metrics to track are inevitable. It's vital to ensure that your goals are measurable and can be assessed within the project's timeframe.

Good goals and objectives are typically SMART: Specific, Measurable, Achievable, Relevant, and Time-bound. Here are examples across various contexts:

Business & Product Management

Goal: Increase our market share in the smartwatch industry.

Objective 1: Launch a new smartwatch model with health tracking features by Q2 2023.
Objective 2: Increase advertising spend on social media platforms targeting health enthusiasts by 15% in the next 6 months.
Objective 3: Partner with three major fitness influencers for product promotions by end of Q3 2023.

Goal: Enhance user engagement on our mobile application.

Objective 1: Introduce two new interactive features based on user feedback by the end of Q1 2023.
Objective 2: Decrease app loading time by 30% in the next 8 months.
Objective 3: Increase monthly active users by 20% over the next year.

Marketing

Goal: Improve brand visibility online.

Objective 1: Increase monthly website traffic by 25% over the next year.
Objective 2: Boost social media followers by 15% in the next 6 months.
Objective 3: Launch a content marketing campaign with bi-weekly blog posts related to industry trends.

Goal: Increase sales through email marketing.

Objective 1: Achieve a 20% open rate for monthly newsletters by Q3 2023.
Objective 2: Improve click-through rate by 5% in the next 4 months.
Objective 3: Segregate the email list into three primary customer personas and tailor content for each by the end of the next quarter.

Hypothesis testing

Every feature rollout should be accompanied by a hypothesis. This keeps your efforts accountable as you can validate or debunk your assumptions. The expected outcome must be quantifiable, and the data you gather should provide detailed insights for further experimentation.

Use the following structure: IF [specific change is made], THEN [expected outcome], BECAUSE [rationale based on data/research/observation].

Example Scenarios & Hypotheses using the Framework:

Feature Improvement Hypothesis:
Context: Users are dropping off at the checkout page possibly due to its complex design.
Hypothesis: IF we simplify the checkout process by reducing the number of steps from 5 to 3, THEN we will see a 15% increase in successful checkouts BECAUSE a streamlined process reduces friction for users.
New Feature Implementation Hypothesis:
Context: Gamers in a multiplayer game have requested a way to communicate with their team members.
Hypothesis: IF we introduce a voice chat feature within the game, THEN we will observe a 10% increase in game session durations BECAUSE players will be more engaged and coordinated with their teams.
Performance Optimization Hypothesis:
Context: Slow loading time for a mobile app's homepage.
Hypothesis: IF we reduce the homepage load time by optimizing image sizes and employing caching, THEN we'll see a 20% reduction in user bounce rates BECAUSE users tend to leave apps that don’t load quickly.
Engagement Hypothesis:
Context: Users of a social media app are not engaging with the newly introduced "stories" feature.
Hypothesis: IF we add a tutorial highlighting the benefits and usage of the "stories" feature on the first app launch, THEN there will be a 25% increase in stories creation within the first week of use BECAUSE users are more likely to engage with features they understand.

Cohort-ing and data segmentation

Data analysis should begin by listing the questions you aim to answer before diving into cohorting and segmentation. Neglecting this step could lead to confirmation bias or false correlations where correlation does not infer causation. For instance, we once observed that users who connected their accounts via Facebook had higher engagement than those who didn’t. But was the act of connection the engagement driver, or were users who connected already more invested in the game? Solely relying on one metric failed to provide a clear answer. To guide your segmentation, here are some effective and not-so-effective cohorts I've employed:

Good Cohorts:

Sign-Up Date: Group users by the date or week they signed up. This cohort helps understand how changes or updates to onboarding impact user retention or engagement over time.
Feature Usage: Group users based on their interaction with a specific feature. This cohort can help determine if a feature increases long-term user engagement or if users lose interest after initial use.
Acquisition Channel: Group users based on where they came from (e.g., Facebook ads, Google search, email campaigns). This helps in understanding which channels produce the most engaged or valuable users.
Pricing Plans: For products with multiple pricing tiers, group users based on the plan they've chosen. This can help in understanding usage patterns and value perception among different pricing tiers.
Event-Based: Group users who participated in a special event or promotion. This helps in measuring the long-term impact of such events on user behavior.

Bad Cohorts:

Randomly Assigned: Grouping users randomly doesn't provide any meaningful insights as there's no shared characteristic or experience among the users in the cohort.
Too Broad: Grouping users as "All Mobile Users" or "All Desktop Users" without further segmentation might be too general to provide actionable insights, especially if you have a diverse user base with different behaviors on each platform.
Time Zone without Context: Grouping users solely by time zone can be pointless unless combined with other factors, like usage patterns at specific hours which might be crucial for a real-time multiplayer game.
Non-Actionable Traits: Grouping users based on attributes that you can't take action upon, like "Users who like blue," unless blue has a direct relation to a product feature or offering.
Overlapping Cohorts: Grouping users in a way that they fit into multiple cohorts for the same analysis can dilute insights. For example, if you're looking at the effect of a new tutorial, don't group by both "saw tutorial" and "signed up in the last week" if the tutorial was only shown to those who signed up in the last week.

Sample size and statistical significance

Sample size and statistical significance are recurring topics for me, especially in startups or smaller companies with limited user bases. This circles back to the importance of hypothesis testing. Startups should prioritize impactful hypotheses since their outcomes can significantly direct the company's path. While larger companies might face optimization challenges, their expansive datasets and sample sizes generally ease these concerns. Regardless of company size, product managers should focus on identifying major opportunities, rather than getting bogged down in minutiae. Test ambitious objectives at any scale. Diverging slightly, here’s a brief guide on statistical significance:

Statistical Significance:

This is a way to measure if an outcome (like positive feedback on your new feature) is likely due to genuine reasons (the feature is genuinely good) or just happened by chance.

Think of it as flipping a coin. If you get heads 6 times out of 10, you might wonder if the coin is biased. But if you get heads 6,000 times out of 10,000, you'd be pretty sure something's up.

In product terms, if 6 out of 10 users love your new feature, that's promising but not conclusive. If 6,000 out of 10,000 love it, you're on to something.

Why it matters: As a product manager, you make decisions based on data. Statistical significance helps ensure you're making decisions based on real effects, not random chance. Before you roll out a feature to everyone, you want to be sure the positive feedback you've received isn't just a fluke.

There’s a few calculators out there that are free or you could make your own with a few different formulas. There are also some tools that have it built in like Optimizely.

Data quality and data definition

Data quality refers to the precision of the ingested data. Low-quality data might include internal user interactions, duplicate or erroneous entries, or issues arising from system integrations. Multiple problems can diminish data accuracy, from potential pipeline fragmentation when systems interact to regional privacy regulations affecting data collection. Recognizing and annotating these discrepancies is crucial.

Data definition, on the other hand, centers on the calculations that inform reports, date ranges used in analyses, and data-triggering events. Ambiguous definitions can yield misleading results. Clear metric definitions assist analysts in generating accurate reports. This process also aligns with the questions you seek answers for, guiding analysts on the type of data to retrieve since there are numerous ways to approach this.

Jon martin