What is the "A/B Test Design Checklist Before You Hit Run" prompt?

Write down what 'success' means BEFORE you launch. Not 'p < 0.05' — the actual business-level decision rule. 'If primary lifts by ≥X% with no guardrail break, we ship.' Pre-committed rules survive the temptation to peek. The prompt targets ChatGPT (GPT-4) and lives in the Analysis & Research category on mycopyprompt.

What AI model is this prompt for?

This prompt is written for ChatGPT (GPT-4). It's a text/chat prompt — paste it into ChatGPT (GPT-4) (or compatible LLMs like Claude or GPT-4) to get the expected output.

How do I use this prompt?

1. Click the Copy button on this page to copy the full prompt. 2. Open ChatGPT (GPT-4). 3. Paste the prompt into a new conversation. 4. Replace any {placeholders} with your specifics, then send. Most prompts produce the right output on the first try; complex ones may need 1-2 iterations.

Is this prompt free to use?

Yes — every prompt on mycopyprompt is free forever. No paywall, no signup wall for browsing or copying. You can use it for personal or commercial work, just don't redistribute the entire mycopyprompt library.

Can I modify the prompt?

Absolutely — most prompts are templates. Look for {placeholders} (curly braces) and swap them with your own values. You can also reword sections, add constraints, or chain it with other prompts.

What kind of output does this produce?

See the "Sample output" panel above — that's a real example of what ChatGPT (GPT-4) returns when this prompt runs. Your output will vary in wording but should follow the same structure and depth.

A/B Test Design Checklist Before You Hit Run — ChatGPT (GPT-4) prompt

The Prompt1,662 chars

Review my A/B test design before I run it. Catch the mistakes that show up post-launch.

HYPOTHESIS in 1 sentence: {if_X_then_Y_because_Z}
CONTROL (current experience): {what_users_see_today}
VARIANT (proposed change): {what_you're_testing}
PRIMARY METRIC: {one metric — exact name + how it's calculated}
SECONDARY METRICS to monitor: {2-4 metrics}
GUARDRAIL METRICS (must not get worse): {2-3 metrics — e.g. revenue, complaint rate, conversion downstream}
WHO IS IN THE TEST: {user_segment + traffic_volume + how_many_days_to_significance}
MINIMUM DETECTABLE EFFECT (MDE): {percent + power + significance level}
DURATION: {planned_days}
TREATMENT ASSIGNMENT: {how_users_get_bucketed}
KNOWN CONFOUNDS: {seasonality, marketing campaigns, releases overlapping}

FOR EACH OF THESE, CALL OUT WHETHER MY DESIGN IS SOLID OR HAS A FLAW:
1. **Hypothesis quality** — is it falsifiable, specific, mechanism-stated?
2. **Metric choice** — is the primary metric the right one? (Common bug: clicking 'CTR' instead of downstream conversion.)
3. **Sample size + power** — calculate if my MDE is realistic for my traffic + duration.
4. **Assignment unit** — should it be user, session, account? (Common bug: session-level when behavior is user-level.)
5. **Randomization** — any selection bias I might miss?
6. **Novelty / primacy effects** — duration handle these or not?
7. **Guardrails** — are mine well chosen?
8. **What's the analysis plan** before launch — what do I do if primary moves but a guardrail breaks?
9. **The 'why didn't this work' hypothesis** — what's the null outcome's most likely explanation?

End with a single recommendation: ship-as-is, modify, or kill.

A/B Test Design Checklist Before You Hit Run

0 copies·0 saves·16 views

Free foreverCopy-paste readyNo signup

240 words·1,662 chars

Sample output

Hypothesis: If we move the 'Add to Cart' button above the fold on PDP, then conversion-to-purchase increases, because users on mobile currently scroll past the button. Control: button below product description (~600px down on mobile). Variant: sticky 'Add to Cart' at top of viewport. Primary metric: PDP-to-purchase conversion. Secondary: time-on-page, scroll depth, search exit rate. Guardrails: revenue per visitor (RPV), refund rate (30-day). Users: US mobile traffic only, 50/50 split. MDE: 3% relative lift. Duration: 14 days. Assignment: user-level, cookie-based. Confounds: Black Friday is 9 days away.

1. Hypothesis quality: GOOD with one flag. Falsifiable ✓, specific ✓, mechanism stated ✓. The hidden assumption is 'users don't see the button → they don't add to cart.' Worth confirming with heatmap data first. If the button is visible but users still scroll past for other reasons (price, reviews, comparing), the variant won't help.

2. Metric choice: GOOD but check counting. PDP-to-purchase is the right outcome (not 'add-to-cart-rate' alone — that's the easy trap). Verify your conversion is measured per-USER, not per-PDP-view. A user who views the same PDP 4 times pre-purchase shouldn't count 4 conversions.

3. Sample size + power: REVISIT. With 3% MDE, 80% power, 95% significance, on a base conversion of ~3% (typical PDP→purchase), you need roughly 50,000 users per arm. If your US mobile traffic is 5,000 PDP-viewers/day per arm, 14 days is too short (you'd need ~10 days just for sample, and 14 days assumes zero hiccups).

Parameters

Max Tokens: 3000
Temperature: 0.4

Text generation parameters

Helpful?

A/B Test Design Checklist Before You Hit Run

Common questions

You might also like

Test Strategy for a New Project

Validate Your Project Idea

Pick the Right North Star Metric

Code Review by a Senior Engineer

Product Onboarding Email Series (7 Days)

Headline A/B Test Variants