Review my A/B test design before I run it. Catch the mistakes that show up post-launch.
HYPOTHESIS in 1 sentence: {if_X_then_Y_because_Z}
CONTROL (current experience): {what_users_see_today}
VARIANT (proposed change): {what_you're_testing}
PRIMARY METRIC: {one metric — exact name + how it's calculated}
SECONDARY METRICS to monitor: {2-4 metrics}
GUARDRAIL METRICS (must not get worse): {2-3 metrics — e.g. revenue, complaint rate, conversion downstream}
WHO IS IN THE TEST: {user_segment + traffic_volume + how_many_days_to_significance}
MINIMUM DETECTABLE EFFECT (MDE): {percent + power + significance level}
DURATION: {planned_days}
TREATMENT ASSIGNMENT: {how_users_get_bucketed}
KNOWN CONFOUNDS: {seasonality, marketing campaigns, releases overlapping}
FOR EACH OF THESE, CALL OUT WHETHER MY DESIGN IS SOLID OR HAS A FLAW:
1. **Hypothesis quality** — is it falsifiable, specific, mechanism-stated?
2. **Metric choice** — is the primary metric the right one? (Common bug: clicking 'CTR' instead of downstream conversion.)
3. **Sample size + power** — calculate if my MDE is realistic for my traffic + duration.
4. **Assignment unit** — should it be user, session, account? (Common bug: session-level when behavior is user-level.)
5. **Randomization** — any selection bias I might miss?
6. **Novelty / primacy effects** — duration handle these or not?
7. **Guardrails** — are mine well chosen?
8. **What's the analysis plan** before launch — what do I do if primary moves but a guardrail breaks?
9. **The 'why didn't this work' hypothesis** — what's the null outcome's most likely explanation?
End with a single recommendation: ship-as-is, modify, or kill.0 copies·0 saves·5 views
240 words·1,662 chars
Sample output
Hypothesis: If we move the 'Add to Cart' button above the fold on PDP, then conversion-to-purchase increases, because users on mobile currently scroll past the button. Control: button below product description (~600px down on mobile). Variant: sticky 'Add to Cart' at top of viewport. Primary metric: PDP-to-purchase conversion. Secondary: time-on-page, scroll depth, search exit rate. Guardrails: revenue per visitor (RPV), refund rate (30-day). Users: US mobile traffic only, 50/50 split. MDE: 3% relative lift. Duration: 14 days. Assignment: user-level, cookie-based. Confounds: Black Friday is 9 days away.
1. Hypothesis quality: GOOD with one flag. Falsifiable ✓, specific ✓, mechanism stated ✓. The hidden assumption is 'users don't see the button → they don't add to cart.' Worth confirming with heatmap data first. If the button is visible but users still scroll past for other reasons (price, reviews, comparing), the variant won't help.
2. Metric choice: GOOD but check counting. PDP-to-purchase is the right outcome (not 'add-to-cart-rate' alone — that's the easy trap). Verify your conversion is measured per-USER, not per-PDP-view. A user who views the same PDP 4 times pre-purchase shouldn't count 4 conversions.
3. Sample size + power: REVISIT. With 3% MDE, 80% power, 95% significance, on a base conversion of ~3% (typical PDP→purchase), you need roughly 50,000 users per arm. If your US mobile traffic is 5,000 PDP-viewers/day per arm, 14 days is too short (you'd need ~10 days just for sample, and 14 days assumes zero hiccups).
Parameters
Text generation parameters
FAQ
Write down what 'success' means BEFORE you launch. Not 'p < 0.05' — the actual business-level decision rule. 'If primary lifts by ≥X% with no guardrail break, we ship.' Pre-committed rules survive the temptation to peek. The prompt targets ChatGPT (GPT-4) and lives in the Analysis & Research category on mycopyprompt.
This prompt is written for ChatGPT (GPT-4). It's a text/chat prompt — paste it into ChatGPT (GPT-4) (or compatible LLMs like Claude or GPT-4) to get the expected output.
1. Click the Copy button on this page to copy the full prompt. 2. Open ChatGPT (GPT-4). 3. Paste the prompt into a new conversation. 4. Replace any {placeholders} with your specifics, then send. Most prompts produce the right output on the first try; complex ones may need 1-2 iterations.
Yes — every prompt on mycopyprompt is free forever. No paywall, no signup wall for browsing or copying. You can use it for personal or commercial work, just don't redistribute the entire mycopyprompt library.
Absolutely — most prompts are templates. Look for {placeholders} (curly braces) and swap them with your own values. You can also reword sections, add constraints, or chain it with other prompts.
See the "Sample output" panel above — that's a real example of what ChatGPT (GPT-4) returns when this prompt runs. Your output will vary in wording but should follow the same structure and depth.
Similar prompts
Matched against every prompt in the catalog by title.
4. Assignment unit: USER-LEVEL is correct here. Good. Session-level would have leaked users between arms across sessions.
5. Randomization: FLAG. 'US mobile traffic only' — confirm randomization is independent of the segment-filtering. If you filter US-mobile after bucketing, you're fine. If you bucket only inside US-mobile, also fine. But if your bucketing function uses a hash of device characteristics and somehow correlates with iOS vs Android (a real bug we've seen), the arms become skewed by OS. Run a Sample Ratio Mismatch (SRM) check after launch: expected 50/50, anything off by more than 0.5% is suspicious.
6. Novelty / primacy: MEDIUM RISK. 14 days is short to dissipate novelty effects on a UI change. Power users may engage more with anything new in week 1, hiding the effect. Plan to ignore the first 3 days of results and analyze days 4-21.
7. Guardrails: GOOD but one missing. RPV ✓ catches 'add-to-cart inflation without revenue lift' (the classic 'we got more clicks but smaller baskets'). Refund rate ✓ catches accidental purchases. Missing: 30-day repurchase rate. If sticky button leads to impulse buys, 30-day repurchase / churn may worsen even if first-touch revenue is up.
8. Analysis plan if primary moves but guardrail breaks: Don't just look at p-values. Ask: how big is the guardrail movement relative to the primary win? Win 3% conversion, lose 1% RPV → conversion gains were on lower-AOV products → ship-with-modification (maybe restrict sticky button to high-AOV PDPs). Win 3% conversion, lose 8% RPV → kill.
9. The 'why didn't this work' hypothesis: Most likely null explanation: users who don't see the button now ALSO don't see other items they'd ultimately convert on. Moving the button up makes the first PDP convert better but cannibalizes subsequent PDPs. To check this, segment by sessions with 1 PDP view vs sessions with 3+ PDP views.
10. Confounds — BLACK FRIDAY: This is your biggest issue. Don't run an A/B test in the 14 days before Black Friday. Conversion baselines are abnormal; user mix shifts; results won't generalize.
Recommendation: MODIFY before shipping.