I inherited a messy spreadsheet. Help me clean it up systematically — not just 'remove duplicates'.
WHAT IT IS: {sales_log / contact_list / inventory / financial_records / survey_responses}
ROW COUNT: {approx}
COLUMN COUNT + WHAT THEY ARE: {paste_headers}
WHO MAINTAINED IT: {one_person_with_their_own_system / multiple_people / unknown}
KNOWN PROBLEMS (your top 5): {paste}
GOAL — what you'll do with it after: {report / dashboard / migrate_to_a_real_system / hand_off}
TIME I HAVE: {30min / 2_hours / a_full_day}
DELIVER a step-by-step cleanup plan:
1. **Triage** — what to look at first. 3 columns / patterns that hide the worst data quality issues.
2. **The 'unmix' step** — separating data types that have been crammed into one column (e.g. 'John Smith, Acme, NYC' in one cell).
3. **The 'normalize' step** — making variations of the same thing match (e.g. 'NYC' / 'New York' / 'new york').
4. **The 'deduplicate' step** — but smarter than removing exact duplicates (handles 'similar but not identical' rows).
5. **The 'audit' step** — formulas / scripts to flag remaining anomalies (impossible dates, outlier values, broken email formats).
6. **The 'lock' step** — preventing the mess from coming back (data validation, named ranges, freeze headers).
7. **A 1-line summary** of what the cleaned sheet should look like, so you can sanity-check.0 copies·0 saves·1 views
211 words·1,351 chars
Sample output
What it is: sales contact list, 4,200 rows, columns are: Name (sometimes 'firstName lastName', sometimes 'lastName, firstName'), Company, Email (some have multiple separated by ';'), Phone (some have country code, some don't), Last Contact (mixed text + dates + 'never' string), Notes (free text). Multiple people maintained it. Top issues: duplicate contacts, inconsistent company names, unparseable dates, multiple emails in one cell. Goal: migrate to HubSpot. Time: 2 hours.
1. Triage — look at these 3 columns first:
Open each as a Pivot Table grouped by themselves — instantly shows you how many variants exist.
2. Unmix step:
Name column: split using =TEXTSPLIT(A2, " ") if you know it's first-last consistently. If mixed format ('Smith, John' vs 'John Smith'), use this:
Parameters
Text generation parameters
FAQ
Always work on a COPY. The amount of irreversible damage you can do with 'remove duplicates' on the original is shocking. Copy first, then clean. The prompt targets ChatGPT (GPT-4) and lives in the Coding & Development category on mycopyprompt.
This prompt is written for ChatGPT (GPT-4). It's a text/chat prompt — paste it into ChatGPT (GPT-4) (or compatible LLMs like Claude or GPT-4) to get the expected output.
1. Click the Copy button on this page to copy the full prompt. 2. Open ChatGPT (GPT-4). 3. Paste the prompt into a new conversation. 4. Replace any {placeholders} with your specifics, then send. Most prompts produce the right output on the first try; complex ones may need 1-2 iterations.
Yes — every prompt on mycopyprompt is free forever. No paywall, no signup wall for browsing or copying. You can use it for personal or commercial work, just don't redistribute the entire mycopyprompt library.
Absolutely — most prompts are templates. Look for {placeholders} (curly braces) and swap them with your own values. You can also reword sections, add constraints, or chain it with other prompts.
See the "Sample output" panel above — that's a real example of what ChatGPT (GPT-4) returns when this prompt runs. Your output will vary in wording but should follow the same structure and depth.
Similar prompts
Matched against every prompt in the catalog by title.
=IF(ISNUMBER(SEARCH(",", A2)), TRIM(MID(A2, SEARCH(",", A2)+1, 100)) & " " & TRIM(LEFT(A2, SEARCH(",", A2)-1)), A2)
(Normalizes 'Smith, John' → 'John Smith'.)
Email column with multiple values: split into rows, not columns. Use Sheets =TEXTSPLIT(C2, ";") or 'Power Query > Split Column > By delimiter > Then > Unpivot.' Now one contact with 3 emails becomes 3 rows; deduplicate later.
Phone: pull out only digits with =REGEXREPLACE(D2, "[^0-9]", ""). If length is 10 → US format, prepend +1. If 11 starting with 1 → US. If 11+ → international, leave with original country code.
3. Normalize step:
Company names:
=LOWER(TRIM(B2))Cities / States in any free-text columns: same approach.
Last Contact dates:
=DATEVALUE(E2). Wrap in IFERROR(..., E2) so unparseable ones stay readable.4. Deduplicate (smart):
Exact-duplicate Remove Duplicates misses 'John Smith / Acme / johnsmith@acme.com' and 'John smith / acme / JOHNSMITH@ACME.COM'.
Build a match_key column: =LOWER(TRIM(A2)) & "|" & LOWER(TRIM(C2_first_email_only))
Then group / dedupe by match_key. Keep the row with the most-recent 'Last Contact'.
For near-duplicates (typos), use a fuzzy match Add-on or Python's rapidfuzz library if you can export → reimport.
5. Audit anomalies:
=E2 > TODAY() flag column.=ISNUMBER(SEARCH("@", C2)) * (LEN(C2) - LEN(SUBSTITUTE(C2, "@", ""))) = 1 (returns TRUE for exactly one '@').=NOT(ISNUMBER(SEARCH(".", MID(C2, SEARCH("@", C2), 100))))For each anomaly, color the row a different color in conditional formatting. Now visual scan = audit.
6. Lock step (so it doesn't come back):
_changelog where future editors record what they changed and when.7. One-line summary of clean state: 'Each row is one person, one company, one valid email, one normalized phone, one valid Last Contact date or a 'never' flag. No duplicates. Future edits flagged and visible.'
If your cleaned sheet doesn't match this sentence, you're not done.