Help me draft a PhD research statement (2 pages, ~1000 words) that sounds like a real researcher's, not a hopeful undergrad's.
FIELD + SUBFIELD: {e.g. CS / NLP / interpretability}
MY BACKGROUND: {undergrad + any RA experience + papers if any}
A SPECIFIC RESEARCH PROBLEM I'm interested in (not a topic, a problem): {paste}
WHY THIS PROBLEM MATTERS — not 'AI is important' — what's stuck in the field: {real_constraint_or_unsolved_question}
MY ANGLE — what I'd contribute that others aren't already trying: {your_unique_take}
2-3 PROFESSORS I'd want to work with + WHY (specific to their work): {names + papers}
METHODS I'm comfortable with: {experimental / theoretical / formal / empirical}
WHAT I'VE ALREADY DONE in this area: {coursework / RA / paper / side_project}
TARGET PROGRAMS: {list}
FORMAT:
1. **Opening paragraph** — a specific observation from a paper or a real failure mode you've noticed, NOT 'I have always been passionate about AI.'
2. **The problem** — 1-2 paragraphs naming a problem the field is trying to solve, what's been tried, why it's still open.
3. **My angle** — 1 paragraph on the specific approach you'd take and why it's different.
4. **Past work** — 1 paragraph tying your past work to this trajectory. Honest, not inflated.
5. **Fit** — 1 paragraph naming 2-3 professors by name + their work + how you'd extend or contrast.
6. **Closing** — 1 paragraph on the kind of researcher you want to become. Quiet, not grandiose.
NO 'I am excited to apply.' NO 'making the world a better place.' This is for a researcher reading 80 of these.0 copies·0 saves·1 views
264 words·1,567 chars
Sample output
Field: CS / interpretability / mech interp. Background: undergrad at UT, 2 years as an RA in Prof. Liu's lab, 1 first-author workshop paper at NeurIPS Interpretability workshop on attention head ablation in vision transformers. Problem: current interpretability tools find features in trained models, but we still can't reliably predict which features will form for a given architecture + dataset. My angle: combine training-dynamics probes with circuit-tracing. Profs: Anthropic's Chris Olah work, MIT's Anna Yablonsky work on developmental interpretability. Methods: experimental + a bit of theoretical. Past: the workshop paper + a 10K-line library for attention probing.
Statement (~1000 words):
In 2023, I helped my advisor's group ablate attention heads in a fine-tuned ViT and measured the drop in classification accuracy. The results matched the literature: most heads can be removed without much loss, a small set are critical. The follow-up question I couldn't shake was harder. We knew which heads mattered — but we couldn't say why those particular heads, and not others structurally identical to them, had ended up doing the work.
That gap between post-hoc identification and prospective prediction of features is the core of why I want to do a PhD in mechanistic interpretability.
The field has matured along one axis: we are now reasonably good at finding interpretable structures in trained models. Sparse-autoencoder feature catalogs, circuit-tracing, and probing methods have made the inside of large language and vision models meaningfully less opaque than they were three years ago. But the question of why a given training run produces these features and not others remains largely unaddressed. This matters for two reasons. First, alignment-relevant features (deception circuits, sycophancy heads, refusal directions) are exactly the ones we'd most want to predict from training conditions, not just discover after the fact. Second, the science of interpretability is uncomfortably correlational: we describe what is, not what causes it.
Parameters
Text generation parameters
FAQ
Have your statement read by one current PhD student in the field you're applying to. If they don't immediately know what specific subfield you're in by paragraph 2, the problem statement is too broad. The prompt targets Claude Opus and lives in the Writing & Content category on mycopyprompt.
This prompt is written for Claude Opus. It's a text/chat prompt — paste it into Claude Opus (or compatible LLMs like Claude or GPT-4) to get the expected output.
1. Click the Copy button on this page to copy the full prompt. 2. Open Claude Opus. 3. Paste the prompt into a new conversation. 4. Replace any {placeholders} with your specifics, then send. Most prompts produce the right output on the first try; complex ones may need 1-2 iterations.
Yes — every prompt on mycopyprompt is free forever. No paywall, no signup wall for browsing or copying. You can use it for personal or commercial work, just don't redistribute the entire mycopyprompt library.
Absolutely — most prompts are templates. Look for {placeholders} (curly braces) and swap them with your own values. You can also reword sections, add constraints, or chain it with other prompts.
See the "Sample output" panel above — that's a real example of what Claude Opus returns when this prompt runs. Your output will vary in wording but should follow the same structure and depth.
Similar prompts
Matched against every prompt in the catalog by title.
My interest is in what I'd call developmental interpretability: studying the dynamics of feature formation during training, with the goal of building predictive theories of which features form, in which order, and under what data and architectural conditions.
The approach I would take builds on three threads that are currently mostly separate. First, training-dynamics probes — measuring how internal representations evolve checkpoint-by-checkpoint, instead of inspecting only final models. Second, circuit-tracing tools, which today operate on final-state models but can be extended to track circuit emergence. Third, controlled architectural and data interventions: holding one variable fixed and observing how feature formation shifts. Combining these lets us test causal hypotheses about feature emergence, not just describe finished models.
My past work points at this direction even when I didn't have language for it. The workshop paper I led at NeurIPS 2024 ('Critical Heads in Fine-Tuned ViTs: When Does a Head Become Necessary?') found that the critical-head distribution was largely set in the first 8% of fine-tuning epochs and stable thereafter — suggesting that what we observe in final models is shaped early and could in principle be intervened on. The probing library I built for that paper (now used by three other groups in Prof. Liu's lab) is the kind of tool I'd want to develop further at the doctoral scale: instrumentation for studying training dynamics, not just trained models.
The groups I would most want to work with on this are Prof. Chris Olah's at Anthropic and Prof. Anna Yablonsky's at MIT. Olah's recent work on sparse-autoencoder features at scale has produced the cleanest feature catalogs we have, but explicitly frames the 'when do these form' question as future work. Yablonsky's developmental interpretability program is the most direct ancestor of what I'd want to do, and the open question her 2024 paper closes on — 'are feature formation trajectories continuous or are there genuine phase transitions?' — is exactly the question I'd want my first project to answer in a particular architecture class. I would also be interested in working with Prof. Singh on the theoretical side; his loss-landscape work has implications for feature emergence that the empirical interpretability community has barely engaged with.
My goal as a researcher is to do work that's useful to two audiences at once: alignment researchers who need predictive interpretability for safety, and theoretical ML researchers who care about why training works at all. Those audiences don't currently talk much. The PhD I want to do sits in the space between them — empirical enough to be testable, theoretical enough to be predictive — and I'd like to write papers that both groups feel they need to cite.
I'm not under the illusion that this is a five-year solved problem. The trajectory I'm describing would probably take a decade of careful work, of which a PhD is the first chapter. What I'd hope to leave with at the end of five years is one clean predictive result about feature formation in a constrained setting, the tooling to do that work at the next scale, and the methodological habits of someone who knows the difference between a story they like and a result that holds up.
What to revise in your version: