Building a Legal Intake Pipeline with GPT-4o

Every solo attorney I've talked to has the same bottleneck: the first 15 minutes after a new client call is just organizing what you were told. Who's involved, what happened, what documents exist, what's still missing. The actual legal work hasn't started yet — you're just trying to turn a paragraph of phone notes into something you can work from.

I built a two-step pipeline that does this automatically. You paste the notes in, it comes back with a structured matter summary and a list of what's still missing. The whole thing runs in under 15 seconds.

Here's how it works.

The pipeline

Two GPT-4o API calls, chained together.

Intake pipeline flow diagram

Step 1 reads the raw intake notes and extracts every identifiable fact: parties, incident date, injuries, damages, documents mentioned, deadlines, anything unclear.

Step 2 takes those extracted facts and maps them to a consistent JSON schema — client, opponent, practice area, liability assessment, special damages, SOL deadline. It also cross-checks the output against a required-fields checklist for the practice area and returns anything missing as a flag list.

That's it. The first call is broad extraction. The second call is structure and validation. Keeping them separate matters — if you try to do both in one prompt, the model rushes the extraction to get to the formatting, and you lose edge cases.

Step 1: Extract facts

The extraction prompt is intentionally open-ended. You want it to surface everything, including things you didn't think to ask for in the intake call.

Step 1 — Extract Prompt

You are a legal intake assistant. Extract every identifiable fact from the following client intake notes.

Return a structured list covering:
- All parties (client, opposing party, witnesses, employers, insurers if mentioned)
- All dates (incident, treatment start/end, any deadlines mentioned by client)
- All injuries or damages described, with any dollar amounts
- Documents the client says they have or have been given
- Any time-sensitive information (statutes of limitations, court dates, response deadlines)
- Anything that seems unclear, contradictory, or that the client was unsure about

Be thorough. Do not infer or assume facts not present in the notes.

Intake notes:
{intake_notes}

The key instruction is the last line: do not infer facts not present in the notes. Without that, the model will fill gaps with plausible-sounding assumptions, which is exactly what you don't want in a legal context.

Step 2: Structure the matter

The second prompt takes the extracted facts and maps them to a schema. It also handles the missing-info check — comparing what's present against what a PI attorney would need before drafting a demand letter.

Step 2 — Structure Prompt

Given the following extracted facts from a legal intake, produce a JSON object with these exact fields:

{
  "client": string,
  "opponent": string,
  "incident_date": "YYYY-MM-DD" or null,
  "practice_area": string,
  "liability_assessment": "Strong" | "Moderate" | "Unclear" | "Weak",
  "medical_costs": number or null,
  "lost_wages": number or null,
  "total_specials": number or null,
  "sol_deadline": "YYYY-MM-DD" or null,
  "missing_info": string[]
}

Rules:
- null if not determinable from facts
- missing_info: everything a PI attorney needs before drafting a demand letter that is absent
- liability_assessment rules: Strong if opposing party was cited, ran a light/stop sign, or witnesses confirm fault; Moderate if disputed or shared; Unclear if genuinely insufficient facts; Weak if client may be at fault
- For sol_deadline: if the state is mentioned and you know the applicable PI statute of limitations, calculate from incident_date — otherwise set null and flag in missing_info. Always flag sol_deadline in missing_info as requiring attorney verification regardless
- Do not invent numbers — only use figures explicitly stated
- Return only valid JSON, no explanation

Extracted facts:
{extracted_facts}

The schema is strict on purpose. Downstream code — whether that's writing to a database or pre-filling a matter management template — needs a predictable shape. Letting the model choose its own structure breaks everything downstream.

Try it

The demo below runs the full pipeline on a mock Torres v. Midland Trucking intake call. Hit Run Pipeline to watch it step through each stage.

▶ Live Pipeline DemoMock data — no API calls

1 Parse client notes

2 Extract key facts

3 Structure matter data

4 Flag missing info

Client Notes Input

Edit the notes and re-run — the structure of what comes back stays consistent even when the input changes. That consistency is what makes it useful: you can build on top of the output reliably.

What it catches that you might miss

The most valuable part isn't the extraction — it's the missing info flags. In a real intake call, you're also managing a person who's stressed and often telling you things out of order. It's easy to forget to ask about insurance limits, employer verification, or whether they've seen other doctors.

The pipeline doesn't forget. If a field isn't in the notes, it flags it. Over a few weeks of using it, I found three common gaps that consistently showed up:

Insurance policy limits — clients rarely know this and don't think to mention it
Prior injuries — relevant for damages disputes, almost never volunteered
Exact treatment end date — matters for calculating future medicals, easy to omit

The flag list becomes a natural follow-up call checklist. You send one email, get the missing pieces, re-run the pipeline, and you have a complete matter record before you've drafted anything.

Limitations to know about

This pipeline works well for straightforward PI and intake-heavy practice areas. It struggles with cases where the facts are legally complex in ways that require judgment — products liability with multiple defendants, cases involving comparative negligence where the client's account is one-sided, anything where the intake notes themselves are adversarial.

For those cases, the extraction is still useful, but treat the liability assessment as a starting point for your own analysis, not a conclusion.

Also: always verify the SOL independently. The pipeline flags it as missing rather than calculating it — jurisdiction-specific exceptions (discovery rule, minority tolling, government claims acts) make automated SOL calculation unreliable without attorney review.

Build it yourself

The two prompts above are the core. Chain them in whatever environment you prefer — a simple Python script, a Zapier automation from your intake form, or a direct OpenAI API integration in your practice management tool.

Run it on your own notes

Paste your intake call notes below. Free to run — 10 times per day, no account needed.

▶ Run on your own notes5 free runs/day · no account needed

Intake notes

1 Extract facts

2 Structure matter

Free · 5 runs/day

Want to talk?

I'm building NileLegal — a product that runs these pipelines automatically, without the API key setup. If you want to see it on your actual case files, have questions about the pipeline, or want to discuss a custom build, book a call.

Book a call See demo