Skip to content

Lesson 05

Capstone: a structured data extractor

Turn free-form text (emails, invoices, messages) into reliable JSON, with tests. The portfolio piece of this course.


What we are building

A function extract(text) that takes free-form text and returns a JS object with structured data. For example:

"Hi, I'm Ana García, my ID is 12345678A and I live at 12 Mayor St, Madrid.
My phone is +34 600 123 456. I want to book for May 15th at 9pm."

…should return:

{
  "name": "Ana García",
  "id": "12345678A",
  "address": "12 Mayor St, Madrid",
  "phone": "+34 600 123 456",
  "bookingDate": "2026-05-15",
  "bookingTime": "21:00"
}

Step 1 — Define the schema

Before the prompt, define the schema in code. That is what makes your system reliable.

import { z } from "zod";

const BookingSchema = z.object({
  name: z.string(),
  id: z.string().regex(/^\d{8}[A-Z]$/),
  address: z.string(),
  phone: z.string(),
  bookingDate: z.string().regex(/^\d{4}-\d{2}-\d{2}$/),
  bookingTime: z.string().regex(/^\d{2}:\d{2}$/),
});

type Booking = z.infer<typeof BookingSchema>;

If the model goes off-schema, we catch it in parse() and retry.

Step 2 — The prompt

Apply everything you have learned:

const SYSTEM = `You are a data extractor. You receive a message and ALWAYS return valid JSON with this shape:

{
  "name": string,
  "id": string (8 digits + 1 uppercase letter),
  "address": string,
  "phone": string (with international prefix),
  "bookingDate": "YYYY-MM-DD",
  "bookingTime": "HH:MM"
}

Rules:
- If a field is not in the message, return it as null.
- Do NOT include text outside the JSON.
- Do NOT use markdown blocks (no \`\`\`).`;

const EXAMPLES = `
Message: "I'm Luis Pérez (ID 87654321B). Book for June 1st at 2pm."
{ "name": "Luis Pérez", "id": "87654321B", "address": null, "phone": null, "bookingDate": "2026-06-01", "bookingTime": "14:00" }
`;

Step 3 — The function

async function extract(text: string): Promise<Booking> {
  const res = await client.messages.create({
    model: "claude-haiku-4-5",
    max_tokens: 500,
    system: SYSTEM,
    messages: [
      { role: "user", content: `${EXAMPLES}\n\nMessage: "${text}"` },
      { role: "assistant", content: "{" },
    ],
  });

  const raw = "{" + res.content[0].text;
  const json = JSON.parse(raw);
  return BookingSchema.parse(json);
}

Two key details:

  1. Assistant prefill with { — forces the model to start with JSON, no intro text.
  2. BookingSchema.parse() — if the model goes off-schema, it throws an exception you can catch and retry.

Step 4 — Tests

No tests, no production. Tests turn magic into engineering.

import { describe, it, expect } from "vitest";

describe("extract", () => {
  it("extracts all fields when present", async () => {
    const r = await extract(
      "I'm Ana García (12345678A), 12 Mayor St Madrid, +34 600 123 456. Booking 15/05 9pm."
    );
    expect(r.name).toBe("Ana García");
    expect(r.id).toBe("12345678A");
    expect(r.bookingDate).toBe("2026-05-15");
  });

  it("returns null for missing fields", async () => {
    const r = await extract("I'm Luis. I want to book on June 1st.");
    expect(r.name).toBe("Luis");
    expect(r.id).toBeNull();
  });

  it("normalizes time to HH:MM", async () => {
    const r = await extract("Book for 9 in the evening.");
    expect(r.bookingTime).toBe("21:00");
  });
});

Step 5 — Deploy

Drop your function into a serverless endpoint (Cloudflare Workers, Vercel, Deno Deploy) and you have a real extraction API.

Estimated cost: under €0.001 per extraction with claude-haiku-4-5.

What you have learned

  • Anatomy of a prompt: role, context, instruction, examples, format.
  • Zero-shot vs few-shot, and when each pays off.
  • Chain-of-thought for reasoning.
  • JSON prefill for reliable formatting.
  • Validation with zod for production.

Next step

Want more? The next course is “Build an AI chatbot”, where we ship an assistant with conversational memory and tools. See you there.


LEVEL 2

Pro challenge

Pro challenge for this lesson

Same idea, no hints, graded by automated tests.

Unlock Pro Mode · €19 One-time payment · lifetime access · no subscription
LEVEL 3

Hard Mode

Hard Mode

Extreme variant of the challenge. Time-bound with extra constraints.

Unlock Pro Mode · €19 One-time payment · lifetime access · no subscription
BOSS

Hard Mode

🏆 Boss Challenge

Build and deploy a ticket classifier to production — your instructor reviews it live

Unlock Pro Mode · €19 One-time payment · lifetime access · no subscription