Lesson 02
Conversation memory without a database
The bot has to remember what already happened in the chat. The messages array IS the memory — and where it breaks.
The big lie about LLMs
“Claude has no memory, you have to give it one.”
True. Each request is independent. The model remembers nothing between calls. “Memory” is simulated by resending the full history on every request.
The array is the memory
const history = [
{ role: "user", content: "My name is Ana." },
{ role: "assistant", content: "Nice to meet you, Ana." },
{ role: "user", content: "What's my name?" },
];
const res = await client.messages.create({
model: "claude-haiku-4-5",
max_tokens: 100,
messages: history,
});
// → "Your name is Ana."
If you don’t send the first two messages on the third call, the model will not know your name. Period.
On the front: React state
"use client";
import { useState } from "react";
type Msg = { role: "user" | "assistant"; content: string };
export default function Chat() {
const [messages, setMessages] = useState<Msg[]>([]);
const [input, setInput] = useState("");
const [loading, setLoading] = useState(false);
async function send() {
if (!input.trim()) return;
const userMsg: Msg = { role: "user", content: input };
const next = [...messages, userMsg];
setMessages(next);
setInput("");
setLoading(true);
const res = await fetch("/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ messages: next }),
});
const data = await res.json();
setMessages([...next, { role: "assistant", content: data.text }]);
setLoading(false);
}
return (
<div className="max-w-2xl mx-auto p-6">
<div className="space-y-4 mb-6">
{messages.map((m, i) => (
<div
key={i}
className={`p-3 rounded-lg ${
m.role === "user" ? "bg-blue-100 ml-12" : "bg-gray-100 mr-12"
}`}
>
<p className="text-xs text-gray-500 mb-1">{m.role}</p>
<p>{m.content}</p>
</div>
))}
{loading && <p className="text-gray-400">Typing…</p>}
</div>
<div className="flex gap-2">
<input
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyDown={(e) => e.key === "Enter" && send()}
className="flex-1 border rounded-lg px-3 py-2"
placeholder="Type..."
/>
<button onClick={send} className="bg-blue-600 text-white px-4 rounded-lg">
Send
</button>
</div>
</div>
);
}
The problem: the history grows unbounded
Each turn adds tokens. If the user reaches 50 turns, you pay for the entire conversation on every new request.
Three strategies:
1. Sliding window
const recent = history.slice(-10);
Simple, brutal — usually enough for casual chats.
2. Summary + window
When the chat exceeds N messages, ask the model to summarize the older ones, replace them with the summary, and keep the latest N.
async function compressIfNeeded(messages: Msg[]): Promise<Msg[]> {
if (messages.length < 20) return messages;
const toCompress = messages.slice(0, -10);
const summary = await client.messages.create({
model: "claude-haiku-4-5",
max_tokens: 300,
messages: [{
role: "user",
content: `Summarize this conversation in 5 sentences:\n${JSON.stringify(toCompress)}`,
}],
});
return [
{ role: "user", content: "Previous summary: " + (summary.content[0] as any).text },
...messages.slice(-10),
];
}
3. RAG (at scale)
If memory spans thousands of messages, store them in a vector store (Postgres+pgvector, Pinecone, etc.) and retrieve only what’s relevant per turn. Out of scope here, but it is the pro solution.
Up next
Tool use: we teach the bot to call real APIs when it lacks information.
Pro challenge
Pro challenge for this lesson
Same idea, no hints, graded by automated tests.
Hard Mode
Hard Mode
Extreme variant of the challenge. Time-bound with extra constraints.