What I learned from Anthropic's "Building with the Claude API" course
I decided to take this course in order to get a high-level understanding of all the features that Claude had to offer, and a better understanding of how these services worked under the hood, as well as patterns and best practices when interacting with them over an API.
I figured a lot of this would be transferrable to other popular LLM services like OpenAI and Gemini, but since I'm leveraging Claude more in my day-to-day work (especially Claude code) right now, this seemed like the right course to focus on.
I thought I'd share some of the more interesting and useful things I learned from this course, and the topics that weren't covered, but that I'm excited to explore further.
How and what multi-turn messages are
I didn't realize that multi-turn messages (or "back-and-forth" messaging the way you do in normal conversation) required the sending of the entire message history up to that point for the AI to have context.
I had assumed that because I needed to create an account with Anthropic, Claude would track message history on its end via a conversation ID, loading it from disk or cache whenever new messages were sent—regardless of whether I was using the web interface or API.
Given this behaviour, I now know that storing message history is something that I'd need to consider in the design of AI-powered applications I build if it involves user input and the expected behaviour is like that of a conversation (i.e. chatbot).
Citations
I became a big fan of Perplexity primarily because it provides citations for its answers out-of-the-box, and I'm glad to see that Claude makes it easy to enable citations in its responses when making requests over the API (docs here for those interested).
While I haven't had reason to use it yet, there's at least one side project I'd like to build in the coming weeks where citations will need to be leveraged frequently, and I'm keen to see how well this works in practice.
Automated testing on effectiveness of prompts
This course went into a bit of detail on a basic "prompt evaluation" - a pipeline testing the effectiveness of a prompt.
For those unfamiliar, the short explanation of these pipelines is that they are comprised of:
- a dataset to test your model with (this can be AI-generated with something like Haiku); and
- a "grader" (which can be one of 'code', 'AI model', or 'human').
In practice, there's likely going to be a hybrid of "graders" that you use depending on the application functionality being tested.
That being said, one of my open questions here after learning about this topic is around the cost of running a pipeline like this.
I'm envisioning something very similar to our continuous deployment pipelines today with dozens, hundreds, or even thousands of test runs a day. Even if it's only a small subset of tests involving an AI model grader, and that model is a lightweight one like Claude's Haiku, OpenAI's GPT-5 nano, or Gemini's 2.5 Flash-Lite, I can see this adding up to quite the chunk of change to run.
Maybe, besides compute costs, this is one of the reasons AI applications charge so much? 😅
For those interested in exploring more in this space, there's a Changelog episode that I'd listened to a few weeks ago that may be of interest to you:

Redacted thinking
A safety feature in Claude when leveraging its "extended thinking", if Claude's internal reasoning for a given message is flagged by Anthropic's safety systems, a message block with the type redacted_thinking is returned:
{
"content": [
{
"type": "thinking",
"thinking": "Let me analyze this step by step...",
"signature": "WaUjzkypQ2mUEVM36O2TxuC06KN8xyfbJwyem2dw3URve/op91XWHOEBLLqIOMfFG/UvLEczmEsUjavL...."
},
{
"type": "redacted_thinking",
"data": "EmwKAhgBEgy3va3pzix/LafPsn4aDFIT2Xlxh0L5L8rLVyIwxtE3rAFBa8cr3qpPkNRj2YfWXGmKDxH4mPnZ5sQ7vB9URj2pLmN3kF8/dW5hR7xJ0aP1oLs9yTcMnKVf2wRpEGjH9XZaBt4UvDcPrQ..."
},
{
"type": "text",
"text": "Based on my analysis..."
}
]
}Code example snippet taken from https://platform.claude.com/docs/en/build-with-claude/extended-thinking#thinking-redaction
I thought this was a cool approach to allow Claude to keep its context in multi-turn conversations while not leaking unsafe or sensitive information out to users.
I'm also curious if this is a design pattern that other companies who offer AI-as-a-service that aren't the big LLM providers (i.e. Google, OpenAI, etc.) also adopt, where they have their own data that they don't want to leak out that differs from what's redacted by Anthropic, OpenAI, etc. and they hide it from users in this way.
Use of XML tags to more clearly delineate different sections
I had intuitively been delineating sections when interacting with Claude Code by writing code blocks with backticks (as if I was writing in Github, a Markdown file, or Slack) or write lines like "BEGIN EXAMPLE"/"END EXAMPLE" at the start and end of blocks of text representing examples that I wanted the AI to consider.
But now that it's been pointed out to me in this course that XML tags can be used to achieve the same thing between examples and types of data within a prompt, XML makes a lot of sense. It also reads much nicer than my previously somewhat ad-hoc approach.
Claude Code parallelization using git worktrees
Git worktrees were always a cool feature to me because it allows one to have multiple git branches in progress at the same time in your local environment.
But seeing it paired with Claude Code is wild in the best way. This looks like it's going to be a ridiculous effort multiplier because it allows multiple tasks to run in parallel without affecting your main feature branch.
Despite how powerful this can be, I'm likely not going to leverage this approach for something other than "minor" tasks at the moment. There are a lot of benefits to be had with not delegating thinking about a complex problem deeply and exploring the codebase yourself to see what solutions are possible and their associated effort, not to mention that we're all responsible for the code we ship at the end of the day, AI-generated or not.
What I'd likely use this approach on, to give a few examples, are:
- bug fixes that are well understood by me;
- writing documentation or initial test cases;
- trying to understand more about the codebase and using Claude Code to ask questions/learn more about the structure of the project for a bug fix or feature; and
- getting Claude Code to review code changes I've made and suggest improvements based on my guidelines for what I want it to check and best practices within the project that are outlined in the project's documentation.
Topics I'm excited to explore further
Agents
Although the course covered agents at a high level, it didn't go into detail about what test evaluation frameworks could look like, and how to leverage agents in a parallelizable way effectively.
I was hoping that the course would've covered this, but I suspect they could've dedicated an entire course to agents because there's so much that appears to be here.
In the meantime, there's a Github repository for "Claude Cookbooks" that appears to include some interesting learnings on this topic, so that'll be my next stop.
Vector databases
There was an introduction to vector databases in the context of talking about RAG (retrieval augmented generation), and I'm keen to explore this a bit further.
Some of the things I'm looking forward to learning more about are:
- the process for generating embeddings prior to storing them in a vector database; and
- whether folks re-generate embeddings every so often as embedding models improve, and what the migration process looks like right now for that.
Like what you've read?
Subscribe to receive the latest updates in your inbox.
