Give Your AI Coding Assistant Something to Work With: A Case Study
Over my winter break, I'd been noodling over a new side project idea related to Ontario politics.
I used to enjoy following what goes on, but these days looking at news articles and reading the email campaigns from my local politicians feels a bit click-bait-y and stress-inducing.
I wanted a tool that:
- allowed me to stay informed with my provincial politics by letting me know what bills were moving through Parliament when it meets and;
- let me know what the bill is about, and its implications and impacts.
Which brings me to my new side project - a tool that gives a quick summary of a bill and its impacts and implications via a weekly newsletter that I'm calling Trillium!
I went into this project with a few learning goals as well:
- finding the limits of the more sophisticated AI models when it comes to building software - how much structure do they need in order to generate as close to "production-ready" code as possible from a prompt?;
- knowing where the optimal "sweet spot" is of leaning on AI without compromising my understanding of the code base and long-term memory of how everything behaves within the system; and
- leveraging AI in a different context from just a coding assistant.
- I want to eventually try and build a pipeline around evaluating the results of an AI-generated summary.
I recently completed the portion of the project that determines if any new bills have been introduced or have updates, and while I haven't reached the point where I'm building the summarizer that requires the evaluation pipeline, I've come away with some learnings about the first two points.
The biggest learning: a pre-existing, solid foundation of good architecture and development patterns gives you significantly better results from AI coding assistants than prompting with a blank project.
That is, having good initial data models, services, project structure etc. to start with can lead to your AI coding assistants having much better results than if you start from an entirely blank canvas.
I was able to see this difference by doing two versions of the project - the first one as a prototype with no structure provided (including no CLAUDE.md or AGENTS.md files), and the second one that had some initial models, services, etc. before I started using the AI to help generate code.
Attempt #1 - The Prototype
Initially I wanted to try and build the bill change detection logic on a Cloudflare Worker, and leverage Cloudflare KV to store the data.
Aside from a Worker that I built a while back to learn a bit of Rust, I didn't have much exposure to building on the Cloudflare platform, and thought this would be a good test case for building something with a completely blank canvas, and leaning really heavily on the AI.
What I prompted the AI with (in this case Claude Opus 4.5) was:
- what I wanted the system to do in terms of overall behaviour;
- what I wanted the system to be built with - Cloudflare Python Workers using Cloudflare KV for storage;
- expected inputs and outputs like URLs, what format the data was expected to be in on the website hosting the bills (here if you're interested), and priorities of the project such as minimal/no cost;
- various edge cases in "business logic" that I expected; and
- some pseudo-code data models and how they related to each other (e.g. a
Billhas a stringtitle, anupdated_atdatetime, and hasBillActivities).
The results
I did a line-by-line code review of the output and found the following on my first pass:
- variables, imports, and functions that would never be accessed or used (i.e. dead code);
- obvious logic errors;
- fragility in logic that had significant assumptions baked in;
- For instance, that the list of activities of a bill (reached second reading, received royal assent, etc.) for a bill were always in the correct order on the web page. This assumption had downstream effects on the change detection logic for the bills that the AI had created
- use of hard coded strings when an enum would've been better;
- Small detail, but matters for maintainability from the perspective of ensuring data integrity
- used the normal
wranglertool instead ofpy-wranglerfrom Cloudflare despite me having specified that this was a Cloudflare Worker being written in Python.- Additionally, the AI followed what appeared to be the Javascript patterns for building a Cloudflare Worker instead of the Python patterns. I suspect this is likely due to there being more examples for Javascript Workers out there and Python Workers still being in beta.
An example of this was that the AI wrote code exporting thescheduledfunction for a cron job as an independent function instead of first declaring aDefaultclass that inherits from theWorkerEntryPointand then addingscheduledas a method of that class as shown in the Cloudflare docs.
- Additionally, the AI followed what appeared to be the Javascript patterns for building a Cloudflare Worker instead of the Python patterns. I suspect this is likely due to there being more examples for Javascript Workers out there and Python Workers still being in beta.
These all add up to bugs, security issues, and spaghetti code that's difficult to maintain, so this was a failure in terms of being "production-ready right out of the gate". However, it worked enough that I got a better understanding of the problem itself.
At this point, I also made the decision to rewrite the application as a FastAPI app.
For one thing, there were size limits on the free tier of Cloudflare Workers, and the initial implementation didn't fit within them. I may have been able to deploy it if I paid for the Workers plan, but I didn't want to spend the money just yet given this is a hobby project.
The other factor in my decision was that I realized that, if I were to do this project entirely on the Cloudflare platform, I'd need to create a monorepo of various Cloudflare Workers to tie all the functionality I wanted to have together.
I didn't want to spend time wrangling monorepo tooling at this time, and so opted for the simpler approach of building with FastAPI.
Now that I had the working prototype, I considered this first experiment complete, and moved to the second, where I'd provide some initial structure before leveraging an AI coding assistant for any help.
Attempt #2 - Prototype, but with some initial models, services, and structure provided
For the initial project contents, I:
- built some initial data models like what a
Billlooked like and another model that tracks the activity on a bill; - the beginnings of the main scheduled task that would be initiating the fetching and processing of the data on the Ontario Legislative Assembly's (OLA) website;
- a couple of the services that would be leveraged by the task; and
- some utility functions that handled some of the more complicated parts of the change detection logic.
When I started prompting the AI to generate services to handle additional parts of the business logic, it tended to copy the patterns and coding style of the services I had already built, which was great.
It did make an interesting decision around what to do with setting up a database connection though, which I hadn't yet implemented when I started pulling the AI in.
It believed that it needed to create both a global variable for a database connection for use in scheduled tasks and another database connection to be used for incoming HTTP requests that would live on the app.state - an object available in FastAPI that persists state across requests.
This initially led to code that needed to ensure that both connections were cleaned up when the app was shut down - a little too much for a simple app as this really just needs one database connection.
I ultimately modified it so that, in FastAPI's lifecycle method, the created database connection is passed into the function responsible for registering tasks so that task definitions could accept and use the database connection (shown below), meaning that I only had one connection to worry about in the project.
@asynccontextmanager
async def lifespan(app: FastAPI):
app.state.engine = create_db_engine()
create_db_and_tables(app.state.engine)
register_tasks(scheduler, app.state.engine)
scheduler.start()
yield
scheduler.shutdown()
app.state.engine.dispose()
In main.py
def register_tasks(scheduler: AsyncIOScheduler, engine: Engine) -> None:
scheduler.add_job(
fetch_current_bills_from_ontario_legislative_assembly,
trigger="cron",
day_of_week="fri",
hour=10,
minute=0,
args=[engine],
)
def fetch_current_bills_from_ontario_legislative_assembly(engine: Engine) -> None:
...
tasks.py
I also intentionally left in a subtle error where the processing of bills and their associated activities wasn't entirely idempotent, and could leave things in an inconsistent state if something interrupted the task midway through.
While the AI didn't find the issue when I asked it to do a code review in a general sense (i.e.: "please review this code"), when I prompted the AI to specifically check that the operation was entirely idempotent, it did find the issue.
Wrapping up
To reiterate, having a good initial structure in your project with good examples of data models, services, etc. can increase the likelihood that AI will boost your productivity instead of causing you chaos and stress.
While I didn't test these out in the context of Trillium, using CLAUDE.md, AGENTS.md, or Serena AI's memory feature in your project can also give the project some initial structure.
In terms of what I wanted to experiment with and learn more about regarding knowing where the optimal "sweet spot" is of leaning on AI without compromising my understanding of the code base:
I liked that I was able to generate a working prototype relatively quickly from a blank canvas, but I genuinely couldn't remember the finer details and edge cases despite going carefully through it for a few hours and reviewing the generated code.
Additionally, I would've spent far more time trying to get that generated code into a "production-ready" state compared to throwing it away and starting again from scratch.
Some other learnings from this first part of the project:
- Serena is a great tool for improving your token efficiency once it's configured; and
- Claude Code doesn't have access to chats and project context created via the Claude Web UI (at least when I ran into this in December)
- I was able to get a workaround by prompting Claude in the web UI, within a chat that was part of the project, to "create an "effective prompt" for Claude Code given the context of the project and the desired behaviour of the system I wanted.
In terms of my plans for next steps - the OLA will be meeting again in about a month and I look forward to testing this application as bills are introduced and updated on a weekly basis.
I also plan on beginning research into what building the functionality summarizing individual bills is going to look like, and what's out there for creating a prompt evaluation pipeline.
If anyone likes the sound of this project, and have thoughts/suggestions/ideas of ways to make this newsletter awesome, leave a comment below, or feel free to drop me a line at hey@ericapisani.dev.
Like what you've read?
Subscribe to receive the latest updates in your inbox.