Thoughts on Coding Agents and Creative Agency
Reflections on building personal software through AI coding ("codegen") tools — from a Chrome extension to interactive websites—and what I've observed as someone without a traditional software development background.
I recently began dedicating a significant portion of my time to developing software with codegen tools. As a product leader with a social science and humanities background, I set out to test the limits of these tools by building 100% through prompting. A week into this journey, the term 'vibe coding' was coined and a community of such explorations has blossomed. For me, the experience has been expansive.
- I am building small pieces of software that feel beautiful, personal and helpful.
- I am exploring about ways to connect the dots for powerful experiences even with quite limited resources (eg: in a Chrome Extension, vs an OS).
- I am waking up each day excited to see how the latest model, tooling or protocol will change what's possible for me.
I wanted to share a few reflections so far — on what I’ve made, practices for effective building, and challenges I’ve observed with these tools in their present form.
I end with early thoughts on the evolution of generative code for people everywhere.
My background
I studied Comparative Literature and Global Affairs at Yale. I focused on statecraft, postcolonial theory, and political philosophy (and spent zero days learning to code). My work examined how narrative, policy, and networks shape the social fabric in a globalized world. After graduating, driven by the same macro forces I turned to investing and technology. I began at D.E. Shaw, then LinkedIn, and now Google. I’ve worked with SQL and Python for data analysis but only recently began building software myself.
My Tech Stack
My reflections are all based on my experience with Windsurf, with websites hosted by Netlify. I experimented with Lovable, Bolt, and Cursor when starting out as well. I found the agentic IDEs most intuitive and I liked the fact that they would provide the most longterm flexibility for me. Windsurf was more agent-first which is what I was interested in, so I chose it over Cursor.
I use Netlify to host all of my websites which allows me to deploy directly from the IDE.
Four Projects and Some Takeaways
I've worked on four essentially 100% generated personal projects so far, each with its own quirks and lessons.
Interactive Personal Portfolio (click to collapse)
In the span of around ~7 days I created this interactive personal portfolio.
My goal was to reimagine my existing portfolio – a static set of digital art and writing – into an interactive playscape that encouraged curiosity and discovery.
I aimed to do this through things like: new UX interactions requiring discovery for simple tasks (eg: the bouncing circle for navigation), and unconventional use of established paradigms to nudge exploration (eg: the shopping metaphor to encourage engagement with questions, images and poems).
Shop
View Cart
Checkout
Print Receipt
Most Fun
Leaning on the model to increase efficiency while testing out how 'aligned' our taste and creativity was.
For example, requesting:
- "Five different fortune teller personalities and random selection among them for our poem generation prompt"
- "50 questions touching on the viewer's current state, experience in the gallery, and childhood to gain information for their poem, using personal data similar to that in my own writing"
Most Frustrating
Formatting the 'receipt' for checkout.
It was challenging to identify which DOM elements were causing extra white space.
The model also consistently failed to dynamically adjust the HTML2Canvas height based on poem length, due to poor sequencing of the poem container height calculation. Unseasoned, this took me a while to figure out.
More Demos and Details
Chrome Extension for In-Flow Understanding
Valentine's Website
Brand-Inspired Website
My latest exploration has been a Chrome Extension for In-Flow Understanding.
I am often reading things and pasting them into Claude or ChatGPT to understand a key concept, etc. Tired of pasting between tabs I thought it would be wonderful to be able to right-click to understand.
I created an initial Q&A MVP with the current page as context in < 60 minutes. I've been expanding since then (persistent cache with contextual RAG, audio modes, customizable UI, etc).
Click to view fullscreen
Most Fun
A growing excitement around pushing the limits of what's possible by chaining together resources across constrained environments. For example, how far can I transform the browsing experience through an extension alone?
Most Frustrating
Overcomplicated code requiring refactors and a comedy of errors :). For example, the Listen to Page feature went from 90% of the way there to days of debugging due to an overcomplicated and inefficient approach to audio-processing and in turn worsening dependency management as the codebase grew.
In the span of around ~7 days I created this interactive personal portfolio.
My goal was to reimagine my existing portfolio – a static set of digital art and writing – into an interactive playscape that encouraged curiosity and discovery.
I aimed to do this through things like: new UX interactions requiring discovery for simple tasks (eg: the bouncing circle for navigation), and unconventional use of established paradigms to nudge exploration (eg: the shopping metaphor to encourage engagement with questions, images and poems).
Shop
View Cart
Checkout
Print Receipt
Click any video to view fullscreen
Most Fun
Leaning on the model to increase efficiency while testing out how 'aligned' our taste and creativity was.
For example, requesting:
- "Five different fortune teller personalities and random selection among them for our poem generation prompt"
- "50 questions touching on the viewer's current state, experience in the gallery, and childhood to gain information for their poem, using personal data similar to that in my own writing"
Most Frustrating
Formatting the 'receipt' for checkout.
It was challenging to identify which DOM elements were causing extra white space.
The model also consistently failed to dynamically adjust the HTML2Canvas height based on poem length, due to poor sequencing of the poem container height calculation. Unseasoned, this took me a while to figure out.
In a few hours I created this Valentine's website for my husband.
Enter Quest
Unlock Memory Bank
Explore Memories
Decrypt Activities
Click any video to view fullscreen
Most Fun
Unleashing the model's "creativity" by aligning up front on a design 'vibe' up front and then allowing significant freedom in implementation throughout the site.
Most Frustrating
Again, formatting. Some challenges formatting the images in the gallery that varied significantly from the others (last image).
In a matter of minutes I made this very simple "Balenciaga vibe" website for my sister to deliver her Valentine's present – she is studying fashion in Paris, hence the theme.
Click to view fullscreen
Most Fun
Sheer speed. I created this simply to make my venmo-gift of a 'cafe date' a little more delightful. It took just a few high level prompts, actually faster than using canned tools like squarespace, and more personal.
Most Frustrating
Styling limitations. My original vision was a quaint, old fashioned parisian cafe vibe. To no avail. I tried 'old fashioned French Cafe', ' sketched Parisian cityscape', 'old fashioned post card' all of which were way too cheesy. I descoped to the minimalist haute fashion vibe with a clear brand reference instead.
Practical notes for effective building
Newcomers to building software should be prepared to get into the weeds even when building through prompting. Areas where I've invested time:
-
Refining few-shot prompting methods to avoid iteration (error prone). I've found the best 'few-shot' prompts include broad context of my goals so the model can take creative liberties aligned with overarching objectives, outline key features, and specify UI vibe. Then, I coax the model to 'keep going' until we have a working prototype. I've also found that requesting the model to "follow best practices to [eg: implement a cart feature]" is useful.
For example, sharing "This is going to be a Valentine's day website for my husband who loves retro tech and is Japanese" gave the model a clear direction for design leeway. -
Co-deliberating on approach before implementing. This includes examining the model's code to both acquire new vocabulary for precise prompting and asking questions to guide the model to good decisions.
For example, I frequently ask questions like: "Is this the simplest way?" "What are some other options, and the pros & cons?" And for Git commands, often: "What will that command do?" or "That's the wrong command (sic)". -
Building testing and debugging toolset. The models excel at writing their own tests and logs, and walking you through where and how to monitor them. I've also found custom debugging interfaces useful, especially when richer visual feedback is helpful.
For my interactive portfolio, I built a custom tool to track user responses and poem views. When prompted to generate a debugging interface, the model dreamed up a tiny, animated terminal-like tool that served my exact purpose:Check question capture
Check poem capture
Current limitations and mitigations
While using codegen to build something like the Valentine's Website is quite straightforward, there are fundamental limitations - relevant to everyone, not just newcomers - that make more complex projects trickier.
Here's my stack-rank of these limitations by level of pain and how I mitigate them.
-
Code Quality
There are still challenges with producing errorless, efficient code. Even functional solutions are often overcomplicated as the models seem to want to always do more. Recent improvements like automatic lint-error correction and bringing local processes into context (so the model can read console logs) are helpful in improving quality, but challenges with overcomplication, structural mistakes, and inefficient architecture choices remain even in the latest models.
As a mitigation, for simple changes I remain vigilant during execution and intervene with corrections (for example, "we already have a function for that, use it"). For more complex features or changes I ask the model to explain its plan before executing. The model returns a thorough description and code snippets. I then ask follow-up questions, using this as an opportunity to proactively identify errors and interrogate the approach to ensure we are making efficient, robust design choices.
-
Context Limitations
There are also challenges with context retrieval, especially as the codebase grows given context window limitations. Large refactors or changes with cross-cutting dependencies can become tricky to get right. Context limitations paired with the models' tendency to sometimes overcomplicate solutions can produce a downward spiral that becomes hard to vibe-climb out of. I've encounterd this with both Sonnet 3.5 and 3.7. Unfortunately as of publishing this I have not been able to gather data on Gemini 2.5 Pro (2.5M context) due to rate limiting.
As a mitigation, I ask the model to carefully consider the entire codebase (in chunks) and form a methodical plan before implementing. The model usually iterates through key files and aims to identify dependencies before proposing a solution. In my experience this gets ~80% of the way. The final 20% requires iteration.
-
Error Loops
When the model repeatedly generates self-identified 'bad code' and attempts to correct, regenerating its answer for a given prompt until finally timing out.
As a mitigation, I start a new chat with fresh context. -
Extraneous Changes
When the model introduces changes unrelated to what you've asked for - sometimes as extreme as restyling large portions of the UI in response to a targeted request ("lower the scroll anchor for the receipt"). I encountered this frequently with Sonnet 3.5 while developing the interactive portfolio.
As a mitigation, when requesting targeted changes I ask the model to "focus specifically on this request and make no unrelated changes".
*Note: With newer models, the frequency of Error Loops and Extraneous Changes seem to have decreased. SWAG estimate: I encounter them 5x less often with Sonnet 3.7 vs 3.5. A likely spurious variable is improvement in my own prompting.
Wrapping up
On “generative tools” and mass consumer appeal
I see high potential for people of a range of backgrounds to begin building. Perhaps without even realizing it.
I imagine this taking shape in emergent and ephemeral ways: small tools generated on demand to suit a specific personal use case, and look & feel personal based on the surrounding context. Like my little debugging tools, or personal Chrome extension.
What if anyone could summon personal programs like these? Maybe the best personal 'agent' is the one that gives me simple yet powerful agency over what it can do, and how.
I believe even the average consumer (heuristics: my parents in Montana, my GenZ sister) would find sticky delight and usefulness in the ability to generate their own tools - shaping function and form - just as folks have enjoyed generative images, answers, and text.
We could call these personal programs 'declarative agents' or more aptly 'generative tools'. I cautiously imagine them to be possibly the first really revolutionary, useful consumer application of generative AI.
Generative tools would be both personal and empowering. Built for me, “by” me, on demand. Customizable to my taste. Philosophically, ushering in a future where automation enables command over one's tasks rather than separation from them.
In this way, generative tools might be a small, regular reminder of one’s own agency - a critical component of what gives people a sense of meaning - by demonstrating in small and consistent ways: with your ideas and actions you can shape your reality.
The technical foundations feel nearly there. Still, there is a gap in commercial demand: a maker mindset shift is needed, one where people dream a little more ("wouldn't it be nice if I could ____"). How might we bridge this gap through what we build for people now, beginning to elevate tool- user to maker?