Founder Playbook · The Bootstrapped Founder
9 tactics from Michael Taylor
Michael Taylor — Prompt Engineering for Fun & Profit
Watch the full episode“you need to get more scientific with something that's going to be in production and that could be as simple as you know just running that prompt like 30 times um and then pasting it into Google doc right um and and then just reviewing like okay like how often does it do bad things”
Run the prompt 100 times in async, then review the failures
Running a prompt once and shipping it is the prompt-engineering equivalent of medieval bloodletting — a lucky hit mistaken for a working system. Before any prompt goes into production, run it 30-100 times asynchronously, dump outputs into a doc, and cluster the failures. Bulk review surfaces real failure modes instead of the one good output that tricked you.
“if you keep going on trial and error I think that's when you get to these incantations because uh you I maybe I think after you've been working on that problem for too long and you seen too many versions of the same thing you start to get weird and uh you know that's when you start casting spells”
Trial-and-error past the working point turns into casting spells
There's a window where iterating on a prompt is productive — after that, it turns into superstition. Tweaking the same prompt for hours produces longer, weirder text that feels powerful but is just correlation mistaken for causation. Get to 'working okay' fast, then switch out of trial-and-error into systematic A/B testing across 100 runs per variant.
“you have to kind of approach it more like um a researcher studying human behavior in a way you know you have to kind of um kind of observe what's happening in different um extreme cases um and then uh and then and then you start to form a pattern”
Treat the model like a tribe to observe, not code to debug
Models aren't deterministic functions — they refuse, hallucinate, and drift for no apparent reason. Working with them is closer to anthropology than software engineering: observe edge cases, log refusals, note the percentage of weird outputs, build a mental model over many trials. The blackbox doesn't yield to debugging, only to patient observation.
“you would never hire a uh human employee and then not give them a brief on the type of tasks that you want to do right you wouldn't hire an agency and say um you you make up the marketing campaign I don't care you know you would say okay here's the brief here's the kind of thing I'm looking for”
Prompt principles are just manager onboarding: brief, format, examples
The same instincts that make a good manager make a good prompter: give clear direction, specify the format (JSON, bullets, paragraph count), supply examples of past good work, inject knowledge the model doesn't have, and state opinions the average person wouldn't. These principles survived the GPT-3 to GPT-4 jump and will likely outlive GPT-5.
“now I'm writing the stuff where I'm like I'm going out and finding something new about the world and then I'm becoming the training data for the next version of chat VT uh you know think you want yeah if if you can if you can be in the training data um more than you're using the training data then I think that's a good balance”
Be in the training data, not just a user of it
Generic AI-generated content is boilerplate the next model will produce for free. The durable founder edge is primary research — running experiments, collecting fresh data, finding the holes in what ChatGPT confidently asserts, and writing up what only direct observation could uncover. That output becomes the corpus future models lean on, a much stronger position than competing with them on their home turf.
“I feel like people are going to end up doing a lot more primary research being an indie hacker will be much more about carving off a specific niche of all the problems left in the world and actually going and running experiments to figure it out”
Indie hackers win on the experiments nobody else bothers to run
As AI commoditizes general knowledge, audience advantage comes from picking a specific niche and being the one running real experiments in it. Generic content gets ignored; experiment-backed niche content gets cited. Pick 1-2 narrow founder problems and publish raw experiment logs with numbers — the audience compounds around the person who runs the tests nobody else does.
“I don't want it to say yes to something that is a no but it's fine for me to for it to say no to something that that might be a yes I don't care about the the false negatives that those are acceptable because I have like 20,000 podcast episodes coming in in a day”
Pick the asymmetric error tolerance before tuning the prompt
When the input firehose is big, optimize the prompt for precision, not recall. A wrong 'yes' burns user trust on every notification; a missed 'yes' is invisible. Encode that asymmetry directly into the eval scoring before tuning — penalize false positives ~10x more than false negatives, then chase that number. Arvid hit 99.8% on a podcast-classifier doing exactly this.
“the book is much more in the vein of like an O'Reilly book where it kind of it explains these topics in a comprehensive way whereas the Udi course is much more of like a quick hit because that's what like what udy people want the udem course blew up right so it was like way more successful than our our our Tech business”
Repackage one body of IP into multiple formats — Udemy out-earned the SaaS
One body of knowledge — prompting principles — was repackaged into a blog post, an O'Reilly book, and a Udemy course. The Udemy course ended up out-earning the underlying SaaS. Audience expansion comes from format, not new content. Different formats serve different buyer intents: cerebral readers buy the book, quick-hit learners buy the video course.
“when you're managing 50 people it's you know there's 30 days in the month so uh it's at least one person's like worst day of the month like every day right there's always someone's bad day happening that day and you you're the one that kind of it filters up to and I kind of feel like that's the same with when I'm managing models”
Manage models like a 50-person workforce — someone's always having a bad day
Stop treating prompt failures as bugs to debug deterministically. Treat them as variance from a workforce — expect a percentage of bad outputs, build review and filtering layers, and don't waste hours chasing the 1-in-100 refusal as if it were reproducible. The model has 'bad days'; the production system needs to filter them, not eliminate them.