The Script Solution: Why I Make LLMs Write Code Instead of Doing the Work

Here's a pattern I keep running into: I ask an LLM to analyze data, and it happily obliges. Then I ask it again with the same data, and I get a different answer. Sound familiar?

After 25 years of building software and leading remote teams, I've learned that consistency and reproducibility aren't just nice-to-haves—they're fundamental to serious software development. But when you're working with Large Language Models, these principles start to break down in interesting ways.

The Problem: LLMs Want to Do Everything

I've been working on several projects recently where I'm dealing with complex data analysis. In one case, I'm orchestrating multiple agents running tests across different datasets for a research paper. In another, I'm generating and validating test data across structured datasets. The common thread? I need consistent, deterministic results that I can reproduce.

The LLM seems eager to help. Give it a task, and it'll jump right in—analyzing data directly, processing information on the fly, generating outputs. And here's the thing: it's actually pretty good at it. Until you need to run it again.

That's when you discover the fundamental issue: LLMs are not deterministic. Ask the same question twice, and you'll likely get two different interpretations, two different analyses, two slightly different outputs. For exploratory work or brainstorming, that's fine. But when you're doing research, building production systems, or need reproducible results? That's a dealbreaker.

The Pattern That Actually Works

What I've found—and this has held up across multiple projects now—is a simple but reliable pattern: Don't let the LLM do the work. Let it write the script that does the work.

Instead of asking the LLM to analyze my data, I ask it to generate a script that analyzes my data. Instead of having it process information directly, I have it create code that processes the information. The LLM becomes a code generator, not a data processor.

The difference is night and day.

Why This Matters

When you have a script, you have something deterministic. Run it once, run it a million times—you'll get the same analysis every single time. The interpretation doesn't drift. The logic doesn't change. The results are reproducible.

This is especially critical when you're working on research or building systems that others will depend on. You need to be able to point to your methodology and say, "Here's exactly what happened, and here's how you can verify it."

The LLM's Natural Tendency

Here's something interesting I've noticed: the LLM almost wants to wrangle control back. You'll give it a task to generate a script, and it'll try to do the task itself instead. It's like it defaults to being the executor rather than the architect.

I've had to get explicit in my prompts: "Generate a script for this. I need the script itself, not the results." Because left to its own devices, the LLM will try to be helpful by doing the analysis directly—which completely defeats the purpose.

Where This Pattern Shines

This approach has been invaluable for several scenarios I've been working through:

Data Analysis: When you need to analyze structured data consistently across multiple runs, having a script ensures your logic stays constant.

Test Generation and Orchestration: When you're coordinating multiple agents or tests, reproducibility is everything. Scripts give you that.

Research Validation: If you're doing any kind of research work where others need to verify your findings, deterministic scripts are non-negotiable.

Data Validation: When you need to validate data against specific rules repeatedly, a script ensures those rules are applied identically every time.

The Builder's Mindset

This is really about thinking like a builder rather than a consumer of AI. Sure, LLMs can do impressive things directly. But when you're building serious software, you need to think about the properties of your system: Is it deterministic? Is it reproducible? Can someone else verify the results?

Using LLMs to generate scripts rather than generate outputs directly gives you the best of both worlds. You get the speed and creativity of AI-assisted development, but you maintain the rigor and reproducibility that professional software development requires.

If you're working with LLMs on complex data tasks and finding yourself frustrated by inconsistent results, try shifting your approach. Stop asking the LLM to do the analysis. Start asking it to write the code that does the analysis. Your future self—and anyone trying to reproduce your work—will thank you.