icon-dropdown-arrow
top
JUST DROPPED: The Google Ads Course Built for DTC Brand Operators (Not Just Media Buyers). Start Learning Here
icon-cross
Kirk Williams
 • 
AI in PPC

The Probabilistic Problem With Autonomous PPC Agents

Date Published: 
June 18, 2026
Last Update: 
June 18, 2026
kirk williams on youtube logo

The Probabilistic Problem With Autonomous PPC Agents

Post Summary

LLMs Are Probabilistic: Why That Changes Everything for PPC Agents

There's a technical distinction about how large language models actually work that I think is getting lost in most conversations about autonomous agents in PPC, and I want to try to lay it out carefully because I think it changes the risk calculus considerably for anyone running fully autonomous agents that make live changes in Google Ads accounts. With that said, I also greatly respect many of the industry experts I see incorporating agents into their workflow, and also hope this opens up the conversation more directly on the usage of autonomous agents in Google Ads and Microsoft Ads accounts.

Caveat: I've given up trying to distinguish between LLM and AI and have fully accepted the cultural naming of LLMs to AI... so in the rest of my article I'll refer to AI agents knowing we're not talking about AGI, yet.

Key Difference: LLMs Are Probabilistic. Scripts Are Not.

First, I want to think through key distinctions between two key automated ways a Google account is managed: Scripts and AI Agents.
A Google Ads script executes deterministic logic. You tell it what to do, it does exactly that thing, and you can audit exactly what it did. The same script run a hundred times produces the same output given the same inputs. That's not a limitation of scripts, it's actually a feature that makes them trustworthy for high-stakes work in live accounts. They're analyzing, compiling, and then reporting on real data without bias or interpretation.

A large language model works differently at a fundamental level. LLMs are probabilistic systems. They're not querying an index or executing a fixed calculation. They're predicting the most statistically likely response based on their training data, which means the same prompt given to the same model can, and occasionally does, produce meaningfully different outputs.

From what I understand, it's something like working with a slightly different version of the model each time you run it. This is a massively crucial point... don't miss it!

So fundamentally, this means that if an agent runs 120 times in your Google Ads account, you're working with 120 slightly different versions of that agent. The 121st run carries real risk that has nothing to do with whether your instructions were clear enough, because the risk isn't primarily in what you've already anticipated, or accounted for in your guidelines (we'll talk about that next).

Big Caveat: I want to be careful here because I'm not an AI engineer and I may be missing nuance in how this works technically, so if you identify errors in my thinking, please let me know so I can learn!

But if the general principle is accurate (and multiple people, including but not limited to Claude and Gemini ;), I've spoken with who are deep in this work have confirmed that it is), then it has significant implications for how we think about autonomous agents making live changes in accounts.

"I've Given It Clear Guidelines" May Not Mean What You Think It Means

So let's talk more about what do to with a probablistic engine making changes in your account when it is a different agent each time (per my point above). The most common thing I hear from practitioners deploying agents is some version of "I've given it very specific guidelines, so I know it'll behave consistently." While appearing to be reassuring, if the probabilistic nature of LLMs is real (it is), then guidelines are essentially narrowing the road the model drives on, not controlling exactly where it goes within that road. The model is still predicting rather than calculating. You've constrained the space of possible outputs. You haven't made the output deterministic.

But even more importantly in this conversation, you can only create guidelines around what you are aware of as a necessary guideline!! This means, the real risk haven't yet realized you should have accounted for, and by definition, you can't write a guideline for something you haven't thought of yet.

It's not what you have already "guidelined" that is the greatest risk, it's what you didn't yet realize you should have "guidelined" where the true risk is in an account.

(read that last sentence again, it's important)

The Audit Problem

Ok, so there is risk (anyone understands that, it's why the idea of guidelines even exists), but let's add another layer of risk into this idea: with a script, you can audit the work. You can see exactly what it did, in what order, and why, because it doesn't think, it calculates. The process is transparent by definition since it is by definition its inputs and operations are fixed and verifiable.

Consider something concrete: n-gram analysis on a search term report with 2,000 rows. A Python script reading that data processes every row, and you can verify that it did. An LLM given the same task may process all the data if it fits in the context window, but you can't confirm which rows it weighted heavily, which it skimmed, or where it started approximating. The answer it returns will look exactly the same whether it processed everything carefully or quietly skipped sections. This is the specific scenario I keep coming back to when thinking about autonomous agents in ad accounts, because auditing is actually more important for AI than for scripts, not less, precisely because AI has more freedom within its guidelines.

With an LLM agent, you get an interpreted answer that you are unable to audit without stopping, and asking direct (multiple) questions about the process (which ironically, is the opposite of an autonomous agent btw... but that's the point of this article, right? We're not anti-AI here at ZATO, we're "anti-autonomous agents in their current state" at ZATO... gosh that might be the first time I've uttered that out loud). The autonomous model returns output that looks complete and confident regardless of how it arrived at that output, and it won't tell you when it glossed over something, approximated rather than analyzed, or weighted certain inputs more heavily than others.

Compounding Errors and Silent Failures

Let's continue with the next layer of concern I have with autonomous agents: scripts fail loudly (when something breaks, you know, because the script throws an error and stops). But LLM agents fail quietly, and they do it with the same confident tone they use when they're correct so you don't actually know anything is wrong... well, without the proper guideline you realized you had to manually add in.

This becomes particularly concerning in multi-step agent loops, where the output of one step becomes the input for the next. A small error that occurs in step 7 doesn't announce itself. It flows into step 8, step 9, step 10, growing as it compounds, and by the time you notice something is off in the account, the chain of errors may be long enough that diagnosing the original source is genuinely difficult. The guardrail problem is real, and quiet compounding failures are one of the more serious expressions of it. This is where guidelines can help, but refer back to my point above... what we're talking about is infinite potential failure points that you cannot out guideline.

Liability and Responsibility

I've written on this elsewhere, but it is worth noting again: the risk potential of you missing a key guideline when that agent makes a devastating change is core to this, so I'll refer you to my article here (AI Agents: Why the Math Still Doesn't Make Sense) to dig in more. The short version is, you are on the hook personally (or your agency) for that agent you have set loose in 200 accounts and the risk of compounding failure could truly be devastating to you. This doesn't mean you shouldn't use AI, but I think it's a "fear" more consultants/agencies/employees should be pondering when they are tempted to "save some time and do more within the hours of my day by using this autonomous agent". If that agent "lets you do more" for 9 months but then you get fired on month 10 from a compounding error it took you 10 months to identify... was "saving time" actually, really the best outcome? 

Where LLMs Actually Shine in PPC Work

Okay, so with that being said you may be thinking "gosh, this guy hates AI". That is untrue (okay I do have concerns with what its doing more broadly to human creativity and community, but that's for another blogpost). Nothing in the above is an argument against using AI in PPC. I use multiple AI tools all day, every day and find them valuable in ways that have completely changed how I work.

Two examples of many for how I have personally used AI to help me: AI generated Landing Pages for clients who were previously unable to afford a full-time designer, and incredible SEO growth on the ZATO blog as I've worked closely with Claude to solve technical SEO problems on our site over time.

But the use cases that play to the model's actual strengths are the ones I keep coming back to: insight generation, surfacing patterns across large data sets that a human analyst might miss, helping a manager think through a problem from angles they hadn't considered, drafting copy variations at scale, structuring analysis frameworks.

These use cases work well because they're not asking the model to be deterministically correct about a specific action in a live account with real budget consequences. They're asking the model to be useful in a generative, exploratory way, where the human reviews the output and decides what to do with it. The model's probabilistic nature is actually an asset in that context because it produces varied, creative, sometimes surprising perspectives that a deterministic system wouldn't generate.

The concern I keep coming back to is specifically the autonomous, make-the-change-without-human-review version of agent use, where we're asking a probabilistic system to do a job that I think requires deterministic accountability. Those are different tools suited for different jobs, and treating them as interchangeable is where I think the current conversation about AI agents in PPC campaign management is getting ahead of itself.

Not Never, Just Not Yet

Can LLMs every be used as fully autonomous agents in Google Ads? Undoubtedly yes! So I want to ensure my actual position is stated and clear here: I understand we are in a transitional phase of continual learning with AI, and we will undoubtedly get to the point where the various risks are more easily mitigated, and the guidelines are more anticipatory and successful without constant human oversight. As one of my ZATO team members, Chris Reeves, recently put it: 

Ultimately, the big question going forward might just be comparing statistical machine failure against human fatigue failure. Know what I mean? Humans make errors as well (like adding an extra zero to a budget on a late Friday afternoon) so eventually people will just ask who makes fewer errors: the human or the machine?

Practical Question as a Conclusion

Before deploying an autonomous agent to make live changes in an account, I think the question worth asking isn't "have I given it clear enough guidelines?" It's "what happens when it behaves differently on the 121st run than it did on the first 120, and do I have the visibility to catch that before it causes damage?"

If the answer to that question is yes, and you've built actual monitoring infrastructure rather than just trusting the guidelines you wrote, then the conversation about autonomous agents becomes much more interesting. If the answer is "I'll know because the agent will tell me," that's the part I'd push on, because a probabilistic system that fails confidently and silently is not a reliable self-reporter.

I could be wrong about parts of how the underlying technology works, and I'm genuinely open to being corrected on the technical details. But the practical risk pattern, quiet failures, compounding errors, unauditable outputs, these seem worth taking seriously regardless of exactly how the probabilistic mechanism works at the model level.

Want more free content like this delivered directly to your inbox?
Subscribe Here
Kirk Williams
@PPCKirk - Owner & Chief Pondering Officer

Kirk is the owner of ZATO, his Paid Search PPC micro-agency of experts, and has been working in Digital Marketing since 2009. His personal motto (perhaps unhealthily so), is "let's overthink this some more."  He even wrote a book recently on philosophical PPC musings that you can check out here: Ponderings of a PPC Professional.

He has been named one of the Top 25 Most Influential PPCers in the world by PPC Hero (now PPCSurvey) 10 years in a row (2016-2026), has written articles for many industry publications (including Shopify, Moz, PPC Hero, Search Engine Land, and Microsoft), and is a frequent guest on digital marketing podcasts and webinars.

Kirk currently resides in Billings, MT with his wife, six children, books, Trek Bikes, Taylor guitar, and little sleep.

Kirk is an avid "discusser of marketing things" on Twitter, as well as an avid conference speaker, having traveled around the world to talk about Paid Search (especially Shopping Ads).  Kirk has booked speaking engagements in London, Dublin, Sydney, Milan, NYC, Dallas, OKC, Milwaukee, and more and has been recognized through reviews as one of the Top 10 conference presentations on more than one occasion.

You can connect with Kirk on Twitter or Linkedin.

In 2023, Kirk had the privilege of speaking at the TEDx Billings on one of his many passions, Stop the Scale: Redefining Business Success... which is also the title of his latest book, Stop the Scale, available now on Amazon!

Continue reading

Find what you're looking for here: