Why Bad AI Is Here to Stay

2025-02-02

It seems that in 2025 a lot of people fall into one of two camps when it comes to AI: skeptic or fanatic. The skeptic thinks AI sucks, that it’s overhyped, it only ever parrots nonsense and it will all blow over soon. The fanatic thinks general human-level intelligence is just around the corner, and that AI will solve almost all our problems. I hope my title is sufficiently ambiguous to attract both camps. The fanatic will be outraged, being ready to jump into the fray to point out why AI isn’t or won’t stay bad. The skeptic will feel validated, and will be eager to read more reasons as to why AI sucks.

I’m neither a skeptic nor a fanatic. I see AI more neutrally, as a tool, and from that viewpoint I make the following two observations:

AI is bad. It is often incorrect, expensive, racist, trained on data without knowledge or consent, environmentally unfriendly, disruptive to society, etc.
AI is useful. Despite the above shortcomings there are tasks for which AI is cheap and effective.

I’m no seer, perhaps AI will improve, become more accurate, less biased, cheaper, trained on open access data, cost less electricity, etc. Or perhaps we have plateaued in performance, and there is no political or economic goodwill to address any of the other issues, nor will there be.

However, even if AI does not improve in any of the above metrics, it will still be useful, and I hope to show you in this article why. Hence my point: bad AI is here to stay. If you agree with me on this, I hope you’ll also agree with me that we have to stop pretending AI is useless and start taking it and its problems seriously.

A formula for query cost

Suppose I am a human with some kind of question that can be answered. I know AI could potentially help me with this question, but I wonder if it’s worth it or if I should not use it at all. To help with this we can quantify the risk associated with any potential method of answering the question:

$$\mathrm{Risk}_\mathrm{AI} = \mathrm{Cost(query)} + (1 - P(\mathrm{success})) \cdot \mathrm{Cost(bad)}$$

That is, the risk of using any particular method is the cost associated with the method plus the cost of the consequences of a bad answer multiplied by the probability of failure. Here ‘Cost’ is a highly multidimensional object, which can consist of but is not limited to: time, money, environmental impact, ethical concerns, etc.

In a lot of cases however we don’t have to blindly trust the answer, and we can verify it. In these cases the consequence of a bad answer is that you’re left in the exact same scenario before trying, except knowing that the AI is of no use. In some scenarios when the AI is non-deterministic it might be worth it to try again as well, but let’s assume for now that you’d have to switch method. In this case the risk is:

$$\mathrm{Risk}_\mathrm{AI} = \mathrm{Cost(query)} + \mathrm{Cost(verify)} + (1 - P(\mathrm{success})) \cdot {\mathrm{Risk}}_\mathrm{Other}$$

The cost of a query is usually fairly fixed and known, and although verification cost can vary drastically from task to task, I’d argue that in most cases the cost of verification is also fairly predictable and known. This makes the risk formula applicable in a lot of scenarios, if you have a good idea of the chance of success. The latter, however, can usually only be established empirically, so for one-shot queries without having done any similar queries in the past it can be hard to evaluate whether trying AI is a good idea before doing so.

There is one more expansion to the formula I’d like to make before we can look at some examples, and that is to the definition of a successful answer:

$$P(\mathrm{success}) = P(\mathrm{correct} \cap \mathrm{relevant})$$

I define a successful answer as one that is both correct and relevant. For example “1 + 1 = 2” might be a correct answer, but irrelevant if we asked about anything else. Relevance is always subjective, but often the correctness of an answer is as well - I’m not assuming here that all questions are about objective facts.

Cheap and effective AI queries

Because AIs are fallible, usually the biggest cost is in fact the time needed for a human to verify the answer as correct and relevant (or the cost of consequences if left unverified). However, I’ve noticed a real asymmetry between these two properties when it comes to AIs:

AIs often give incorrect answers. Worse, they will do so confidently, forcing you to waste time checking their answer instead of them simply stating that they don’t know for sure.
AIs almost never give irrelevant answers. If I ask about cheese, the probability a modern AI starts talking about cars is very low.

With this in mind I identify five general categories of query for which even bad AI is useful, either by massively reducing or eliminating this verification cost or by leaning on the strong relevance of AI answers:

Inspiration, where $\operatorname{Cost}(\mathrm{bad}) \approx 0$,
Creative, where $P(\mathrm{correct}) \approx 1$,
Planning, where $P(\mathrm{correct}) = P(\mathrm{relevant}) = 1$,
Retrieval, where $P(\mathrm{correct}) \approx P(\mathrm{relevant})$, and
Objective, where $P(\mathrm{relevant}) = 1$ and correctness verification cost is low.

Let’s go over them one by one and look at some examples.

Inspiration ($\operatorname{Cost}(\mathrm{bad}) \approx 0$)

In this category are the queries where the consequences of a wrong answer are (near) zero. Informally speaking, “it can’t hurt to try”. In my experience these kinds of queries tend to be the ones where you are looking for something but don’t know exactly what; you’ll know it when you see it. For example:

“I have leeks, eggs and minced meat in the fridge, as well as a stocked pantry with non-perishable staples. Can you suggest me some dishes I can make with this for a dinner?”
“What kind of fun activities can I do with a budget of $100 in New York?”
“Suggest some names for a Python function that finds the smallest non-negative number in a list.”
“The user wrote this partial paragraph on their phone, suggest three words that are most likely to follow for a quick typing experience.”
“Give me 20 synonyms of or similar words to ‘good’.”

I think the last query highlights where AI shines or falls for this kind of query. The more localized and personalized your question is, the better the AI will do compared to an alternative. For simple synonyms you can usually just look up the word on a dedicated synonym site, as millions of other people have also wondered the same thing. But the exact contents of your fridge or your exact Python function you’re writing are rather unique to you.

Creative ($P(\mathrm{correct}) \approx 1$)

In this category are the queries where there are (almost) no wrong answers. The only thing that really matters is the relevance of the answer, and as I mentioned before, I think AIs are pretty good at being relevant.

Examples of queries like these are:

“Draw me an image of a polar bear using a computer.”
“Write and perform for me a rock ballad about gnomes on tiny bicycles.”
“Rephrase the following sentence to be more formal.”
“Write a poem to accompany my Sinterklaas gift.”

This category does have a controversial aspect to it: it is ‘soulless’, inhuman. Usually if there are no wrong answers we expect the creator to use this opportunity to express their inner thoughts, ideas, experiences and emotions to evoke them in others. If an AI generates art it is not viewed as genuine, even if it evokes the same emotions to those ignorant of the art’s source, because the human to human connection is lost.

Current AI models have no inner thoughts, ideas, experiences or emotions, at least not in a way I recognize them. I think it’s fine to use AI art in places where it would otherwise be meaningless (e.g. your corporate presentation slides), fine for humans to use AI-assisted art tools to express themselves, but ultimately defeating the point of art if used as a direct substitute.

Planning ($P(\mathrm{correct}) = P(\mathrm{relevant}) = 1$)

This is a more restrictive form of creativity, where irrelevant answers are absolutely impossible. This often requires some modification of the AI output generation method, where you restrict the output to the valid subdomain (for example yes / no, or binary numbers, etc). However, this is often trivial if you actually have access to the raw model by e.g. masking out invalid outputs, or you are working with a model which outputs the answer directly rather than in natural language or a stream of tokens.

One might think in such a restrictive scenario there would be no useful queries, but this isn’t true. The quality of the answer with respect to some (complex) metric might still vary, and AIs might be far better than traditional methods at navigating such domains. For example:

“Here is the schema of my database, a SQL query, a small sample of the data and 100 possible query plans. Which query plan seems most likely to execute the fastest? Take into account likely assumptions based on column names and these small data samples.”
“What follows is a piece of code. Reformat the code, placing whitespace to maximize readability, while maintaining the exact same syntax tree as per this EBNF grammar.”
“Re-order this set of if-else conditions in my code based on your intuition to minimize the expected number of conditions that need to be checked.”
“Simplify this math expression using the following set of rewrite rules.”

Retrieval ($P(\mathrm{correct}) \approx P(\mathrm{relevant})$)

I define retrieval queries as those where the correctness of the answer depends (almost) entirely on its relevance. I’m including classification tasks in this category as well, as one can view it as retrieval of the class from the set of classes (or for binary classification, retrieval of positive samples from a larger set).

Then, as long as the cost of verification is low (e.g. a quick glance at a result by a human to see if it interests them), or the consequences of not verifying an irrelevant answer are minimal, AIs can be excellent at this. For example:

“Here are 1000 reviews of a restaurant, which ones are overall positive? Which ones mention unsanitary conditions?”
“Find me pictures of my dog in my photo collection.”
“What are good data structures for maintaining a list of events with dates and quickly counting the number of events in a specified period of time?”
“I like Minecraft, can you suggest me some similar games?”
“Summarise this 200 page government proposal.”
“Which classical orchestral piece starts like ‘da da da daaaaa’”?

Objective ($P(\mathrm{relevant}) = 1$, low verification cost)

If a problem has an objective answer which can be verified, the relevance of the answer doesn’t really matter or arguably even make sense as a concept. Thus in these cases I’ll define $P(\mathrm{relevant}) = 1$ and leave the cost of verification entirely to correctness.

AIs are often incorrect, but not always, so if the primary cost is verification and verification can be done very cheaply or entirely automatically without error, AIs can still be useful despite their fallibility.

“What is the mathematical property where a series of numbers can only go up called?”
“Identify the car model in this photo.”
“I have a list of all Unicode glyphs which are commonly confused with other letters. Can you write an efficient function returning a boolean value that returns true for values in the list but false for all other code points?”
“I formalized this mathematical conjecture in Lean. Can you help me write a proof for it?”

In a way this category is reminiscent of the $P = NP$ problem. If you have an efficient verification algorithm, is finding solutions still hard in general? The answer seems to be yes, yet the proof eludes us. However, this is only true in general. For specific problems it might very well be possible to use AI to generate provably correct solutions with high probability, even though the the search space is far too large or too complicated for a traditional algorithm.

Conclusion

Out of the five identified categories, I consider inspiration and retrieval queries to be the strongest use-case for AI where often there is no alternative at all, besides an expensive and slow human that would rather be doing something else. Relevance is highly subjective, complex and fuzzy, which AI handles much better than traditional algorithms. Planning and objective queries are more niche, but absolutely will see use-cases for AI that are hard to replace.

Creative queries are both something I think AI is really good at, while simultaneously being the most dangerous and useless category. Art, creativity and human-to-human connections are in my opinion some of the most fundamental aspects of human society, and I think it is incredibly dangerous to mess with them.

So dangerous in fact I consider many such queries useless. I wanted the above examples all to be useful queries, so I did not list the following four examples in the “Creative” section despite them belonging there:

“Write me ten million personalized spam emails including these links based on the following template.”
“Emulate being the perfect girlfriend for me–never disagree with me or challenge my world views like real women do.”
“Here is a feed of Reddit threads discussing the upcoming election. Post a comment in each thread, making up a personal anecdote how you are affected by immigrants in a negative way.”
“A customer sent in this complaint. Try to help them with any questions they have but if your help is insufficient explain that you are sorry but can not help them any further. Do not reveal you are an AI.”

Why do I consider these queries useless, despite them being potentially very profitable or effective? Because their cost function includes such a large detriment to society that only those who ignore its cost to society would ever use them. However, since the cost is “to society” and not to any particular individual, the only way to address this problem is with legislation, as otherwise bad actors are free to harm society for (temporary) personal gain.

I wrote this article because I noticed that there are a lot of otherwise intelligent people out there who still believe (or hope) that all AI is useless garbage and that it and its problems will go away by itself. They will not. If you know someone that still believes so, please share this article with them.

AI is bad, yes, but bad AI is still useful. Therefore, bad AI is here to stay, and we must deal with it.