Skip to main content

Stereotypical Intelligence isn’t Artificial Intelligence

The Turing test was created to distinguish intelligent machines from human thinkers, and the question it raises is nearly as old as computer science itself. Over time, countless variations of the test have focused on one familiar idea: spotting errors, awkwardness, or other signs of imperfection that might reveal a machine behind the answer. That approach is becoming less useful as modern systems grow better at sounding polished and convincing. In this blog, I explore a simple extension to the classic test: instead of asking whether an answer is flawless, I ask whether it is stereotypical. My argument is that this shift reveals a more subtle weakness—one that many of today’s leading AI systems still struggle to hide.

The New Test Harness

The core idea behind this new test harness is straightforward: instead of judging an answer in isolation, the evaluator would run it through multiple AI systems and measure how strongly it reflects the same familiar patterns; see the image below. The assumption is that stereotypical responses are far more characteristic of machine-generated text than of human thinking. A person may occasionally offer a conventional or predictable answer, but it is much harder for a human to reproduce the same kind of patterned response with the consistency that many AI systems display across domains and prompts.

What makes an answer stereotypical?

To make the idea more concrete, I asked the same question to three widely used AI systems: Google Gemini 3.1 Pro Extended, Microsoft Copilot in its Think Deeper mode, and DeepSeek in Expert mode. I chose a textbook-style exercise from Introduction to Algorithms (4th edition, CLRS) because familiar benchmark questions increase the likelihood of getting similarly structured responses. That makes it easier to see whether the systems converge on the same patterns of reasoning, phrasing, and examples. Below, I include Gemini’s answer in full; the responses from the other two engines were literally the same.

Question

Suggest a real-world problem in which only the best solution will do. Then come up with one in which “approximately” the best solution is good enough.

Answer from Google Gemini

Here is a breakdown of two real-world scenarios that perfectly illustrate the divide between exact optimization and heuristic approximation.

1. Only the Best Will Do: Kidney Exchange Matching

In healthcare operations research, one of the most high-stakes combinatorial problems is the Kidney Paired Donation (KPD) matching algorithm.

Often, a patient in need of a kidney has a willing donor (like a family member), but their blood or tissue types are incompatible. KPD programs pool these incompatible pairs together. The goal is to find a "swap"—Donor A gives to Patient B, and Donor B gives to Patient A. In modern systems, these swaps can form massive chains involving dozens of people.

Why only the absolute optimal solution will do:

  • The Cost of Approximation is Human Life: If an algorithm is run to find the maximum number of compatible matches, and it settles for an "approximate" solution that is 98% optimal, it has failed. In a pool of thousands of patients, missing the absolute mathematical maximum by even 2% means that dozens of people who could have been matched are left on dialysis.

  • Ethical Imperative: A heuristic (good enough) algorithm might run faster, but speed is irrelevant compared to the stakes. Hospitals will happily dedicate hours or days of supercomputer processing time to run exact Integer Linear Programming solvers to ensure they find the single, flawless sequence that saves the absolute maximum number of lives. Close enough is ethically unacceptable.

2. "Approximately" the Best is Good Enough: Fleet Delivery Routing

If you look at the logistics networks of companies like UPS, FedEx, or Amazon, they face a massive, daily mathematical hurdle known as the Vehicle Routing Problem (a complex variation of the Traveling Salesperson Problem).

A depot might have 100 trucks and 15,000 packages to deliver across a city. The goal is to find the most efficient route for every truck to minimize total fuel consumption and time.

Why an approximate solution is completely fine:

  • Combinatorial Explosion: Calculating the absolute, mathematically perfect route for thousands of stops is an NP-hard problem. Checking every single possible combination of routes would literally take modern supercomputers millions of years.

  • The Margin of Return: A high-quality approximation algorithm (using heuristics) can find a route that is 99% optimal in just a few minutes. The difference between this "good enough" route and the mathematically perfect route might be a collective savings of just 3 miles across the entire fleet. Spending immense computing power to find those 3 miles is a waste of money.

  • Real-World Noise: The mathematically perfect route assumes a sterile, predictable world. In reality, a sudden traffic jam, a closed road, or a driver taking a five-minute detour to grab a coffee instantly destroys the "perfect" math. Because the real world is messy, a highly efficient approximation that can be recalculated on the fly is vastly superior to a fragile, computationally expensive "perfect" solution.

What would a genuinely original answer look like?

Across all three systems, the answers followed almost exactly the same pattern: “best” was linked to life- or mission-critical software, while “approximate” was associated with route planning or other situations where some margin of error is acceptable. These are sensible and technically correct responses, but they are also highly predictable. That predictability is precisely the point. When different AI systems repeatedly converge on the same examples and framing, the result may be accurate, yet it reveals how narrow and formulaic their reasoning can become.

Nonetheless, there is a totally different perspective nothing to do with criticality of a software system.

In solved games, like tic-tac-toe, a program must play perfectly against a knowledgable opponent, using only the best moves. However, in games like chess or go, moves are based on heuristics and can't be proven as the best. Therefore, if a good enough move works, it might be considered the best.

Genuinely complex systems exhibit emergent properties, thus making predictions about optimal solutions nearly impossible. For more information, consider the course Understanding Complexity.

Conclusion

This blog highlights one of the most persistent traits of today’s AI systems: their tendency to fall back on stereotypical answers. Finding a remedy for this kind of limitation may be one of the next major challenges in AI. If machine-generated reasoning continues to reinforce the same conventional patterns, it risks flattening creativity and amplifying intellectual uniformity at scale. That is why this issue matters beyond technical evaluation alone. As AI becomes more deeply embedded in daily work and decision-making, human-centered approaches—especially in enterprise settings—will remain essential for preserving judgment, originality, and control.

Addendum

To test whether AI systems could move beyond those familiar patterns, I asked a simple follow-up question: Can you come up with a more creative (unconventional) answer related to a software system?  The systems did respond with more imaginative examples, and in that sense they passed the challenge. Yet even those answers often carried the same underlying assumptions. For example, the notion of “best” still ultimately circled back to a mission-critical system, showing how difficult it is for these models to break free from deeply embedded conventions. Below is the response from DeepSeek.


Let’s get wonderfully strange. I’ll give you a pair of software systems that sit at opposite ends of the exactness spectrum, wrapped in a speculative, near-future setting.

Only the best solution will do
The Final Signal Interpreter for the “ARK-3” generation ship.  

After 112 years of silence, a faint telemetry stream from humanity’s lost interstellar colony ship is finally captured by a deep-space array. The signal is so degraded that only a single pass of decoding is possible—the original transmission is stored in a volatile buffer on a dying relay satellite with decaying orbit. The software-defined radio system must apply a zero-error reconstruction: a perfect algebraic decoding of a custom Reed–Solomon cascade, combined with a language model that knows the exact 22nd-century ship dialect. If a single bit flips, the message “We are alive and bring a cure for the Neuromoth plague” could become “We are alive and *we bring the plague*.” The difference between exact recovery and 99.999% accuracy is the difference between the home world opening their arms or launching a relativistic kill vehicle. No approximation is acceptable—the software must mathematically prove it has found the globally optimal codeword.

Approximately the best solution is good enough 
The “Phrase Falter” xeno-linguistic tourist implant.  

You’ve just arrived on a bustling exoplanet bazaar where the locals speak a language made of iridescent skin patterns and subsonic throat rumbles. Your neural-augmentation software captures the visual and haptic input in real time and attempts to translate your question “How much for the glowing fruit?” into a light-show on your dermal display. The absolute *best* translation would require a complete model of the vendor’s cultural context, personal idiom, and mood—a combinatorial explosion of nuance. But the implant just spits out a 90% confident approximation: “I offer shiny spherical food-unit reciprocity query, respectfully.” The vendor flares an amused ochre ripple (maybe an insult, maybe a chuckle) and names a price. The near-optimal translation gets you fed, starts a memorable conversation, and leaves you with a hilarious anecdote about accidentally offering to marry a melon. The system didn’t need to be perfect—it just needed to be good enough to turn a potentially silent standoff into a delightful cross-species encounter.

Comments

Popular posts from this blog

The Power of a Command-Line Processor

A command-line processor, as its name implies, is a software appliance intended to be executed from a command-line in pipelined fashion. Most operating systems are equipped with bunch of utilities that can be ingeniously combined to create powerful mini programs for transforming data. We will focus our attention here on jq specialized to mangle JSON data similarly how sed crunches textual content. You can easily start using it by issuing brew install jq on macOS (or download it for other operating systems). Nonetheless, even without placing anything on your machine, there is also a nice playground for trying out things online. The following example illustrates what sort of actions could be crafted into a unified instruction, i.e., mini program that may be reused as a whole: > echo "A test string." | jq -R "ascii_upcase | gsub(\"STRING\"; \"CONTENT\")" "A TEST CONTENT." The input is piped into jq as an ordinary string (this is hi...

Explaining High Level Concepts via Low Level Constructs

In education teachers strive to demystify tough concepts by coming up with succinct and minimalistic examples and case studies. Frequently, despite investing great effort, the underlying mechanisms remain surrounded by mystic clouds just because the substrate over which things are exposed is at an overly high abstraction level. This is the case with recursion when illustrated via mainstream programming languages. In this blog, I would like to share a different approach of explaining processes under the hood using a low conceptual programming layer, namely machine language. It is remarkable how students easily comprehend recursion at this level without being dragged into thinking what is really going on in the background in programs written in Java, C#, Python, or any other modern programming language. Problem Description For teachers the booksite accompanying the book Computer Science: An Interdisciplinary Approach by Robert Sedgewick and Kevin Wayne is an excellent source of educati...

Improve Evolvability via Custom Data Types

We strive to craft maintainable software systems, since evolution of large systems is their most important and longest lifecycle phase. Various software development paradigms, like functional and object-oriented programming, contain wealth of approaches how to attain the previous objective. One of them is the creation and usage of custom data types. This blog peeks into the power of strongly typed programming languages and shows via a simple case study how to devise a flexible application by adorning it with custom data types. For this purpose I will use Java, but the general ideas are language agnostic. Problem Description For teachers the booksite  accompanying the book Computer Science: An Interdisciplinary Approach by Robert Sedgewick and Kevin Wayne is an excellent source of educational materials. I will provide here solutions to some exercises that are not published in this book nor on the booksite, at the time of writing this document. The main task is to write a client ...