The Turing test was created to distinguish intelligent machines from human thinkers, and the question it raises is nearly as old as computer science itself. Over time, countless variations of the test have focused on one familiar idea: spotting errors, awkwardness, or other signs of imperfection that might reveal a machine behind the answer. That approach is becoming less useful as modern systems grow better at sounding polished and convincing. In this blog, I explore a simple extension to the classic test: instead of asking whether an answer is flawless, I ask whether it is stereotypical. My argument is that this shift reveals a more subtle weakness—one that many of today’s leading AI systems still struggle to hide. The New Test Harness The core idea behind this new test harness is straightforward: instead of judging an answer in isolation, the evaluator would run it through multiple AI systems and measure how strongly it reflects the same familiar patterns; see the image below. The ...
This blog contains posts mostly from the domains of computer science and systems/software engineering. Each post is written in a self-contained manner with links toward external sources. These should be treated as mandatory reading, as they explain concepts not repeated in posts. Every topic is treated in pragmatic fashion using concrete examples implemented in various mainstream programming languages (like, C++, Python and Java).