AI doesn't necessarily give better answers when polite

Table of Contents

Public opinion about being polite to AI changes almost as frequently as the latest verdicts on coffee and red wine. Still, more and more users are being added ‘please’ or ‘thank you’ Not just about their prompts, but not from concern that brave exchanges could be carried over into real life, but from the belief that courtesy leads to better and productive outcomes from AI.

This assumption is circulating between both users and researchers, and rapid research is being studied in research circles as tools for alignment, safety and tone control, even as user habits reinforce and reconstruct these expectations.

For example, a 2024 study from Japan found that rapid politeness could change the behavior of large language models. GPT-3.5, GPT-4, PALM-2, and Claude-2 were tested in English, Chinese and Japanese tasks, and each prompt was rewritten with three levels of politeness. The author of that work observed that “dull” or “rude” language led to practical accuracy and short answers, and that moderately polite requests were more clear explanations and less rejection.

Furthermore, Microsoft recommends the careful tone of the co-pilot from performance rather than from a cultural perspective.

However, a new research paper from George Washington University challenges increasingly popular ideas, presents a mathematical framework that predicts when the output of large-scale language models will “decay,” going from consistency to misleading or dangerous content. In that context, the author claims to be polite There’s no meaningful delay Or prevent it This “collapse”.

Tip off

Researchers argue that polite language use is generally independent of the main topic of prompts and therefore does not significantly affect the focus of the model. To support this, one attention presents a detailed formulation of how a single attention updates the internal orientation when processing a new token, demonstrating that the behavior of the model is superficially in shape. Cumulative impact of tokens containing content.

As a result, it is assumed that polite language has little effect when the output of the model begins to deteriorate. What will you decide? Turning pointthe paper states that it is not the existence of a socially polite language, but the overall alignment of meaningful tokens with good or bad output paths.

An illustration of a simplified attention head that generates sequences from a user prompt. The model starts with a good token (g) and then hits a tipping point (n*) that flips to a bad token (b). Prompt’s polite terms (such as P₁, P₂, etc.) play no role in this shift, supporting the paper’s claim that courtesy has little effect on model behavior. Source: https://arxiv.org/pdf/2504.20980

In the case of truth, this result contradicts both general beliefs and the implicit logic of instruction adjustment, which presumably assumes that prompt phrases affect the interpretation of a model of user intent.

Hulkout

In this paper, we examine the internal context vector of the model (evolving compass for token selection). shift Power generation is underway. For each token, this vector is updated in direction, and the next token is selected based on which candidate matches it most closely.

When the prompt directs towards well-formed content, the response of the model remains stable and accurate. But over time, you can pull this way Go backwardssteers the model towards more and more topical, incorrect or internally conflicting outputs.

The transition point of this transition (authors define it mathematically as iterations) n*), occurs when the context vector is now more aligned with “bad” output vectors than “good”. At that stage, new tokens push the model further along the wrong path, enhancing the increasingly flawed or misleading patterns of output.

Turning point n* It is calculated by finding moments where the internal orientation of the model equally matches both the output type and the bad type. The geometry of the embedded space, shaped by both the training corpus and the user prompt, determines how quickly this crossover occurs.

FIG. 10 illustrates how tipping points n* appear within the author’s simplified model. Geometric setup (a) defines the important vectors associated with predicting when the output will flip from good to bad. In (b), the author plots their vectors using test parameters, and (c) compares the predicted chip points with the simulated results. The match is accurate and supports the researcher’s claim that collapse is mathematically inevitable when internal dynamics exceed the threshold.

According to the author, polite terms do not affect the selection of models between good and bad outputs, as they are not meaningfully related to the main subject of the prompt. Instead, it becomes part of the internal space of the model that has little to do with what the model actually determines.

When such terms are added to the prompt, the number of vectors the model considers increases, but not in a way to shift the attentional trajectory. As a result, politeness works like statistical noise: it exists, but inert, leaving the tipping point n* It has not been changed.

The author states:

‘() Whether our AI response relies on LLM training to provide token embedding and LLM training to provide substantial tokens of our prompts is not whether we are polite to it. ”

The model used in the new work is intentionally narrow and focuses on a single attention head with linear token dynamics. This is a simplified setup that updates internal states through direct vector additions without nonlinear transformations or gating.

This simplified setup allows the author to calculate accurate results and provides a clear geometric image of how and when the model’s output can move from good to bad. Their tests derive an equation that they derive to predict that shifts are consistent with what the model actually does.

chat..?

However, this level of accuracy only works because the model is intentionally kept simple. The authors acknowledge that their conclusions need to be tested with more complex multi-head models such as the Claude and ChatGpt series, but they also believe that the theory remains replicable as attention heads increase.

“The question of how additional phenomena occur is attractive as the number of linked attention heads and layers is increased. However, transitions within a single attention head still occur and can be amplified and/or synchronized by coupling.

The illustration of how the predicted turning point n* changes will vary depending on how strong the prompt is and whether it leans towards good or bad content. The surface comes from the author’s approximation equation, indicating that polite terms that neither side clearly support has little effect on when the collapse occurs. The marked value (n* = 10) matches previous simulations that support internal logic in the model.

What remains unknown is whether the same mechanism can withstand a jump to modern transformer architectures. Multi-head attention introduces interactions between special heads. This may be buffered or masked for the types of chip operations listed.

The authors acknowledge this complexity, but argue that the heads of attention are often loosely connected, and that internal disintegration can be done as they model. Enhancement Rather than being suppressed by a full-scale system.

Without empirical testing across model extensions or production LLMs, claims remain unverified. However, the mechanism appears to be accurate enough to support subsequent research initiatives, providing the author with a clear opportunity to challenge or confirm the theory of scale.

Sign off

At this point, the topic of consumer politeness to LLM appears to be approached from the (practical) perspective that trained systems may respond more usefully and politely to research. Alternatively, a meaningless, dull communication style with such a system runs the risk of spreading through the power of habits into the true social relationships of users.

Perhaps LLM has not yet been fully used in the real-world social context of the research literature to confirm the latter case. However, the new paper raises interesting doubts about the benefits of this type of personification AI system.

A study from Stanford last October (as opposed to the 2020 study) addressed the risk of lowering language meaning further, and concluded that “rote” politeness ultimately loses its original social meaning.

(a) The latter does not have the meaningful commitment or intention behind the statement, so statements that appear friendly or authentic to human speakers may be undesirable when arising from an AI system.

However, a 2025 survey from future publications shows that around 67% of Americans say they are polite to AI chatbots. Most people said it was simply “the right thing”, but 12% confessed that they were cautious.

* My conversion from author inline quotes to hyperlinks. To some extent, the hyperlinks are optional/example, as the authors of a particular point link to a wide range of footnote citations rather than to a specific publication.

The first release was released on Wednesday, April 30th, 2025. Wednesday, April 30th, 2025, 15:29:00, revised for formatting.