Prompt Politeness Affects LLM Accuracy (2025)

Recorded: May 27, 2026, 8 a.m.

Original

Summarized

[2510.04950] Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)

Learn about arXiv becoming an independent nonprofit.

We gratefully acknowledge support from the Simons Foundation, member institutions, and all contributors.
Donate

> cs > arXiv:2510.04950

Help | Advanced Search

All fields
Title
Author
Abstract
Comments
Journal reference
ACM classification
MSC classification
Report number
arXiv identifier
DOI
ORCID
arXiv author ID
Help pages
Full text

open search

open navigation menu

quick links

Computer Science > Computation and Language

arXiv:2510.04950 (cs)

[Submitted on 6 Oct 2025]
Title:Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper)
Authors:Om Dobariya, Akhil Kumar View a PDF of the paper titled Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper), by Om Dobariya and Akhil Kumar
View PDF

Abstract:The wording of natural language prompts has been shown to influence the performance of large language models (LLMs), yet the role of politeness and tone remains underexplored. In this study, we investigate how varying levels of prompt politeness affect model accuracy on multiple-choice questions. We created a dataset of 50 base questions spanning mathematics, science, and history, each rewritten into five tone variants: Very Polite, Polite, Neutral, Rude, and Very Rude, yielding 250 unique prompts. Using ChatGPT 4o, we evaluated responses across these conditions and applied paired sample t-tests to assess statistical significance. Contrary to expectations, impolite prompts consistently outperformed polite ones, with accuracy ranging from 80.8% for Very Polite prompts to 84.8% for Very Rude prompts. These findings differ from earlier studies that associated rudeness with poorer outcomes, suggesting that newer LLMs may respond differently to tonal variation. Our results highlight the importance of studying pragmatic aspects of prompting and raise broader questions about the social dimensions of human-AI interaction.

Comments:
5 pages, 3 tables; includes Limitations and Ethical Considerations sections; short paper under submission to Findings of ACL 2025

Subjects:

Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Methodology (stat.ME)

Cite as:
arXiv:2510.04950 [cs.CL]

(or
arXiv:2510.04950v1 [cs.CL] for this version)

https://doi.org/10.48550/arXiv.2510.04950

Focus to learn more

arXiv-issued DOI via DataCite

Submission history From: Om Dobariya [view email] [v1]
Mon, 6 Oct 2025 15:50:39 UTC (337 KB)

Full-text links:
Access Paper:

View a PDF of the paper titled Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy (short paper), by Om Dobariya and Akhil KumarView PDF

view license

Current browse context: cs.CL

< prev

|
next >

new
|
recent
| 2025-10

Change to browse by:

cs
cs.AI
cs.LG
cs.NE
stat
stat.ME

References & Citations

NASA ADSGoogle Scholar
Semantic Scholar

export BibTeX citation
Loading...

BibTeX formatted citation
×

Data provided by:

Bookmark

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

Links to Code Toggle

Papers with Code (What is Papers with Code?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)

Related Papers

Recommenders and Search Tools

Link to Influence Flower

Influence Flower (What are Influence Flowers?)

Core recommender toggle

CORE Recommender (What is CORE?)

Author
Venue
Institution
Topic

About arXivLabs

arXivLabs: experimental projects with community collaborators
arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.
Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.
Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

Which authors of this paper are endorsers? |
Disable MathJax (What is MathJax?)

About
Help

contact arXivClick here to contact arXiv
Contact

subscribe to arXiv mailingsClick here to subscribe
Subscribe

Web Accessibility Assistance

arXiv Operational Status

Om Dobariya and Akhil Kumar investigated the role of prompt politeness and tone in influencing the accuracy of large language models (LLMs), a factor that remains underexplored despite existing research on prompt wording. The study aimed to determine how varying levels of prompt politeness affect model performance on multiple-choice questions. To achieve this, the authors constructed a dataset consisting of 50 base questions drawn from mathematics, science, and history. These base questions were systematically rewritten into five distinct tone variants: Very Polite, Polite, Neutral, Rude, and Very Rude. This process generated a total of 250 unique prompts for evaluation. The experiment utilized ChatGPT 4o to assess the model's responses across these varied tonal conditions, and paired sample t-tests were employed to statistically evaluate the significance of the observed differences in accuracy.

Contrary to expectations, the results indicated that impolite prompts consistently yielded higher accuracy than polite ones. The measured accuracy ranged from 80.8% for the most polite prompts to 84.8% for the most rude prompts. This outcome contrasts with prior studies that commonly associated rudeness with diminished model performance. The findings suggest that newer LLMs exhibit a different response pattern to tonal variations than previously observed. Consequently, the research emphasizes the considerable importance of studying the pragmatic aspects embedded within prompting interactions and raises broader theoretical questions concerning the social dimensions inherent in human-AI communication.