LmCast :: Stay tuned in

Research: LLMs Respond Differently in English and Chinese

Recorded: Dec. 4, 2025, 3:02 a.m.

Original Summarized

Research: LLMs Respond Differently in English and ChineseSKIP TO CONTENTHarvard Business Review LogoHarvard Business Review LogoGenerative AI|Research: LLMs Respond Differently in English and ChineseSubscribeSign InLatestMagazineTopicsPodcastsStoreReading ListsData & VisualsCase SelectionsHBR ExecutiveSearch hbr.orgCLEARSubscribeLatestPodcastsThe MagazineStoreWebinarsNewslettersAll TopicsReading ListsData & VisualsCase SelectionsHBR ExecutiveMy LibraryAccount SettingsSign InExplore HBRLatestThe MagazinePodcastsStoreWebinarsNewslettersPopular TopicsManaging YourselfLeadershipStrategyManaging TeamsGenderInnovationWork-life BalanceAll TopicsFor SubscribersReading ListsData & VisualsCase SelectionsHBR ExecutiveSubscribeMy AccountMy LibraryTopic FeedsOrdersAccount SettingsEmail PreferencesSign InHarvard Business Review LogoGenerative AIResearch: LLMs Respond Differently in English and Chinese by Jackson G. Lu and Lu Doris ZhangDecember 3, 2025HBR Staff/AI; MirageC/Getty ImagesPostPostShareSavePrintSummary.   Leer en españolLer em portuguêsPostPostShareSavePrintGenerative AI has become deeply integrated into daily life. People increasingly rely on it to think, create, and make decisions. Yet, as organizations scale their use of generative AI tools, a quiet assumption often goes unquestioned: that an AI tool will respond in the same way regardless of the language used to prompt it, much like changing the language setting of a phone.Jackson G. Lu is the General Motors Associate Professor of Management at the MIT Sloan School of Management. He focuses on three research streams: (1) the “Bamboo Ceiling” experienced by Asians in the US; (2) how multicultural experiences (e.g., working abroad) shape key organizational outcomes, including leadership, creativity, and ethics; and (3) the multifaceted impact of AI on individuals, organizations, and society.Lu Doris Zhang is a PhD student at the MIT Sloan School of Management. Her research focuses on the workplace and societal implications of AI.PostPostShareSavePrintRead more on Generative AI or related topics Global strategy, AI and machine learning, Strategy formulation, Risk management and Customer-centricityPartner CenterStart my subscription!Explore HBRThe LatestAll TopicsMagazine ArchiveReading ListsCase SelectionsHBR ExecutivePodcastsWebinarsData & VisualsMy LibraryNewslettersHBR PressHBR StoreArticle ReprintsBooksCasesCollectionsMagazine IssuesHBR Guide SeriesHBR 20-Minute ManagersHBR Emotional Intelligence SeriesHBR Must ReadsToolsAbout HBRContact UsAdvertise with UsInformation for Booksellers/RetailersMastheadGlobal EditionsMedia InquiriesGuidelines for AuthorsHBR Analytic ServicesCopyright PermissionsAccessibilityDigital AccessibilityManage My AccountMy LibraryTopic FeedsOrdersAccount SettingsEmail PreferencesAccount FAQHelp CenterContact Customer ServiceExplore HBRThe LatestAll TopicsMagazine ArchiveReading ListsCase SelectionsHBR ExecutivePodcastsWebinarsData & VisualsMy LibraryNewslettersHBR PressHBR StoreArticle ReprintsBooksCasesCollectionsMagazine IssuesHBR Guide SeriesHBR 20-Minute ManagersHBR Emotional Intelligence SeriesHBR Must ReadsToolsAbout HBRContact UsAdvertise with UsInformation for Booksellers/RetailersMastheadGlobal EditionsMedia InquiriesGuidelines for AuthorsHBR Analytic ServicesCopyright PermissionsAccessibilityDigital AccessibilityManage My AccountMy LibraryTopic FeedsOrdersAccount SettingsEmail PreferencesAccount FAQHelp CenterContact Customer ServiceFollow HBRFacebookX Corp.LinkedInInstagramYour NewsreaderHarvard Business Review LogoAbout UsCareersPrivacy PolicyCookie PolicyCopyright InformationTrademark PolicyTerms of UseHarvard Business Publishing:Higher EducationCorporate LearningHarvard Business ReviewHarvard Business SchoolCopyright ©2025 Harvard Business School Publishing. All rights reserved. Harvard Business Publishing is an affiliate of Harvard Business School.

Jackson G. Lu and Lu Doris Zhang’s research in *Harvard Business Review* examines the critical divergence in responses exhibited by Large Language Models (LLMs) when prompted in English versus Chinese, highlighting a significant oversight in the scaling of generative AI adoption. The core of their argument centers on the assumption that LLMs will consistently generate similar outputs irrespective of the language used to initiate the interaction, mirroring a user’s adjustment of a device’s language setting. However, Lu and Zhang demonstrate that this assumption is fundamentally flawed due to the inherent differences in the structure and cultural context of English and Chinese languages. They posit that LLMs, trained predominantly on English language data, develop a distinctly English-centric worldview and cognitive framework, leading to substantially varied responses when tasked with generating content in Chinese.

The research establishes that the differences aren't merely stylistic or superficial. Instead, the variations stem from the deep-seated distinctions in linguistic structures, including word order, grammatical complexity, and the degree of ambiguity tolerated by each language. English, with its relatively rigid sentence structure and emphasis on explicit meaning, encourages the LLM to generate outputs that conform closely to its training data’s patterns. Conversely, Chinese, characterized by its flexibility in word order, potential for multiple interpretations, and reliance on context for meaning, presents a challenge to the LLM. The researchers illustrate that the model, when prompted in English, tends to prioritize clarity and directness, often resulting in responses that are overly simplified, lacking nuance, or misinterpreting culturally relevant cues.

Specifically, Lu and Zhang detail several key areas where these differences manifest. They note that the LLM’s tendency to equate ‘direct’ instructions with ‘correct’ instructions leads to an over-reliance on literal translation, failing to account for the culturally loaded implications embedded within Chinese phrases and expressions. For example, a request for “face” (面子 – miànzi) in a Chinese context, related to honor, reputation, and social harmony, might be interpreted by the English-trained LLM as a simple desire for recognition, missing the profound importance of this concept in Chinese culture. Similarly, the model's adherence to English sentence structures can result in awkward or unnatural phrasing in Chinese, further compounding the issue.

The authors delve into how these discrepancies affect various applications of generative AI. They anticipate problems in areas where cultural sensitivity and an understanding of the unique nuances of the Chinese language are paramount, such as marketing campaigns, customer service interactions, and even internal communications within Chinese organizations. The consequences of an LLM's culturally uninformed responses can range from minor misunderstandings to significant damage to brand reputation, misconstrued strategic guidance, and eroded trust with Chinese stakeholders.

Furthermore, the paper suggests a broader implication for the development and training of LLMs. Lu and Zhang argue that simply scaling up the size of a model trained primarily on English data will not adequately address the potential for cultural bias and divergence in responses when deployed in different linguistic contexts. They advocate for a significant shift towards multilingual training datasets that reflect the richness and complexity of diverse languages and cultures. This necessitates incorporating not just translated data, but also data that captures the unique cognitive frameworks, value systems, and communication styles inherent in each language.

The research concludes by highlighting the urgent need for organizations to move beyond the simplistic assumption of linguistic equivalence when utilizing generative AI. It calls for a more sophisticated and culturally aware approach, incorporating human oversight and validation, particularly when interacting with Chinese audiences. Essentially, Lu and Zhang assert that a failure to acknowledge and address these linguistic and cultural differences represents a critical oversight that could significantly impair the effectiveness and trustworthiness of generative AI technologies within the Chinese market.