Do We Get Better Answers Querying Models in English?

Gábor Bíró • 2024. December 31.

7 阅读时间

When using Large Language Models (LLMs) like GPT-4o or Claude Sonnet, a common question arises, particularly for the vast number of users worldwide who interact with these tools in languages other than English: which language should one use to achieve the most effective results? While the multilingual capabilities of these models allow for effective communication in numerous languages, their performance often seems diminished compared to interactions conducted purely in English. This exploration delves into why that might be the case and when switching to English could be beneficial.

Do We Get Better Answers Querying Models in English?

来源: 作者原创

The Foundations of Multilingual Capabilities

The training of Large Language Models is typically dominated by English-language data, although multilingual data is also used to enable functionality across different languages. The dominance of English in digital content and scientific publications significantly influences the models' linguistic abilities. For example, the training dataset for GPT-3 consisted of nearly 93% English content (this was the last official data released regarding OpenAI's models).

Data Dominance: The proportion of data used during training determines the model's competence in a given language. For languages with less representation (e.g., Hungarian, Danish, Slovak, many African languages), models may provide less accurate answers.
Linguistic Structures and Cultural Differences: Varying grammatical rules and cultural specificities make it harder for models to generalize, especially for tasks requiring cultural context.

Although Hungarian is not among the languages with the largest number of speakers (like English or Chinese), most models perform at a high level in Hungarian. This is because the training datasets contain a sufficient amount of Hungarian text to allow for the generation of accurate and natural responses, though these responses might sometimes be less detailed or natural-sounding than those in English. The Hungarian language is rich in idiomatic expressions and slang, which can occasionally pose challenges for the models.

Current advanced LLMs use various techniques and fine-tuning to optimize responses in languages other than English, but their performance still significantly depends on the input language and the type of task. Research distinguishes between the following two task types:

Translation-equivariant tasks: For these tasks, the correct answer does not depend on the input language. Examples include mathematical questions and factual queries. LLMs tend to perform relatively consistently in these areas across languages.
Translation-variant tasks: These include problems that are language-specific, such as wordplay, grammatical peculiarities, or cultural references. Performance on these can vary greatly depending on the language.

Do LLMs Translate Non-English Texts into English Internally?

The concept behind the operation of modern Large Language Models (LLMs) is that they do not translate from other languages internally but rather generate responses directly in the target language. This approach offers several advantages that contribute to more accurate, faster, and more natural interactions. When an LLM is trained, it processes vast amounts of text data (as mentioned earlier) written in various languages. The model doesn't store text data or explicitly memorize examples; instead, it learns patterns, statistical relationships, and correlations. Consequently, when given a question or task, the model uses these learned patterns to produce the answer directly in the target language, without first translating it to another language.

Benefits of Skipping the Translation Step

Reduced Potential for Error: During translation, the meaning of the source language might not be perfectly conveyed in the target language, especially due to cultural or grammatical differences. Direct generation eliminates this issue as the model doesn't act as an "intermediary" but focuses on generating the response in the target language.
More Natural Language Use: LLMs can consider the specific characteristics of the target language, such as idiomatic expressions, local customs, and grammatical rules. This is particularly important for producing natural and understandable text.
Faster Response Time: Skipping the translation step reduces the time needed to generate a response, as the final answer is created in a single step.

Language Fine-tuning: The general capabilities of a multilingual model can be further improved through targeted fine-tuning to generate even more accurate responses in a specific language. Embeddings and Context Handling: LLMs work with text embeddings, which are mathematical representations expressing the meaning of words, phrases, and sentences. This allows the model to interpret the context directly in the target language and create an appropriate response.

What Happens if Only English Sources Were Available for a Specific Topic?

When a Large Language Model (LLM) is trained on a specific topic – say, chemistry – using exclusively English-language sources, the model might still be able to respond in other languages, such as Hungarian. However, the quality of these responses depends on several factors that influence accuracy and naturalness.

Model Capabilities and Limitations

One advantage of modern LLMs is their ability to transfer knowledge acquired in one language to others. This "cross-lingual transfer" means the model can generate responses in Hungarian based on English sources. However, this is not always flawless:

Inaccuracies: Concepts might lose their original meaning during the transfer, or the model might use inappropriate Hungarian terms.
Translation Effect: Sometimes, the responses can sound excessively "translation-like," resulting in less natural phrasing.

Handling Terminology

Managing technical terminology is particularly important in fields like chemistry, medicine, or technology. Models trained primarily on English sources might handle terms as follows:

Direct Borrowing: English terms might appear unchanged in Hungarian responses, e.g., "chemical bonding" appearing instead of a translation.
Translation or Adaptation: If the model has received adequate Hungarian training, it will try to find the Hungarian equivalents, e.g., "chemical bonding" → "kémiai kötés".

The Impact of Hungarian Training Data

If very little or no Hungarian text data was used for training the model on a specific topic, like chemistry, the following problems might arise:

Inaccurate Answers: The model attempts to generate the Hungarian response based on the English context, which can lead to inaccuracies.
Unnatural Language: Responses might sound overly formal or stiff because the model lacks sufficient Hungarian examples for natural phrasing.

Lack of Context

The absence of Hungarian context makes it difficult for the model to consider the cultural and stylistic nuances of the language, which can lead to:

Stylistic Differences: Responses may not fully align with standard Hungarian usage.
Vocabulary Errors: A specific technical term might appear incorrectly or in an unconventional way.

When Is It Worth Asking in English?

For specific or technical topics, meaning subjects that require highly detailed expert knowledge – such as chemistry, physics, medicine, or technology – asking questions in English is more likely to yield detailed and accurate answers.

Due to the abundance of English sources, the model is better equipped to process and structure the information.
Many technical terms originated in English, making them easier to understand and explain in their original context.

Lack of Hungarian Sources

If the model's training lacked Hungarian sources for a particular topic, Hungarian responses might sometimes be less accurate. Asking in English allows the model to directly utilize the information present in its English-language training database.

Example of Differences

The following example shows how we might receive a more detailed answer to the same question in English:

In Hungarian:
„Mi a fotoszintézis?”
Response:
„A fotoszintézis egy olyan folyamat, amelyben a növények napfény segítségével szerves anyagokat állítanak elő.” (Photosynthesis is a process in which plants produce organic matter using sunlight.)

In English:
„What is photosynthesis?”
Response:
„Photosynthesis is the process by which green plants and some other organisms use sunlight to synthesize foods with the help of chlorophyll, converting carbon dioxide and water into glucose and oxygen.”

The English response explains the process in greater depth, including details about the chemical reaction participants, which might be omitted in the Hungarian answer.

So, the level of detail in responses can vary by subject area. In everyday life, we can group when it might be better to ask in English as follows:

General Topics: Similar accuracy in both languages.
Specialized Fields: Generally more precise terminology in English.
Technical Documentation: May be more detailed in English.

An intermediate solution could be to ask the question in Hungarian but indicate that, due to the complexity of the topic, an English answer is acceptable. This way, one can achieve nearly the same level of detail as if the question had been asked in English from the start.

Summary

Using English is particularly advantageous when high accuracy and deeper detail are required. However, it's important to note that the continuous improvement of Hungarian responses – thanks to the advancement of multilingual LLMs – increasingly allows for natural and accurate information retrieval in Hungarian as well. It's clear that how LLMs function, cross-lingual transfer, and the handling of technical terminology are factors determining the quality and usability of responses. Choosing the appropriate language can be key to achieving optimal results.

BYD bemutatta a 2092 kilométeres hatótávú hibridet

Gábor Bíró • 2024. June 09.

A kínai autógyártó, a Warren Buffett által támogatott BYD, két új plug-in hibrid szedánt, a Qin L és a Seal 06 modellt mutatta be, amelyek lenyűgöző, több mint 1300 mérföldes (2092 km) hatótávval büszkélkedhetnek egyetlen tank üzemanyaggal és teljesen feltöltött akkumulátorral. Ez a figyelemre méltó teljesítmény új mércét állít az autóiparban, és versenyképes helyzetbe hozza a BYD-t az olyan nagy játékosokkal szemben, mint a Volkswagen és a Toyota a kínai piacon.

Nyílt forráskódú robotika a fenntartható kertészkedésért

Gábor Bíró • 2024. June 03.

A modern technológia új utakat nyit a fenntartható élelmiszertermelésben, és a FarmBot egy kiváló példa erre. Ez az innovatív, nyílt forráskódú precíziós mezőgazdasági projekt a robotikát és a szoftvereket kombinálva automatizálja a kis léptékű kertészkedést. Legyen szó otthoni kertekről, oktatási környezetről vagy kisebb kereskedelmi használatról, a FarmBot hatékony és fenntartható megoldást kínál az élelmiszertermelés új szintre emeléséhez.

AI a chipgyártásban

Gábor Bíró • 2024. September 23.

A Google DeepMind nemrégiben bejelentette az AlphaChip nevű nyílt forráskódú mesterséges intelligencia rendszerét, amely forradalmasítja a számítógépes chipek tervezését. Az AlphaChip képes órák alatt optimalizált chip elrendezéseket generálni, szemben a hagyományos módszerekkel, amelyek hónapokig tarthatnak.

Az új AI, amelyről mindenki beszél

Gábor Bíró • 2024. August 17.

Az xAI legújabb fejlesztése, a Grok-2, nemrégiben jelent meg az X platform prémium felhasználói számára, és azóta izgalmas vitákat generál. Az AI-asszisztens továbbfejlesztett verziója számos új képességgel büszkélkedhet, beleértve a chatelést, kódolást, logikai feladatmegoldást, valamint a képgenerálást – ez utóbbi miatt különösen sokan fejezik ki aggodalmukat.

Munka közben tanuló robotok

Gábor Bíró • 2024. August 13.

Az MIT kutatói nemrégiben egy új algoritmust fejlesztettek ki, amely az "Estimate, Extrapolate, and Situate" (EES - Becslés, extrapolálás és helymeghatározás) nevet viseli. Ez az újítás előrelépést jelent a robotika területén, ugyanis lehetővé teszi, hogy a robotok saját magukat képezzék, anélkül, hogy állandó emberi beavatkozásra lenne szükség.

A marshmallow kísérlet újraértelmezése

Gábor Bíró • 2024. September 08.

A pszichológia történetének egyik leghíresebb és legbefolyásosabb kísérlete kétségtelenül a Stanford marshmallow kísérlet (Stanford marshmallow experiment). Ez a vizsgálat, amelyet Walter Mischel és kollégái végeztek az 1960-as években, évtizedekig meghatározta, hogyan gondolkodunk az önuralomról és annak hosszú távú hatásairól. De vajon tényleg olyan egyszerű a helyzet, mint ahogy korábban gondoltuk?

Meta Llama 3

Gábor Bíró • 2024. April 22.

Meta hivatalosan is kiadta a Llama 3-at, egy új sorozatú nyílt forráskódú nagy nyelvi modellt (LLM), amely számos fejlesztést és funkciót tartalmaz, melyeket úgy terveztek, hogy több platformon javítsák az AI-alkalmazások teljesítményét. A Llama 3 modellek két méretben érhetők el: 8 milliárd és 70 milliárd paraméterrel, mindkettő előre betanított és instrukcióra hangolt változatban.