OpenAI distinguishes reasons for the security of LLMs, without relying on specific safeguards.
In a recent publication, OpenAI, a leading research organisation in artificial intelligence, has unveiled a paper titled 'Why Language Models Hallucinate'. This paper aims to shed light on a pervasive issue in AI, seeking to explain the concept of hallucinations in language models.
The paper delves into the statistical inevitability of errors in pre-training and conceptual errors in post-training incentives. Language models, it explains, do not learn absolute truths, but probabilities of which token follows another. This can lead to the production of plausible but false or contradictory statements, known as hallucinations.
OpenAI's strategy to build trust is evident in the publication of this paper. The behaviour of language models has shifted from being cautious and emphasizing uncertainties to appearing more authoritative, with hallucinations being tolerated and even encouraged. This trend, the paper suggests, is not just a statistical problem, but also a regulatory problem.
The quality and origin of training data are significant factors in hallucinations. Training data sources include publicly accessible repositories, Wikipedia dumps, forums, blog posts, and large amounts of GitHub. However, the paper warns of the potential flaws, outdatedness, or manipulation in these data sources, which can influence the behaviour of language models.
One example given in the paper is a points system, where answers above a required confidence threshold earn plus points, an 'I don't know' answer results in no points, and answers below the threshold (assuming 90 percent) incur negative points. This system encourages language models to provide confident, albeit potentially incorrect, answers.
The problem with benchmarks, originally emerging from research, has become a marketing tool. These benchmarks decide which language model is perceived as leading, influencing investors, media, and customers. This, in turn, affects the development strategies of providers, leading to a systematic incentive to guess.
OpenAI proposes a correction called Confidence Targets, where a model should only respond if it exceeds a certain security threshold, and wrong answers are penalized. This approach aims to encourage language models to be more cautious and provide more accurate responses.
The paper also highlights the issue of targeted data poisoning, where prepared content can influence the behaviour of later models. This raises concerns about the potential for manipulation and the need for robust regulations.
Collaborations with universities, peer reviews, and mathematical proofs are intended to convey seriousness to the public, especially in light of OpenAI's growing legal challenges and CEO Sam Altman's admission of a possible AI bubble. The paper was written in collaboration with researchers from Georgia Tech University.
In conclusion, OpenAI's paper provides valuable insights into the issue of hallucinations in AI. It underscores the need for careful consideration of training data, the development of robust regulations, and the importance of accurate benchmarks in the field of AI.