Home » AI Progress vs Productivity: The Output Gap

AI Progress vs Productivity: The Output Gap

by

A New York lawyer was sanctioned last month after submitting case law to a federal court generated by ChatGPT that turned out to be entirely fabricated, a stark illustration of the challenges posed by increasingly sophisticated artificial intelligence tools.

The case, involving alleged injuries at a nightclub, highlighted a growing concern among legal professionals and academics: the tendency of generative AI to “hallucinate” information – to produce outputs that are false, misleading, or entirely invented. The lawyer, Steven Schwartz, admitted to using the chatbot and apologized to the court, according to reports. Judge Kevin Castel imposed a $500 sanction on Schwartz and his firm.

The incident is not isolated. Experts warn that the core function of large language models (LLMs) is not truth-seeking, but rather the creation of plausible-sounding responses. This fundamental characteristic makes verifying AI-generated content critical, and increasingly difficult. According to a research guide from the University of Montana library, the problems contributing to these “hallucinations” are unlikely to be solved given the nature of LLMs.

The rise of generative AI has prompted institutions to grapple with evaluating the quality of its output. Clarivate, a company providing analytics and information services, notes that traditional quality assurance methods are often inadequate for assessing AI-generated content. The company emphasizes the need for new methods and metrics to ensure trustworthiness and accuracy, particularly in academic settings.

Evaluating AI output requires a critical approach similar to that used for online information, but with additional considerations. The SAIC library guide suggests examining the date of information creation, the authority and potential biases of the source (even if that source is an AI), and the purpose of the generated content.

The West Point library recommends employing fact-checking techniques and the SIFT method – Stop, Investigate the source, Find better coverage, and Trace claims, quotes, and media to their original context – to mitigate the spread of misinformation generated by AI.

Despite the growing awareness of these issues, a clear consensus on best practices for evaluating AI output remains elusive. The Clarivate report, published in May 2025, acknowledges that approaches are “still evolving.” No immediate plans for standardized evaluation protocols have been publicly announced by major academic or legal bodies.

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Adblock Detected

Please support us by disabling your AdBlocker extension from your browsers for our website.