April 28, 2022

This article has been reviewed according to Science X's editorial process and policies. Editors have highlighted the following attributes while ensuring the content's credibility:

New language-learning algorithms risk reinforcing inequalities, social fragmentation

Credit: Pixabay/CC0 Public Domain
× close
Credit: Pixabay/CC0 Public Domain

The use of large language models could transform many facets of modern life, including how policymakers assess public sentiment about pending legislation, how patients evaluate their medical care and how scientists could translate research findings across languages.

Yet, new research from the University of Michigan finds that while there's great potential for these machine learning algorithms to benefit society, they likely could reinforce inequalities, tax the environment and place still more power in the hands of tech giants.

Large language models, or LLMs, can recognize, summarize, translate, predict and generate on the basis of very large text-based datasets, and are likely to provide the most convincing computer-generated imitation of human language yet.

A report by the Technology Assessment Project at the Science, Technology, and Public Policy (STPP) program at the Gerald R. Ford School of Public Policy raises concerns about the many ways that LLMs can cause profoundly negative outcomes.

The report, "What's in the Chatterbox? Large Language Models, Why They Matter, and What We Should Do About Them," anticipates the transformative social change they could produce:

"Our analysis shows that LLMs could empower communities and democratize knowledge, but right now they are unlikely to achieve this potential. The harms can be mitigated, but not without new rules and regulations about how these technologies are created and used," said STPP director Shobita Parthasarathy, professor of .

The report uses the analogical case study method to analyze LLM development and adoption, by examining the history of similar past technologies—in terms of form, function and impacts—to anticipate the implications of emerging technologies. STPP pioneered this method in previous reports on facial recognition technologies in K-12 schools and vaccine hesitancy.

"Technologies can be implemented widely and then the negative consequences can take years to correct. LLMs present many of the same equity, environmental and access issues we have seen in previous cases," said Johanna Okerlund, STPP postdoctoral fellow and report co-author.

LLMs are much larger than their artificial intelligence predecessors, both in terms of the massive amounts of data developers use to train them and the millions of complex word patterns and associations the models contain. They are more advanced than previous natural language processing efforts because they can complete many types of tasks without being specifically trained for each, which makes any single LLM widely applicable.

Numerous factors create the circumstances for built-in inequity, according to the report.

"LLMs require enormous resources in terms of finances, infrastructure, personnel and computational resources including 360,000 gallons of water a day and immense electricity, infrastructure and rare earth material usage," the report says.

Only a handful of tech companies can afford to build them, and their construction is likely to disproportionately burden already marginalized communities. The authors also say they worry "because LLM design is likely to distort or devalue the needs of marginalized communities … LLMs might actually alienate them further from social institutions."

Researchers also note the vast majority of models are based on texts in English, and, to a lesser extent, Chinese.

"This means that LLMs are unlikely to achieve their translation goals (even to and from English and Chinese) and will be less useful for those who are not English or Chinese dominant," the report says.

One example of the analogical case study method's utility is to examine how is already embedded in many medical devices including the spirometer, which is used to measure lung function: "The technology considers race in its assessment of 'normal' lung function, falsely assuming that Black people naturally have lower lung function than their white counterparts, and making it more difficult for them to access treatment."

"We expect similar scenarios in other domains including criminal justice, housing and education, where biases and discrimination enshrined in historical texts are likely to generate advice that perpetuates inequities in resource allocation," the report says.

"LLMs' thirst for data will jeopardize privacy, and customary methods for establishing informed consent will no longer work.

"Because they collect enormous amounts of data, LLMs will likely be able to triangulate bits of disconnected information about individuals including mental health status or political opinions to develop a full, personalized picture of actual people, their families or communities. In a world with LLMs, the customary method for ethical data collection—individual informed consent—no longer makes sense" and can cross to unethical methods of data collection in order to diversify the .

LLMs will affect many sectors, but the report dives deeply into one to provide an example: How they will influence and practice. The authors suggest that academic publishers, which own most research publications, will construct their own LLMs and use them to increase their monopoly power.

Meanwhile, researchers will need to develop standard protocols on how to scrutinize insights generated by LLMs and how to cite output so others can replicate the results. Scientific inquiry will likely shift to finding patterns in big data rather than establishing causal relationships. And scientific evaluation systems relying on LLMs will probably not be able to identify truly novel work, a task that is already quite difficult for human beings.

Given these likely outcomes, the authors suspect scientists will come to distrust LLMs.

The report concludes with , which include:

The report also outlines specific recommendations for the scientific community and a Developer's Code of Conduct.

"Both LLM and app developers must recognize their public responsibilities and try to maximize the benefits of these technologies while minimizing the risks," the authors wrote.

Load comments (0)