August 16, 2024 report

AI researchers introduce an LLM capable of generating text outputs of up to 10,000 words

by Bob Yirka , Tech Xplore

Chinese team introduces LongWriter—an LLM capable of generating text with more than 10,000 words — As existing LLMs fail to generate long enough output, AgentWrite adopts a plan-thenwrite pipeline to obtain a sufficient length output with off-the-shelf LLMs. Credit: *arXiv* (2024). DOI: 10.48550/arxiv.2408.07055

A team of AI researchers at Tsinghua University, working with a colleague from Zhipu AI, has developed a large language model (LLM) called LongWriter that they claim is capable of generating text output of up to 10,000 words. The group has written a paper describing their efforts and new LLM, which is available on the arXiv preprint server.

As LLMs have become mainstream, many have noticed that they are not capable of generating very long answers, such as full books or manuscripts—the current limit appears to be approximately 2,000 words. The researchers suggest this is because they are all trained on short documents. In their new effort, they have found that if LLMs are changed slightly and then trained using much longer documents, they are able to produce longer documents.

To test their idea, the research teams first trained a 9-billion parameter LLM using a conventional dataset, which included documents that were mostly less than 2,000 words long. As expected, when queried, it was not able to create texts longer than 2,000 words long.

Next, the team modified a traditional LLM using a pipeline they named AgentWrite to decompose training material into subtasks as it was processed. They then assembled a dataset they named "LongWriter-6k," which is a dataset that holds 6,000 written documents ranging in length from 2,000 to 32,000 words. They then trained the modified LLM using the new dataset LongWriter-6k and found that doing so increased the word length of documents it could produce to approximately 10,000 words.

Credit: Yushi Bai et al

In reviewing the newly produced long documents generated by the LLM, the team found them to be coherent and useable in a variety of contexts. They have posted the open-source code for their model on GitHub—a move that will allow others to build on what the team in China has done. They also posted a video showing LongWriter producing a 10,000-word tourist guide for people traveling in China.

The researchers acknowledge that there are ethical considerations that must be considered now that it has been found that LLMs can generate entire research papers, books, manuscripts or perhaps even movie scripts.

More information: Yushi Bai et al, LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs, arXiv (2024). DOI: 10.48550/arxiv.2408.07055

Github: github.com/THUDM/LongWriter

Journal information: arXiv

Citation: AI researchers introduce an LLM capable of generating text outputs of up to 10,000 words (2024, August 16) retrieved 16 August 2024 from https://techxplore.com/news/2024-08-ai-llm-capable-generating-text.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Team proposes a reasoning framework aimed at improving the reliability and traceability of LLMs

7 shares

Feedback to editors

Epic launches own app store, Fortnite back for iPhones in Europe

17 minutes ago

Numerous manufacturers use insecure Android kernels, analysis shows

1 hour ago

Q&A: Could 'personhood credentials' protect people against digital imposters?

1 hour ago

Can AI add value to medical education and improve communication between physicians and patients?

3 hours ago

Large parts of Canada are ideal for future hydrogen production, global analysis suggests

4 hours ago

Transformative FiBa soft actuators pave the way for future soft robotics

4 hours ago

Predicting the implications of transforming public transport depots in China into energy hubs

7 hours ago

China's growing 'robotaxi' fleet sparks concern, wonder on streets

10 hours ago

Engineers design tiny batteries for powering cell-sized robots

22 hours ago

Leaf-like solar concentrators promise major boost in solar efficiency

22 hours ago

Load comments (0)

AI researchers introduce an LLM capable of generating text outputs of up to 10,000 words

Epic launches own app store, Fortnite back for iPhones in Europe

Numerous manufacturers use insecure Android kernels, analysis shows

Q&A: Could 'personhood credentials' protect people against digital imposters?

Can AI add value to medical education and improve communication between physicians and patients?

Large parts of Canada are ideal for future hydrogen production, global analysis suggests

Transformative FiBa soft actuators pave the way for future soft robotics

Predicting the implications of transforming public transport depots in China into energy hubs

China's growing 'robotaxi' fleet sparks concern, wonder on streets

Engineers design tiny batteries for powering cell-sized robots

Leaf-like solar concentrators promise major boost in solar efficiency

Team proposes a reasoning framework aimed at improving the reliability and traceability of LLMs

AI chatbots found to use racist stereotypes even after anti-racism training

Amazon unveils largest text-to-speech model ever made

Ethicists wonder if LLM makers have a legal duty to ensure reliability

Using AI to train AI: Model collapse could be coming for LLMs, say researchers

Research provides curated bibliographic dataset of advances in health AI research

Q&A: Could 'personhood credentials' protect people against digital imposters?

Can AI add value to medical education and improve communication between physicians and patients?

Why does AI beat humans at the strategy game Diplomacy?

A two-stage framework to improve LLM-based anomaly detection and reactive planning

'AI Scientist' model designed to conduct scientific research autonomously

Robot planning tool accounts for human carelessness

Phys.org

Medical Xpress

Science X

AI researchers introduce an LLM capable of generating text outputs of up to 10,000 words

Epic launches own app store, Fortnite back for iPhones in Europe

Numerous manufacturers use insecure Android kernels, analysis shows

Q&A: Could 'personhood credentials' protect people against digital imposters?

Can AI add value to medical education and improve communication between physicians and patients?

Large parts of Canada are ideal for future hydrogen production, global analysis suggests

Transformative FiBa soft actuators pave the way for future soft robotics

Predicting the implications of transforming public transport depots in China into energy hubs

China's growing 'robotaxi' fleet sparks concern, wonder on streets

Engineers design tiny batteries for powering cell-sized robots

Leaf-like solar concentrators promise major boost in solar efficiency

Related Stories

Team proposes a reasoning framework aimed at improving the reliability and traceability of LLMs

AI chatbots found to use racist stereotypes even after anti-racism training

Amazon unveils largest text-to-speech model ever made

Ethicists wonder if LLM makers have a legal duty to ensure reliability

Using AI to train AI: Model collapse could be coming for LLMs, say researchers

Research provides curated bibliographic dataset of advances in health AI research

Recommended for you

Q&A: Could 'personhood credentials' protect people against digital imposters?

Can AI add value to medical education and improve communication between physicians and patients?

Why does AI beat humans at the strategy game Diplomacy?

A two-stage framework to improve LLM-based anomaly detection and reactive planning

'AI Scientist' model designed to conduct scientific research autonomously

Robot planning tool accounts for human carelessness

Your Privacy