January 31, 2024

Research team launches first-of-its-kind mini AI model with three trillion-token punch

by Singapore University of Technology and Design

SUTD launches first-of-its-kind mini AI model with three trillion-token punch — TinyLlama--the mini AI model with three trillion-token punch. Credit: SUTD

It's called TinyLlama and it's taken the research world by storm because of how much power it packs.

Developed by Associate Professor Lu Wei of Singapore University of Technology and Design (SUTD), research assistant Mr. Zhang Peiyuan, and Ph.D. students, Mr. Zeng Guangtao, and Mr. Wang Tianduo, TinyLlama is a 1.1 billion parameter open-sourced small language model that has outperformed other open-source models of comparable sizes across several benchmarks. A total of three trillion tokens of datasets were pre-trained on TinyLlama within just four months.

Current large language models (LLMs) such as ChatGPT or Google Bard, developed by large technology firms such as OpenAI or Google, are managed by thousands or even tens of thousands of graphic processing units (GPUs) and require users to connect online to their massive servers. TinyLlama, in contrast, is built on just 16 GPUs and takes up only 550MB of Random Access Memory (RAM). In other words, TinyLlama can readily be deployed on mobile devices, enabling everyone to carry a "mini ChatGPT" in their pocket wherever they go.

According to Marktechpost, a California-based Artificial Intelligence news platform with a community of over 1.5 million AI professionals and developers, TinyLlama's performance in common-sense reasoning and problem-solving tasks highlights the potential of smaller models to achieve high performance when trained with a substantial amount of data. It also opens up new possibilities for research and application in natural language processing, especially in scenarios where computational resources are limited.

Said Prof Lu, also the Director of the StatNLP Research Group, which focuses on natural language processing research, "The importance of small language models cannot be understated, and the reason why TinyLlama was specifically created to be open-sourced was that it will democratize language models by allowing smaller tech companies and research labs to build and develop their own models for a variety of applications. As researchers, our plan is to lay the foundations for small language models, with the aim of making significant scientific advancements in the field.

"Smaller tech firms as well as individual researchers and developers are increasingly demanding small language models that require less resources to run. These models, such as TinyLlama, are therefore more feasible for them to build and more optimal for edge devices such as mobile phones. The compactness of such models also allows them to cater to a multitude of applications that demand real-time machine translation without an internet connection. This means that users can access the language model offline. They need not send their personal information to the server when using it, and through the technique called 'fine-tuning,' we are able to improve it further," Prof Lu added.

TinyLlama's innovative approach lies in its construction. It is based on the architecture and tokenizer of Llama 2 and incorporates several state-of-the-art technologies. One such technology is FlashAttention, which enhances computational efficiency. Despite its smaller size than some of its predecessors, TinyLlama exhibits exceptional performance in various downstream tasks. It has successfully challenged the notion that larger models are always better, demonstrating that models with fewer parameters can still achieve high levels of effectiveness when trained with extensive and diverse datasets.

With its compact architecture and exceptional performance, TinyLlama can enable end-user applications on mobile devices and serve as a lightweight platform for language model research.

Firms such as leading global consumer internet company Sea Limited and DSO National Laboratories, a national defense research and development organization, have downloaded the TinyLlama source code from GitHub for research purposes.

Dr. Liu Qian, Research Scientist and Team Lead, Natural Language Processing Group at Sea AI Lab, said, "In our language model research projects, we've utilized the TinyLlama project as a nimble and efficient testbed. Its codebase follows a compact and well-organized structure, which allows easy modifications for diverse purposes. With access to several 1B model checkpoints, we swiftly validate hypotheses, obtaining faster feedback compared to the Llama-7b models.

"Notably, TinyLlama's optimization enhancements significantly boost GPU utilization, outperforming the Hugging Face transformers library. This combination of swift prototyping and efficient training positions TinyLlama as a valuable tool, facilitating accelerated iterations in the research community."

TinyLlama is currently available on GitHub, a platform and cloud-based service for developers to store and manage their code. It was trending as the Number One code on Hugging Face, a platform for hosting AI-related projects, out of over 460,000 models for about a week from 3 January 2024. Plans are underway to further improve TinyLlama.

Provided by Singapore University of Technology and Design

Citation: Research team launches first-of-its-kind mini AI model with three trillion-token punch (2024, January 31) retrieved 17 July 2024 from https://techxplore.com/news/2024-01-team-kind-mini-ai-trillion.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Zeroing in on the origins of bias in large language models

70 shares

Feedback to editors

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

12 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

14 hours ago

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

16 hours ago

Large language models make human-like reasoning mistakes, researchers find

16 hours ago

Unveiling a new class of synthetic fuels

17 hours ago

Microsoft unveils software that allows LLMs to work with spreadsheets

17 hours ago

New technique to assess a general-purpose AI model's reliability before it's deployed

18 hours ago

New system enables intuitive teleoperation of a robotic manipulator in real-time

20 hours ago

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

22 hours ago

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Jul 15, 2024

Load comments (0)

Research team launches first-of-its-kind mini AI model with three trillion-token punch

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Zeroing in on the origins of bias in large language models

Researchers develop large language model for medical knowledge

Q&A: ChatGPT has read almost the whole internet. That hasn't solved its diversity issues

Computer scientists introduce a new method to reduce the size of multilingual language models

New platform allows easier, cheaper, and safer interactions with large language models like ChatGPT

Research shows artificial intelligence fails in grammar

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Phys.org

Medical Xpress

Science X

Research team launches first-of-its-kind mini AI model with three trillion-token punch

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Unveiling a new class of synthetic fuels

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

New system enables intuitive teleoperation of a robotic manipulator in real-time

Recycled micro-sized silicon anodes from photovoltaic waste improve lithium-ion battery performance

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Related Stories

Zeroing in on the origins of bias in large language models

Researchers develop large language model for medical knowledge

Q&A: ChatGPT has read almost the whole internet. That hasn't solved its diversity issues

Computer scientists introduce a new method to reduce the size of multilingual language models

New platform allows easier, cheaper, and safer interactions with large language models like ChatGPT

Research shows artificial intelligence fails in grammar

Recommended for you

New system enables intuitive teleoperation of a robotic manipulator in real-time

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Microsoft unveils software that allows LLMs to work with spreadsheets

New technique to assess a general-purpose AI model's reliability before it's deployed

Large language models make human-like reasoning mistakes, researchers find

A new neural network makes decisions like a human would

Your Privacy