May 23, 2023 report

DarkBERT learns language of the Dark Web

by Peter Grad , Tech Xplore

Two types of people thrive in the distant hidden recesses of the Dark Web. One type consists of the good guys: whistleblowers, freedom fighters, journalists, the intelligence community and law enforcement agencies, all generally waging the good fight against power, greed and tyranny.

The other type is made up of the bad guys: criminals, drug gangs, extortionists, weapons dealers, terrorists.

The Dark Web is an active mall where criminals offer a laundry list of criminal digital services providing passwords to bank accounts, Social Security numbers and other private data for identity theft, malware and cyberattack packages that can bring down a company, a town or a country.

"There's a compounding and unraveling chaos that is perpetually in motion in the Dark Web's toxic underbelly," James Scott, a senior fellow at the Institute for Critical Infrastructure Technology, once said.

Researchers at a national research university in South Korea are trying to shine a little more light on that toxic underbelly. Their report, "DarkBERT: A Language Model for the Dark Side of the Internet," appeared this week on the arXiv preprint server.

While the Dark Web comprises barely 5% of the entire internet, it draws roughly 3 million users daily. Cybersecurity Ventures predicts proceeds from global cybercrime will top $10 trillion by 2025.

To help combat that menace, researchers at the Korea Advanced Institute of Science & Technology have pre-trained a large language model on documents obtained from the Dark Web. They said such an effort was needed to bring greater efficiency to efforts to navigate the Dark Web and aid those seeking to stem criminal activity.

Researcher Youngjin Jin said his team's language model, named DarkBERT, will "combat the extreme lexical and structural diversity of the Dark Web that may be detrimental to building a proper representation of the domain."

Jin said pre-trained language models, such as the earlier BERT and RoBERTa projects based on Surface Web content (as opposed to Dark Web content), "are not ideal for … extracting useful information, due to the differences in the language used in the two domains."

"Our evaluation results show that the DarkBERT-based classification model outperforms that of known pre-trained language models," Jin said.

The researchers noted three key areas in which DarkBERT proved effective: ransomware leak detection, noteworthy thread detection in which potentially malicious threads were spotted, and threat keyword inference defined as "a set of keywords that are semantically related to threats and drug sales in the Dark Web."

Jin noted that manual review of the voluminous quantities of posts on the Dark Web would require "massive human resources." Automating such analysis would "significantly reduce the workload of security experts," especially with a language model trained in the unique vocabulary of the Dark Web, Jin said.

Law enforcement has made some progress in crushing illegal activity on the Dark Web. The first modern Dark Web marketplace, Silk Road, which made more than a billion dollars in illegal drug sales, was shut down by the FBI and its creator sentenced to life in prison. AlphaBay, which sold hundreds of millions of dollars worth of drugs and hacked data, was shut down by a multinational police effort.

But those efforts were a drop in the bucket. To achieve greater success, law enforcement must better learn the language of the cybercriminals.

DarkBERT appears to be a good step in that direction.

More information: Youngjin Jin et al, DarkBERT: A Language Model for the Dark Side of the Internet, arXiv (2023). DOI: 10.48550/arxiv.2305.08596

Journal information: arXiv

Citation: DarkBERT learns language of the Dark Web (2023, May 23) retrieved 29 June 2024 from https://techxplore.com/news/2023-05-darkbert-language-dark-web.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Dark matter can make dark atoms, say theoretical astrophysicists

97 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

23 hours ago

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (2)

DarkBERT learns language of the Dark Web

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Dark matter can make dark atoms, say theoretical astrophysicists

Nearly 300 arrested in US-Europe dark web drug bust

Become a dark energy explorer: NASA citizen science project

People are buying illegal opioids on the dark web

A new model for dark matter

Pandemic sees criminals target online shoppers: Europol

Security experts find millions of users running malware infected extensions from Google Chrome Web Store

New security loophole allows spying on internet users visiting websites and watching videos

Security experts find vulnerability in ARM's memory tagging extensions

Information-hiding camera: Optical technology conceals data in plain sight

Using GPT-4 with HPTSA method to autonomously hack zero-day security flaws

New quantum random number generator achieves 2 Gbit/s speed

Phys.org

Medical Xpress

Science X

DarkBERT learns language of the Dark Web

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

Dark matter can make dark atoms, say theoretical astrophysicists

Nearly 300 arrested in US-Europe dark web drug bust

Become a dark energy explorer: NASA citizen science project

People are buying illegal opioids on the dark web

A new model for dark matter

Pandemic sees criminals target online shoppers: Europol

Recommended for you

Security experts find millions of users running malware infected extensions from Google Chrome Web Store

New security loophole allows spying on internet users visiting websites and watching videos

Security experts find vulnerability in ARM's memory tagging extensions

Information-hiding camera: Optical technology conceals data in plain sight

Using GPT-4 with HPTSA method to autonomously hack zero-day security flaws

New quantum random number generator achieves 2 Gbit/s speed

Your Privacy