February 6, 2020

Crawling the invisible web genetically

by David Bradley, Inderscience

The world-wide web has grown immensely since its academic and research inception in 1991, and its subsequent expansion into the public and commercial domains. Initially, it was a network of hyperlinked pages and other digital resources. Very early on, it became obvious that some resources were so vast that it would make more sense to generate the materials required by individual users dynamically rather than storing every single digital entity as a unique item.

Today, countless websites are dynamic, every unique visit draws information and data dynamically from a back-end database and presents it to the user on-demand. Whereas static pages can easily be spidered by search engines, database content that drives dynamic websites is inaccessible. Even as long ago as 2001 when there were already several terabytes of public, static web data, it was estimated that the "invisible web," or "hidden web," not to be confused with the "dark web," was some 550 times bigger than the visible resources.

Writing in the International Journal of Business Intelligence and Data Mining, a team from India describes how they have developed a genetic algorithm-based intelligent multiagent architecture that can extract information from the invisible web. The tools could allow even materials that are purportedly off-limits to conventional search engines to be spidered, scraped, and cataloged for a wide range of applications.

D. Weslin of Bharathiar University and Joshva Devadas of Vellore Institute of Technology describe the details and benefits of their approach in the latest issue of the journal. "The experimental results show that the proposed architecture provides better precision and recall than the existing web crawlers," the team writes.

More information: D. Weslin et al. Genetic algorithm-based intelligent multiagent architecture for extracting information from hidden web databases, International Journal of Business Intelligence and Data Mining (2020). DOI: 10.1504/IJBIDM.2020.104740

Provided by Inderscience

Citation: Crawling the invisible web genetically (2020, February 6) retrieved 17 July 2024 from https://techxplore.com/news/2020-02-invisible-web-genetically.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Data mining hyphenated headlines: Improving named entity recognition

4 shares

Feedback to editors

Engineers develop technique to pinpoint nanoscale 'hot spots' in electronics to improve their longevity

4 hours ago

Researchers create insect-inspired autonomous navigation strategy for tiny, lightweight robots

4 hours ago

Soft, stretchy 'jelly batteries' inspired by electric eels

4 hours ago

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

4 hours ago

The magnet trick: New invention makes vibrations disappear

6 hours ago

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

6 hours ago

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

7 hours ago

Scientists bridge the 'valley of death' in carbon capture technologies

7 hours ago

Flexible electronics researchers develop a completely stretchy lithium-ion battery

10 hours ago

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

11 hours ago

Load comments (0)

Crawling the invisible web genetically

Engineers develop technique to pinpoint nanoscale 'hot spots' in electronics to improve their longevity

Researchers create insect-inspired autonomous navigation strategy for tiny, lightweight robots

Soft, stretchy 'jelly batteries' inspired by electric eels

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Data mining hyphenated headlines: Improving named entity recognition

Detecting malicious web pages

How trustworthy is that website?

Researcher: Data on 267 million Facebook users exposed

Entropy and search engines

Europe can't enforce the right to be forgotten

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New technique to assess a general-purpose AI model's reliability before it's deployed

Phys.org

Medical Xpress

Science X

Crawling the invisible web genetically

Engineers develop technique to pinpoint nanoscale 'hot spots' in electronics to improve their longevity

Researchers create insect-inspired autonomous navigation strategy for tiny, lightweight robots

Soft, stretchy 'jelly batteries' inspired by electric eels

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Related Stories

Data mining hyphenated headlines: Improving named entity recognition

Detecting malicious web pages

How trustworthy is that website?

Researcher: Data on 267 million Facebook users exposed

Entropy and search engines

Europe can't enforce the right to be forgotten

Recommended for you

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Astronomy methods applied to reflections in eyes could help with spotting deepfakes

New system enables intuitive teleoperation of a robotic manipulator in real-time

Microsoft unveils software that allows LLMs to work with spreadsheets

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

New technique to assess a general-purpose AI model's reliability before it's deployed

Your Privacy