June 3, 2021

Let's talk about the elephant in the data

You would not be surprised to see an elephant in the savanna or a plate in your kitchen. Based on your prior experiences and knowledge, you know that is where elephants and plates are often to be found. If you saw a mysterious object in your kitchen, how would you figure out what it was? You would rely on your expectations or prior knowledge. Should a computer approach the problem in the same way? The answer may surprise you. Cold Spring Harbor Laboratory Professor Partha Mitra described how he views problems like these in a "Perspective" in Nature Machine Intelligence. He hopes his insights will help researchers teach computers how to analyze complex systems more effectively.

Mitra thinks it helps to understand the nature of knowledge. Mathematically speaking, many data scientists try to create a model that can "fit an elephant," or a set of complex data points. Mitra asks researchers to consider what philosophical framework would work best for a particular machine learning task: "In philosophical terms, the idea is that there are these two extremes. One, you could say "rationalist," and the other, "empiricist" points of view. And really, it's about the role of prior knowledge or prior assumptions."

Rationalists versus empiricists

A rationalist views the world through the lens of prior knowledge. They expect a plate to be in a kitchen and an elephant in a savanna.

An empiricist analyzes the data exactly as it is presented. When they visit the savanna, they no more expect to see an elephant than they do a plate.

If a rationalist came across this set of data points in the kitchen, they might at first be inclined to view it as a plate. Their prior knowledge states that a plate is likely to be found in a kitchen; it is highly unlikely to find an elephant. They have never seen this situation before, nor have they ever learned that such a situation could occur. Although their result takes in a certain amount of the data, it leaves out other parts. In this case, their methods have produced an incorrect result: a plate.

When an empiricist sees the same data, they will analyze it without regard to whether they are in the savanna or their kitchen. They will piece together an image from as many data points as possible. In this case, their result is a jagged image. It doesn't tell the empiricist if they are looking at an elephant, a plate, or anything else.

Neither the empiricist nor the rationalist is wrong. Both approaches work for various kinds of problems. However, in this case, if there is an elephant in the kitchen, it would pay to figure it out as quickly as possible. A middle ground between purely empirical and purely rationalist approaches may be best. With some prior knowledge of what an elephant looks like, you may notice the trunk and legs. And although the chances of an elephant being in your kitchen are low, it is certainly not impossible. Therefore, you would come to the conclusion that there is indeed an elephant in your kitchen, and you probably should leave—fast.

Predictable but wrong

Data scientists face this sort of problem all the time. They train computers to recognize new objects or patterns. Some machine learning programs may be able to process a lot of information and make many rules to fit the presented data, like the jagged image above. The jagged image might be reproducible when the same rules are applied to another similar data set. But just because the pattern is reproducible, that doesn't mean it accurately represents what is happening (the elephant).

There are historical examples of this dilemma. Two thousand years ago, Ptolemy developed a model of the universe that yielded excellent predictions for the movements of the moon and planets. His model was used successfully for centuries. However, Ptolemy used the wrong prior information: He placed the Earth at the center of the solar system and prioritized the circular motions of celestial objects. Johannes Kepler questioned this view in the 17th century and ultimately rejected Ptolemy's approach, which eventually led to Newton's law of universal gravitation. Although Ptolemy's complex model fit his own observations exceptionally well, it did not accurately represent what was happening. Mitra warns that "if you want to be an extreme empiricist, you really do need a lot of data. We now understand why under certain circumstances, such an approach can, in fact, succeed in a mathematically rigorous setting. Biological brains, on the other hand, are halfway in between. You do learn from experience, but you're not entirely data-driven."

Mitra hopes that data scientists will look to brain circuitry for inspiration when developing next-generation machine learning approaches. Vertebrate brains have circuits of different sizes, including medium-sized (mesoscale) ones. Those circuits are encoded with priors (known information, such as what animals look like, where they are found, or how to escape quickly from a charging elephant). At the same time, your brain is highly flexible, classifying new information and weighing the importance of different priors based on experience—elephants may not belong in a kitchen, but somehow, you have one anyway.

Mitra concludes in his article, "This points to the possibility of a new generation of intelligent machinery based on distributed circuit architectures which incorporate stronger priors, possibly drawing upon the mesoscale circuit architecture of vertebrate brains."

More information: Partha P. Mitra, Fitting elephants in modern machine learning by statistically consistent interpolation, Nature Machine Intelligence (2021). DOI: 10.1038/s42256-021-00345-8

Journal information: Nature Machine Intelligence

Provided by Cold Spring Harbor Laboratory

Citation: Let's talk about the elephant in the data (2021, June 3) retrieved 26 July 2024 from https://techxplore.com/news/2021-06-elephant.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AI learns to trace neuronal pathways

19 shares

Feedback to editors

Generative AI creates personalized storybooks for the future of child language learning

9 hours ago

Study explores win–win potential of grass-powered energy production

10 hours ago

Novel algorithm for discovering anomalies in data outperforms current software

10 hours ago

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

11 hours ago

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

12 hours ago

New microgrids model takes into account a fair design of decentralized energy systems

12 hours ago

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

13 hours ago

Robot Spot configured to find and stun weeds using a blowtorch

13 hours ago

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

14 hours ago

OpenAI to challenge Google with new search functionality

Jul 25, 2024

Load comments (0)

Let's talk about the elephant in the data

Rationalists versus empiricists

Predictable but wrong

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

AI learns to trace neuronal pathways

Baby elephant dies rejected by family at Swedish zoo

Scent trails could boost elephant conservation

3-week-old elephant dies at Ohio zoo after sudden illness

Restoring balance in machine learning datasets

Baby elephant joins herd at San Diego Zoo Safari Park

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Generative AI creates personalized storybooks for the future of child language learning

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

OpenAI to challenge Google with new search functionality

Phys.org

Medical Xpress

Science X

Let's talk about the elephant in the data

Rationalists versus empiricists

Predictable but wrong

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

Related Stories

AI learns to trace neuronal pathways

Baby elephant dies rejected by family at Swedish zoo

Scent trails could boost elephant conservation

3-week-old elephant dies at Ohio zoo after sudden illness

Restoring balance in machine learning datasets

Baby elephant joins herd at San Diego Zoo Safari Park

Recommended for you

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Generative AI creates personalized storybooks for the future of child language learning

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

OpenAI to challenge Google with new search functionality

Your Privacy