March 21, 2017

A simple statistical trick could help make a ubiquitous model of decision processes more accurate

by Larry Hardesty, Massachusetts Institute of Technology

Markov decision processes are mathematical models used to determine the best courses of action when both current circumstances and future consequences are uncertain. They've had a huge range of applications—in natural-resource management, manufacturing, operations management, robot control, finance, epidemiology, scientific-experiment design, and tennis strategy, just to name a few.

But analyses involving Markov decision processes (MDPs) usually make some simplifying assumptions. In an MDP, a given decision doesn't yield a predictable result. Instead, the MDP uses a probability distribution to describe a range of possible results. Characterizing that distribution requires collection of empirical data, which can be prohibitively time consuming, so analysts usually just make educated guesses. That means, however, that the MDP analysis doesn't guarantee the best decision in all cases.

In the Proceedings of the Conference on Neural Information Processing Systems, published last month, researchers from MIT and Duke University took a step toward putting MDP analysis on more secure footing. They show that, by adopting a simple trick long known in statistics but little applied in machine learning, it's possible to build accurate MDPs while collecting much less empirical data than had previously seemed necessary.

In their conference presentation, the researchers described a simple example in which the standard approach to characterizing probabilities would require the same decision to be performed almost 4 million times in order to yield a reliable MDP.

With the researchers' approach, it would need to be run 167,000 times. That's still a big number—except, perhaps, in the context of a server farm processing millions of web clicks per second, where MDP analysis could help allocate computational resources. In other contexts, the work at least represents a big step in the right direction.

"People are not going to start using something that is so sample-intensive right now," says Jason Pazis, a postdoc at the MIT Laboratory for Information and Decision Systems and first author on the new paper. "We've shown one way to bring the sample complexity down. And hopefully, it's orthogonal to many other ways, so we can combine them."

Unpredictable outcomes

In their paper, the researchers also report running simulations of a robot exploring its environment, in which their approach yielded consistently better results than the existing approach, even with more reasonable sample sizes—nine and 105. Pazis emphasizes, however, that the paper's theoretical results bear only on the number of samples required to build accurate MDPs; they don't prove anything about the relative performance of different algorithms at low sample sizes.

Pazis is joined on the paper by Jonathan How, the Richard Cockburn Maclaurin Professor of Aeronautics and Astronautics at MIT, and by Ronald Parr, a professor of computer science at Duke.

Generally, MDP analysis doesn't require the exact description of a probability distribution; it's usually sufficient to calculate the mean, or average, value of the distribution. In the familiar bell curve of the so-called normal distribution, the mean defines the highest point of the bell.

The trick the researchers' algorithm employs is called the median of means. If you have a bunch of random values, and you're asked to estimate the mean of the probability distribution they're drawn from, the natural way to do it is to average them. But if your sample happens to include some rare but extreme outliers, averaging can give a distorted picture of the true distribution. For instance, if you have a sample of the heights of 10 American men, nine of whom cluster around the true mean of 5 feet 10 inches, but one of whom is a 7-foot-2-inch NBA center, straight averaging will yield a mean that's off by about an inch and a half.

With the median of means, you instead divide your sample into subgroups, take the mean (average) of each of those, and then take the median of the results. The median is the value that falls in the middle, if you arrange your values from lowest to highest.

Value proposition

The goal of MDP analysis is to determine a set of policies—or actions under particular circumstances—that maximize the value of some reward function. In a manufacturing setting, the reward function might measure operational costs against production volume; in robot control, it might measure progress toward the completion of a task.

But a given decision is evaluated according to a much more complex measure called a "value function," which is a probabilistic estimate of the expected reward from not just that decision but every possible decision that could follow.

The researchers showed that, with straight averaging, the number of samples required to estimate the mean of a distribution is proportional to the square of the range of values that the value function can take on. Since that range can be quite large, so is the number of samples. But with the median of means, the number of samples is proportional to the range of a different value, called the Bellman operator, which is usually much narrower. The researchers also showed how to calculate the optimal size of the subsamples in the median-of-means estimate.

"The results in the paper, as with most results of this type, still reflect a large degree of pessimism because they deal with a worst-case analysis, where we give a proof of correctness for the hardest possible environment," says Marc Bellemare, a research scientist at the Google-owned artificial-intelligence company Google DeepMind. "But that kind of analysis doesn't need to carry over to applications. I think Jason's approach, where we allow ourselves to be a little optimistic and say, 'Let's hope the world out there isn't all terrible,' is almost certainly the right way to think about this problem. I'm expecting this kind of approach to be highly useful in practice."

More information: Improving PAC Exploration Using the Median of Means. papers.nips.cc/paper/6577-impr … -median-of-means.pdf

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: A simple statistical trick could help make a ubiquitous model of decision processes more accurate (2017, March 21) retrieved 17 July 2024 from https://techxplore.com/news/2017-03-simple-statistical-ubiquitous-decision-accurate.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Finding patterns in corrupted data: New model-fitting technique efficient even for data sets with hundreds of variables

19 shares

Feedback to editors

The magnet trick: New invention makes vibrations disappear

1 hour ago

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

2 hours ago

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

2 hours ago

Scientists bridge the 'valley of death' in carbon capture technologies

2 hours ago

Flexible electronics researchers develop a completely stretchy lithium-ion battery

5 hours ago

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

6 hours ago

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

22 hours ago

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Jul 16, 2024

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Jul 16, 2024

Large language models make human-like reasoning mistakes, researchers find

Jul 16, 2024

Load comments (0)

A simple statistical trick could help make a ubiquitous model of decision processes more accurate

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Finding patterns in corrupted data: New model-fitting technique efficient even for data sets with hundreds of variables

Helping robots handle uncertainty

Preserving variety in subsets of unmanageably large data sets to aid machine learning

Mathematical theorem finds gerrymandering in PA congressional district maps

New model predicts wind speeds more accurately with three months of data than others do with 12

How computers can learn better

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Phys.org

Medical Xpress

Science X

A simple statistical trick could help make a ubiquitous model of decision processes more accurate

The magnet trick: New invention makes vibrations disappear

Creating and verifying stable AI-controlled robotic systems in a rigorous and flexible way

Unlocking the potential of rust: High-efficiency green hydrogen production from hematite

Scientists bridge the 'valley of death' in carbon capture technologies

Flexible electronics researchers develop a completely stretchy lithium-ion battery

A strategy to enhance the stability of perovskite solar cells under reverse bias conditions

Engineers evaluate cybersecurity risks associated with EV fast-charging equipment

Machine learning framework maps global rooftop growth for sustainable energy and urban planning

Giving drones wrap-and-grip wings to allow them to land on poles and tree limbs

Large language models make human-like reasoning mistakes, researchers find

Related Stories

Finding patterns in corrupted data: New model-fitting technique efficient even for data sets with hundreds of variables

Helping robots handle uncertainty

Preserving variety in subsets of unmanageably large data sets to aid machine learning

Mathematical theorem finds gerrymandering in PA congressional district maps

New model predicts wind speeds more accurately with three months of data than others do with 12

How computers can learn better

Recommended for you

You're just a stick figure to this camera—a new camera to prevent companies from collecting private information

Visual abilities of language models found to be lacking depth

Reasoning skills of large language models are often overestimated, researchers find

A new model to plan and control the movements of humanoids in 3D environments

Researchers introduce generative AI to analyze complex tabular data

Computer scientists develop new and improved camera inspired by the human eye

Your Privacy