September 29, 2017

Bug-repair system learns from example

by Larry Hardesty, Massachusetts Institute of Technology

Anyone who's downloaded an update to a computer program or phone app knows that most commercial software has bugs and security holes that require regular "patching."

Often, those bugs are simple oversights. For example, the program tries to read data that have already been deleted. The patches, too, are often simple—such as a single line of code that verifies that a data object still exists.

That simplicity has encouraged computer scientists to explore the possibility of automatic patch generation. Several research groups, including that of Martin Rinard, an MIT professor of electrical engineering and computer science, have developed templates that indicate the general forms that patches tend to take. Algorithms can then use the templates to generate and evaluate a host of candidate patches.

Recently, at the Association for Computing Machinery's Symposium on the Foundations of Software Engineering, Rinard, his student Fan Long, and Peter Amidon of the University of California at San Diego presented a new system that learns its own templates by analyzing successful patches to real software.

Where a hand-coded patch-generation system might feature five or 10 templates, the new system created 85, which makes it more diverse but also more precise. Its templates are more narrowly tailored to specific types of real-world patches, so it doesn't generate as many useless candidates. In tests, the new system, dubbed Genesis, repaired nearly twice as many bugs as the best-performing hand-coded template system.

Thinning the herd

"You are navigating a tradeoff," says Long, an MIT graduate student in electrical engineering and computer science and first author on the paper. "On one hand, you want to generate enough candidates that the set you're looking through actually contains useful patches. On the other hand, you don't want the set to include so many candidates that you can't search through it."

Every item in the data set on which Genesis was trained includes two blocks of code: the original, buggy code and the patch that repaired it. Genesis begins by constructing pairs of training examples, such that every item in the data set is paired off with every other item.

Genesis then analyzes each pair and creates a generic representation—a draft template—that will enable it to synthesize both patches from both originals. It may synthesize other, useless candidates, too. But the representation has to be general enough that among the candidates are the successful patches.

Next, Genesis tests each of its draft templates on all the examples in the training set. Each of the templates is based on only two examples, but it might work for several others. Each template is scored on two criteria: the number of errors that it can correct and the number of useless candidates it generates. For instance, a template that generates 10 candidates, four of which patch errors in the training data, might score higher than one that generates 1,000 candidates and five correct patches.

On the basis of those scores, Genesis selects the 500 most promising templates. For each of them, it augments the initial two-example training set with each of the other examples in turn, creating a huge set of three-example training sets. For each of those, it then varies the draft template, to produce a still more general template. Then it performs the same evaluation procedure, extracting the 500 most promising templates.

Covering the bases

After four rounds of this process, each of the 500 top-ranking templates has been trained on five examples. The final winnowing uses slightly different evaluation criteria, ensuring that every error in the training set that can be corrected will be. That is, there may be a template among the final 500 that patches only one bug, earning a comparatively low score in the preceding round of evaluation. But if it's the only template that patches that bug, it will make the final cut.

In the researchers' experiments, the final winnowing reduced the number of templates from 500 to 85. Genesis works with programs written in the Java programming language, and the MIT researchers compared its performance with that of the best-performing hand-coded Java patch generator. Genesis correctly patched defects in 21 of 49 test cases drawn from 41 open-source programming projects, while the previous system patched 11.

It's possible that more training data and more computational power—to evaluate more candidate templates—could yield still better results. But a system that allows programmers to spend only half as much time trying to repair bugs in their code would be useful nonetheless.

More information: Automatic Inference of Code Transforms for Patch Generation. people.csail.mit.edu/rinard/pa … er/fse17.genesis.pdf

Provided by Massachusetts Institute of Technology

This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.

Citation: Bug-repair system learns from example (2017, September 29) retrieved 26 July 2024 from https://techxplore.com/news/2017-09-bug-repair.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Automatic bug-repair system fixes 10 times as many errors as its predecessors

122 shares

Feedback to editors

Generative AI creates personalized storybooks for the future of child language learning

6 hours ago

Study explores win–win potential of grass-powered energy production

7 hours ago

Novel algorithm for discovering anomalies in data outperforms current software

7 hours ago

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

8 hours ago

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

9 hours ago

New microgrids model takes into account a fair design of decentralized energy systems

9 hours ago

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

10 hours ago

Robot Spot configured to find and stun weeds using a blowtorch

10 hours ago

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

11 hours ago

OpenAI to challenge Google with new search functionality

Jul 25, 2024

Load comments (0)

Bug-repair system learns from example

Thinning the herd

Covering the bases

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

Automatic bug-repair system fixes 10 times as many errors as its predecessors

Researchers develop dynamic templates critical to printable electronics technology

System makes modifications necessary to transplant code from one program into another

System clusters similar student programs together, so instructors can identify broad trends

Software That's Resilient Against Hacker Attack

Playing action video games can boost learning, new study reports

Novel algorithm for discovering anomalies in data outperforms current software

Digital twin method can boost wireless network speed and reliability

Study: When allocating scarce resources with AI, randomization can improve fairness

Lightweight neural network enables realistic rendering of woven fabrics in real-time

Multimodal agent can iteratively design experiments to better understand various components of AI systems

AI study reveals dramatic reasoning breakdown in large language models

Phys.org

Medical Xpress

Science X

Bug-repair system learns from example

Thinning the herd

Covering the bases

Generative AI creates personalized storybooks for the future of child language learning

Study explores win–win potential of grass-powered energy production

Novel algorithm for discovering anomalies in data outperforms current software

Deep learning models can be trained with limited data: New method could reduce errors in computational imaging

Experts warn against hype for deriving green hydrogen from direct seawater electrolysis

New microgrids model takes into account a fair design of decentralized energy systems

Engineers develop magnetic tunnel junction–based device to make AI more energy efficient

Robot Spot configured to find and stun weeds using a blowtorch

Magnetic fields help understand light particle splitting for boosting solar cell efficiency

OpenAI to challenge Google with new search functionality

Related Stories

Automatic bug-repair system fixes 10 times as many errors as its predecessors

Researchers develop dynamic templates critical to printable electronics technology

System makes modifications necessary to transplant code from one program into another

System clusters similar student programs together, so instructors can identify broad trends

Software That's Resilient Against Hacker Attack

Playing action video games can boost learning, new study reports

Recommended for you

Novel algorithm for discovering anomalies in data outperforms current software

Digital twin method can boost wireless network speed and reliability

Study: When allocating scarce resources with AI, randomization can improve fairness

Lightweight neural network enables realistic rendering of woven fabrics in real-time

Multimodal agent can iteratively design experiments to better understand various components of AI systems

AI study reveals dramatic reasoning breakdown in large language models

Your Privacy