August 23, 2018 feature

Researchers compile a new database of executable Python code snippets on GitHub

by Ingrid Fadelli , Tech Xplore

A team of researchers at North Carolina State University has recently carried out an empirical analysis of the executable status of Python code snippets shared on GitHub. Their study, pre-published on arXiv, also presents Gistable, a new database of executable Python code snippets on GitHub's gist system, which could enable reproducible studies in the field of software engineering.

Every day, software developers worldwide create and share code online to demonstrate and outline new programming concepts. GitHub is one of the largest online platforms on which developers can share their code snippets and collaborate on the development of software. Currently, it contains over 300,000 Python snippets and over 4.5 million gists in a variety of programming languages.

While code snippets published online can be very useful, sometimes they are not directly executable by others. This might be due to parse errors in the code or to issues with executing snippets in environments that contain unmet dependencies.

To gain a better understanding of how many code snippets hosted on GitHub's gist system are actually executable, researchers at North Carolina State University conducted a thorough evaluation of the executability of publicly available Python scripts hosted on the platform. Their study was aimed at identifying common issues with the execution of code snippets, which could provide valuable insight for further research on automated software configuration management.

In their study, the researchers also presented Gistable, a database and extensible framework built on GitHub's gist system. Gistable contains 10,259 Python code snippets, of which approximately 5,000 come with a Dockerfile to configure and execute them without import error.

"Our work on Gistable was motivated as part of a larger project concerning automated configuration of application environments," Eric Horton, one of the researchers who carried out the study, told Tech Xplore. "Given a codebase, such as the snippets studied in Gistable, we want to find a process which can build a sufficient execution environment for them without requiring input from a developer. In order to do this, we first had to step back and answer a couple questions. First, is this a common use case? We needed to establish a baseline for how often existing applications need some sort of non-trivial configuration. Second, when not executable, what type of configuration is needed to enable execution?"

In their study, the researchers found that 75.6 percent of analyzed Python gists required substantial configurations to overcome issues such as missing dependencies, configuration files, reliance on a specific operating system, or other environment configuration challenges. In addition, the assumptions that developers make about resource names when trying to resolve configuration errors were found to be correct less than half of the time.

"We found that around 30 percent of our sample fell into the 'hard to configure' category, with the most common configuration difficulty being dependencies on external libraries," Horton explained. "Our research in the immediate future will focus on techniques for finding and installing these libraries. Afterward, we hope to address other common configuration difficulties discovered as part of Gistable."

Overall, an insufficiently configured environment was the primary factor preventing the Python code snippets from being executable. While in some cases, correct application environment configurations could be recovered automatically, others required further interventions. In future, the researchers plan to investigate strategies to consistently perform effective environment configurations.

"I think the most meaningful achievement of this study was our investigation into how developers perform configuration manually," Horton said. "Not only did the responses from participants confirm that this is in many cases a hard problem, but they also helped us categorize things that can make configuration difficult. This is very useful, because it points us at a concrete list of items for future research."

More information: Gistable: Evaluating the Executability of Python Code Snippets on GitHub. arXiv:1808.04919v1 [cs.SE]. arxiv.org/abs/1808.04919

Abstract
Software developers create and share code online to demonstrate programming language concepts and programming tasks. Code snippets can be a useful way to explain and demonstrate a programming concept, but may not always be directly executable. A code snippet can contain parse errors, or fail to execute if the environment contains unmet dependencies.
This paper presents an empirical analysis of the executable status of Python code snippets shared through the GitHub gist system, and the ability of developers familiar with software configuration to correctly configure and run them. We find that 75.6% of gists require non-trivial configuration to overcome missing dependencies, configuration files, reliance on a specific operating system, or some other environment configuration. Our study also suggests the natural assumption developers make about resource names when resolving configuration errors is correct less than half the time.
We also present Gistable, a database and extensible framework built on GitHub's gist system, which provides executable code snippets to enable reproducible studies in software engineering. Gistable contains 10,259 code snippets, approximately 5,000 with a Dockerfile to configure and execute them without import error. Gistable is publicly available at this URL: github.com/gistable/gistable

Citation: Researchers compile a new database of executable Python code snippets on GitHub (2018, August 23) retrieved 29 June 2024 from https://techxplore.com/news/2018-08-database-python-code-snippets-github.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

AI for code encourages collaborative, open scientific discovery

677 shares

Feedback to editors

Researchers develop novel 3D printing strategy with controllable gradients porous structures

20 hours ago

Researchers develop the fastest possible flow algorithm

Jun 28, 2024

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Jun 28, 2024

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Jun 27, 2024

Wireless receiver blocks interference for better mobile device performance

Jun 27, 2024

Researchers successfully develop domestic 6G antenna measurement system

Jun 27, 2024

Research shows how common plastics could passively cool and heat buildings with the seasons

Jun 27, 2024

Researchers suggest smart solution to harness waste heat from industry

Jun 27, 2024

Robotic hand with tactile fingertips achieves new dexterity feat

Jun 27, 2024

Help or hindrance? ER robots have potential to aid health care workers

Jun 27, 2024

Load comments (4)

Researchers compile a new database of executable Python code snippets on GitHub

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

AI for code encourages collaborative, open scientific discovery

Microsoft embraces collaboration in $7.5B deal for GitHub

Study finds auto-fix tool gets more programmers to upgrade code

Big data analytics for dummies

Microsoft says buying GitHub for $7.5 bn

Team turns deep-learning AI loose on software development

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Phys.org

Medical Xpress

Science X

Researchers compile a new database of executable Python code snippets on GitHub

Researchers develop novel 3D printing strategy with controllable gradients porous structures

Researchers develop the fastest possible flow algorithm

Real-time modeling of 3D temperature distributions within nuclear microreactors to improve safety systems

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Wireless receiver blocks interference for better mobile device performance

Researchers successfully develop domestic 6G antenna measurement system

Research shows how common plastics could passively cool and heat buildings with the seasons

Researchers suggest smart solution to harness waste heat from industry

Robotic hand with tactile fingertips achieves new dexterity feat

Help or hindrance? ER robots have potential to aid health care workers

Related Stories

AI for code encourages collaborative, open scientific discovery

Microsoft embraces collaboration in $7.5B deal for GitHub

Study finds auto-fix tool gets more programmers to upgrade code

Big data analytics for dummies

Microsoft says buying GitHub for $7.5 bn

Team turns deep-learning AI loose on software development

Recommended for you

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Researchers propose the next platform for brain-inspired computing

Your Privacy