August 15, 2018 weblog

DefCon presenters explore programmer de-anonymization, stylistic fingerprints

by Nancy Owano , Tech Xplore

One of the nicer things about higher education: Gaining awareness of the signature styles of authors, painters, musicians even before we are told their names. Well, signature styles are not just confined to the arts.

Two researchers can show the world their work on stylistic fingerprints and how these can be used to potentially identify programmers from code and binaries.

"Machine Learning Can Uncover Programmers' Identity," was the headline from Fossbytes. The article was talking about Rachel Greenstadt and Aylin Caliskan, who presented their work at DefCon. Greenstadt is associate professor, Drexel University; Caliskan is an assistant professor of computer science, George Washington University.

"Stylistic fingerprints"? Meaning? Louise Matsakis in Wired looked at something called stylometry—the statistical analysis of linguistic style. She said that "newer research shows that stylometry can also apply to artificial language samples, like code. Software developers, it turns out, leave behind a fingerprint as well."

In this area, anonymous programmers can be identified. Fossbytes summed up the research effort: They tested codes submitted by programmers and the system could correctly identify 83 percent of the times the algorithm was run.

They explored "programmer de-anonymization" with machine learning. They arrived at the conference ready to show how abstract syntax trees have "stylistic fingerprints," and sleuths can use these fingerprints potentially to identify programmers, from code and binaries. The question comes up: are these algorithms from heaven or from hell? Two sides of the coin.

The plus factor, obviously, would be in identifying those authors who plant malware. Negative factor: Coders who like to contribute code anonymously may be put off by this, as noted in Fossbytes. "There are times when programmers would like to remain unknown for legit reasons and getting identified is not always a good thing."

Matsakis also remarked on privacy implications, "especially for the thousands of developers who contribute open source code to the world."

Wired described their exploration as a binary experiment, where Caliskan and other researchers used code samples from Google's annual Code Jam competition. The machine learning algorithm correctly identified a group of 100 individual programmers 96 percent of the time, using eight code samples from each.

As interesting, even when the sample size was widened to 600 programmers, "the algorithm still made an accurate identification 83 percent of the time."

Cory Doctorow in Boing Boing, meanwhile, mentioned additional insights in programming styles. Doctorow reported that, actually, they found that experienced developers appeared easier to identify than novice developers. The more skilled you are, the more unique your work apparently becomes.

How so? Doctorow commented that may be "in part because beginner programmers often copy and paste code solutions from websites like Stack Overflow."

More information: De-anonymizing Programmers from Source Code and Binaries, www.defcon.org/html/defcon-26/ … kers.html#Greenstadt

Citation: DefCon presenters explore programmer de-anonymization, stylistic fingerprints (2018, August 15) retrieved 3 July 2024 from https://techxplore.com/news/2018-08-defcon-explore-programmer-de-anonymization-stylistic.html

This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.

Explore further

Study finds auto-fix tool gets more programmers to upgrade code

31 shares

Feedback to editors

How to increase the rate of plastics recycling

13 minutes ago

Lab creates world's first anode-free sodium solid-state battery

1 hour ago

Novel 3D stretchable electronic strip could spark new possibilities for wearable e-textiles

1 hour ago

Meta releases four new publicly available AI models for developer use

2 hours ago

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

19 hours ago

New ink-based method offers best recipe yet for thermoelectric devices

20 hours ago

New recycling process can recover up to 99.97% of materials in perovskite solar cells

21 hours ago

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

21 hours ago

New design approach identifies routes to stronger titanium alloys

22 hours ago

Scientists develop new electrolytes for low-temperature lithium metal batteries

22 hours ago

Load comments (0)

DefCon presenters explore programmer de-anonymization, stylistic fingerprints

How to increase the rate of plastics recycling

Lab creates world's first anode-free sodium solid-state battery

Novel 3D stretchable electronic strip could spark new possibilities for wearable e-textiles

Meta releases four new publicly available AI models for developer use

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Study finds auto-fix tool gets more programmers to upgrade code

Researchers tackle issues surrounding security tools for software developers

Using machine learning to detect software vulnerabilities

New tool improves productivity, quality when translating software

A system purely for developing high-performance, big data codes

Clone wars—finding buggy code copies

Computer scientists develop new and improved camera inspired by the human eye

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Phys.org

Medical Xpress

Science X

DefCon presenters explore programmer de-anonymization, stylistic fingerprints

How to increase the rate of plastics recycling

Lab creates world's first anode-free sodium solid-state battery

Novel 3D stretchable electronic strip could spark new possibilities for wearable e-textiles

Meta releases four new publicly available AI models for developer use

Survey shows most people think LLMs such as ChatGPT can experience feelings and memories

New ink-based method offers best recipe yet for thermoelectric devices

New recycling process can recover up to 99.97% of materials in perovskite solar cells

AI is learning from what you said on Reddit, Stack Overflow or Facebook. Are you OK with that?

New design approach identifies routes to stronger titanium alloys

Scientists develop new electrolytes for low-temperature lithium metal batteries

Related Stories

Study finds auto-fix tool gets more programmers to upgrade code

Researchers tackle issues surrounding security tools for software developers

Using machine learning to detect software vulnerabilities

New tool improves productivity, quality when translating software

A system purely for developing high-performance, big data codes

Clone wars—finding buggy code copies

Recommended for you

Computer scientists develop new and improved camera inspired by the human eye

Researchers develop the fastest possible flow algorithm

Is ChatGPT the key to stopping deepfakes? Study asks LLMs to spot AI-generated images

Sony introduces AI for single-instrument accompaniment generation in music production

Mechanical computer relies on kirigami cubes, not electronics

New tool detects AI-generated videos with 93.7% accuracy

Your Privacy