Predicting Popularity of VS Code Extensions Using a Decision Tree Classifier: A Study of Listing Metadata Features

Luis Eduardo Muñoz Guerrero

PDF

Published: Jul 2, 2024

Keywords:

software popularity, marketplace analysis, machine learning, decision trees, VS Code extensions, feature importance, empirical software engineering

Luis Eduardo Muñoz Guerrero, Camilo Eduardo Muñoz Albornoz

Abstract

Abstract. Visual Studio Code (VS Code) has become the dominant code editor among professional developers, with its popularity driven largely by its extensible marketplace containing tens of thousands of extensions. For developers seeking to publish extensions, understanding what marketplace presentation factors predict adoption and success is crucial. This paper addresses whether simple, observable listing metadata—visible at publication time and controlled entirely by the developer—can effectively predict whether a VS Code extension will achieve substantial adoption (defined as 1,000+ installations).

We conducted an empirical study analyzing 150 VS Code extensions collected from seven distinct marketplace categories during May 2025, extracting four listing metadata features: description length (text presentation effort), screenshot count (visual documentation), GitHub repository link presence (transparency), and tag count (keyword discoverability). We trained a Decision Tree classifier to predict binary popularity outcomes, selecting this model for its interpretability and ease of feature importance analysis. The resulting model achieved 70% accuracy on held-out test data, representing a 10 percentage point improvement over the 60% majority-class baseline. Feature importance analysis revealed that description length (normalized importance = 0.45) is the single most critical predictor of extension popularity, followed by screenshot count (importance = 0.30). These findings suggest that developers’ effort in crafting detailed, well-documented marketplace listings has measurable impact on adoption outcomes.

Our empirical findings demonstrate that listing metadata contains genuine predictive signal for extension popularity, contrary to naive intuitions that adoption is driven primarily by extension functionality quality or developer reputation—factors invisible at publication time. More importantly, our results provide actionable, evidence-based guidance for independent developers publishing extensions: prioritizing description completeness and visual documentation substantially improves predicted adoption likelihood. However, we acknowledge significant limitations: our 150-extension sample represents only 0.25% of the marketplace, the binary popularity threshold (1,000 installs) is somewhat arbitrary, and marketplace dynamics may change over time. This work represents the first machine learning study of VS Code extension popularity from listing metadata and establishes a foundation for future research into developer tool adoption across marketplace ecosystems.

Issue

Vol. 23 No. 01 (2024)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details