Mining API expertise profiles with Partial Program Analysis
Abstract
A developer's API usage expertise can be estimated by analyzing source code that they have checked-in to a software repository. In prior work we proposed a system for creating a social network of developers centered around the APIs they use in order to recommend people and projects they might be interested in. The implementation of such a system requires analyzing code from repositories of large numbers of projects that use different build systems. Hence, one challenge is to determine the APIs referenced in code in these repositories without relying on the ability to resolve every project's external dependencies. In this paper, we consider a technique called Partial Program Analysis for resolving type bindings in Java source code in the absence of third-party library binaries. Another important design decision concerns the approach of associating such API references with the developers who authored them such as walking entire change history or use blame information. We evaluate these different design options on 4 open-source Java projects and found that both Partial Program Analysis and blame-based approach provide precision greater than 80%. However, use of blame as opposed to complete program history leads to significant recall loss, in most cases greater than 40%.