In the last years, tools like Git and GitHub have turned essential to support the daily activities around open source software. Such tools act also like data silos, which can be gathered to derive insightful knowledge about a project (e.g., activities, community). However, collecting this data is often a laborious task, which includes: understanding how to access the data, supporting incrementality, resume and retry mechanisms, and defining a scalable process able to cope with large projects.
This talk will show how to use Perceval, Graal and Arthur (3 tools under the Linux Foundation's CHAOSS umbrella) to collect project data. Perceval performs automatic and incremental data gathering from many tools related to open source development, Graal provides a generic approach to support source code analysis, finally Arthur allows to execute Perceval and Graal at scale, managing incrementality and possible failures.