Extracting data from product databases – GitHub and Git
JIRA and Gerrit are, to some extent, additional tools to the main product development tools. However, every software development organization uses a source code repository to store the main asset – the source code of the company’s software product. Today, the tools that are used the most are Git version control and its close relative, GitHub. Source code repositories can be a very useful source of data for machine learning systems – we can extract the source code of the product and analyze it.
GitHub is a great source of data for machine learning if we use it responsibly. Please remember that the source code provided as open source, by the community, is not for profiting off. We need to follow the licenses and we need to acknowledge the contributions that were made by the authors, contributors, and maintainers of the open source community. Regardless of the license, we are always able to analyze...