Ethics behind data from open source systems
Proprietary systems oftentimes have licenses that regulate who owns the data and for what purpose. For example, code review data from a company often belongs to the company. By working for the company, the employees usually sign off their rights to the data that they generate for the company. It is needed in the legal sense because the employees are getting compensated for that – usually in the form of salaries.
However, what the employees do not transfer to the company is the right to use their personal data freely. This means that when we work with source systems, such as the Gerrit review system, we should not extract personal information without the permission of the people involved. If we execute the query where masking of this data is not possible, we must ensure that the personal data is anonymized (as soon as it is possible) and is not leaked to the analysis. We must ensure that such personal data is not made publicly available...