Uses and abuses of machine learning
At its core, machine learning is primarily interested in making sense of complex data. This is a broadly applicable mission, and largely application agnostic. As you might expect, machine learning is used widely. For instance, it has been used to:
Predict the outcomes of elections
Identify and filter spam messages from e-mail
Foresee criminal activity
Automate traffic signals according to road conditions
Produce financial estimates of storms and natural disasters
Examine customer churn
Create auto-piloting planes and auto-driving cars
Identify individuals with the capacity to donate
Target advertising to specific types of consumers
For now, don't worry about exactly how the machines learn to perform these tasks; we will get into the specifics later. But across each of these contexts, the process is the same. A machine learning algorithm takes data and identifies patterns that can be used for action. In some cases, the results are so successful that they seem to reach near-legendary status.
One possibly apocryphal tale is of a large retailer in the United States, which employed machine learning to identify expectant mothers for targeted coupon mailings. If mothers-to-be were targeted with substantial discounts, the retailer hoped they would become loyal customers who would then continue to purchase profitable items like diapers, formula, and toys.
By applying machine learning methods to purchase data, the retailer believed it had learned some useful patterns. Certain items, such as prenatal vitamins, lotions, and washcloths could be used to identify with a high degree of certainty not only whether a woman was pregnant, but also when the baby was due.
After using this data for a promotional mailing, an angry man contacted the retailer and demanded to know why his teenage daughter was receiving coupons for maternity items. He was furious that the merchant seemed to be encouraging teenage pregnancy. Later on, as a manager called to offer an apology, it was the father that ultimately apologized; after confronting his daughter, he had discovered that she was indeed pregnant.
Whether completely true or not, there is certainly an element of truth to the preceding tale. Retailers, do in fact, routinely analyze their customers' transaction data. If you've ever used a shopper's loyalty card at your grocer, coffee shop, or another retailer, it is likely that your purchase data is being used for machine learning.
Retailers use machine learning methods for advertising, targeted promotions, inventory management, or the layout of the items in the store. Some retailers have even equipped checkout lanes with devices that print coupons for promotions based on the items in the current transaction. Websites also routinely do this to serve advertisements based on your web browsing history. Given the data from many individuals, a machine learning algorithm learns typical patterns of behavior that can then be used to make recommendations.
Despite being familiar with the machine learning methods working behind the scenes, it still feels a bit like magic when a retailer or website seems to know me better than I know myself. Others may be less thrilled to discover that their data is being used in this manner. Therefore, any person wishing to utilize machine learning or data mining would be remiss not to at least briefly consider the ethical implications of the art.
Ethical considerations
Due to the relative youth of machine learning as a discipline and the speed at which it is progressing, the associated legal issues and social norms are often quite uncertain and constantly in flux. Caution should be exercised when obtaining or analyzing data in order to avoid breaking laws, violating terms of service or data use agreements, abusing the trust, or violating privacy of the customers or the public.
Tip
The informal corporate motto of Google, an organization, which collects perhaps more data on individuals than any other, is "don't be evil." This may serve as a reasonable starting point for forming your own ethical guidelines, but it may not be sufficient.
Certain jurisdictions may prevent you from using racial, ethnic, religious, or other protected class data for business reasons, but keep in mind that excluding this data from your analysis may not be enough—machine learning algorithms might inadvertently learn this information independently. For instance, if a certain segment of people generally live in a certain region, buy a certain product, or otherwise behave in a way that uniquely identifies them as a group, some machine learning algorithms can infer the protected information from seemingly innocuous data. In such cases, you may need to fully "de-identify" these people by excluding any potentially identifying data in addition to the protected information.
Apart from the legal consequences, using data inappropriately may hurt your bottom line. Customers may feel uncomfortable or become spooked if aspects of their lives they consider private are made public. Recently, several high-profile web applications have experienced a mass exodus of users who felt exploited when the applications' terms of service agreements changed and their data was used for purposes beyond what the users had originally agreed upon. The fact that privacy expectations differ by context, by age cohort, and by locale, adds complexity to deciding the appropriate use of personal data. It would be wise to consider the cultural implications of your work before you begin on your project.
Tip
The fact that you can use data for a particular end does not always mean that you should.