Data Science: Why Ethical Guidelines Are Needed In A Flashy New Field
By Tristan McIntosh, PhD
Data science. It’s the exciting new tool businesses pine after to gain a competitive advantage by predicting the needs and wants of relevant stakeholders before the stakeholders even know what it is that they need or want. Kind of creepy, but also kind of cool. While there is a lack of consensus about the exact definition of what data science is, we do know that it’s a discipline that involves skillful acquisition, analysis, and interpretation of large amounts of data from multiple sources to answer million dollar questions.
Data scientists (the people who do the data science) are oftentimes let “off the leash” by upper management to mine the nearly endless ocean of data to find something that gives the company an upper hand in their respective industry. This unchecked freedom is likely granted because those in upper management seek novel and quick solutions to complex problems. And giving data scientists free reign is an easy, efficient way to achieve those aims. Because of the depth of knowledge and specialization required by data scientists, those in upper management who are not data science experts lack the capacity to monitor and regulate what type of data is used and how it is used.
So, what could possibly go wrong? According to some data scientists, absolutely nothing! In fact, it is the opinion of some data scientists that ethical review of how data is gathered and analyzed is a burden to doing the exciting, flashy work that data science has become. However, this mindset will likely come at a steep cost. The average Joe doesn’t fully understand what exactly it is they are signing up for when they create a profile on, say, a large and popular social networking site. They are relinquishing a ton of personal information to the stewards of this data: data scientists. Some data scientists may claim that public, online dissemination of personal information is the cost of doing business. But this perspective completely ignores ethical issues surrounding informed consent, privacy, and ownership of personal data.
Data science uses the individual data of millions of people. Using this data to predict behaviors and preferences may have considerable long-term negative effects on society and may unfairly discriminate against certain groups of people if left unchecked. For example, a couple of years ago, some software was developed to predict the likelihood of future criminality, and these predictions ended up being unfairly biased against people of color. It should be easy to see why this is a massive problem. This is not to say that these technologies are evil or that they are the bedrock of a dystopian future. In fact, these technologies can be beneficial and maximize wellbeing and efficiency if approached correctly. However, if these technologies are taken to extremes without any consideration of future outcomes, undue human suffering and injustice may result.
Therefore, it is worthwhile for leaders in organizations and in the data science field to form a set of uniform ethical guidelines and best practices for their data scientists. It’s easy to get caught up in the excitement of this new field with great prospects for fame and recognition. Similarly, it is easy to dehumanize a massive spreadsheet of numbers and algorithms. To reap the greatest amount of benefit and minimize the harm that could result from this ambitious field, it will be imperative for data scientists to consider both the short- and long-term outcomes of their algorithms on other people. Especially those people that are not as technologically or as data savvy as they are. Developing standards of practice will help guide data scientists to do no harm while still doing innovative and impactful work.
The Ethics Advantage Team