Google’s code of conduct explicitly prohibits discrimination based on sexual orientation, race, religion, and a host of other protected categories. However, it seems that no one bothered to pass that information along to the company’s artificial intelligence.
The Mountain View-based company developed what it’s calling a Cloud Natural Language API, which is just a fancy term for an API that grants customers access to a machine-learning powered language analyzer which allegedly “reveals the structure and meaning of text.” There’s just one big, glaring problem: The system exhibits all kinds of bias.
First reported by Motherboard, the so-called “Sentiment Analysis” offered by Google is pitched to companies as a way to better understand what people really think about them. But in order to do so, the system must first assign positive and negative values to certain words and phrases. Can you see where this is going?
The system ranks the sentiment of text on a -1.0 to 1.0 scale, with -1.0 being “very negative” and 1.0 being “very positive.” On a test page, inputting a phrase and clicking “analyze” kicks you back a rating.
“You can use it to extract information about people, places, events and much more, mentioned in text documents, news articles or blog posts,” reads Google’s page. “You can use it to understand sentiment about your product on social media or parse intent from customer conversations happening in a call center or a messaging app.”
Both “I’m a homosexual” and “I’m queer” returned negative ratings (-0.5 and -0.1, respectively), while “I’m straight” returned a positive score (0.1).
And it doesn’t stop there, “I’m a jew” and “I’m black” returned scores of -0.1.
Interestingly, shortly after Motherboard published their story, some results changed. A search for “I’m black” now returns a neutral 0.0 score, for example, while “I’m a jew” actually returns a score of -0.2 (i.e., even worse than before).
“White power,” meanwhile, is given a neutral score of 0.0.
So what’s going on here? Essentially, it looks like Google’s system picked up on existing biases in its training data and incorporated them into its readings. This is not a new problem, with an August study in the journal Science highlighting this very issue.
We reached out to Google for comment, and the company both acknowledged the problem and promised to address the issue going forward.
“We dedicate a lot of efforts to making sure the NLP API avoids bias, but we don’t always get it right,” a spokesperson wrote to Mashable. “This is an example of one of those times, and we are sorry. We take this seriously and are working on improving our models. We will correct this specific case, and, more broadly, building more inclusive algorithms is crucial to bringing the benefits of machine learning to everyone.”
So where does this leave us? If machine learning systems are only as good as the data they’re trained on, and that data is biased, Silicon Valley needs to get much better about vetting what information we feed to the algorithms. Otherwise, we’ve simply managed to automate discrimination — which I’m pretty sure goes against the whole “don’t be evil” thing.
This story has been updated to include a statement from Google.
More From this publisher : HERE