Debiasing language

ask the database “father : doctor :: mother : x” and it will say x = nurse. And the query “man : computer programmer :: woman : x” gives x = homemaker. In other words, the word embeddings can be dreadfully sexist. This happens because any bias in the articles that make up the Word2vec corpus is inevitably captured in the geometry of the vector space. “One might have hoped that the Google News embedding would exhibit little gender bias because many of its authors are professional journalists”.

if we can identify this reliably, we can remove the troglodyte voice completely

Leave a comment