Words Embedding Bias

Metrics and debiasing for bias (such as gender and race) in words embedding.

Important

The following paper suggests that the current methods have an only superficial effect on the bias in words embeddings:

Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv preprint arXiv:1903.03862.

Currently, two methods are supported:

  1. Bolukbasi et al. (2016) bias measure and debiasing - ethically.we.bias
  2. WEAT measure - ethically.we.weat

Besides, some of the standard benchmarks for words embeddings are also available, primarily to check the impact of debiasing performance.

Refer to the Words Embedding demo for a complete usage example.

Bolukbasi Bias Measure and Debiasing

Measuring and adjusting bias in words embedding by Bolukbasi (2016).

References:

Usage

>>> from ethically.we import GenderBiasWE
>>> from gensim import downloader
>>> w2v_model = downloader.load('word2vec-google-news-300')
>>> w2v_gender_bias_we = GenderBiasWE(w2v_model)
>>> w2v_gender_bias_we.calc_direct_bias()
0.07307904249481942
>>> w2v_gender_bias_we.debias()
>>> w2v_gender_bias_we.calc_direct_bias()
1.7964246601064155e-09

Types of Bias

Direct Bias

  1. Associations
    Words that are closer to one end (e.g., he) than to the other end (she). For example, occupational stereotypes (page 7). Calculated by calc_direct_bias().
  2. Analogies
    Analogies of he:x::she:y. For example analogies exhibiting stereotypes (page 7). Generated by generate_analogies().

Indirect Bias

Projection of a neutral words into a two neutral words direction is explained in a great portion by a shared bias direction projection.

Calculated by calc_indirect_bias() and generate_closest_words_indirect_bias().

class ethically.we.bias.BiasWordsEmbedding(model, only_lower=False, verbose=False, identify_direction=False)[source]

Bases: object

Measure and adjust a bias in English words embedding.

Parameters:
  • model – Words embedding model of gensim.model.KeyedVectors
  • only_lower (bool) – Whether the words embedding contrains only lower case words
  • verbose (bool) – Set vebosity
project_on_direction(word)[source]

Project the normalized vector of the word on the direction.

Parameters:word (str) – The word tor project
Return float:The projection scalar
calc_projection_data(words)[source]

Calculate projection, projected and rejected vectors of a words list.

Parameters:words (list) – List of words
Returns:pandas.DataFrame of the projection, projected and rejected vectors of the words list
plot_projection_scores(words, n_extreme=10, ax=None, axis_projection_step=None)[source]

Plot the projection scalar of words on the direction.

Parameters:
  • words (list) – The words tor project
  • or None n_extreme (int) – The number of extreme words to show
Returns:

The ax object of the plot

plot_dist_projections_on_direction(word_groups, ax=None)[source]

Plot the projection scalars distribution on the direction.

Parameters:word_groups word (dict) – The groups to projects
Return float:The ax object of the plot
classmethod plot_bias_across_words_embeddings(words_embedding_bias_dict, words, ax=None, scatter_kwargs=None)[source]

Plot the projections of same words of two words Embeddings.

Parameters:
  • words_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys.
  • words (list) – Words to be projected.
  • scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter.
Returns:

The ax object of the plot

generate_analogies(n_analogies=100, multiple=False, delta=1.0, restrict_vocab=30000)[source]

Generate analogies based on the bias directionself.

x - y ~ direction. or a:x::b:y when a-b ~ direction.

delta is used for semantically coherent. Default vale of 1 corresponds to an angle <= pi/3.

Parameters:
  • n_analogies (int) – Number of analogies to generate.
  • multiple (bool) – Whether to allow multiple appearances of a word in the analogies.
  • delta (float) – Threshold for semantic similarity. The maximal distance between x and y.
  • restrict_vocab (int) – The vocabulary size to use.
Returns:

Data Frame of analogies (x, y), their distances, and their cosine similarity scores

calc_direct_bias(neutral_words, c=None)[source]

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters:
  • neutral_words (list) – List of neutral words
  • c (float or None) – Strictness of bias measuring
Returns:

The direct bias

calc_indirect_bias(word1, word2)[source]

Calculate the indirect bias between two words.

Based on the amount of shared projection of the words on the direction.

Also called PairBias. :param str word1: First word :param str word2: Second word :type c: float or None :return The indirect bias between the two words

generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words=None, n_extreme=5)[source]

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters:
  • neutral_positive_end (str) – A word that define the positive side of the neutral direction.
  • neutral_negative_end (str) – A word that define the negative side of the neutral direction.
  • words (list) – List of words to project on the neutral direction.
  • n_extreme (int) – The number for the most extreme words (positive and negative) to show.
Returns:

Data Frame of the most extreme words with their projection scores and indirect biases.

debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]

Debias the words embedding.

Parameters:
  • method (str) – The method of debiasing.
  • neutral_words (list) – List of neutral words for the neutralize step
  • equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction.
  • inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the words embedding will be normalized to unit length.

evaluate_words_embedding(kwargs_word_pairs=None, kwargs_word_analogies=None)[source]

Evaluate word pairs tasks and word analogies tasks.

Parameters:
  • model – Words embedding.
  • kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method.
  • kwargs_word_analogies – Kwargs for evaluate_word_analogies method.
Returns:

Tuple of pandas.DataFrame for the evaluation results.

learn_full_specific_words(seed_specific_words, max_non_specific_examples=None, debug=None)[source]

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters:
  • seed_specific_words (list) – List of seed specific words
  • max_non_specific_examples (int) – The number of non-specifc words to sample for training
Returns:

List of learned specific words and the classifier object

class ethically.we.bias.GenderBiasWE(model, only_lower=False, verbose=False, identify_direction=True)[source]

Bases: ethically.we.bias.BiasWordsEmbedding

Measure and adjust the Gender Bias in English Words Embedding.

Parameters:
  • model – Words embedding model of gensim.model.KeyedVectors
  • only_lower (bool) – Whether the words embedding contrains only lower case words
  • verbose (bool) – Set vebosity
plot_projection_scores(words='professions', n_extreme=10, ax=None, axis_projection_step=None)[source]

Plot the projection scalar of words on the direction.

Parameters:
  • words (list) – The words tor project
  • or None n_extreme (int) – The number of extreme words to show
Returns:

The ax object of the plot

plot_dist_projections_on_direction(word_groups='bolukbasi', ax=None)[source]

Plot the projection scalars distribution on the direction.

Parameters:word_groups word (dict) – The groups to projects
Return float:The ax object of the plot
classmethod plot_bias_across_words_embeddings(words_embedding_bias_dict, ax=None, scatter_kwargs=None)[source]

Plot the projections of same words of two words Embeddings.

Parameters:
  • words_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys.
  • words (list) – Words to be projected.
  • scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter.
Returns:

The ax object of the plot

calc_direct_bias(neutral_words='professions', c=None)[source]

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters:
  • neutral_words (list) – List of neutral words
  • c (float or None) – Strictness of bias measuring
Returns:

The direct bias

generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words='professions', n_extreme=5)[source]

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters:
  • neutral_positive_end (str) – A word that define the positive side of the neutral direction.
  • neutral_negative_end (str) – A word that define the negative side of the neutral direction.
  • words (list) – List of words to project on the neutral direction.
  • n_extreme (int) – The number for the most extreme words (positive and negative) to show.
Returns:

Data Frame of the most extreme words with their projection scores and indirect biases.

debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]

Debias the words embedding.

Parameters:
  • method (str) – The method of debiasing.
  • neutral_words (list) – List of neutral words for the neutralize step
  • equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction.
  • inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the words embedding will be normalized to unit length.

learn_full_specific_words(seed_specific_words='bolukbasi', max_non_specific_examples=None, debug=None)[source]

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters:
  • seed_specific_words (list) – List of seed specific words
  • max_non_specific_examples (int) – The number of non-specifc words to sample for training
Returns:

List of learned specific words and the classifier object

WEAT

Compute WEAT score of a Words Embedding.

WEAT is a bias measurement method for words embedding, which is inspired by the IAT (Implicit Association Test) for humans. It measures the similarity between two sets of target words (e.g., programmer, engineer, scientist, … and nurse, teacher, librarian, …) and two sets of attribute words (e.g., man, male, … and woman, female …). A p-value is calculated using a permutation-test.

Reference:

Important

The effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Stimulus and original finding from:

  • [0, 1, 2] A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in implicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).
  • [3, 4]: M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).
  • [5, 6, 9]: B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).
  • [7]: B. A. Nosek, M. R. Banaji, A. G. Greenwald, Math=male, me=female, therefore math≠me., Journal of Personality and Social Psychology 83, 44 (2002).
  • [8] P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of Artificial Intelligence Research 37, 141 (2010).
ethically.we.weat.calc_single_weat(model, first_target, second_target, first_attribute, second_attribute, with_pvalue=True, pvalue_kwargs=None)[source]

Calc the WEAT result of a words embedding.

Parameters:
  • model – Words embedding model of gensim.model.KeyedVectors
  • first_target (dict) – First target words list and its name
  • second_target (dict) – Second target words list and its name
  • first_attribute (dict) – First attribute words list and its name
  • second_attribute (dict) – Second attribute words list and its name
  • with_pvalue (bool) – Whether to calculate the p-value of the WEAT score (might be computationally expensive)
Returns:

WEAT result (score, size effect, Nt, Na and p-value)

ethically.we.weat.calc_all_weat(model, weat_data='caliskan', filter_by='model', with_original_finding=False, with_pvalue=True, pvalue_kwargs=None)[source]

Calc the WEAT results of a words embedding on multiple cases.

Note that for the effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Parameters:
  • model – Words embedding model of gensim.model.KeyedVectors
  • weat_data (dict) – WEAT cases data
  • filter_by (bool) – Whether to filter the word lists by the model (‘model’) or by the remove key in weat_data (‘data’).
  • with_original_finding (bool) – Show the origina
  • with_pvalue (bool) – Whether to calculate the p-value of the WEAT results (might be computationally expensive)
Returns:

pandas.DataFrame of WEAT results (score, size effect, Nt, Na and p-value)

Words Embedding Benchmarks

Evaluate words embedding by standard benchmarks.

Reference:

Word Pairs Tasks

  1. The WordSimilarity-353 Test Collection http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
  2. Rubenstein, H., and Goodenough, J. 1965. Contextual correlates of synonymy https://www.seas.upenn.edu/~hansens/conceptSim/
  3. Stanford Rare Word (RW) Similarity Dataset https://nlp.stanford.edu/~lmthang/morphoNLM/
  4. The Word Relatedness Mturk-771 Test Collection http://www2.mta.ac.il/~gideon/datasets/mturk_771.html
  5. The MEN Test Collection http://clic.cimec.unitn.it/~elia.bruni/MEN.html
  6. SimLex-999 https://fh295.github.io/simlex.html
  7. TR9856 https://www.research.ibm.com/haifa/dept/vst/files/IBM_Debater_(R)_TR9856.v2.zip

Analogies Tasks

  1. Google Analogies (subset of WordRep) https://code.google.com/archive/p/word2vec/source
  2. MSR - Syntactic Analogies http://research.microsoft.com/en-us/projects/rnn/
ethically.we.benchmark.evaluate_word_pairs(model, kwargs_word_pairs=None)[source]

Evaluate word pairs tasks.

Parameters:
  • model – Words embedding.
  • kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method.
Returns:

pandas.DataFrame of evaluation results.

ethically.we.benchmark.evaluate_word_analogies(model, kwargs_word_analogies=None)[source]

Evaluate word analogies tasks.

Parameters:
  • model – Words embedding.
  • kwargs_word_analogies – Kwargs for evaluate_word_analogies method.
Returns:

pandas.DataFrame of evaluation results.

ethically.we.benchmark.evaluate_words_embedding(model, kwargs_word_pairs=None, kwargs_word_analogies=None)[source]

Evaluate word pairs tasks and word analogies tasks.

Parameters:
  • model – Words embedding.
  • kwargs_word_pairs (dict or None) – Kwargs fo evaluate_word_pairs method.
  • kwargs_word_analogies – Kwargs for evaluate_word_analogies method.
Returns:

Tuple of DataFrame for the evaluation results.