# Words Embedding Bias¶

Metrics and debiasing for bias (such as gender and race) in words embedding.

Important

The following paper suggests that the current methods have an only superficial effect on the bias in words embeddings:

Gonen, H., & Goldberg, Y. (2019). Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them. arXiv preprint arXiv:1903.03862.

Currently, two methods are supported:

1. Bolukbasi et al. (2016) bias measure and debiasing - ethically.we.bias
2. WEAT measure - ethically.we.weat

Besides, some of the standard benchmarks for words embeddings are also available, primarily to check the impact of debiasing performance.

Refer to the Words Embedding demo for a complete usage example.

## Bolukbasi Bias Measure and Debiasing¶

Measuring and adjusting bias in words embedding by Bolukbasi (2016).

References:

### Usage¶

>>> from ethically.we import GenderBiasWE
>>> w2v_gender_bias_we = GenderBiasWE(w2v_model)
>>> w2v_gender_bias_we.calc_direct_bias()
0.07307904249481942
>>> w2v_gender_bias_we.debias()
>>> w2v_gender_bias_we.calc_direct_bias()
1.7964246601064155e-09


### Types of Bias¶

#### Direct Bias¶

1. Associations
Words that are closer to one end (e.g., he) than to the other end (she). For example, occupational stereotypes (page 7). Calculated by calc_direct_bias().
2. Analogies
Analogies of he:x::she:y. For example analogies exhibiting stereotypes (page 7). Generated by generate_analogies().

#### Indirect Bias¶

Projection of a neutral words into a two neutral words direction is explained in a great portion by a shared bias direction projection.

class ethically.we.bias.BiasWordsEmbedding(model, only_lower=False, verbose=False, identify_direction=False)[source]

Bases: object

Measure and adjust a bias in English words embedding.

Parameters: model – Words embedding model of gensim.model.KeyedVectors only_lower (bool) – Whether the words embedding contrains only lower case words verbose (bool) – Set vebosity
project_on_direction(word)[source]

Project the normalized vector of the word on the direction.

Parameters: word (str) – The word tor project The projection scalar
calc_projection_data(words)[source]

Calculate projection, projected and rejected vectors of a words list.

Parameters: words (list) – List of words pandas.DataFrame of the projection, projected and rejected vectors of the words list
plot_projection_scores(words, n_extreme=10, ax=None, axis_projection_step=None)[source]

Plot the projection scalar of words on the direction.

Parameters: words (list) – The words tor project or None n_extreme (int) – The number of extreme words to show The ax object of the plot
plot_dist_projections_on_direction(word_groups, ax=None)[source]

Plot the projection scalars distribution on the direction.

Parameters: word_groups word (dict) – The groups to projects The ax object of the plot
classmethod plot_bias_across_words_embeddings(words_embedding_bias_dict, words, ax=None, scatter_kwargs=None)[source]

Plot the projections of same words of two words Embeddings.

Parameters: words_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys. words (list) – Words to be projected. scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter. The ax object of the plot
generate_analogies(n_analogies=100, multiple=False, delta=1.0, restrict_vocab=30000)[source]

Generate analogies based on the bias directionself.

x - y ~ direction. or a:x::b:y when a-b ~ direction.

delta is used for semantically coherent. Default vale of 1 corresponds to an angle <= pi/3.

Parameters: n_analogies (int) – Number of analogies to generate. multiple (bool) – Whether to allow multiple appearances of a word in the analogies. delta (float) – Threshold for semantic similarity. The maximal distance between x and y. restrict_vocab (int) – The vocabulary size to use. Data Frame of analogies (x, y), their distances, and their cosine similarity scores
calc_direct_bias(neutral_words, c=None)[source]

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters: neutral_words (list) – List of neutral words c (float or None) – Strictness of bias measuring The direct bias
calc_indirect_bias(word1, word2)[source]

Calculate the indirect bias between two words.

Based on the amount of shared projection of the words on the direction.

Also called PairBias. :param str word1: First word :param str word2: Second word :type c: float or None :return The indirect bias between the two words

generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words=None, n_extreme=5)[source]

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters: neutral_positive_end (str) – A word that define the positive side of the neutral direction. neutral_negative_end (str) – A word that define the negative side of the neutral direction. words (list) – List of words to project on the neutral direction. n_extreme (int) – The number for the most extreme words (positive and negative) to show. Data Frame of the most extreme words with their projection scores and indirect biases.
debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]

Debias the words embedding.

Parameters: method (str) – The method of debiasing. neutral_words (list) – List of neutral words for the neutralize step equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction. inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the words embedding will be normalized to unit length.

evaluate_words_embedding(kwargs_word_pairs=None, kwargs_word_analogies=None)[source]

Parameters: model – Words embedding. kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method. kwargs_word_analogies – Kwargs for evaluate_word_analogies method. Tuple of pandas.DataFrame for the evaluation results.
learn_full_specific_words(seed_specific_words, max_non_specific_examples=None, debug=None)[source]

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters: seed_specific_words (list) – List of seed specific words max_non_specific_examples (int) – The number of non-specifc words to sample for training List of learned specific words and the classifier object
class ethically.we.bias.GenderBiasWE(model, only_lower=False, verbose=False, identify_direction=True)[source]

Measure and adjust the Gender Bias in English Words Embedding.

Parameters: model – Words embedding model of gensim.model.KeyedVectors only_lower (bool) – Whether the words embedding contrains only lower case words verbose (bool) – Set vebosity
plot_projection_scores(words='professions', n_extreme=10, ax=None, axis_projection_step=None)[source]

Plot the projection scalar of words on the direction.

Parameters: words (list) – The words tor project or None n_extreme (int) – The number of extreme words to show The ax object of the plot
plot_dist_projections_on_direction(word_groups='bolukbasi', ax=None)[source]

Plot the projection scalars distribution on the direction.

Parameters: word_groups word (dict) – The groups to projects The ax object of the plot
classmethod plot_bias_across_words_embeddings(words_embedding_bias_dict, ax=None, scatter_kwargs=None)[source]

Plot the projections of same words of two words Embeddings.

Parameters: words_embedding_bias_dict (dict) – WordsEmbeddingBias objects as values, and their names as keys. words (list) – Words to be projected. scatter_kwargs (dict or None) – Kwargs for matplotlib.pylab.scatter. The ax object of the plot
calc_direct_bias(neutral_words='professions', c=None)[source]

Calculate the direct bias.

Based on the projection of neutral words on the direction.

Parameters: neutral_words (list) – List of neutral words c (float or None) – Strictness of bias measuring The direct bias
generate_closest_words_indirect_bias(neutral_positive_end, neutral_negative_end, words='professions', n_extreme=5)[source]

Generate closest words to a neutral direction and their indirect bias.

The direction of the neutral words is used to find the most extreme words. The indirect bias is calculated between the most extreme words and the closest end.

Parameters: neutral_positive_end (str) – A word that define the positive side of the neutral direction. neutral_negative_end (str) – A word that define the negative side of the neutral direction. words (list) – List of words to project on the neutral direction. n_extreme (int) – The number for the most extreme words (positive and negative) to show. Data Frame of the most extreme words with their projection scores and indirect biases.
debias(method='hard', neutral_words=None, equality_sets=None, inplace=True)[source]

Debias the words embedding.

Parameters: method (str) – The method of debiasing. neutral_words (list) – List of neutral words for the neutralize step equality_sets (list) – List of equality sets, for the equalize step. The sets represent the direction. inplace (bool) – Whether to debias the object inplace or return a new one

Warning

After calling debias, all the vectors of the words embedding will be normalized to unit length.

learn_full_specific_words(seed_specific_words='bolukbasi', max_non_specific_examples=None, debug=None)[source]

Learn specific words given a list of seed specific wordsself.

Using Linear SVM.

Parameters: seed_specific_words (list) – List of seed specific words max_non_specific_examples (int) – The number of non-specifc words to sample for training List of learned specific words and the classifier object

## WEAT¶

Compute WEAT score of a Words Embedding.

WEAT is a bias measurement method for words embedding, which is inspired by the IAT (Implicit Association Test) for humans. It measures the similarity between two sets of target words (e.g., programmer, engineer, scientist, … and nurse, teacher, librarian, …) and two sets of attribute words (e.g., man, male, … and woman, female …). A p-value is calculated using a permutation-test.

Reference:

Important

The effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Stimulus and original finding from:

• [0, 1, 2] A. G. Greenwald, D. E. McGhee, J. L. Schwartz, Measuring individual differences in implicit cognition: the implicit association test., Journal of personality and social psychology 74, 1464 (1998).
• [3, 4]: M. Bertrand, S. Mullainathan, Are Emily and Greg more employable than Lakisha and Jamal? a field experiment on labor market discrimination, The American Economic Review 94, 991 (2004).
• [5, 6, 9]: B. A. Nosek, M. Banaji, A. G. Greenwald, Harvesting implicit group attitudes and beliefs from a demonstration web site., Group Dynamics: Theory, Research, and Practice 6, 101 (2002).
• [7]: B. A. Nosek, M. R. Banaji, A. G. Greenwald, Math=male, me=female, therefore math≠me., Journal of Personality and Social Psychology 83, 44 (2002).
• [8] P. D. Turney, P. Pantel, From frequency to meaning: Vector space models of semantics, Journal of Artificial Intelligence Research 37, 141 (2010).
ethically.we.weat.calc_single_weat(model, first_target, second_target, first_attribute, second_attribute, with_pvalue=True, pvalue_kwargs=None)[source]

Calc the WEAT result of a words embedding.

Parameters: model – Words embedding model of gensim.model.KeyedVectors first_target (dict) – First target words list and its name second_target (dict) – Second target words list and its name first_attribute (dict) – First attribute words list and its name second_attribute (dict) – Second attribute words list and its name with_pvalue (bool) – Whether to calculate the p-value of the WEAT score (might be computationally expensive) WEAT result (score, size effect, Nt, Na and p-value)
ethically.we.weat.calc_all_weat(model, weat_data='caliskan', filter_by='model', with_original_finding=False, with_pvalue=True, pvalue_kwargs=None)[source]

Calc the WEAT results of a words embedding on multiple cases.

Note that for the effect size and pvalue in the WEAT have entirely different meaning from those reported in IATs (original finding). Refer to the paper for more details.

Parameters: model – Words embedding model of gensim.model.KeyedVectors weat_data (dict) – WEAT cases data filter_by (bool) – Whether to filter the word lists by the model (‘model’) or by the remove key in weat_data (‘data’). with_original_finding (bool) – Show the origina with_pvalue (bool) – Whether to calculate the p-value of the WEAT results (might be computationally expensive) pandas.DataFrame of WEAT results (score, size effect, Nt, Na and p-value)

## Words Embedding Benchmarks¶

Evaluate words embedding by standard benchmarks.

Reference:

1. The WordSimilarity-353 Test Collection http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
2. Rubenstein, H., and Goodenough, J. 1965. Contextual correlates of synonymy https://www.seas.upenn.edu/~hansens/conceptSim/
3. Stanford Rare Word (RW) Similarity Dataset https://nlp.stanford.edu/~lmthang/morphoNLM/
4. The Word Relatedness Mturk-771 Test Collection http://www2.mta.ac.il/~gideon/datasets/mturk_771.html
5. The MEN Test Collection http://clic.cimec.unitn.it/~elia.bruni/MEN.html
6. SimLex-999 https://fh295.github.io/simlex.html
7. TR9856 https://www.research.ibm.com/haifa/dept/vst/files/IBM_Debater_(R)_TR9856.v2.zip

2. MSR - Syntactic Analogies http://research.microsoft.com/en-us/projects/rnn/
ethically.we.benchmark.evaluate_word_pairs(model, kwargs_word_pairs=None)[source]

Parameters: model – Words embedding. kwargs_word_pairs (dict or None) – Kwargs for evaluate_word_pairs method. pandas.DataFrame of evaluation results.
ethically.we.benchmark.evaluate_word_analogies(model, kwargs_word_analogies=None)[source]
Parameters: model – Words embedding. kwargs_word_analogies – Kwargs for evaluate_word_analogies method. pandas.DataFrame of evaluation results.
ethically.we.benchmark.evaluate_words_embedding(model, kwargs_word_pairs=None, kwargs_word_analogies=None)[source]