Curiosity. Creativity. Courage. Community.
Welcome to the second issue of the CRT-AI Newsletter "The Turing Point". This time we talk about bias in AI. Even though the topic has received quite a lot of attention in recent years, the level of awareness remains low. Knowing what tools a scientist can use to spot, evaluate and mitigate any form of bias is of great importance. For this issue, our team prepared a list of interesting articles, tools and datasets that can help you on your PhD journey.
Do you want to know how biased your mind is? Try taking The Implicit Association Test (IAT).
1) Scroll down and press “I wish to proceed”
2) Find the “Gender – Science” button
3) Do that test
In recent years, scientific society has started to wrestle with just how much human biases can make their way into artificial intelligence systems—with harmful results.
"Bias in AI occurs when results cannot be generalized widely. We often think of bias resulting from preferences or exclusions in training data, but bias can also be introduced by how data is obtained, how algorithms are designed, and how AI outputs are interpreted.
How does bias get into AI? Everybody thinks of bias in training data – the data used to develop an algorithm before it is tested on the wide world. But this is only the tip of the iceberg," says Dr. Sanjiv M. Narayan from Stanford University.
Our team prepared for you several tools and articles about datasets to explore and work with bias. You can reach them by clicking the tabs "tools" and "datasets". In order to truly understand where the bias begins, you can try The Implicit Association Test.
Knowing that a problem exists is not enough; one might want to solve it. Below you can find four popular tools that can assist you with evaluating and mitigating bias. To explore it further, just click on a title (GitHub page will be opened).
A broad-purpose tool that helps in surfacing the under and different representations in visual datasets along the dimensions of object-based, person-based and geography-based (locations). It provides findings and leaves it to the user’s judgment and expertise on what is 'biased' as it is contested. This tool operates by investigating datasets and its annotations to identify model-agnostic patterns. 
An open-source tool with minimal coding that helps to probe, visualize and analyze ML systems. It helps the practitioners to test hypothetical cases and analyze the importance of data features, visualize model behavior across different models and subsets of input data. The tool supports local and global model understanding and different data and model types. It also helps evaluate performance and fairness. One of the drawbacks is it works only with Tensorflow. 
Clarify as a tool is integrated with Amazon SageMaker which is a fully managed service to build, train and deploy ML models at any scale. Clarify helps bias detection and feature importance computation across the ML lifecycle: data preparation, model evaluation and post-deployment monitoring. 11 metrics are supported in the tool currently. It implements the model-agnostic KernelSHAP algorithm for explainability as it performs favorably compared to LIME. 
LiFT is a framework for scalable computation of fairness metrics for large ML systems. It supports 6 fairness metrics. The other publicly available fairness toolkits are designed for single system execution, constraining it to smaller datasets. The tool focuses on providing a framework that is flexible, scalable and easy to integrate into offline workflows instead of focusing on a comprehensive set of fairness metrics. It leverages distributed computation where possible. 
The introduction of AI accentuates the bias phenomena in human civilization, such as racial prejudice and gender discrimination, which has not been eradicated in human society, especially in the digital age. This issue focuses on gender bias in AI, as seen by the papers mentioned in the table, which range from recruiting to advertising suggestions, image search, machine translation, and voice assistant speech recognition, among other topics. Due to the unequal gender of the samples included in the dataset and the imbalanced gender of the developers working on AI research, it is impossible to totally eradicate gender bias in AI in the short term, as these papers demonstrate.
Amazon's artificial intelligence program penalized candidates with the word "women" in their resume, sources say. The company has edited the program to be neutral to these terms, but sources said that was no guarantee that the machines would not devise other discriminatory ways of sorting candidates.learn more
Google showed ads to simulated male users that promised large salaries more frequently than to simulated female users. This is the first study to provide statistically significant evidence of discrimination in online advertising when demographic information is supplied via a transparency-control mechanism (i.e., the Ad Settings page).
Gender bias is evident in image search results for a variety of occupations. Findings include the fact that the top 100 results of "Chief Executive" only return 10% women, and images chosen to represent those careers are biased towards male-dominated fields.learn more
Google Translate exhibits a strong tendency toward male defaults. This is particularly true for fields associated with unbalanced gender or stereotypes such as STEM (Science, Technology, Engineering, and Mathematics) jobs. Research provides experimental evidence that Google Translate yields male defaults much more frequently than what would be expected from demographic data.
A White American male has a 92% accuracy rate when it comes to being understood by voice-enabled assistants. A mixed-race American woman only has a 69% chance of being understood, according to the US National Association for Voice-Enabled Assistants (VAAD).learn more
Gender Shades project evaluated the accuracy of AI-powered gender classification products. All companies perform better on males than females with an 8.1% - 20.6% difference in error rates. The results were clear when analyzed by intersectional subgroups.learn more
Fancy a collaboration with your CRT-AI mates? You have created that amazing tool but nobody knows about it? You have spent hundreds of hours gathering this incredible database but you are the only one using it? We have your back!
The CRT-AI’s Padlet now provides you with a nice place to advertise your work and find your AI soulmate <3. Whether you are looking for a collaboration or have a nice database to share, you can publish a post with everything you want people to know. If it isn’t your case yet, remember to save this Padlet to your favorites; you could be surprised by how inspiring this mood board will soon be! Here we welcome every single one of our members and want everyone to feel comfortable. You can therefore post by yourself or by filling in our special form!
Despite all of them being synthetic voices, default assistants’ voices have one thing in common: they sound feminine. Here, we try to investigate why it is so.
Although these frequencies overlap, women and men tend to use different frequency ranges when they speak. They also have distinct intonation patterns and, more broadly, their speech is tinted with lexical, syntactic and pronunciation differences.
Research shows that pitch tends to differ: typical male voices are usually comprised between 100 Hz and 200 Hz, while typical female voices are between 120 Hz and 350 Hz .
Yet, it is interesting to notice the overlap between these two vocal ranges.
Voices also differ in terms of prosody, i.e the intonation of an utterance follows different patterns depending on the gender of the person (ex: rising intonation at the end of an assertive sentence is more common in women’s speech) .
On top of it, speech is not just a voice, and many sociolinguistic works have highlighted the differences in pronunciation, vocabulary, and syntax between women's and men’s speeches .
Tough one. While puberty impacts the vocal tract of men and women, language is socially constructed. Proof of this is that by the age of 4, children have already embraced the linguistic form that matches their gender.
That is the one big question in psycholinguistics, the field that studies language development in humans.
Innate to some extent…
Yes, with puberty, the length of the vocal cords and the shape of the larynx change and grow differently depending on the sex of the individual. These organs determine the fundamental frequency of someone’s speech, thus impacting their pitch (you can read more about this by researching the 'source-filter theory' in speech processing) .
…but mostly learnt.
Yet, voice is such a big part of someone’s identity that it necessarily matches some social criteria. It can be trained -see the outrageous quantity of Youtube Videos to train a deeper voice- to match social expectations. Not only actors and actresses are often able to use different voices, but even one individual might proceed to a change of pitch depending on the context of the interaction and their intention. Thus, even though the distinction between male and female voices becomes clearer during the teenage years -when one’s body undergoes various changes, the social pressure to fit in the box that teenagers undergo throughout these years is another aspect that should not be overlooked. Another example of the social impact of gender on voices is that children tend to have distinct voices by age 4 despite still having the exact same vocal tract .
Finally, and as mentioned in the previous paragraph, all the distinctions in vocabulary and intonation between genders are strong supports of the social aspect of speech.
Female voices are preferred by customers and science overwhelmingly shows that they are more intelligible; even though this has nothing to do with their acoustic properties.
The social construction of the female voice
Higher pitch is usually associated with youth and femininity. It is perceived as more melodious and uttered by someone calmer, less threatening, and more submissive . To its extreme, yet, it might also sound too shrill. On the other hand, lower pitch associated with masculinity is perceived as more threatening, determined, but also suaver .
Technology wise: a recipe for success
1 .Find the scientific support you need
The literature is quite solid on that topic and female voices are overwhelmingly reported as being more intelligible , . It is hypothesized that this would be caused by pronunciation differences between female and male rather than their higher pitch. These difference could be overcome by training the assistant to pronounce vowels well.
In fact, higher-pitched voices are even detrimental to people suffering from presbycusis (hearing loss): higher frequencies are the first ones to be affected, making it harder for older people to hear a feminine voice.
2. Please your customers
Yet, the use of stereotypical voices allows a better mental representation of the vocal assistants (AI), which is beneficial to the company: “this is because users prefer simple, easy-to-understand interface, and a stereotype can provide a solid image of computer agent that is more familiar and acceptable to users.” .
3. Confirm their biases
Further investigation revealed to what extent the feminine prejudices were embodied within the technology. By simply engaging in a conversation with South Korean vocal assistants (VA), they revealed that the AIs depicted themselves as beautiful women, who would remain nice, available and quiet, even when insulted and sexualised .
They’ve already done it!
A Suitable Neutral Voice? Really?
Here, we question the neutrality of this voice, and argue that it might just move the problem somewhere else and reinforce people’s prejudices. The question of neutrality in linguistics was examined and we reached the following conclusion that we, speakers of a language, have the power to decide what is neutral… but only within our own linguistic community.
Indeed, people cannot agree on the gender of this voice. In this sense, this voice sounds gender-neutral. Yet, its neutrality might stop just there. Q was developed thanks to the recording of non-binary people. The assumption was that, considering the hugely social aspect of voice their voices were more likely to be non-gendered. Even though this voice does not perfectly fit in the box of female or male, it is not outside of all boxes either. The prosody, intonation and vocabulary may reproduce patterns creating new prejudices. In linguistics neutrality usually just means “standardised''. What is standard and what people are used to is considered neutral. Many people who listened to Q were confused and unable to get a mental representation of “who” was speaking. This does not necessarily mean that we should give up on this voice… On the contrary, maybe should we make it more prominent and encourage its use, until it becomes a new standard..
A Global Challenge?
In conclusion, all examples discussed were taken from English-speaking voices. The question of cross-language representation is still necessary. Speech frequencies, prosodic contour, phonological systems, and many more aspects of speech are language-specific . Therefore, these “neutral voices” have to be thought of and created for every language, and every accent. Furthermore, if Q’s methodology was adopted by every linguistic community, non-binary people would have to be recruited. There may not be a global answer, and perhaps all we can do is to speak up for our own linguistic community and support what sounds good to us, encouraging more representation and fewer biases in voices.
 C. Pernet and P. Belin, “The Role of Pitch and Timbre in Voice Gender Categorization,” Front. Psychol., vol. 3, 2012, Accessed: Jun. 17, 2022. [Online]. Available: https://www.frontiersin.org/article/10.3389/fpsyg.2012.00023
 Anthony Pym, Do women and men use language the same way?, (Jan. 25, 2019). Accessed: Jun. 16, 2022. [Online Video]. Available: https://www.youtube.com/watch?v=Txd93vZQHWU
 J. Holmes and N. Wilson, An introduction to sociolinguistics, Fifth Edition. London ; New York, NY: Routledge, 2017.
 L. Véron, “La voix neutre n’existe pas.” Accessed: Jun. 17, 2022. [Online]. Available: https://www.binge.audio/podcast/parler-comme-jamais/la-voix-neutre-nexiste-pas/
 H. K. Vorperian and R. D. Kent, “Vowel Acoustic Space Development in Children: A Synthesis of Acoustic and Anatomic Data,” J. Speech Lang. Hear. Res. JSLHR, vol. 50, no. 6, pp. 1510–1545, Dec. 2007, doi: 10.1044/1092-4388(2007/104).
 A. Arnold and M. Candea, “Comment étudier l’influence des stéréotypes de genre et de race sur la perception de la parole ?,” Lang. Société, vol. 152, no. 2, pp. 75–96, 2015, doi: 10.3917/ls.152.0075.
 A. Arnold, “La voix genrée, entre idéologies et pratiques – Une étude sociophonétique,” p. 311.
 A. R. Bradlow, G. M. Torretta, and D. B. Pisoni, “Intelligibility of normal speech I: Global and fine-grained acoustic-phonetic talker characteristics,” Speech Commun., vol. 20, no. 3, pp. 255–272, Dec. 1996, doi: 10.1016/S0167-6393(96)00063-5.
 H.-B. Kwon, “Gender difference in speech intelligibility using speech intelligibility tests and acoustic analyses,” J. Adv. Prosthodont., vol. 2, no. 3, pp. 71–76, Sep. 2010, doi: 10.4047/jap.2010.2.3.71.
 G. Hwang, J. Lee, C. Y. Oh, and J. Lee, “It Sounds Like A Woman: Exploring Gender Stereotypes in South Korean Voice Assistants,” in Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems, New York, NY, USA, May 2019, pp. 1–6. doi: 10.1145/3290607.3312915.
Episode 2 of the Turing Point Podcast! CRT-AI PhD students Anais Murat and Sharmi Dev Gupta moderate a lively conversation with Dr. Begum Genc and CRT-AI student Buvana Ganesh about the impacts of gender bias in Artificial Intelligence.
Implicit bias refers to a bias or prejudice that is present and affects our understanding and actions but is not consciously held. Such bias includes any attitudes or stereotypes that influence our decisions and behaviour in an unconscious manner. This type of bias is implicit because we may be unaware that our thoughts and feelings towards people or ideas are not neutral. We can act on the basis of stereotypes or prejudice without meaning to do so. These biases may exist towards ethnic groups, gender, physical abilities, sexual orientation and more. Often implicit biases are observed and absorbed in social, cultural, familial and institutional environments. For example, we may have a tendency to prefer people with types of names that are similar to our own and judge others who have a different type of name. The best way to reduce implicit biases is to become aware of them.
With explicit bias, individuals are aware of their attitudes and beliefs about a person or group. They consciously hold preferences (negative or positive) about certain ideas, perceived groups and people. The key difference between implicit and explicit bias is that one happens at a subconscious level and the other is at the conscious level. Explicit bias is relatively easy to identify as it is made obvious through conscious expressions of explicit bias, such as discrimination or hate speech. Examples of explicit bias include racism, sexism, ageism and xenophobia. Since explicit biases are part of deliberate thought, and based on prejudice and stereotypes, they can be changed. This can come from understanding the bias though learning from others, having positive interactions and reading more about the bias.
Data feminism is a framework for thinking about data, informed by intersectional feminist thought. It is characterised by a focus on direct experience and by a commitment to action. Data feminism acknowledges that power is not equally distributed in the world and that data is a form of power for those who have the resources to collect and analyse it. It uncovers how standard practices in data science serve to reinforce existing inequalities, such as in medical care and hiring processes. However, data feminism is also a call to use data science to challenge and change the distribution of power. It questions binary and hierarchical classification systems in order to promote equitable and gender-sensitive data systems. Data feminism isn’t only about women, isn’t only for women and isn’t only about gender. It is about power and working towards a society where everyone benefits equitably from data and technology.
Underpinning automation bias is the human tendency to trust automated systems over our own judgements. A simple example is blindly following GPS when our instinct is that we are not at the correct location. In an increasingly AI-driven and automated world, many decisions are made by computer systems rather than by humans. Automation bias describes this predisposition to depend excessively on automated systems. Over-reliance on automation can lead to automated information, which is incorrect, overriding correct decisions. For example, an incorrect word substitution in an article, done by autocorrection but missed by a human editor because they trusted the automated system to catch errors. More serious examples come into play when we look at applications of automation in healthcare, such as clinical decision support systems, where consequences of an error can be serious. Training and emphasising user accountability are ways to mitigate automation bias.
Algorithmic fairness aims to understand and prevent bias in machine learning. All models will make incorrect predictions but if these errors systematically disadvantage a particular group of people, that model is considered biased or unfair. For example, an unfair model could offer lower credit limits more often to women than men. While the algorithms themselves, as mathematical functions, are not inherently biased, they are trained on data that reflects historical, social and cultural human biases. Examples of algorithmic bias or unfairness have been found in many fields, including image search results, financial decision-making and predictive policing. As machine learning models are increasingly being used to make important decisions in all aspects of our lives, the consequences of incorrect predictions could be harmful, not just for the individual but for entire groups. Efforts to achieve algorithmic fairness look at ways to ensure model outputs are not correlated with certain sensitive or protected characteristics such as age, race, gender or sexuality.
The meet up was held in the BNY Mellon offices Dublin on the 25th May 2022. Researchers from third level institutions and guests from industry were in attendance. During the event, the panelists shared their career stories; from where they started to where they currently are. They did not fail to bring in the good, bad, and ugly as the theme of the event was Diversity Driving Innovation. This was a very topical issue and the aim was to increase awareness around women's participation in the IT sector.
Dr Suzanne Little opened with a discussion around her career and research to date and mentioned that she had no intention of becoming a lecturer at the start of her career. She also touched upon on how she seized opportunities and how her ambitions and curiosity positively impacted her life.
Joanna Murphy also contributed to her story of how she switched between jobs and ultimately reached her current position. She stressed the fact that it was not always plain sailing. During the meet up, participants had the opportunity to ask questions about how industries are adapting culturally to include more women in their organizations. Dr Little touched upon the fact that just reaching out to other females to encourage them to enter the field is not enough.
Eoin Lane shared a story about a female colleague in his department that was performing remarkably well and yet left the industry. He asked the panel and audience what were the factors that could affect women's decision making when considering career change or moves out of the industry.
Dating back to 1878, Emma Nutt was considered the first woman telephone operator. Her voice was so well received that she soon became the standard for all other companies. By the 1880s, all telephone operators were women. We now have over a hundred years of female voice recordings that can be used to create new automated voice assistants. Gender bias in voice automation reflects this lack of male voice data as well as accepted assumptions about the female voice .
Developers rely on female voices because of the lack of an equivalent male voice training set . As a result, the first speech assistant was made using a female voice, as it was easier to use existing data than develop a new dataset. Furthermore, even if a new male voice dataset were created, there is a risk that it might not be as well received as the female version. It was simply more expedient for companies to use existing data known to have general public acceptance, even if this meant perpetuating gender bias .
In Machine Learning (ML), a model makes predictions based on the data provided as a training set. Natural Language Processing (NLP) has enabled ML models to recognise the gender of voices. If the training data is imbalanced and uses more samples from one class, then there will be a bias towards that class. The model can make more accurate predictions for the data it has seen most frequently .
As humans, we may assume that when we communicate, we do so without bias towards a particular group. We interpret different languages and perform specific tasks based on making meaning from the language. When machines are tasked with replicating this process, computers 'recognise' speech, and then the natural language unit performs the 'understanding' of words, referred to as interpreters . This process is one of AI's most complex tasks, as the system attempts to generate output that is as 'natural’ as human speech. Gender bias appears because of various ambiguities in ML models, including lexical ambiguity. As AI bots might be unable to 'recognise' words correctly, they perform predictions based on limited data sets. Any bias in the dataset will be reflected in the predictions made. Recognition errors can also result from pragmatic ambiguity, which gives different meanings to different words and sentences, depending on context .
Researchers at Stanford University found that individuals deal with machines in the same manner they treat humans. A lack of diversity amongst developers has resulted in algorithms with significant bias in AI models, as the models do not reflect the broader population . Due to the nature of the data, AI bots replicate and can reinforce gender assumptions built into the original data, for example, the association of female-voiced assistants with a submissive and malleable nature .
How to Make Automatic Speech Recognition (ASR) Systems Less Biased?
Due to the complexity and richness of human speech, it is unrealistic to believe that bias (in the ML sense) will ever be eliminated completely from Automated Speech Recognition (ASR) systems. It is reasonable to think that it will take time for improvements in AI and ML. After all, as humans, we occasionally have difficulty comprehending speakers of other languages or those with different accents.
What can be done when it is clear that a model is not working well for a particular set of people? There are several approaches to help improve model performance.
One solution involves examining the whole ML pipeline: (i) Dataset, (ii) Training (or the model) and (iii) Results. Within the dataset, ensuring a balanced distribution of all subgroups can be achieved when pre-processing the data. During training, one can include the constraints of fairness like demographic parity (statistical independence between outcome and demographics), so that the model optimises for accuracy and fairness. Finally, one can make post-hoc changes to the outputs so that the demographic distribution is balanced. The figure below indicates techniques that can be used to address bias at each stage [6,7].
Within ML, biased data produces biased models: whether it is intentional or unintentional, any data not reflective of the range of potential outcomes, will result in bias. Sampling bias can lead to inaccurate models, particularly if they are built using historical datasets which have built-in biases. For example, if a company trains a model to assist in decision-making about promotions and has a poor track record of promoting women, its model will likely make the same biased decisions because of the nature of the training data. Similarly, a model trained on speech that was simple for its inventor to collect (from themselves, their family and friends), who might all speak with similar accents or inflection, the resulting ASR model may reflect a preference for such voices and may not recognise those with a different tone or accent .
While there is no easy, universal method for identifying and addressing bias in ASR systems, it is important that all data is examined for potential bias before models are developed and deployed. Observe patterns in the data, anticipate the populations affected by the model’s decisions and be aware of what is missing from the dataset.
 Adapt. 2022. Gender Bias in AI: Why Voice Assistants Are Female | Adapt - Adapt. [online] Available at: [Accessed 22 May 2022].
 Robison, C., 2022. How AI bots and voice assistants reinforce gender bias. [online] Brookings. Available at: [Accessed 20 May 2022].
 Harvard Business Review. 2022. Voice Recognition Still Has Significant Race and Gender Biases. [online] Available at: [Accessed 20 May 2022].
 L. Zhang, Y. Wu, and X. Wu, “A causal framework for discovering and removing direct and indirect discrimination,” CoRR, vol. abs/1611.07509, 2016
 Female IBM Researchers are helping AI overcome bias and find its voice | IBM Research Blog. [online] IBM Research Blog. Available at: [Accessed 25 May 2022].
 Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K. and Galstyan, A., 2021. A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), pp.1-35.
 Yuan, M., Kumar, V., Ahmad, M.A. and Teredesai, A., 2021. Assessing Fairness in Classification Parity of Machine Learning Models in Healthcare. arXiv preprint arXiv:2102.03717.