Performance Analysis of Generative AI in Bias Detection and Mitigation on Text Datasets

Charlotte Charlotte; Grayson Grayson; Matteo Xavier

Authors

Charlotte Charlotte School of Electrical Engineering and Computer Science, University of Ottawa, Canada
Grayson Grayson School of Electrical Engineering and Computer Science, University of Ottawa, Canada
Matteo Xavier School of Electrical Engineering and Computer Science, University of Ottawa, Canada

Keywords:

Generative Artificial Intelligence, Bias Detection, Bias Mitigation, Fairness in AI, Natural Language Processing (NLP

Abstract

This study investigates the performance of generative artificial intelligence in detecting and mitigating bias within text datasets, addressing a critical challenge in the development of fair and ethical AI systems. This research aims to provide a comprehensive evaluation framework that integrates both bias detection and mitigation, which are often studied separately in existing literature. The methodology employs multiple text datasets, including social media, news articles, and hate speech corpora, to capture diverse forms of bias. Generative models based on transformer architectures, particularly GPT-based and fine-tuned models, are evaluated alongside baseline models. Bias detection is conducted using prompt-based, classifier-based, and lexicon-based approaches, while mitigation strategies include prompt engineering, debiasing algorithms, reinforcement learning with human feedback (RLHF), and data augmentation. Model performance is assessed using a combination of classification metrics (accuracy, precision, recall, F1-score), fairness metrics (demographic parity and equal opportunity), and text quality measures (perplexity, coherence, and semantic similarity). The results indicate that all mitigation techniques contribute to reducing bias, with RLHF and hybrid approaches achieving the highest effectiveness, reducing bias scores by over 50% while significantly improving fairness metrics. This study contributes to AI fairness research by proposing an integrated evaluation framework and demonstrating that it is possible to achieve substantial bias reduction without compromising overall model performance. The findings provide practical insights for the development of more transparent, reliable, and ethically aligned generative AI systems, supporting their responsible deployment in sensitive domains such as healthcare, finance, and hiring.

Downloads

Download data is not yet available.

References

Ahmed, K. F., Wang, G., Silander, J., Wilson, A. M., Allen, J. M., Horton, R., & Anyah, R. (2013). Statistical downscaling and bias correction of climate model outputs for climate change impact assessment in the US northeast. Global and Planetary Change, 100, 320–332.

Arduini, M., Noci, L., Pirovano, F., Zhang, C., Shrestha, Y. R., & Paudel, B. (2020). Adversarial learning for debiasing knowledge graph embeddings. ArXiv Preprint ArXiv:2006.16309.

Arora, A. (2017). Evaluating Ethical Challenges in Generative AI Development and Responsible Usage Guidelines. INTERNATIONAL JOURNAL OF RESEARCH IN ELECTRONICS AND COMPUTER ENGINEERING.

Badjatiya, P., Gupta, M., & Varma, V. (2019). Stereotypical bias removal for hate speech detection task using knowledge-based generalizations. The World Wide Web Conference, 49–59.

Baeza-Yates, R., & Liaghat, Z. (2017). Quality-efficiency trade-offs in machine learning for text processing. 2017 IEEE International Conference on Big Data (Big Data), 897–904.

Balashov, Y. (2015). Translation in the Wild. Information. 2025; 16: 1077. Conference on Empirical Methods in Natural Language Processing, 17, 1412–1421.

Biswas, S., & Rajan, H. (2021). Fair preprocessing: towards understanding compositional fairness of data transformers in machine learning pipeline. Proceedings of the 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 981–993.

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., Bernstein, M. S., Bohg, J., Bosselut, A., & Brunskill, E. (2021). On the opportunities and risks of foundation models. ArXiv Preprint ArXiv:2108.07258.

Boyd, H. C., Evans, N. M., Orpwood, R. D., & Harris, N. D. (2017). Using simple technology to prompt multistep tasks in the home for people with dementia: an exploratory study comparing prompting formats. Dementia, 16(4), 424–442.

Buyl, M., & De Bie, T. (2020). Debayes: a bayesian method for debiasing network embeddings. International Conference on Machine Learning, 1220–1229.

Cawley, G. C., & Talbot, N. L. C. (2010). On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research, 11, 2079–2107.

De Almeida, P. G. R., Dos Santos, C. D., & Farias, J. S. (2021). Artificial intelligence regulation: a framework for governance. Ethics and Information Technology, 23(3), 505–525.

Dixon, L., Li, J., Sorensen, J., Thain, N., & Vasserman, L. (2018). Measuring and mitigating unintended bias in text classification. Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 67–73.

Do Carmo, F., Shterionov, D., Moorkens, J., Wagner, J., Hossari, M., Paquin, E., Schmidtke, D., Groves, D., & Way, A. (2021). A review of the state-of-the-art in automatic post-editing. Machine Translation, 35(2), 101–143.

Fiebrink, R., Cook, P. R., & Trueman, D. (2011). Human model evaluation in interactive supervised learning. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 147–156.

Haddaway, N. R., Bethel, A., Dicks, L. V, Koricheva, J., Macura, B., Petrokofsky, G., Pullin, A. S., Savilaakso, S., & Stewart, G. B. (2020). Eight problems with literature reviews and how to fix them. Nature Ecology & Evolution, 4(12), 1582–1589.

Huang, P.-S., Zhang, H., Jiang, R., Stanforth, R., Welbl, J., Rae, J., Maini, V., Yogatama, D., & Kohli, P. (2020). Reducing sentiment bias in language models via counterfactual evaluation. Findings of the Association for Computational Linguistics: EMNLP 2020, 65–83.

Ioannidis, J. P. A., Munafo, M. R., Fusar-Poli, P., Nosek, B. A., & David, S. P. (2014). Publication and other reporting biases in cognitive sciences: detection, prevalence, and prevention. Trends in Cognitive Sciences, 18(5), 235–241.

Jin, X., Barbieri, F., Kennedy, B., Davani, A. M., Neves, L., & Ren, X. (2021). On transferability of bias mitigation effects in language model fine-tuning. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3770–3783.

Kraft, A. (2021). Triggering models: Measuring and mitigating bias in german language generation. Universität Hamburg.

Kumar, T. V. (2020). Generative AI Applications in Customizing User Experiences in Banking Apps.

Lash, T. L., Fox, M. P., MacLehose, R. F., Maldonado, G., McCandless, L. C., & Greenland, S. (2014). Good practices for quantitative bias analysis. International Journal of Epidemiology, 43(6), 1969–1985.

McDuff, D., Ma, S., Song, Y., & Kapoor, A. (2019). Characterizing bias in classifiers using generative models. Advances in Neural Information Processing Systems, 32.

Mehrabi, N., Morstatter, F., Saxena, N., Lerman, K., & Galstyan, A. (2021). A survey on bias and fairness in machine learning. ACM Computing Surveys (CSUR), 54(6), 1–35.

Meister, C., & Cotterell, R. (2021). Language model evaluation beyond perplexity. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 5328–5339.

Norori, N., Hu, Q., Aellen, F. M., Faraci, F. D., & Tzovara, A. (2021). Addressing bias in big data and AI for health care: A call for open science. Patterns, 2(10).

Rothe, S., Narayan, S., & Severyn, A. (2020). Leveraging pre-trained checkpoints for sequence generation tasks. Transactions of the Association for Computational Linguistics, 8, 264–280.

Sabuhi, M., Zhou, M., Bezemer, C.-P., & Musilek, P. (2021). Applications of generative adversarial networks in anomaly detection: A systematic literature review. Ieee Access, 9, 161003–161029.

Wang, M., Yang, T., Flechas, M. A., Harris, P., Hawks, B., Holzman, B., Knoepfel, K., Krupa, J., Pedro, K., & Tran, N. (2021). GPU-accelerated machine learning inference as a service for computing in neutrino experiments. Frontiers in Big Data, 3, 604083.

Performance Analysis of Generative AI in Bias Detection and Mitigation on Text Datasets

Authors

Keywords:

Abstract

Downloads

References

Downloads

Published

How to Cite

Issue

Section

License

Halaman Sampul

International Indexing

National Accreditation

Quick Menu

Tools used

Information

Jurnal Teknik Informatika C.I.T Medicom

Policies and Regulations Link