Re-evaluating the role of personal statements in pediatric residency admissions in the era of artificial intelligence: Comparing faculty ratings of human and AI-generated statements

Curry, Brittany; Kirpalani, Amrit; Remington, Mia; Van Hooren, Tamara; Shen, Ye; Peebles, Erin R

doi:https://doi.org/10.36834/cmej.81345

Feuilleter les articles de ce numéro

Investigating the threat of AI to undergraduate medical school admissions: A study of its potential impact on the rating of applicant essays

Six ways to get a grip on computer vision syndrome in medical school examinations

Plan de l’article

Retour au début
Résumé
Bibliography

Boîte à outils

PDFTélécharger
Citer cet article
Partager

Résumés

Abstract

Background: Personal statements play a large role in pediatric residency applications, providing insights into candidates’ motivations, experiences, and fit for the program. With large language models (LLMs) such as Chat Generative Pre-trained Transformer (ChatGPT), concerns have arisen regarding how this may influence the authenticity of statements in evaluating candidates. This study investigates the efficacy and perceived authenticity of LLM-generated personal statements compared to human-generated statements in residency applications.

Methods: We conducted a blinded study comparing 30 ChatGPT-generated personal statements with 30 human-written statements. Four pediatric faculty raters assessed each statement using a standardized 10-point rubric. We analyzed the data using linear mixed-effects models, a chi-square sensitivity analysis, an evaluation of rater accuracy in identifying statement origin as well as consistency of scores amongst raters using intraclass correlation coefficients (ICC).

Results: There was no significant difference in mean scores between AI and human-written statements. Raters could only identify the source of a letter (AI or human) with 59% accuracy. There was considerable disagreement in scores between raters as indicated by negative ICCs.

Conclusions: AI-generated statements were rated similarly to human-authored statements and were indistinguishable by reviewers, highlighting the sophistication of these LLM models and the challenge in detecting their use. Furthermore, scores varied substantially between reviewers. As AI becomes increasingly used in application processes, it is imperative to examine its implications in the overall evaluation of applicants.

Résumé

Contexte : Les lettres de motivation jouent un rôle crucial dans les candidatures aux résidences en pédiatrie, car elles permettent de mieux comprendre les motivations, l’expérience et l’adéquation des candidats au programme. L’utilisation de grands modèles de langage (GML), tels que ChatGPT (Chat Generative Pre-trained Transformer), a soulevé des interrogations quant à l’authenticité des lettres lors de l’évaluation des candidats. Cette étude examine l’efficacité et l’authenticité perçue des lettres de motivation générées par un GML par rapport à celles rédigées par des humains dans le cadre des candidatures aux résidences.

Méthodes : Nous avons mené une étude en aveugle comparant 30 lettres générées par ChatGPT à 30 lettres rédigées par des humains. Quatre membres du corps professoral en pédiatrie ont évalué chaque lettre à l’aide d’une grille d’évaluation standardisée sur 10 points. Les données ont été analysées à l’aide de modèles linéaires mixtes, d’une analyse de sensibilité par le test du χ², d’une évaluation de la précision des évaluateurs quant à l’identification de l’origine des lettres, ainsi que de la cohérence des scores entre évaluateurs à l’aide des coefficients de corrélation intraclasse (CCI).

Résultats : Aucune différence significative n’a été observée entre les scores moyens des lettres générées par l’IA et ceux des lettres rédigées par des humains. Les évaluateurs n’ont pu identifier la source d’une lettre (IA ou humain) qu’avec une précision de 59 %. Un désaccord considérable a été constaté entre les évaluateurs concernant les scores, comme l’indiquent les CCI négatifs.

Conclusions : Les lettres générées par l’IA ont été évaluées de manière similaire à celles rédigées par des humains et étaient indiscernables pour les évaluateurs, ce qui souligne la sophistication de ces modèles GML et la difficulté à détecter leur utilisation. De plus, les scores variaient considérablement d’un évaluateur à l’autre. À mesure que l’IA se généralise dans les processus de candidature, il est impératif d’examiner ses implications dans l’évaluation globale des candidats.

Parties annexes

Bibliography

Whalen A. CaRMS. 2024. Available from https://www.carms.ca/ [Accessed Oct 30, 2024].
Google Scholar
Dirschl DR. MD. Scoring of orthopaedic residency applicants: Is a scoring system reliable? Clin Orthop Relat Res. 2002;399:260-264. https://doi.org/10.1097/00003086-200206000-00033
10.1097/00003086-200206000-00033 Google Scholar
Hostetter L, Kelm D, Nelson D. Ethics of writing personal statements and letters of recommendations with large language models. ATS Sch. 2024;0038PS. https://doi.org/10.34197/ats-scholar.2024-0038PS
10.34197/ats-scholar.2024-0038PS Google Scholar
Zumsteg JM, Junn C. Will ChatGPT match to your program. Am J Phys Med Rehabil. 2023;102(6):545-547. https://doi.org/10.1097/PHM.0000000000002238
10.1097/PHM.0000000000002238 Google Scholar
White BA, Sadoski M, Thomas S, Shabahang M. Is the evaluation of the personal statement a reliable component of the general surgery residency application? J Surg Educ. 2012;69(3):340-343. https://doi.org/10.1016/j.jsurg.2011.12.003
10.1016/j.jsurg.2011.12.003 Google Scholar
Burke H, Kazinka R, Gandhi R, et al. Artificial intelligence-generated writing in the ERAS personal statement: an emerging quandary for post-graduate medical education. Acad Psychiatry. 2025; 49:13-17. https://doi.org/10.1007/s40596-024-02080-9
10.1007/s40596-024-02080-9 Google Scholar
Patel V, Deleonibus A, Wells MW, Bernard SL, Schwarz GS. Distinguishing authentic voices in the age of ChatGPT: comparing AI-generated and applicant-written personal statements for plastic surgery residency application. Ann Plast Surg. 2023;91(3):324-325. https://doi.org/10.1097/SAP.0000000000003653
10.1097/SAP.0000000000003653 Google Scholar
Whitrock J, Pratt C, Carter M, et al. Does using artificial intelligence take the person out of personal statements? We can't tell. Surg. 2024;176(6):1610-1616. https://doi.org/10.1016/j.surg.2024.08.018
10.1016/j.surg.2024.08.018 Google Scholar
Johnstone RE, Neely G, Sizemore DC. Artificial intelligence software can generate residency application personal statements that program directors find acceptable and difficult to distinguish from applicant compositions. J Clin Anesth. 2023;89:111185. https://doi.org/10.1016/j.jclinane.2023.111185
10.1016/j.jclinane.2023.111185 Google Scholar
Gao CA, Howard FM, Markov N.S., et al. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med. 2023;6(75). https://doi.org/10.1038/s41746-023-00819-6
10.1038/s41746-023-00819-6 Google Scholar
Chen J, Tao BK, Park S, Bovill E. Can ChatGPT fool the match? Artificial intelligence personal statements for plastic surgery residency applications: a comparative study. Plastic Surg. 2024;33(2):348-353. https://doi.org/10.1177/22925503241264832
10.1177/22925503241264832 Google Scholar
Lum ZC, Guntupalli L, Saiz AM, et al. Can artificial intelligence fool residency selection committees? Analysis of personal statements by real applicants and generative AI, a randomized, single-blind multicenter study. JB JS Open Access. 2024;9(4):e24.00028. https://doi.org/10.2106/JBJS.OA.24.00028
10.2106/JBJS.OA.24.00028 Google Scholar
Christophers B, Marr MC, Pendergrast TR. Medical school admission policies disadvantage low-income applicants. Perm J. 2022;26(2):172-176. https://doi.org/10.7812/TPP/21.181
10.7812/TPP/21.181 Google Scholar
Shadan M, Chhapra HU, Mashooq FN. Navigating challenges: Supporting non-native speaking medical students with AI and mentorship. Cogent Educ. 2024;12(1). https://doi.org/10.1080/2331186X.2025.2563991
10.1080/2331186X.2025.2563991 Google Scholar
Taylor C, Weinstein L, Mayhew H. The process of resident selection: A view from the residency director's desk. Obstet Gynecol. 1995;85(2):299-303. https://doi.org/10.1016/0029-7844(94)00388-T
10.1016/0029-7844(94)00388-T Google Scholar
Max BA, Gelfand B, Brooks MR, Beckerly R, Segal S. Have personal statements become impersonal? An evaluation of personal statements in anesthesiology residency applications. J Clin Anesth. 2010;22(5):346-351. https://doi.org/10.1016/j.jclinane.2009.10.007
10.1016/j.jclinane.2009.10.007 Google Scholar
Matsubara, S. Comment on "Artificial intelligence-generated writing in the ERAS personal statement: an emerging quandary for post-graduate medical education". Acad Psych. 2025;49,200-201. https://doi.org/10.1007/s40596-025-02123-9
10.1007/s40596-025-02123-9 Google Scholar
Matsubara S, Matsubara D. Letter regarding: "Digital ink and surgical dreams: perceptions of artificial intelligence-generated essays in residency applications." J Surg Res. 2024;303:797-8. https://doi.org/10.1016/j.jss.2024.08.025
10.1016/j.jss.2024.08.025 Google Scholar
Subillaga O, Coulter AP, Tashjian D, Seymour N, Hubbs D. Artificial intelligence-assisted narratives: analysis of surgical residency personal statements. J Surg Educ. 2025;18:103566. https://doi.org/10.1016/j.jsurg.2025.103566
10.1016/j.jsurg.2025.103566 Google Scholar
Montemayor C, Halpern J, Fairweather A. In principle obstacles for empathic AI: why we can't replace human empathy in healthcare. AI Soc. 2022;37(4):1353-1359. https://doi.org/10.1007/s00146-021-01230-z
10.1007/s00146-021-01230-z Google Scholar

Résumés

Abstract

Résumé

Parties annexes

Bibliography

Outils de citation

Citer cet article

Exporter la notice de cet article