Résumés
Abstract
Background: Personal statements play a large role in pediatric residency applications, providing insights into candidates’ motivations, experiences, and fit for the program. With large language models (LLMs) such as Chat Generative Pre-trained Transformer (ChatGPT), concerns have arisen regarding how this may influence the authenticity of statements in evaluating candidates. This study investigates the efficacy and perceived authenticity of LLM-generated personal statements compared to human-generated statements in residency applications.
Methods: We conducted a blinded study comparing 30 ChatGPT-generated personal statements with 30 human-written statements. Four pediatric faculty raters assessed each statement using a standardized 10-point rubric. We analyzed the data using linear mixed-effects models, a chi-square sensitivity analysis, an evaluation of rater accuracy in identifying statement origin as well as consistency of scores amongst raters using intraclass correlation coefficients (ICC).
Results: There was no significant difference in mean scores between AI and human-written statements. Raters could only identify the source of a letter (AI or human) with 59% accuracy. There was considerable disagreement in scores between raters as indicated by negative ICCs.
Conclusions: AI-generated statements were rated similarly to human-authored statements and were indistinguishable by reviewers, highlighting the sophistication of these LLM models and the challenge in detecting their use. Furthermore, scores varied substantially between reviewers. As AI becomes increasingly used in application processes, it is imperative to examine its implications in the overall evaluation of applicants.
Résumé
Contexte : Les lettres de motivation jouent un rôle crucial dans les candidatures aux résidences en pédiatrie, car elles permettent de mieux comprendre les motivations, l’expérience et l’adéquation des candidats au programme. L’utilisation de grands modèles de langage (GML), tels que ChatGPT (Chat Generative Pre-trained Transformer), a soulevé des interrogations quant à l’authenticité des lettres lors de l’évaluation des candidats. Cette étude examine l’efficacité et l’authenticité perçue des lettres de motivation générées par un GML par rapport à celles rédigées par des humains dans le cadre des candidatures aux résidences.
Méthodes : Nous avons mené une étude en aveugle comparant 30 lettres générées par ChatGPT à 30 lettres rédigées par des humains. Quatre membres du corps professoral en pédiatrie ont évalué chaque lettre à l’aide d’une grille d’évaluation standardisée sur 10 points. Les données ont été analysées à l’aide de modèles linéaires mixtes, d’une analyse de sensibilité par le test du χ², d’une évaluation de la précision des évaluateurs quant à l’identification de l’origine des lettres, ainsi que de la cohérence des scores entre évaluateurs à l’aide des coefficients de corrélation intraclasse (CCI).
Résultats : Aucune différence significative n’a été observée entre les scores moyens des lettres générées par l’IA et ceux des lettres rédigées par des humains. Les évaluateurs n’ont pu identifier la source d’une lettre (IA ou humain) qu’avec une précision de 59 %. Un désaccord considérable a été constaté entre les évaluateurs concernant les scores, comme l’indiquent les CCI négatifs.
Conclusions : Les lettres générées par l’IA ont été évaluées de manière similaire à celles rédigées par des humains et étaient indiscernables pour les évaluateurs, ce qui souligne la sophistication de ces modèles GML et la difficulté à détecter leur utilisation. De plus, les scores variaient considérablement d’un évaluateur à l’autre. À mesure que l’IA se généralise dans les processus de candidature, il est impératif d’examiner ses implications dans l’évaluation globale des candidats.
Parties annexes
Bibliography
- Whalen A. CaRMS. 2024. Available from https://www.carms.ca/ [Accessed Oct 30, 2024].
- Dirschl DR. MD. Scoring of orthopaedic residency applicants: Is a scoring system reliable? Clin Orthop Relat Res. 2002;399:260-264. https://doi.org/10.1097/00003086-200206000-00033
- Hostetter L, Kelm D, Nelson D. Ethics of writing personal statements and letters of recommendations with large language models. ATS Sch. 2024;0038PS. https://doi.org/10.34197/ats-scholar.2024-0038PS
- Zumsteg JM, Junn C. Will ChatGPT match to your program. Am J Phys Med Rehabil. 2023;102(6):545-547. https://doi.org/10.1097/PHM.0000000000002238
- White BA, Sadoski M, Thomas S, Shabahang M. Is the evaluation of the personal statement a reliable component of the general surgery residency application? J Surg Educ. 2012;69(3):340-343. https://doi.org/10.1016/j.jsurg.2011.12.003
- Burke H, Kazinka R, Gandhi R, et al. Artificial intelligence-generated writing in the ERAS personal statement: an emerging quandary for post-graduate medical education. Acad Psychiatry. 2025; 49:13-17. https://doi.org/10.1007/s40596-024-02080-9
- Patel V, Deleonibus A, Wells MW, Bernard SL, Schwarz GS. Distinguishing authentic voices in the age of ChatGPT: comparing AI-generated and applicant-written personal statements for plastic surgery residency application. Ann Plast Surg. 2023;91(3):324-325. https://doi.org/10.1097/SAP.0000000000003653
- Whitrock J, Pratt C, Carter M, et al. Does using artificial intelligence take the person out of personal statements? We can't tell. Surg. 2024;176(6):1610-1616. https://doi.org/10.1016/j.surg.2024.08.018
- Johnstone RE, Neely G, Sizemore DC. Artificial intelligence software can generate residency application personal statements that program directors find acceptable and difficult to distinguish from applicant compositions. J Clin Anesth. 2023;89:111185. https://doi.org/10.1016/j.jclinane.2023.111185
- Gao CA, Howard FM, Markov N.S., et al. Comparing scientific abstracts generated by ChatGPT to real abstracts with detectors and blinded human reviewers. NPJ Digit Med. 2023;6(75). https://doi.org/10.1038/s41746-023-00819-6
- Chen J, Tao BK, Park S, Bovill E. Can ChatGPT fool the match? Artificial intelligence personal statements for plastic surgery residency applications: a comparative study. Plastic Surg. 2024;33(2):348-353. https://doi.org/10.1177/22925503241264832
- Lum ZC, Guntupalli L, Saiz AM, et al. Can artificial intelligence fool residency selection committees? Analysis of personal statements by real applicants and generative AI, a randomized, single-blind multicenter study. JB JS Open Access. 2024;9(4):e24.00028. https://doi.org/10.2106/JBJS.OA.24.00028
- Christophers B, Marr MC, Pendergrast TR. Medical school admission policies disadvantage low-income applicants. Perm J. 2022;26(2):172-176. https://doi.org/10.7812/TPP/21.181
- Shadan M, Chhapra HU, Mashooq FN. Navigating challenges: Supporting non-native speaking medical students with AI and mentorship. Cogent Educ. 2024;12(1). https://doi.org/10.1080/2331186X.2025.2563991
- Taylor C, Weinstein L, Mayhew H. The process of resident selection: A view from the residency director's desk. Obstet Gynecol. 1995;85(2):299-303. https://doi.org/10.1016/0029-7844(94)00388-T
- Max BA, Gelfand B, Brooks MR, Beckerly R, Segal S. Have personal statements become impersonal? An evaluation of personal statements in anesthesiology residency applications. J Clin Anesth. 2010;22(5):346-351. https://doi.org/10.1016/j.jclinane.2009.10.007
- Matsubara, S. Comment on "Artificial intelligence-generated writing in the ERAS personal statement: an emerging quandary for post-graduate medical education". Acad Psych. 2025;49,200-201. https://doi.org/10.1007/s40596-025-02123-9
- Matsubara S, Matsubara D. Letter regarding: "Digital ink and surgical dreams: perceptions of artificial intelligence-generated essays in residency applications." J Surg Res. 2024;303:797-8. https://doi.org/10.1016/j.jss.2024.08.025
- Subillaga O, Coulter AP, Tashjian D, Seymour N, Hubbs D. Artificial intelligence-assisted narratives: analysis of surgical residency personal statements. J Surg Educ. 2025;18:103566. https://doi.org/10.1016/j.jsurg.2025.103566
- Montemayor C, Halpern J, Fairweather A. In principle obstacles for empathic AI: why we can't replace human empathy in healthcare. AI Soc. 2022;37(4):1353-1359. https://doi.org/10.1007/s00146-021-01230-z

