Résumés
Abstract
In this paper, we examine how methods for evaluating corpora in terms of technical terms can be used for characterising technical documents used as textual materials in translation training in a translation education setup. Technical documents are one of the standard types of textual materials used in translation training courses, and choosing suitable materials for learners is an important issue. In technical documents, technical terms play an essential role. Assessing how terms are used in these documents, therefore, would help translation teachers to choose relevant documents as training materials. As corpus-characterisation methods, we used self-referring measurement of the occurrence of terminology and measurement of the characteristic semantic scale of terms. To examine the practical applicability of these methods to assessing technical documents, we prepared a total of 12 short English texts from the six domains of law, medicine, politics, physics, technology and philosophy (two texts were chosen from each domain), whose lengths ranged from 300 to 1,150 words. We manually extracted terms from each text, and using those terms, we evaluated the nature and status of the textual materials. The analysis shows that even for short texts, the corpus-characterisation methods we provide useful insights into assessing textual materials.
Keywords:
- specialised documents,
- translation training,
- conceptual coverage,
- semantic scale,
- quantitative measures
Résumé
Dans cet article, nous examinons les méthodes d’évaluation des corpus spécialisés fondées sur l’analyse des termes en vue de la caractérisation des documents techniques destinés à une utilisation comme matériau pédagogique dans l’enseignement de la traduction. Les documents techniques sont le type de matériau textuel le plus utilisé dans les cours de traduction et le choix du matériel approprié pour les apprenants est d’un intérêt majeur. Dans les documents techniques, les termes techniques jouent un rôle essentiel. L’évaluation de la façon dont les termes sont utilisés dans ces documents aiderait les enseignants de traduction à choisir les documents appropriés pour soutenir la formation en traduction. La méthode de caractérisation des corpus que nous proposons combine deux techniques : l’évaluation auto-référentielle de l’apparition des termes dans le corpus et la mesure de l’échelle sémantique qui caractérise ces termes. Pour examiner l’applicabilité pratique de ces techniques pour l’évaluation des documents spécialisés en termes de difficultés pour la traduction, nous avons sélectionné 12 textes anglais courts dans six domaines différents : le droit, la médecine, la politique, la physique, les technologies et la philosophie. Chaque domaine est représenté par deux textes dont les longueurs variaient de 300 à 1150 mots. Nous avons ensuite procédé à l’extraction manuelle des termes depuis le corpus et, enfin, en nous fondant sur l’analyse de ces termes, nous avons évalué la nature et le statut des documents textuels composant le corpus. L’analyse montre que, même pour les textes courts, les techniques de caractérisation du corpus que nous avons adoptées offrent des informations utiles pour l’évaluation des difficultés traductionnelles que peuvent présenter les documents spécialisés lors de leur utilisation comme matériau pédagogique en classe.
Mots-clés :
- documents spécialisés,
- formation en traduction,
- couverture conceptuelle,
- échelle sémantique,
- mesures quantitatives
Resumen
En este trabajo se examina cómo se pueden utilizar los métodos de evaluación de los corpus especializados basándose en el análisis de los términos técnicos para caracterizar los documentos técnicos se que se emplean como materiales textuales en la formación de traductores. Los documentos técnicos son uno de los materiales textuales más utilizados en los cursos de formación en traducción y la selección de materiales adecuados es un tema importante. En los documentos técnicos, los términos técnicos desempeñan un papel esencial. Por lo tanto, evaluar cómo se utilizan los términos en estos documentos ayudaría al profesorado en traducción a elegir los documentos pertinentes como material de aprendizaje. Como métodos de caracterización de corpus, utilizamos la evaluación autoreferencia de la aparición de los términos y la medida de la escala semántica que caracteriza los términos. Para examinar la aplicabilidad de estos métodos a la evaluación de documentos técnicos, se seleccionaron 12 textos cortos en inglés pertenecientes a seis ámbitos, el del derecho, medicina, política, física, tecnología y filosofía (se eligieron dos textos para cada dominio), textos cuya extensión variaba entre 300 y 1150 palabras. Hemos extraído manualmente los términos de cada texto y, basándonos en el análisis de dichos términos, evaluamos la naturaleza y el estatuto de los materiales textuales. El análisis demuestra que, incluso para textos cortos, los métodos de caracterización de corpus que hemos adoptado dan una información útil para evaluar las dificultades traductológicas que puedan presentar los textos especializados utilizados como soportes textuales didácticos.
Palabras clave:
- documentos especializados,
- formación de traductores,
- alcances conceptuales,
- escala semántica,
- medidas cuantitativas
Parties annexes
Bibliography
- Ahmad, Khurshid and Rogers, Margaret (2001): Corpus linguistics and terminology extraction. In: Sue Ellen Wright and Gerhard Budin, eds. Handbook of Terminology Management. Vol. 2. Amsterdam/Philadelphia: John Benjamins, 725-760.
- Aizawa, Akiko (2000): An information-theoretic perspective of tf-idf measures. Information Processing and Management. 39(1):45-65.
- Asaishi, Takuma (2017): An Informetric Analysis of the Arrangement of Knowledge in High-school Science Textbooks. Doctoral dissertation, unpublished. Tokyo: The University of Tokyo.
- Asaishi, Takuma and Kageura, Kyo (2016): Growth of the terminological networks in junior-high and high school textbooks. In: Fahad Khan, Špela Vintar, Pilar LeónAraúz, et al., eds. LangOnto2 + TermiKS Proceedings. (LangOnto2 + TermiKS: Joint Second Workshop on Language and Ontology & Terminology and Knowledge Structures, Portorož, 23 May 2016). Paris: European Language Resources Association, 30-37.
- Baayen, R. Harald (2001): Word Frequency Distributions. Dordrecht: Kluwer.
- Bowker, Lynne (2015): Terminology and translation. In: Hendrik. J. Kockaert and Frieda. Steurs, eds. Handbook of Terminology. Vol. 1. Amsterdam/Philadelphia: John Benjamins, 304-323.
- Byrne, Jody (2006): Technical Translation: Usability Strategies for Translating Technical Documentation. Dordrecht: Springer.
- Cabré, Maria Teresa (2010): Terminology and translation. In: Yves Gambier and Luc van Doorslaer, eds. Handbook of Translation Studies. Vol. 1. Amsterdam/Philadelphia: John Benjamins, 356-365.
- Efron, Bradley and Thisted, Ronald (1976): Estimating the number of unseen species: How many words did Shakespeare know? Biometrika. 63(3):435-447.
- Evert, Stefan (2004): A simple LNRE model for random character sequences. In: Gérald Purnelle, Cédrick Fairon, and Anne Dister, eds. Le poids des mots. Actes des 7es Journées internationales d’Analyse statistique des Données Textuelles. (JADT2004: 7es Journées internationales d’Analyse statistique des Données Textuelles, Louvain-la-Neuve, 10-12 March 2004). Vol. I. Louvain-la-Neuve: Presses universitaires de Louvain, 411-422.
- Evert, Stefan and Baroni, Marco (2007): zipfR: Word frequency distributions in R. In: Annie Zaenen and Antal vandenBosch, eds. Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions. (ACL’07: 45th Annual Meeting of the Association for Computational Linguistics, Prague, 25-27 June 2007). Stroudsburg: Association for Computational Linguistics, 29-32.
- Felber, Helmut (1984): Terminology Manual. Paris: Unesco/Inforterm.
- Feng, Lijun, Jansche, Martin, Huenerfauth, Matt, et al. (2010): Comparison of features for automatic readability assessment. (Coling 2010, Beijing, 23-27 August, 2010) 276-284.
- Gotti, Maurizio and Šarčević, Susan (2006): Introduction. In: Maurizio Gotti and Susan Šarčević, eds. Insights into Specialized Translation. Bern: Peter Lang, 9-24.
- Gries, Stefan Th. (2009): Quantitative Corpus Linguistics with R: A Practical Introduction. London/New York: Routledge.
- Herrera, Juan P. and Purry, Pedro A. (2008): Statistical keyword detection in literary corpora. European Physical Journal. B63:135-146.
- Heylen, Kris and de Hertog, Dirk (2015): Automatic term extraction. In: Hendrik. J. Kockaert and Frieda Steurs, eds. Handbook of Terminology. Vol. 1. Amsterdam/Philadelphia: John Benjamins, 203-221.
- Jones, Karen (1995): Readability of textbooks for technology education. Technology Teacher. 55:28-32.
- Kageura, Kyo (2012): The Quantitative Analysis of the Dynamics and Structure of Terminologies. Amsterdam/Philadelphia: John Benjamins.
- Kageura, Kyo and Kikui, Genichiro (2006): A self-referring quantitative evaluation of the ATR basic travel expression corpus (BTEC). In: Nicoletta Calzolari, Khalid Choukri, Aldo Gangemi, et al., eds. Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06). (LREC 2006: Fifth International Conference on Language Resources and Evaluation, Genoa, 24-26 May 2006). Genoa: European Language Resources Association, 1945-1950.
- Kageura, Kyo and Umino, Bin (1996): Methods of automatic term recognition. Terminology. 3(2):259-289.
- Mandelbrot, Benoit (1953): An information theory of the statistical structure of language. In: Willis E. Jackson, ed. Communication Theory. New York: Academic Press, 503-512.
- Manning, Christopher, Raghavan, Prabhakar, and Schütze, Hinrich (2009): An Introduction to Information Retrieval. Cambridge: Cambridge University Press.
- Manning, Christopher and Schütze, Hinrich (2009): Foundations of Statistical Natural Language Processing. Cambridge: MIT Press.
- Mehri, Ali and Darroneh, Amir (2011): The role of entropy in word ranking. Physica A. 390:3157-3163.
- Miyata, Rei and Kageura, Kyo (2016): Constructing and evaluating controlled bilingual terminologies. In: Patrick Drouin, Natalia Grabar, Thierry Hamon, et al., eds. Proceedings of the 5th International Workshop on Computational Terminology. (CompuTerm 2016: 5th International Workshop on Computational Terminology, Osaka, 12 December 2016). Osaka: The COLING 2016 Organizing Committee, 83-93.
- Montemurro, Marcero and Zanette, Damián (2010): Towards the quantification of semantic information encoded in written language. Advances in Complex Systems. 13(2):135-153.
- Montero Martinez, Silvia and Faber, Pamela (2009): Terminological competence in translation. Terminology. 15(1):88-104.
- Nord, Christiane (1988/1991): Text Analysis in Translation: Theory, Methodology, and Didactic Application of a Model for Translation-oriented Text Analysis. (Translated by Christiane Nord and Penelope Sparrow) Amsterdam/Atlanta: Rodopi.
- Oakes, Michael and Ji, Meng, eds. (2012): Quantitative Methods in Corpus-based Translation Studies. Amsterdam/Philadelphia: John Benjamins.
- Ortuño, Miguel, Carpena, Pedro, Bernaola-Galván, Pedro, et al. (2002): Keyword detection in natural languages and DNA. Europhysics Letters. 57(5):759-764.
- Popescu, Ioan-Ioviz (2009): Word Frequency Studies. Berlin: Mouton de Gruyter.
- Pearson, Jennifer (1998): Terms in Context. Amsterdam/Philadelphia: John Benjamins.
- Rey, Alain (1995): Essays on Terminology. (Translated by Juan C. Sager) Amsterdam/Philadelphia: John Benjamins.
- Rogers, Margaret (2008): Terminological equivalence: Probability and consistency in technical translation. In: Heidrun Gerzymisch-Arbogast, Gerhard Budin, and Gertrud Hofer, eds. LSP Translation Scenarios: Selected Contributions to the EU Marie Curie Conference Vienna 2007. (MuTra 2007: LSP translation scenarios, Vienna, 30 April-4 May, 2007). MuTra. 02:101-108.
- Sager, Juan (1990): A Practical Course in Terminology Processing. Amsterdam/Philadelphia: John Benjamins.
- Sager, Juan, Dungworth, David, and McDonald, Peter (1980): English Special Languages: Principles and Practice in Science and Technology. Wiesbaden: Oscar Brandstetter.
- Stamatatos, Efstathios (2009): A survey of modern authorship attribution methods. Journal of the Association for Information Science and Technology. 60(3):538-556.
- Temmerman, Rita (2000): Towards New Ways of Terminology Description: The Sociocognitive Approach. Amsterdam/Philadelphia: John Benjamins.
- Tuldava, Juhan (1995): Methods in Quantitative Linguistics. Trier: Wissenchaftlicher Verlag Trier.
- Wright, Sue Ellen and Wright, Leland D. Jr. (1997): Terminology management for technical translation. In: Sue Ellen Wright and Gerhard Budin, eds. Handbook of Terminology Management. Vol. 1. Amsterdam/Philadelphia: John Benjamins, 147-159.
- Yang, Zhen, Lei, Jianjun, Fan, Kefeng, and Lai, Yingxu (2011): Keyword extraction by entropy difference between the intrinsic and extrinsic mode. Physica A. 392:4523-4531.
- Yule, George (1944): The Statistical Study of Literary Vocabulary. Cambridge: Cambridge University Press.
- Zakaluk, Beverley and Samuels, S. Jay, eds. (1988): Readability: Its Past, Present and Future. Newark: The International Reading Association.
- Zhou, Hongding and Slator, Gary (2003): A metric to search for relevant words. Physica A. 329:309-327.
- Zipf, George (1935): The Psycho-biology of Language. Boston: Houghton Mifflin.