ECTS credits ECTS credits: 5
ECTS Hours Rules/Memories Hours of tutorials: 5 Expository Class: 15 Interactive Classroom: 20 Total: 40
Use languages German, English
Type: Ordinary subject Master’s Degree RD 1393/2007 - 822/2021
Departments: External department linked to the degrees
Areas: Área externa M.U Erasmus Mundus en Lexicografía (2ªed)
Center Faculty of Philology
Call: First Semester
Teaching: With teaching
Enrolment: Enrollable
- Training students to work with computer tools for linguistic data processing.
- Giving students skills to design and implement basic tools to automatically extract lexicographic information from texts.
This course presents an introduction to some basic programming methods in scripting languages (e.g. R, Python, etc), aimed at creating lexicographic resources. More precisely, the course will focus on automatic extraction of collocations and lexical relations.
1. Introduction to natural language processing with R
1.1. Basic tasks: tokenization and sentences splitting
1.2. Lemmatization and Part of Speech Tagging
1.3. Named Entity Recognition
2. Quantitative-empirical methods in lexicography
2.4. Introduction: Empirical research methods
2.5. Methodologies: Advantages & Shortcomings
3. Data visualisation and analysis
3.1. Introduction to visualization in R
3.2. Descriptive & inferential statistics
3.3. Data visualization
4. Collaborative lexicography
4.1. Basics of collaborative work
4.2. Crowdsourced collaborative lexicography: the Wikitionary project
4.3. Some tools for collaborative lexicography
Abel, Andrea & Meyer, Christian M. (2013). “The dynamics outside the paper: user contributions to online dictionaries”, en Iztok Kosem / Jelena Kallas / Polona Gantar / Simon Krek / Margit Langemets / Maria Tuulik, coords., Electronic lexicography in the 21st century: thinking outside the paper: proceedings of the eLex 2013 conference, 17–19 October 2013, Tallinn, Estonia. Liublliana / Tallin: Institute for Applied Slovene Studies / Institute of the Estonian Language, pp. 179–194. Available at: <http://eki.ee/elex2013/ proceedings/eLex2013_13_Abel+Meyer.pdf>
Arnold, T., & Tilton, L. (2015). Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text (1st ed.). Springer International Publishing AG.
Evert, Stefan (2008). “Corpora and collocations”. In A. Lüdeling and M. Kytö (eds.), Corpus Linguistics. An International Handbook, article 58, pages 1212-1248. Mouton de Gruyter, Berlin.
Grefenstette, Gregory (1994). Explorations in Automatic Thesaurus Discovery. Kluwer Academic Publishers, Norwell, MA, USA.
Thalken, Rosamond & Jockers, Matthew L. (2020). Text analysis with R: for students of literature, Cham: Springer.
Mel’chuk, Igor (1998). “Collocations and Lexical Functions”. In A.P. Cowie (ed.): Phraseology. Theory, Analysis, and Applications, Oxford: Clarendon Press, 23-53.
Meyer, Christian M. / Gurevych, Iryna (2012a): “Wiktionary: a new rival for expert-build lexicons? Exploring the possibilities of collaborative lexicography”, in Sylviane Granger / Magali Paquot, eds., Electronic Lexicography. Oxford: Oxford University Press, pp. 259–595.
Müller-Spitzer, Carolin / Wolfer, Sasha / Koplenig, Alexander (2015): “Observing online dictionary users: studies using Wiktionary log files”, International Journal of Lexicography, 28/1, pp. 1–26.
Padó, Sebastian & Lapata, Mirella (2007). “Dependency-based construction of semantic space models”. Computational Linguistics. 33 (2): 161–199.
Sahlgren, Magnus (2008). “The Distributional Hypothesis”. Rivista dei Linguistica. 20(1): 33–53.
Sweigart, Ao (2015). Automate the Boring Stuff with Python: Practical Programming for Total Beginners, Non Starch Press.
Wolfer, Sasha / Müller-Spitzer, Carolin (2016). “How Many People Constitute a Crowd and What Do They Do? Quantitative Analyses of Revisions in the English and German Wiktionary Editions”. Lexikos. 26: 347-371.
Wu, Winston, / Yarowsky, David (2020). “Wiktionary normalization of translations and morphological information”. In Donia Sot / Nuria Bel / Chengqing Zong, eds., Proceedings of the 28th International Conference on Computational Linguistics , Barcelona: International Committee on Computational Linguistics, pp. 4683-4692.
(Additional references could be suggested during the module)
Knowledge or contents: Con03, Con05, Con06
Skills or abilities: H/D05, H/D06, H/D07, H/D09
Competencies: Comp02, Comp03, Comp08
- Lectures guide by the professors, conveying knowledge to students, and open to discussion.
- Lab sessions in and out the classroom following a collaborative methodology.
- Tasks previously proposed as individual work outside the classroom will be the subject of analysis and discussion in the classroom.
1. First chance: Realization and delivery of tasks for each module and active participation: 100%.
2. Second chance: Same criteria as in the first call will be applied.
Those students granted by the Faculty authorities with special permission for not attending lessons regularly will necessarily have to write a final work, which will constitute 100% of the final grade.
Academic misconduct (cheating, plagiarism in exercises or tests) will be penalized according to the University regulations on student assessment (“Normativa de avaliación do rendemento académico dos estudantes e de revisión de cualificacións”)
The number of hours for attendance in person is 35, to which we must add the individual work of students.
- It is recommended to take this subject considering the basic skills previously learnt in Introduction to Computer Science and Natural Language Processing.
- It is expected of students’ preparation –before and after– class hours.
Students will apply in this subject methodologies studied in "Resources and tools with lexicographic application: use and design I".