ECTS credits ECTS credits: 5
ECTS Hours Rules/Memories Hours of tutorials: 5 Expository Class: 15 Interactive Classroom: 20 Total: 40
Use languages German, English
Type: Ordinary subject Master’s Degree RD 1393/2007 - 822/2021
Departments: External department linked to the degrees
Areas: Área externa M.U Erasmus Mundus en Lexicografía (2ªed)
Center Faculty of Philology
Call: Second Semester
Teaching: With teaching
Enrolment: Enrollable | 1st year (Yes)
The students should be able
• to formulate their corpus requirements for a lexicographic project and specify the design of a representative corpus;
• to compile such a corpus from Web pages or other sources;
• to annotate the corpus with linguistic information using automatic natural language processing tools;
• to search the corpus with regular expressions and more complex queries based on lexico-grammatical patterns;
• to apply quantitative techniques such as collocation or keyword analysis and interpret the results appropriately;
• to communicate the results of their work to fellow students;
• to lead academic discussions about technical and methodological aspects of corpus-based research; and
• to document and archive corpus data and analysis results.
Foundations of corpus linguistics
• Principles and methods of corpus analysis
• Applications of corpus data in lexicography
• Types of corpora, overview of existing corpora
• Corpus design, representativity, data sources, metadata
Corpus compilation
• Building corpora from online data: Web scraping etc.
• Boilerplate removal, normalization, metadata extraction
• Representation and exchange formats
• Online and stand-alone tools for Web corpus compilation
• Automatic linguistic annotation (POS, lemma, NER, parsing, …)
• Online and stand-alone tools for linguistic annotation
Searching corpora
• Regular expressions
• Character encodings and the Unicode standard
• CQP query language for lexico-grammatical patterns
• Practical exercises with SketchEngine and CQPweb
Quantitative analysis
• Frequency lists and metadata distribution
• Collocations and word sketches
• Keyword analysis
• Lexicographic interpretation of results
• Foundations of statistical inference
Reproducibility
• Research methodology and documentation
• Data management, sustainability of corpus resources
HSK 5.4, Ch. XVIII + XIX
Knowledge or contents: Con05, Con06, Con07, Con10
Abilities or skills: H/D01, H/D05, H/D07, H/D03
Competencies: Comp04, Comp03, Comp09
Block seminar (date and duration to be announced)
The teachers choose one of the following (option b recommended):
a) 90-minute final exam on the contents of the seminar
or
b) presentation of class project plus a short paper (ca. 10 pages)
or
c) longer paper (15-20 pages)
2. Opportunity:
The assessment on the second opportunity will be based on the same criteria.
For students who are officially exempt from attending the assessment system will be the same as for the rest.
Academic misconduct (cheating, plagiarism in exercises or tests) will be penalized according to the University regulations on student assessment (“Normativa de avaliación do rendemento académico dos estudantes e de revisión de cualificacións”).
Attendance: max. 35
Requirements for participation: Students must obtain 25 ECTS in the first semester
Elective module in the second semester.
Language: German and/or English
Requisitos de participación: El alumnado debe obtener 25 ECTS en el primer semestre.
Módulo optativo del segundo semestre.
Lengua de enseñanza: alemán y/o inglés.