ECTS credits ECTS credits: 3
ECTS Hours Rules/Memories Hours of tutorials: 1 Expository Class: 10 Interactive Classroom: 11 Total: 22
Use languages English
Type: Ordinary subject Master’s Degree RD 1393/2007 - 822/2021
Departments: Electronics and Computing, Spanish Language and Literature, Theory of Literature and General Linguistics
Areas: Computer Science and Artificial Intelligence, General Linguistics
Center Higher Technical Engineering School
Call: Second Semester
Teaching: With teaching
Enrolment: Enrollable | 1st year (Yes)
Provide theoretical knowledge that allows an in-depth study of linguistic models, such as language models and models of distributional semantics.
Link language modeling and model types to different tasks within the area of language technologies and natural language processing.
Evaluate different aspects of language models.
Provide practical knowledge to train language models and use them in different natural language processing tasks.
1. Language models:
1.1. N-gram based language models.
1.2. Neural based language models.
2. Distributional semantics models:
2.1. Linguistic hypothesis about distributional meaning.
2.2. Classic models of distributional semantics.
2.3. Neural models representing static meaning (word embeddings).
2.4. Neural models representing dynamic-contextual meaning.
2.5. Compositional models.
3. Sequence labeling:
3.1. Use and fine-tuning of models for sequence labeling.
4. Text-To-Text models.
Baroni, Marco, Raffaella Bernardi & Roberto Zamparelli (2014). “Frege in space: A program for compositional distributional semantics.” Linguistic Issues in Language Technologies 9(6): 5-110.
Baroni, Marco, Georgiana Dinu & Germán Kruszewski (2014). “Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors.” In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 238–247, Baltimore, Maryland. Association for Computational Linguistics.
Church, Kenneth Ward, Zeyu Chen & Yanjun Ma (2021). “Emerging trends: A gentle introduction to fine-tuning.” Natural Language Engineering, 27: 763–778.
Devlin, Jacob, Ming-Wei Chang, Kenton Lee & Kristina Toutanova (2018). “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
Erk, Katrin (2012). "Vector space models of word meaning and phrase meaning: A survey." Language and Linguistics Compass 6.10: 635-653.
Hirschberg, Julia & Christopher D. Manning (2015). "Advances in natural language processing." Science 349.6245: 261-266.
Jeremy Howard & Sebastian Ruder (2018). “Universal Language Model Fine-tuning for Text Classification.” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 328–339, Melbourne, Australia. Association for Computational Linguistics.
Jurafsky, Daniel & James H. Martin (2021). “N-gram Language Models.” Speech and Language Processing, Chapter 3. https://web.stanford.edu/~jurafsky/slp3/
Jurafsky, Daniel & James H. Martin (2021). “Vector Semantics and Embeddings.” Speech and Language Processing, Chapter 6. https://web.stanford.edu/~jurafsky/slp3/
Jurafsky, Daniel & James H. Martin (2021). “Neural Networks and Neural Language Models.” Speech and Language Processing, Chapter 7. https://web.stanford.edu/~jurafsky/slp3/
Jurafsky, Daniel & James H. Martin (2021). “Sequence Labeling for Parts of Speech and Named Entities.” Speech and Language Processing, Chapter 8. https://web.stanford.edu/~jurafsky/slp3/
Lenci, Alessandro (2018). “Distributional Models of Word Meaning.” Annual Review of Linguistics, Vol. 4:151-171.
Linzen, Tal (2016). "Issues in evaluating semantic spaces using word analogies." In Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP, pp. 13–18, Berlin, Germany. Association for Computational Linguistics.
Mikolov, Tomas, Wen-tau Yih & Geoffrey Zweig (2013). "Linguistic Regularities in Continuous Space Word Representations." In Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 746–751, Atlanta, Georgia. Association for Computational Linguistics.
Taher Pilehvar, Mohammad & Jose Camacho-Collados (2021). Embeddings in Natural Language Processing: Theory and Advances in Vector Representations of Meaning. Morgan & Claypool (Synthesis Lectures on Human Language Technologies, volume 47).
Basic and basic skills:
CG1 - Maintain and expand theoretical approaches founded to enable the introduction and exploration of new and advanced technologies in the field of Artificial Intelligence.
CG3 - Look for and select useful and necessary information to solve complex problems, by making use of the bibliographic sources in the area.
CG4 - Elaborate adequately and with some originality written compositions or motivated arguments, present plans, work projects, scientific articles and formulate rational hypotheses in the field.
CB6 - Understand knowledge that provides a basis or opportunity to be original in the development and / or application of ideas, usually in a research context.
CB7 - Students are able to apply their acquired knowledge and problem-solving skills in new or unfamiliar environments within broader (or multidisciplinary) contexts related to their area of study.
Transversal competencies:
CT7 - Develop the ability to work in interdisciplinary or transdisciplinary teams, to present proposals that contribute to the sustainable development of the environmental, economic, political and social point of view.
CT8 - Appreciate the importance of research, innovation and technological development in the socio-economic and cultural progress of society.
Specific skills:
CE1 - Comprehension and mastery of lexical, syntactic and semantic processing techniques in natural languages.
CE2 - Understanding and mastery of the fundamentals and techniques of processing of linked, structured and unstructured documents, and the representation of their content.
CE3 - Understanding and knowledge of the techniques of representation and processing of knowledge through ontologies, graphs and RDF, as well as the tools associated with them.
The following teaching methodology is used:
- Presentation method/theoretical session: teachers present a topic to students with the aim of providing a set of information with a specific scope.
- Laboratory practices: the teachers of the discipline present to the students one or more practical problems that require the comprehension and application of the theoretical and practical contents included in the syllabus of the subject. Students can work on solving problems individually or as a team. These activities may require autonomous work, although guided by the teacher of the subject.
- Project-based learning: students are presented with practical projects that require an important part of their total dedication to the topic. In addition, and due to the scope of the work to be performed, it is necessary for the student to use not only management skills, but also technical skills.
- Mentoring: the teachers will attend the students in individualized mentoring sessions, dedicated to the orientation in the study and to the resolution of doubts on the contents, duties and activities of the discipline.
The Virtual Campus will be used for the distribution of materials, as well as guides and tutorials for carrying out the necessary activities.
E1: Final exam: 45%
E2: Evaluation of practical works: 50%
E3: Continuous monitoring: 5%
Each student must reach a minimum of 40% of the maximum mark of parts E1 and E2, and in any case the sum of the three parts (E1, E2 and E3) must be greater than 5 to pass the subject. If the student does not meet any of the above requirements, the qualification of the subject will be established according to the lowest score obtained.
In the case of not obtaining the minimum in any of the parts (E1 and E2), the student will have a second opportunity in which only the delivery of that part will be required.
No qualification will be saved between academic years.
Practical works must be submitted within the timeframe established in the virtual campus, and will follow the specifications outlined for both presentation and defense.
Those students who submit all the compulsory practices or take the exam in the official evaluation period will have the status of “Presented”.
In the case of fraudulent conduct in practical works or or tests, the Regulations for the evaluation of students' academic performance and the review of qualifications will be applied. In application of the corresponding regulations on plagiarism, the total or partial copy of any practical or theoretical exercise will result in failing in the two opportunities of the course, with the qualification of 0,0 in both cases.
The temporal distribution of the course is as follows:
Distribution of two ECTS credits:
- Theoretical sessions: 10 (on-site hours) + 10 (non-presential hours) = Total 20 hours
- Practical laboratory sessions: 5 (on-site hours) + 15 (non-presential hours) = Total 20 hours
- Problem-based learning sessions: 6 (on-site hours) + 29 (non-presential hours) = Total 35 hours
Total: 21 (on-site hours) + 54 (non-presential hours) = Total 75 hours
It is important to acquire some basic mechanisms and automations for the use of some of the tools presented in the course. For this reason, it is recommended to repeat and extend individually at home the practices carried out in the interactive sessions.
Marcos Garcia Gonzalez
Coordinador/a- Department
- Spanish Language and Literature, Theory of Literature and General Linguistics
- Area
- General Linguistics
- marcos.garcia.gonzalez [at] usc.gal
- Category
- Researcher: Ramón y Cajal
Pablo Gamallo Otero
- Department
- Spanish Language and Literature, Theory of Literature and General Linguistics
- Area
- General Linguistics
- Phone
- 881816426
- pablo.gamallo [at] usc.gal
- Category
- Professor: University Professor
Alejandro Catala Bolos
- Department
- Electronics and Computing
- Area
- Computer Science and Artificial Intelligence
- alejandro.catala [at] usc.es
- Category
- PROFESOR/A PERMANENTE LABORAL
Tuesday | |||
---|---|---|---|
17:00-18:30 | Grupo /CLE_01 | English | IA.12 |
18:30-20:00 | Grupo /CLIL_01 | English | IA.12 |
05.30.2025 16:00-20:00 | Grupo /CLIL_01 | IA.12 |
05.30.2025 16:00-20:00 | Grupo /CLE_01 | IA.12 |
07.04.2025 16:00-20:00 | Grupo /CLE_01 | IA.01 |
07.04.2025 16:00-20:00 | Grupo /CLIL_01 | IA.01 |