ECTS credits ECTS credits: 6
ECTS Hours Rules/Memories Student's work ECTS: 108 Hours of tutorials: 1 Expository Class: 21 Interactive Classroom: 20 Total: 150
Use languages Spanish, Galician, English
Type: Ordinary subject Master’s Degree RD 1393/2007 - 822/2021
Center Higher Technical Engineering School
Call: Second Semester
Teaching: With teaching
Enrolment: Enrollable | 1st year (Yes)
The increasing amount of information available through the Internet calls for the efficient processing of large amounts of data. This has led to the development of new storage and processing techniques to deal with huge amounts of data, namely Big Data techniques, that naturally adapt to distributed systems.
The main goal of this subject is to learn suitable processing techniques for large amounts of information in the Big Data world, particularly using the Hadoop ecosystem, and compare these techniques with the traditional ones employed in HPC environments. This will allow the student to select the optimal tools to solve a particular problem.
1. Introduction to Data Engineering
1.1 HPC vs Big Data: similarities and differences in data management.
1.2 Hardware and Software Technologies for High Performance Data Engineering
1.3 Data Engineering in HPC infrastructures vs. Cloud environments
2. Data Engineering phases
2.1 Modeling (Formats, Compression, Designing Schemas)
2.2 Intake (Periodicity, Transformations, Tools)
2.3 Storage (HDFS and NoSQL DBs, HBase, MongoDB, Cassandra)
2.4 Processing (Batch, Real-Time)
2.5 Orchestration
2.6 Analysis (SQL, Machine Learning, Graphs, UI)
2.7 Governance
2.8 Integration with BI (Visualization)
3. Introduccion to Data Analytics
3.1 Exploratory Data Analytics
3.2 Introduction to Machine Learning
4 Use cases
4.1 Applications to Internet of Things (Smart environments and Industry 4.0)
4.2 Applications to sciences and engineering
Basic bibliography
- T. White, "Hadoop: The Definitive Guide", 4th Edition, O'Reilly, 2015
- Wes McKinney "Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython" 2nd Edition, O'Reilly, 2017
Additional bibliography
- Alex Holmes, "Hadoop in practice", 2nd Edition, Manning, 2014
- The student will be capable of installing, configuring, and managing the basic software for massive data processing.
- The student will be capable of coding massive data processing applications using domain-specific languages.
- The student will learn about Data Engineering tools (for Intake/Storage/Processing/Visualization).
- The student will learn the skills to search, select and manage Big data-related resources (bibliography, software, etc.).
Skills
- Basic: CG1, CG3, CG5, CB6, CB7.
- General: CT1, CT4.
- Specific: CE1, CE2
- Instruction programmed through educational materials that are specially designed for autonomous and asynchronous learning, with an important weight of the references to the documentary sources used in the different contents.
- Development of practical assignments in an autonomous way with supervision by the subject instructors.
- Development of assignments, in which the students have to apply the knowledge acquired in order to solve different problems in an autonomous way.
- Directed discussion. Guidance to solve individual / group assignments, problem solving and continuous evaluation activities.
- Follow-up support: orientation for the development of the assignments, resolution of doubts, etc.
Formative activities of no face-to-face nature and their relation with the competences of the degree:
Reading didactic material, viewing videos and querying of multimedia material CB6, CE1, CE2, CG1, CT4
Development of practical assignments in an autonomous way with supervision by the instructors CB10, CB6, CG3, CG5
Development of academically directed assignments CB6, CB7, CG3, CE1, CE2
Directed discussion CG1, CT1, CT4
Follow-up support in non face-to-face modality CB6, CB7
Laboratory practice. Grading the assignments submitted by students: 50%
Supervised projects. Grading the supervised projects submitted by students: 50%
Not graded: Students that do not present any practical exercise or guided project will not be graded.
Second opportunity (June/July): Resubmit those laboratory practices or supervised projects not previously presented or submitting improved versions of previously presented practices/projects.
In the case of fraudulent performance of exercises or tests, the regulations of the Normativa de avaliación do rendemento académico dos estudantes e de revisión de cualificacións will be applied.
In the application of the Normativa da ETSE sobre plaxio (approved by the ETSE Council on 12/19/2019), the total or partial copy of any exercise will mean failure on both opportunities of the course, with a grade of 0.0 in both cases.
- Reading didactic material, viewing videos and querying of multimedia material: 0h face-to-face + 18h autonomous work (total 18h)
- Development of practical assignments in an autonomous way with supervision by the instructors: 0h face-to-face + 80h autonomous work (total 80h)
- Directed Discussion: 3h face-to-face + 3h autonomous work (total 6h)
- Follow-up support in the non face-to-face modality: 1h face-to-face + 0h autonomous work (total 1h)
- Development of assignments: 0h face-to-face + 45h autonomous work (total 45h)
TOTAL: 4h face-to-face + 146h autonomous work, for a total of 150h
Due to the large practical component of the subject, it is advisable to be up-to-date with practices and guided projects during the semester.
The course makes intensive use of online communication tools: Video calls, chats, etc. In-person classes will be recorded for later perusing. An online learning management will be using for distributing notes, creating forums, etc.
The software tools used in this course are generally open-source or have free license for students.
The subject will be taught in English.