Data Science Certificate Program
print


Breadcrumb Navigation


Content

Curriculum (Winter Term 2018/19)

Book an optional R crash course, Casalicchio (tba)

Basic knowledge in R is highly recommended for the data science certificate program. For more information follow the link to the website of the Munich R Courses.

Day 1 (09. November 2018): First steps in Data Analysis, Casalicchio

The aim of this course is to give an overview of the different data analysis methods, including data visualization, summary statistics and the aggregation of data which allows data scientists to gain insights into the data.

At the end of the course, participants should have acquired the ability to apply the learned methods to their respective fields of work and their own data, as well as the ability to present results in an easily understandable way.

  • Measurement Scales (Nominal, Ordinal, Interval, Ratio)
  • Descriptive Statistics 
  • Univariate Data Analysis
  • Multivariate Data Analysis
  • Use Case of Exploratory Data Analysis

Day 2 (16. November 2018): Statistical Foundations, Kauermann

Statistical Reasoning is an essential step in data analytics. Statistics allows to quantify information in data and to distinguish random variation and from relevant / significant effects. This is based on probability statements and statistical reasoning. It also includes the step to quantify information, be it with confidence intervals or based on Baysian reasoning. The principle ideas are extended to regression models, which are given in a general format allowing for arbitrary data formats. This day provides the basic foundation in for statistical reasoning and inference in Data Science.

  • Principles of Statistical Reasoning
  • Bayesian Statistics/Statistical Tests
  • Multiple testing/Model Selection
  • Linear Regressions
  • Generalized Regression / Quantile Regression
  • Use Case of Exploratory Data Analysis and Regression 

Day 3 (23. November 2018): Causality and Causation & Visualization

Data today are seldomly perfect, they contain missing or erroneous entries and after all, one needs to ask whether data at hand allow to answer the question posed. This step is often overseen, but notedly important in order to draw the right conclusions from data and to answer the questions at hand with the data. For instance, recorded sales data usually do not allow to estimate price elasticity. This half day provides an introduction to statistical concepts to deal with deficient, missing and/or erroneous data. It also touches the core ideas of causality.

Causality and Causation, Kauermann

  • Principles of Causation
  • Error in Variables
  • Missing Data

Visualization is a powerful tool for analysing, exploring and understanding complex data sets. This part focuses on the theory behind data visualization and introduces different concepts for various data.

After an introduction to the basics of the topic, the different types of visualization and their advantages/disadvantages will be discussed. This is followed by an overview of modern visualization technologies as well as application examples.

Visualization, Wiedemann

  • Introduction and Background
  • Visualization techniques
  • Virtual Reality and Mixed Reality

Day 4 (30. November 2018): Data Management, Kröger

The first part of this day provides an introduction to state-of-the-art techniques in data management, particularly relational databases (SQL), data warehousing and a brief overview on technologies for big data beyond SQL. Participants will get theoretical as well as hands-on experience in these topics.

  • Relational Databases (SQL)
  • Data Warehouses and BI

Day 5 (07. December 2018): Predictive Modelling 1, Bischl

Supervised machine learning, in particular by means of non-linear, non-parametric methods, has become a central part of modern data analysis in order to uncover complex patterns and relationships in data. During this training day, participants are introduced to decision trees and ensemble techniques like random forests and gradient boosting, as these methods offer a very attractive trade-off between complexity, predictive power and interpretability. Proper model evaluation through resampling techniques (e.g. cross-validation) is a central topic. The day concludes with a session on feature selection and a practical case study, including hints on data preprocessing and best practices. All methods are introduced through practical examples and demos in R, so that participant can directly apply them.

  • Intro to ML (Machine Learning)
  • Trees and Forests
  • Resampling and model evaluation
  • Gradient boosting
  • Variable selection
  • Use case study of ML

Day 6 (14. December 2018): Predictive Modelling 2 & Deep Learning, Bischl

This second course day on supervised learning focusses on more advanced topisc. During the first half of the day, participants are introduced to the main concepts of modern deep learning techniques including their optimization, convolutional neural networks for image data and auto encoders. Practical examples will use either the keras or mxnet toolbox via R.

During the second half of the day, imbalanced data situations, preprocessing and pipeline configuration will be tackled. All methods are introduced through practical examples and demos in R, so that participant can directly apply them.

  • Deep Learning 1: Intro and network structure
  • Deep Learning 2: Optimization and demos
  • Deep Learning 3: CNNs and AutoEncoder
  • Imbalanced data and ROC (Receiver-Operating Characteristic)
  • Preprocessing and feature generation
  • Parameter tuning and pipeline configuration
  • Use Case Study of Deep Learning with discussion

Day 7 (11. January 2019): Unsupervised Methods, Kröger

Unsupervised learning methods including frequent pattern mining, clustering, and anomaly/outlier detection are essential tools for applications where no information on data is known a priori. This training day gives an introduction to mining frequent itemsets and association rules as well as to finding clusters and outliers without prior knowledge such as training data. Participants will get an overview of different models and algorithms with a discussion of strength and weaknesses and best practices. All methods are introduced through practical examples and demos using an open source data mining framework, so that participant can experience them in action and play with different parameterizations.

  • Introduction & Frequent Itemset Mining
  • Frequent Itemset Mining
  • Clustering
  • Outlier Detection
  • Evaluation of Unsupervised Methods

Day 8 (18. January 2019): Tools and Concepts for Large Data Sets, gentschen Felde

This course day focuses on on tools and concepts for handling and working with large data sets. During the first half of the day, participants are introduced to the main concepts of designing data-intensive applications with an special focus on parallel and high performance computing (HPC). As cloud computing becomes more and more relevant in data science, basic concepts and storage models for cloud computing are emphasized. The theoretical introduction of Hadoop and Flink/Spark hands over to the second part of the day, during which practical assignments deepen the understanding and use of both Map-Reduce with Hadoop and the capabilities of Flink/Spark.

  • Introduction: Designing Data-Intensive Applications
  • HPC (High Performance Computing) and parallel computing
  • Cloud Computing 
  • Hadoop and Flink/Spark

Day 9 (25. January 2019): Data Privacy, Security & Visualization

This course day comprises two main topics and is thus split into two disjunct parts. The first half of the day focuses on data privacy and security. Having defined basic terms of information security, three technical sessions are to follow. Firstly, the fundamentals of cryptography are repeated, before a more in-depth consideration of anonymization and pseudonymization of data sets is followed by a special focus on homomorphic encryption.

Data Privacy and Security, gentschen Felde

  • Cryptography
  • Anonymization, Pseudonymization
  • Homomorphic Encryption
The second part of the day focuses on data preparation for visualisation. Before visualizing data sets, a major step is to extract the right information and clean the data up. This ranges from removing unnecessary information over adding extra attributes, e.g. color or vector data, to data simplification. For this commonly used open-source tools will be introduced in a hands-on session.

Visualization, Wiedemann 

  • Data filtering with Paraview
  • Data preparation with MeshLab
  • Tour V2C

Day 10 (01. February 2019):

  • Day of the Exam (written and oral)