Third BIGMATH PhD Course

Processing Big Data

From 9 to 15 October 2019

Schedule for all the course days: lecture from 10:00 to 12:00; practical session from 13:00 to 16:00.


At IST- Instituto Tecnico Superior, Alameda campus, North Tower, Room 4.12, Lisbon, Portugal


Each day the course will be divided into two parts: one dedicated to sharing knowledge and the other to experimenting the methodologies with real data. Please bring a laptop with your favorite data analysis language ready (Python, R, Julia or even MATLAB). Example code in the hands on sessions will be in any of the first three languages, running on jupyter notebooks, so if you want to run it is advisable to have a fully operational setup of jupyter + Python 3 + Julia, and R + RStudio.

Evaluation will consist on a homework assignment and a small data project.

Day 1: The data science process

  1. Exploratory Data Analysis;
  2. Generalization of a learned hypothesis;
  3. Limitations of predictive modeling;
  4. Causality and correlation;
  5. Hands on.

Day 2: Probabilistic perspective on machine learning for large datasets

  1. The learning problem;
  2. Probabilistic graphical models;
  3. Large scale learning on PGMs: variational inference;
  4. Hands on.

Day 3: Unsupervised learning for Big Data

  1. Dimensionality reduction for massive datasets:
    1. PCA;
    2. Matrix sketching;
    3. Nonlinear methods.
  2. Clustering and extreme clustering;
  3. Hands on.

Day 4: Supervised learning for Big Data

  1. Classification, regression;
  2. Linear models;
  3. Nonlinear models;
  4. Hands on.

Day 5: Learning from real world data

  1. Imbalanced learning;
  2. Coping with missing data;
  3. Regularization for noisy data;
  4. Hands on.