PCA and Matrix Completion

Introduction to Statistical Learning - PISE

Aldo Solari

Ca’ Foscari University of Venice

This unit will cover the following topics:

  • Principal Components Analysis
  • Matrix Completion

Unsupervised Learning

  • Most of this course focuses on supervised learning methods such as regression
  • In supervised learning:
    • We observe features (X_1, X_2, …, X_p) and a response variable (Y)
    • Goal: predict (Y) using (X_1, X_2, …, X_p)
  • In unsupervised learning:
    • We observe only features (X_1, X_2, …, X_p)
    • No response variable (Y) is available
    • Goal is not prediction, but discovering structure in the data

Principal Components Analysis

  • PCA produces a low-dimensional representation of a dataset

  • It finds a sequence of linear combinations of the variables that have maximal variance and are mutually uncorrelated

  • Apart from producing derived variables for use in supervised learning problems, PCA also serves as a tool for data visualization

Principal Components Analysis: details

  • The first principal component of features (X_1, X_2, …, X_p) is the normalized linear combination: [ Z_1 = {11}X_1 + {21}X_2 + + _{p1}X_p ] that has the largest variance

  • Normalization constraint: [ {j=1}^{p} {j1}^2 = 1 ]

  • The coefficients ({11}, …, {p1}) are called the loadings

  • These form the principal component loading vector: [ 1 = ({11}, {21}, …, {p1})^T ]

  • Constraint ensures loadings do not grow arbitrarily large, preventing artificially large variance

Required readings from the textbook and course materials

  • Chapter 3: Linear Regression
    • 3.1 Simple Linear Regression
      • 3.1.1 Estimating the Coefficients
      • 3.1.3 Assessing the Accuracy of the Model
    • 3.2 Multiple Linear Regression
      • 3.2.1 Estimating the Regression Coefficients
    • 3.3 Other Considerations in the Regression Model
      • 3.3.1 Qualitative Predictors
  • Chapter 6: Linear Model Selection and Regularization
    • 6.1 Subset Selection
      • 6.1.1 Best Subset Selection
      • 6.1.2 Stepwise Selection
      • 6.1.3 Choosing the Optimal Model

Required readings from the textbook and course materials

Video SL 3.1 Simple Linear Regression - 13:02

Video SL 3.3 Multiple Linear Regression - 15:38

Video SL 3.4 Some Important Questions - 14:52

Video SL 3.5 Extensions of the Linear Model - 14:17

Video SL 6.1 Introduction and Best Subset Selection - 13:45

Video SL 6.3 Backward Stepwise Selection - 5:27

Video SL 6.4 Estimating Test Error - 14:07

Video SL 6.5 Validation and Cross Validation - 8:44