Functional PCA

Last updated: 2019-10-08

Checks: 2 0

Knit directory: SMF/

This reproducible R Markdown analysis was created with workflowr (version 1.4.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.

R Markdown file: up-to-date

Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.

Repository version: a47d047

Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility. The version displayed above was the version of the Git repository at the time these results were generated.

Note that you need to be careful to ensure that all relevant files for the analysis have been committed to Git prior to generating the results (you can use wflow_publish or wflow_git_commit). workflowr only checks the R Markdown file, but you know if there are other scripts or data files that it depends on. Below is the status of the Git repository when the results were generated:


Ignored files:
    Ignored:    .DS_Store
    Ignored:    .Rhistory
    Ignored:    .Rproj.user/
    Ignored:    analysis/.DS_Store

Unstaged changes:
    Deleted:    analysis/SmoothMFpoisson.Rmd

Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.

These are the previous versions of the R Markdown and HTML files. If you’ve configured a remote Git repository (see ?wflow_git_remote), click on the hyperlinks in the table below to view them.

File	Version	Author	Date	Message
html	ab207f4	Dongyue Xie	2019-07-23	Build site.
Rmd	29c1207	Dongyue Xie	2019-07-23	wflow_publish(“analysis/fpca.Rmd”)
html	c69cc9c	Dongyue Xie	2019-07-23	Build site.
Rmd	57856fd	Dongyue Xie	2019-07-23	wflow_publish(“analysis/fpca.Rmd”)

Introduction

Let \(x\) be \(p-\)dimensional random vector. For multivariate data, PCA finds weights \(\beta\) s.t. \(\beta^Tx\) has maximum variance. For functional data, \(\beta\) and \(x\) are functions \(\beta(s)\) and \(x(s)\), the summations become integration

\[\begin{equation} f = \int \beta(s)x(s)ds \end{equation}\]

Let \(\beta_1\) denote the first weight function, which is chosen to maximize \(N^{-1}\sum_i(\int \beta_1x_i)^2\) subject to \(\int\beta_1(s)^2ds=1\).

Another way to define PCA is to find a set of K orthonormal functions s.t. the expansion of each curve in terms of basis functions approximates the curve as closely as possible. The expansion can be written as \[\begin{equation} \hat{x}_i(t) = \sum_{k=1}^K f_{ik}\beta_k(t). \end{equation}\]

A measure of fitting is the integrated squared error \[\begin{equation} ||x_i-\hat{x}_i||^2 = \int [x(s) - \hat{x}(s)]^2 ds \end{equation}\]

Computational methods

Suppose we have a set of \(N\) curves \(x_i\) and column centered. Let \(v(s,t)\) be the sample covariance function of observed data.

Discretizing the functions

Discretize the \(x_i\) to a fine grid of \(n\) equally spaced values \(s_j\). Then we work with \(N\times n\) matrix \(X\).

Basis function expansion of the functions

To reduce the eigenequation \(\int v(s,t)\beta(t)dt = \rho\beta(s)\) to discrete or matrix form , we can express each \(x_i\) as a linear combination of known basis functions \(\phi_k\). Then each function can be written as \[\begin{equation} x_i(t) = \sum_{k=1}^K c_{ik}\phi_k(t), \end{equation}\] or in matrix form \[\begin{equation} x=C\phi, \end{equation}\] where \(C\) is a \(N\times K\) matrix, \(\phi\) is a vector-valued function to have component \(\phi_1,...,\phi_K\).

The covariance matrix is then \[\begin{equation} v(s,t) = N^{-1}\phi(s)^TC^TC\phi(t). \end{equation}\]

Define the order \(K\) symmetric matrix \(W\) as \[\begin{equation} w_{k_1,k_2} = \int \phi_{k_1}\phi_{k_2}. \end{equation}\]

Suppose an eigenfunction \(\beta\) has expansion \[\begin{equation} \beta(s) = \phi(s)^Tb. \end{equation}\]

Then we have \[\begin{equation} \int v(s,t)\beta(t)dt = \phi(s)^TN^{-1}C^TCWb = \rho\phi(s)^Tb. \end{equation}\]

The above equation should hold for all \(s\), so \[\begin{equation} N^{-1}C^TCWb = \rho b \end{equation}\]

Regularized principal components analysis

Smoothing approach

Suppose \(\beta\) is the leading principal component, we usually penalize the roughness of \(\beta\), \(PEN_2(\beta) = \int \beta^{''}(t)^2dt\). The penalized sample variance is given by \[\begin{equation} \frac{var \int \beta x_i}{||\beta||^2+\lambda*PEN_2(\beta)}. \end{equation}\]

In practice

Suppose that \(\phi_\nu\) is a suitable basis for the space of smooth functions S on \([0,1]\), for example B-splines. We can expand \(x(s)\) as \(x(s) = \sum_\nu c_\nux_\nu(s) = c'\phi(s)\). Let \(c_i\) be the vector of coefficients of the data function \(x_i(s)\) in the basis \(\phi_\nu\). Let \(V\) be the covariance matrix of the vectors \(c_i\). Define \(J\) to be the matrix \(\int\phi\phi'\) whose elements are \(\int \phi_j\phi_k\) and \(K\) the matrix whose elements are are \(\int D^2phi_jD^2\phi_k\), then the penalized sample variance is \[\begin{equation} PCAPSV = \frac{y'JVJy}{y'Jy+\lambda y'Ky} \end{equation}\]

Alternative approaches

Instead of carrying out our smoothing step within the PCA, we smooth the data first, and then carry out an unsmoothed PCA. This approach to func- tional PCA was taken by Besse and Ramsay (1986), Ramsay and Dalzell (1991) and Besse, Cardot and Ferraty (1997).