Exercise on PARAFAC2 for difficult data

PARAFAC2 offers quite distinct possibilities for handling data that cannot be modeled in a meaningful way by ordinary PARAFAC models. Often the trilinearity assumptions in PARAFAC are too strict and not adequate for describing the data. In those situations, PARAFAC2 can handle the data if the trilinearity is a problem in one mode.

 

When does PARAFAC fail?

In order to see that ordinary PARAFAC can actually handle quite severe deviations from trilinearity, a small simulated example will be used. The amino acid fluorescence data will be used, only the emission mode will be shifted from excitation to excitation to make the data non-trilinear.

 

load claus

X = permute(X,[2 1 3]);

 

The data are rearranged with the permute command so that emission (the changing mode) is the first mode and excitation (the mode across which changes occur) is the last mode. Then the emission spectra are shifted:

 

for i=1:61

  Xs(:,:,i) = X(1+round(i/4):170+round(i/4),:,i);

end

 

To see the changes, first plot e.g. sample 1 in the unshifted situation. As this is a pure sample, it is close to rank one and all emission spectra have the same shape

 

subplot(2,1,1)

xx = squeeze(X(:,1,:)); % Sample 1

xx = xx*diag(sum(xx.^2).^(-.5)); % Normalize spectra

plot(xx),axis tight

title('Raw data - no shifts')

 

Then plot the shifted version of this sample to see the difference

 

subplot(2,1,2)

xx = squeeze(Xs(:,1,:)); % Sample 1

xx = xx*diag(sum(xx.^2).^(-.5)); % Normalize spectra

plot(xx),axis tight

title('Raw data - with shifts')

 

As can be seen, the data are now far from being trilinear and hence PARAFAC can definitely not model the data to the extent that the unshifted data can be modeled.

 

 

Fit the PARAFAC model and compare the loadings with those of the unshifted model.

 

m = parafac(Xs,3);

 

It is impressive and maybe even surprising that the parameters turn out so well even with such severe shifts (non-trilinearities) in the data. In this case, there is a little ‘cheating’ involved because the regularity of the shifts makes life a little bit easier for PARAFAC. Still, the result is important because it clearly shows that meaningful results can be obtained also when PARAFAC is only approximately correct.

 

Using PARAFAC2

When the shifts are more irregularly spaced, then PARAFAC will have more difficulties and then PARAFAC2 may come in handy. Make a modified data array according to the below

 

for i=1:61

  if i<30 % Don’t shift the first 30 excitations

    Xs(:,:,i) = X(1:170,:,i);

  else

    % but shift the last 31 excitations

    Xs(:,:,i) = X(1+round(i/3):170+round(i/3),:,i);

  end

end

 

Plot the data to see the modifications

 

subplot(2,1,1)

xx = squeeze(X(:,1,:)); % Sample 1

xx = xx*diag(sum(xx.^2).^(-.5)); % Normalize spectra

plot(xx),axis tight

title('Raw data - no shifts')

subplot(2,1,2)

xx = squeeze(Xs(:,1,:)); % Sample 1

xx = xx*diag(sum(xx.^2).^(-.5)); % Normalize spectra

plot(xx),axis tight

title('Raw data - with shifts')

 

Now fit both a PARAFAC and a PARAFAC2 model to these data and plot the parameters to see the differences:

 

mpf1 = parafac(Xs,3);

[A,H,C,P]=parafac2(permute(Xs,[2 1 3]),3);

 

subplot(3,2,1)

for i = 1:10

  plot(P{i}*H),hold on

end

hold off

axis tight

title('PARAFAC2 solution','fontweight','bold')

 

subplot(3,2,3)

plot(A)

axis tight

 

subplot(3,2,5)

plot(C)

axis tight

 

 

 

subplot(3,2,2)

plot(mpf1{1})

axis tight

 

title('PARAFAC1 solution','fontweight','bold')

 

subplot(3,2,4)

plot(mpf1{2})

axis tight

 

subplot(3,2,6)

plot(mpf1{3})

axis tight

 

Apart from minor issues, the PARAFAC2 results are clearly preferred to the ordinary PARAFAC results. Why are there several similar emission loadings in PARAFC2? How many are there?

 

In fact, only the first ten versions of the PARAFAC2 emission loadings are plotted. Try to plot all the 61 versions. Now the emission plot is much more cluttered. Is that a problem?