Starting and Ending Regularization

Hi,

I think the advantage of regularization is to prevent degeneration of the covariance matrix, so why do you use the annealing approach?

Is the reason for the starting value to increase the entropy, thereby increasing the update range of the gradient descent method?

Also, I don’t understand

“Starting regularization pushes all samples to the mean and hides the underlying “unoptimized” covariance structure.”

which is mentioned in the document.

Since we are adding regular values to the diagonal components of the covariance matrix, aren’t we moving away “from” the mean shape?

Thank you for your help.

The decaying regularization approach allows the model to focus initially on the larger modes of variation and not be dominated by the smaller modes of variation. As the optimization proceeds, it can then focus on the smaller modes of variation.

See more information here:

1 Like