Hi,
I think the advantage of regularization is to prevent degeneration of the covariance matrix, so why do you use the annealing approach?
Is the reason for the starting value to increase the entropy, thereby increasing the update range of the gradient descent method?
Also, I don’t understand
“Starting regularization pushes all samples to the mean and hides the underlying “unoptimized” covariance structure.”
which is mentioned in the document.
Since we are adding regular values to the diagonal components of the covariance matrix, aren’t we moving away “from” the mean shape?
Thank you for your help.