Why is MSE loss preferable over e.g. cosine distance?

Hi,

why is MSE loss preferable over e.g. cosine distance when it comes to train a downstream classifier?

Regards,
Riccardo

Hey Riccardo, it isn’t necessarily preferable, but if you can share more detail on the training context and the type of input data maybe we can help more?