Model Concepts

Bias-Variance Tradeoff

The best predictive model is one that has good generalization ability. With that, it will be able to give accurate predictions to new and previously unseen data. With that comes two terms bias & variance.

High Bias results from Underfitting the model. This usually results from erroneous assumptions, and cause the model to be too general.

High Variance results from Overfitting the model, and it will predict the training dataset very accurately, but not with unseen new datasets. This is because it will fit even the slightless noise in the dataset.

The best model with the highest accuarcy is the middle ground between the two.

Regularization

Regularization is an important concept for some supervised models. It prevents overfitting by restricting the model, thus lowering its complexity.

In regression, it is regulated by the hyperparameter alpha, where a high alpha means more regularization and a simpler model.

Regularization	Model	Desc
L1	LASSO	reduces sum of the absolute values of coefficients. change unimportant features' regression coefficients into 0
L2	Ridge	reduces the sum of squares

In classification, it is regulated by the hyperparameter C, where a lower C means more regularization and a simpler model.

Model	Regularization	Desc
SVC	L2	-
LinearSVC	L1/L2	Can choose which regularization type
Logistic Regression	L1/L2/ElasticNet	Can choose which regularization type