Inductive bias

The inductive bias (also known as learning bias) of a learning algorithm is the set of assumptions that the learner uses to predict outputs of given inputs that it has not encountered.^[1] Inductive bias is anything which makes the algorithm learn one pattern instead of another pattern (e.g., step-functions in decision trees instead of continuous functions in linear regression models). Learning involves searching a space of solutions for a solution that provides a good explanation of the data. However, in many cases, there may be multiple equally appropriate solutions.^[2] An inductive bias allows a learning algorithm to prioritize one solution (or interpretation) over another, independently of the observed data.^[3]

In machine learning, the aim is to construct algorithms that are able to learn to predict a certain target output. To achieve this, the learning algorithm is presented some training examples that demonstrate the intended relation of input and output values. Then the learner is supposed to approximate the correct output, even for examples that have not been shown during training. Without any additional assumptions, this problem cannot be solved since unseen situations might have an arbitrary output value. The kind of necessary assumptions about the nature of the target function are subsumed in the phrase inductive bias.^[1]^[4]

A classical example of an inductive bias is Occam's razor, assuming that the simplest consistent hypothesis about the target function is actually the best. Here, consistent means that the hypothesis of the learner yields correct outputs for all of the examples that have been given to the algorithm.

Approaches to a more formal definition of inductive bias are based on mathematical logic. Here, the inductive bias is a logical formula that, together with the training data, logically entails the hypothesis generated by the learner. However, this strict formalism fails in many practical cases in which the inductive bias can only be given as a rough description (e.g., in the case of artificial neural networks), or not at all.

^ ^a ^b Mitchell, T. M. (1980), The need for biases in learning generalizations, CBM-TR 5-110, New Brunswick, New Jersey, USA: Rutgers University, CiteSeerX 10.1.1.19.5466
^ Goodman, Nelson (1955). "The new riddle of induction". Fact, Fiction, and Forecast. Harvard University Press. pp. 59–83. ISBN 978-0-674-29071-6.{{cite book}}: CS1 maint: date and year (link)
^ Mitchell, Tom M (1980). "The need for biases in learning generalizations" (PDF). Rutgers University Technical Report CBM-TR-117: 184–191.
^ DesJardins, M.; Gordon, D. F. (1995), "Evaluation and selection of biases in machine learning", Machine Learning, 20 (1–2): 5–22, doi:10.1007/BF00993472

[Mitchell1980-1] Mitchell, T. M. (1980), The need for biases in learning generalizations, CBM-TR 5-110, New Brunswick, New Jersey, USA: Rutgers University, CiteSeerX 10.1.1.19.5466

[2] Goodman, Nelson (1955). "The new riddle of induction". Fact, Fiction, and Forecast. Harvard University Press. pp. 59–83. ISBN 978-0-674-29071-6.{{cite book}}: CS1 maint: date and year (link)

[3] Mitchell, Tom M (1980). "The need for biases in learning generalizations" (PDF). Rutgers University Technical Report CBM-TR-117: 184–191.

[DesJardinsandGordon1995-4] DesJardins, M.; Gordon, D. F. (1995), "Evaluation and selection of biases in machine learning", Machine Learning, 20 (1–2): 5–22, doi:10.1007/BF00993472

[1]

[2]

[3]

[4]