Connectionism is an approach to the study of human mental processes and cognition that utilizes mathematical models known as connectionist networks or artificial neural networks.[1]
Connectionism has had many "waves" since its beginnings. The first wave appeared 1943 with Warren Sturgis McCulloch and Walter Pitts both focusing on comprehending neural circuitry through a formal and mathematical approach,[2] and Frank Rosenblatt who published the 1958 paper "The Perceptron: A Probabilistic Model For Information Storage and Organization in the Brain" in Psychological Review, while working at the Cornell Aeronautical Laboratory.[3] The first wave ended with the 1969 book about the limitations of the original perceptron idea, written by Marvin Minsky and Seymour Papert, which contributed to discouraging major funding agencies in the US from investing in connectionist research.[4] With a few noteworthy deviations, most connectionist research entered a period of inactivity until the mid-1980s. The term connectionist model was reintroduced in a 1982 paper in the journal Cognitive Science by Jerome Feldman and Dana Ballard.
The second wave blossomed in the late 1980s, following a 1987 book about Parallel Distributed Processing by James L. McClelland, David E. Rumelhart et al., which introduced a couple of improvements to the simple perceptron idea, such as intermediate processors (now known as "hidden layers") alongside input and output units, and used a sigmoid activation function instead of the old "all-or-nothing" function. Their work built upon that of John Hopfield, who was a key figure investigating the mathematical characteristics of sigmoid activation functions.[3] From the late 1980s to the mid-1990s, connectionism took on an almost revolutionary tone when Schneider,[5] Terence Horgan and Tienson posed the question of whether connectionism represented a fundamental shift in psychology and so-called "good old-fashioned AI," or GOFAI.[3] Some advantages of the second wave connectionist approach included its applicability to a broad array of functions, structural approximation to biological neurons, low requirements for innate structure, and capacity for graceful degradation.[6] Its disadvantages included the difficulty in deciphering how ANNs process information or account for the compositionality of mental representations, and a resultant difficulty explaining phenomena at a higher level.[7]
The current (third) wave has been marked by advances in deep learning, which have made possible the creation of large language models.[3] The success of deep-learning networks in the past decade has greatly increased the popularity of this approach, but the complexity and scale of such networks has brought with them increased interpretability problems.[8]