How To Understand The Machine Learning Family



In former posts I have given a quick-and-dirty overview of Machine Learning and related methodologies: “what most people call machine learning in neuroscience academic texts is known in other research contexts as pattern recognition, data mining, artificial intelligence (AI), computational intelligence (CI), soft computing (SC), or data analytics. All of these refer to more or less the same idea from an application point of view”.

I would like to extend and detail this overview by also considering  expert systems, hybrid systems, probabilistic approaches, modern AI … The first thing to clarify is that these are not categories with hard limits but fuzzy boundaries. So there is a mix of methodologies everywhere. Methodologies have grown in different research communities, i.e. around particular journals and conferences and the researchers that feed them, but in recent times digital communication channels allow these communities to exchange knowledge more than ever. As already mentioned in that past blog post, all terms look for methods on how to split data into groups or categories with the final goal of making sense out of the data, i.e. to infer some knowledge from it.

Concepts and taxonomies

I would say you can establish two axis to distinguish among all these terms: data dimensionality, i.e. number of data samples and  how many items to define these samples with, and the type of methodologies you use for achieving the goals mentioned in the former paragraph. As the term data dimensionality might not be clear I would like to elucidate it with an easy example of what I mean: if you have a database with entries for five subjects with their name, age, and height, the number of samples is 5 and the number of items 3. So when coming to EEG you can talk of pattern recognition if you extract for instance 1 feature per channel and try to classify it; you can talk of data mining if you have data from a clinical experiment with 300 subjects with 32 channel EEG sequences of 5 minutes, where you extract 1 feature per channel each 4 seconds; and finally you can talk of data analytics if you have a database with all physiological data of patients (EEG, ECG, and others) in a hospital [1].

So depending on the dimensionality of the data you will talk of pattern recognition, data mining, and data analytics, from smaller to larger dimensionality. You might add signal processing, when you deal with data with a temporal relationship among samples. Again, the limits among these categories are not hard. Practitioners may refer to a pattern recognition or data mining problem depending on their background in spite of the data dimensionality of both problems being the same. On the methodology axis you would find all other terms. I try to explain my view on them in the following sections.

The grandparents: computational intelligence, artificial intelligence, and probabilistic approaches

Mario Koeppen told me once about the curious history of Artificial Intelligence and Computational Intelligence. Both originated in the 60s, but it happened that AI was better at that time in convincing of its usefulness so that research programs (specially in the US) started to reflect their associated topics forgetting about Computational Intelligence. In the 80s the situation changed. Research programs were looking for more applied artificial intelligence, so they start to support computational intelligence, specially neural network approaches (I have read sometime later this history reflected in a paper or blog post, but since I have not found it by googling I can not give you the link. So if readers could help, would be great to update the post).

On the other hand, probabilistic approaches have been always on the crest of the wave. Tradition plays an important role in science, so that probabilistic approaches take over the fame of statistics. Probabilistic approaches group probabilistic inference, Bayesian methods, Gaussian processes, and stochastic modeling (to name some). They have flavored around IEEE Trans. Pattern Analysis and Machine Intelligence, which constitutes the flagship of their research community, and the International Association for Pattern Recognition.

Comparing Computational Intelligence and Artificial Intelligence

I must admit I am not so confident about AI, since I have not worked in the field. My understanding is that they work with logical inference, and rule-based systems. These last kind of systems are known as well as expert systems. These are hard computational inference approaches when you compare them to those grouped under Soft-Computing (SC) or Computational Intelligence (CI).

Rumors say that SC and CI are exactly the same. You have just two names, one used on the West Coast of the USA, and the other on the East Coast. But it is CI that gives name to the associated research community represented by the Computational Intelligence Society. CI methodologies are adaptive, tolerant to errors, biological or cognitive-inspired, suitable for parallelization, and that treat inference in a bottom-up approach. This last feature targets the core of classical AI, which is thought to handle inference in a top-down approach.

CI mainly groups Neural Networks (aka Neurocomputing), Fuzzy Computing, and Evolutionary Computation and covers Probabilistic Computing and Machine Learning. Neural networks are distributed processing systems, whose structure and operation simulates those of connectionist models established on neuronal circuits present in biological brains. Fuzzy Computing was built on the Theory of Fuzzy Sets. Fuzzy Sets are a generalization of classical sets in terms of the membership definition, which becomes a real-valued function in contrast to the binary definition within the classical Theory of Sets. Evolutionary Computation results from the application of the principles of the biological Theory of Natural Selection into computational search procedures (you can cite my thesis [A. Soria-Frisch (2005) Soft Data Fusion for Computer Vision, PhD Thesis, TU Berlin] if you like any of this 3 definitions. Available here or on demand).

Furthermore CI does not rely on any of these methodologies, but is understood as an alliance of them with the goal of solving a real-world problem.

In the next blog post I will try to explain Machine Learning (the perfect family parent), and the promising family offspring (hybrid systems, and modern AI).


Captura de pantalla 2015-10-22 a las 15.42.59

Leave a Reply

Your email address will not be published. Required fields are marked *