Chapter 1 Introduction: neurons, models and outliers

In the late 1990s, neuroscientists faced a curious problem. They investigated a set of neurons which started to fire when a bar with a certain orientation appeared within their receptive field, i. e. the visual area these neurons were concerned with. However, when this bar extended their receptive field, the firing stopped. This behaviour is called endstopping.

Illustration of endstopping: The shaded rectangle shows the receptive field of the neuron. The black rectangle represents the bar. If the receptive field is grey, the neuron does not fire, if it is orange, the neuron fires.

Figure 1.1: Illustration of endstopping: The shaded rectangle shows the receptive field of the neuron. The black rectangle represents the bar. If the receptive field is grey, the neuron does not fire, if it is orange, the neuron fires.

Different functions of these endstopped neurons had been postulated. Studies had suggested a role for these cells in detecting curvature, line terminations, occlusion, perceptual grouping and illusory contours. (Rao and Ballard 1999) Rao and Ballard’s study presented an alternative explanation for this effect. Their analysis suggested that higher-level neurons whose receptive field resembled the entire area in figure 1.1 predicted the lower-level input. Feedback connections transmitted the error of the transmission. According to Rao and Ballard, the endstopped neurons had the function of error feedback.

They demonstrated this by fitting a hierarchical model with three layers on image patches extracted from five natural photos such that the higher-level layers optimally predicted the lower-level layers. The feedback nodes encoding the difference between the values in the second layer and the values that had been predicted by the third layer exhibited endstopping effects. (Rao and Ballard 1999)

The explanation for neuronal endstopping therefore consists of two parts: firstly, the structure of neuronal networks allow for association of neighbouring pixels which is necessary to detect bars. Secondly, short bars are less usual than longer bars¹. This is the reason why the higher-level neurons predict a longer bar in these cases and receive a higher error feedback signal. Short bars are therefore outliers. Endstopped neurons thus provide a natural example for outlier detection.

Since these investigation, predictive coding, the hypothesis that higher-level neuronal layers attempt to predict lower-level layers, has become popular with respect to different areas of the brain (Friston 2005). Intuitively, this makes sense: if the received input represents usual patterns the information of the higher-level neurons suffices so that lower-level correction is only required if something unusual happens. This allows the brain to build a model of reality as such a model cannot account for all irregularities and retain its descriptive power. Consider, for instance, a chess game where the player attempts to predict his opponent’s actions. It is impossible to consider all possible moves. Instead, the player has to restrict himself to the most probable ones. If the next move has been considered by the player, his opponent behaves in the usual way and the player can rest assured. In contrast, if the player has not considered the next move, he may be alerted. Unusual behavior therefore merits special attention, in particular, if it is impossible to model all paths of behavior.

The statistical implementation of this strategy is given by outlier detection. The examples are manifold:

It is impossible to model all possible methods to commit credit card fraud. Instead, we simply try to detect unusual behavior.
There are many different reasons why data points may deviate from the assumptions of the classical linear model but many of these deviations are observable by looking at different kinds of residual errors.
There are many different reasons why a sensor can be corrupted and not all of them can be considered. However, many of them manifest themselves in unusual patterns.

Outliers can now be defined as those data points that are dissimilar from the other data points (see Aggarwal 2017). The particular outlier detection methods defines what it means for data points to be similar to each other. For instance, with respect to the endstopping neurons, image patches are similar if they can be characterized by the same neuronal model.

This report focuses on methods that generate predictions and assess them in order to judge how usual a certain data point is. Its structure is based on chapter 3 of Aggarwal’s Outlier Analysis (2017). Two alternative popular approaches are probabilistic modeling (Aggarwal 2017, ch. 2) and clustering (Aggarwal 2017, ch. 4). Note that the distinction is not well-defined and there are many methods that can be attributed to different fields (see chapter 2).

Chapter 2 presents a general methodology to assess outlier scores and introduces the Z-plot as a visualization of outlier scores..

Subsequently, chapter 3 introduces outlier detection using linear models and principal component analysis. All these methods are limited by the fact that they can only model linear relationships. This disadvantage is adressed in chapter 4 where nonlinear extensions of these methods are introduced. Specifically, the kernel trick is introduced yielding kernel principal component analysis. Finally, more complex regressions, in particular neural networks, are introduced. Methodological instructions consist of the following parts:

Motivation and intuition: What does it mean for a data point to be usual? What hyperparameters need to be adjusted and how do they affect whether data points are usual?
Implementation in R
Example application: a two-dimensional outlier analysis using different generated datasets should provide an intuition for advantages and disadvantages of every method as it is visually easy to assess
Applications in compression and data correction, two related fields to outlier detection
Pro and Contra

References

Aggarwal, Charu C. 2017. Outlier Analysis. Springer Publishing Company, Incorporated.

Friston, Karl. 2005. “A theory of cortical responses.” Philosophical Transactions of the Royal Society B 360: 815–36. https://doi.org/10.1098/rstb.2005.1622.

Rao, Rajesh P N, and Dana H Ballard. 1999. “Predictive Coding in the Visual Cortex: a Functional Interpretation of Some Extra- classical Receptive-field Effects.” Nature Neuroscience 2 (1). https://doi.org/10.1038/4580.

Rao and Ballard demonstrated this by another investigation (Rao and Ballard 1999).↩