Miklós I, Somodi I, Podani J: Rearrangement of ecological data ... (2005)

Miklós I, Somodi Imelda, Podani János
2005
Rearrangement of ecological data matrices via Markov Chain Monte Carlo simulation.
Ecology 86, 3398-3410.
Összefoglaló: 

Block clustering and seriation reveal the underlying structure of ecological data structures by rearranging the rows and the columns of the data table or data matrix, usually representing species and sample sites, respectively. Classical approaches to this problem rely upon a goodness criterion optimized through an iterative algorithm or utilize a simultaneous classification or ordination of species and sites. The new procedure introduced here does not strive for a single optimal rearrangement. Instead, it generates a series of probability distributions, the Boltzmann distributions, for a large set of solutions that include both the optimal and suboptimal solutions. The procedure is governed by a hypothetical parameter, T, called the ‘‘temperature.’’ For the value of zero, only the best rearrangements (if there are many) have nonzero probabilities. As the value of this parameter increases, less optimal rearrangements become more frequent in the associated series of
distributions. When T approaches infinity, the distribution becomes uniform over all the possible matrix rearrangements. We propose a Markov chain Monte Carlo (MCMC) method which converges reasonably fast to this distribution for any value of T. This chain provides
a sample of matrices that can be characterized via several statistics based on the Boltzmann distribution. The relevance of the method is demonstrated using ecological data. We illustrate how much extra information can be gained from suboptimal solutions that may have biological meaning not revealed by the best solution. Although the objective is to give a distribution of solutions rather than a single optimal solution, the new method can actually outperform heuristic searching algorithms in finding the best arrangement. We provide source code for the MCMC method in the C language, which can be compiled under many operating systems (Windows, Linux/Unix, Macintosh OS) and used in command line mode. The Linux/Unix version operates in interactive mode, it gives a graphical output of the results, and is available as a web server interface in PHP 4.3, which can also be installed on personal computers.

Angol nyelvű összefoglaló: 

Block clustering and seriation reveal the underlying structure of ecological data structures by rearranging the rows and the columns of the data table or data matrix, usually representing species and sample sites, respectively. Classical approaches to this problem rely upon a goodness criterion optimized through an iterative algorithm or utilize a simultaneous classification or ordination of species and sites. The new procedure introduced here does not strive for a single optimal rearrangement. Instead, it generates a series of probability distributions, the Boltzmann distributions, for a large set of solutions that include both the optimal and suboptimal solutions. The procedure is governed by a hypothetical parameter, T, called the ‘‘temperature.’’ For the value of zero, only the best rearrangements (if there are many) have nonzero probabilities. As the value of this parameter increases, less optimal rearrangements become more frequent in the associated series of
distributions. When T approaches infinity, the distribution becomes uniform over all the possible matrix rearrangements. We propose a Markov chain Monte Carlo (MCMC) method which converges reasonably fast to this distribution for any value of T. This chain provides
a sample of matrices that can be characterized via several statistics based on the Boltzmann distribution. The relevance of the method is demonstrated using ecological data. We illustrate how much extra information can be gained from suboptimal solutions that may have biological meaning not revealed by the best solution. Although the objective is to give a distribution of solutions rather than a single optimal solution, the new method can actually outperform heuristic searching algorithms in finding the best arrangement. We provide source code for the MCMC method in the C language, which can be compiled under many operating systems (Windows, Linux/Unix, Macintosh OS) and used in command line mode. The Linux/Unix version operates in interactive mode, it gives a graphical output of the results, and is available as a web server interface in PHP 4.3, which can also be installed on personal computers.