The traditional approach of physics proposes to characterize complex phenomena with simple idealized models, since this facilitates comprehension and the execution of controlled experiments. But, the emerging information era prompted statistical modeling as an important complement to the traditional approach, enabling the synthesization of useful information from large amounts of data. A principled approach to statistical modeling combining Occam’s razor with information theory is the Minimum Description Length (MDL) and the Refined MDL (RMDL) is one of the most important formalizations although, unluckily, it is typically non-tractable. Hence, in this work we introduce a simple extension of the RMDL which is purposely devised to be formally analogous to a fictitious statistical mechanical systems in thermodynamic equilibrium and is significantly more tractable than the original RMDL. The extended RMDL largely benefits from the immense theoretical and practical toolbox of statistical mechanics and, concepts such as phase transitions become crucially important from the statistical modeling point of view. For example, we use the existence of phase transitions to justify an appropriate high-temperature series expansion from where systematic approximations of the extended RMDL can be computed, or, when the extended RMDL is used to model subsets of models, we can identify ordered phases with statistical significant selections of models.
In a second stage of our work, we test the introduced formalism in the practical settings of community detection in complex networks, by combining the extended RMDL with a particular family of models. The resulting formalism becomes, at first order of approximation, essentially equivalent to the well known Girvan-Newman (GN) modularity and the inclusion of the corresponding high-order terms are found to improve the detection performance. Moreover, by applying the extended RMDL formalism to model subsets of models, we can also derive the Zhang-Moore (ZM) Belief Propagation method for community detection, allowing us to interpolate betweens the GN and ZM methods, so we can vary the degree of over- and under-fitting characteristic of these two approaches. Finally, we summarize a list of future work that follows from the present one in both, statistical modeling in general and in community detection in particular.
Statistical mechanical extension of the Refined Minimum Description Length and the problem of community detection in complex networks
Συνεδρία:
Room:
1
Date:
Monday, September 24, 2018 - 12:45 to 13:00