Understanding patterns of growth for cities is a crucial issue for its practical and policy implications e.g. regarding sustainability, but also from a theoretical point of view regarding the validation of theories for urban systems. A particular entry is taken by the Evolutive Urban Theory which postulates interactions between cities as main drivers of their growth. Within this framework, numerous concurrent models have been introduced for the evolution of systems of cities, and applied on diverse case studies, but there is to the best of our knowledge no systematic comparison of performances across models and application cases. This contribution makes a first step towards such a systematic benchmark. We consider simple models simulating population of cities only, but including heterogenous underlying processes and assumptions. More precisely, we compare (i) the Favaro-Pumain model for the diffusion of innovation; (ii) the Marius multi-model based on economic exchanges; and (iii) a model including flows within abstract physical networks parametrized on elevation data. These models are calibrated on a large scale harmonized dataset including the European, former Soviet Union, Chinese, Brazilian, South-African, Indian and USA systems of cities on a time period covering 1960-2010. Calibrations are done for different versions of each model including more or less parameters, using distributed genetic algorithms on a computation grid through the intermediary of the OpenMOLE model exploration software. Calibration results show that: (i) at a fixed number of parameters, no model performs particularly better on all urban systems, suggesting the complementary of the innovation, economic, and network dimensions taken into account by the different models; and (ii) when adjusting for the number of parameters with an empirical information criterion, we find that additional components generally improve performances for all models, what reveals a high effective dimensionality of urban systems. This work confirms the need for multiple complementary approaches to model urban systems and sketches a framework allowing systematic benchmarks of concurrent models for complex urban systems.