The global financial system is a complex network with many stakeholders and nonlinear feedback interactions among them. A core component of the global financial system is the major stock markets, the crashes of which are rare events that are driven by large-scale collective behavior, and are accompanied by high magnitude of both social and economic consequences (Bluedorn et al. (2013), Farmer (2012)). It is a really challenging task to recognize in advance the triggers of these events, given the inherent difficulty to decompose the global financial system to its individual nodes and, at the same time, to simulate the relevant transmission channels. In this study we employee deep learning hierarchical models for one day ahead prediction of global stock crashes, defined as any realized return below the 1% percentile. To this end, we develop an ensemble method composed by parallel and serial connected components of machine learning algorithms. Our dataset is composed of stock, bond, and currency returns for 39 countries from 1996 to 2017, while it is also enriched by autoregressive terms, historical volatilities, and historical transition matrices of crash vs. no-crash states for the various regions. The occurrence of a global crisis event (i.e. dependent variable) is defined as at least two out of three regional economic centers being in financial turmoil. To reduce the dimensionality of the features (i.e. regressors) collected, which exceed 1600, we apply in a preprocessing stage a machine learning algorithm called BORUTA. The pipeline of the proposed framework is composed by a series of deep neural networks and an XGBOOST aggregation layer. Initially, we disentangle the global financial system into regional centers (i.e., Asia, America, and Europe), and train algorithms that capture the local specificities of each region. Specifically, we train a hierarchical deep learning network for each region to predict a crisis event for each individual region. Subsequently, regional agents’ prediction is fed into an aggregation layer using a recently introduced machine learning technique, known as Extreme Gradient Boosting (XGBOOST), to derive a forecast of an upcoming global crisis event. We follow an end to end approach for optimizing the hyper parameters of the proposed ensemble architecture. We train our models using data up to 2010, while the rest of the data spanning from 2011 to 2017 is used for out of time validation of the models’ forecasting efficacy. The proposed architecture addresses issues like patterns recognition, effects in time-varying volatility, and temporal dependencies. We benchmark our results by training a series of standalone techniques, namely Logistic Regression, Support Vector Machines, Neural Networks, CART, Random Forests, XGBOOST and Deep Neural Network. Our empirical evidence suggests that deep learning techniques outperform in detecting interactions that are invisible or difficult to disentangle using conventional modeling techniques. Furthermore, the forecasts derived by combining Deep Learning with XGBOOST algorithms show superior predictive performance relative to individual models. Particular, the hit rate of the proposed ensemble framework reaches 54% with a small percentage of false alarm rates (10%). Our results also suggest that a forecasted crisis in Europe as well as VIX, are the main determinants for predicting a global stock market crash, whereas, a crash in Asia is identified as a less contributing explanatory variable of global stock market crises. Thus, our work, being essentially an Early Warning Systems (EWSs) for the short term period, can complement the expert judgment of policymakers in their effort to curtail contagion risk and, in extreme cases, even preempt a global crisis.