According to the classical definition, the chemical kinetics is a discipline that studies molecular compounds and the changes they undergo when reacting with each other. In the most complex cases, it is convenient to represent reaction kinetics with a reaction network -- a bipartite graph in which nodes represent reactions and species, whereas directed edges represent the participation of a species in respective reactions. A reaction network can be processed to obtain a system of non-linear differential equations, the so called zero-dimensional model. The usual questions here regard numerical integration, sensitivity analysis, and identifying the most important reaction pathways. That said, the question of where does the information about the reaction network itself comes from is largely ignored, and it is usually left up to the experts to construct such a network "by-hand". That is to say by either performing multiple observations of the chemical system or by inference form theoretical knowledge.
Graph grammar is a set of rules that transform one subgraph into another, and repeated action of such rules may often result in a highly complex, yet not-random network structures. Molecular structures are often abstracted as labelled graphs. Here we devise a theoretical framework that formalises the expert knowledge of organic chemistry as a graph grammar that acts upon molecular graphs. As such, this machinery is meant to automatically rediscover large reaction networks starting from a small set of initial compounds. Furthermore, when such computational procedure is followed in practice, one will immediately face with two problems: firstly, the space of all combinatorial possibilities for chemical reactions might be very large; secondly, some molecules (known as polymers) have a periodic structure with unlimited size, which essentially means that the reaction network is infinite. We propose a compromise that circumvents the infinite network issue by using random graphs. So that the number of molecular species scales down from infinity to the number of unique fragments that contribute to the network.
A generic working model that recovers large reaction networks is very valuable tool: in principle, one should be able to discover the scenario of how proteins appeared form simple non-organic compounds -- the largest enigma in origin of life studies. In this work, however, we demonstrate our methodology to the polymerisation of linseed oil, which is a relatively fast process with good experimental coverage.