Combination of Support Vector Machine and K-Fold cross-validation for prediction of long-term degradation of the compressive strength of marine concrete

.


Introduction
Because of the continued growth of infrastructure construction in China, large projects of marine construction proliferate constantly. In the marine environment, some ions such as Mg 2+ , SO4 2and Clcould cause a coupling action with temperature in the degradation of concrete, resulting in the degradation of concrete strength and irreversible decline in the durability of concrete structures, which is the main concern of researchers. Therefore, the prediction and evaluation of concrete strength in a marine environment is of great significance, which can be helpful in determining the safety of concrete structures. For practical environmental conditions, it is difficult to determine the potential nonlinear relationships between the properties of concrete and material composition, which cause a complicate prediction of the long-term strength of concrete in a marine environment.
Different approaches have been proposed and used to predict the long-term strength of concrete. Many have developed mathematical formulae using statistical regression analysis to build the relationships between time t and Kc (corrosion resistant coefficient for concrete compressive strength) [1][2][3]. However, due to the variations in the chemical ion concentration and temperature, the factors involved in these approaches are not adequate, which limits their applicability. Also, those traditional statistical methods are limited to predicting the long-term strength of concrete based on a small amount of data and the accuracy of the prediction is lower than required. With the development of computer technology, a dataset can be established and analyzed based on large quantities of data available from public sources. Over the past few decades, many machinelearning technologies have been used to model practical problems. In terms of studying the performances of concrete, many researchers such as Vinay Chandwani (2015) [4][5][6][7][8][9][10], have built some machine-learning models. But those models usually only involve the material composition of concrete, without considering environmental factors. Therefore, it is necessary to extend the current models by comprehensively considering the effect of materials and environments. For the extension of the models, a large volume of new data will be needed, and the selection of functional parameters will have a great effect on the results. Therefore, a robust parameter optimization method is significant to improve the performance of prediction model.
In this study the following points are considered: An evaluation model was built to estimate the degradation of concrete strength in the marine environment. The time-dependent corrosive ion concentration, temperature and material composition were involved in this model.
The optimization method in the model was improved. The new model was compared with two commonly used machine learning models, including artificial neural network and decision tree.
Several factors (i.e., material factors and environmental factors) were evaluated to determine their influence on the degradation of concrete strength. The sensitivity of these factors is discussed. The model proposed in this study can be used as a tool for designing marine concrete structures and guide the construction in the marine environment.

Data collection and prediction methodology 2.1 Data collection
It is well known that the degradation of concrete performance in the marine environment is a complex process. Many studies have been conducted to investigate this process [3][4][5][6][7][8][9][10], but it is difficult to find a proper mathematical to describe it. This study collected a large quantity of data [1, 2, 11, 12, 13, 14 and 15] to assist in the construction and optimization of a mathematical model, which could describe the concrete degradation in the marine environment. The main factors considered in this study are shown as follows: (a) Initial strength. A high initial designed strength of concrete could result in high compactness in the structure, and it could reduce the ingress of ions and the deterioration of the concrete. Many researchers have studied high strength concrete in marine structures [16].
(b) The dosage of fly ash and slag. Addition of fly ash and slag has been found to be effective in improving the performance of concrete in the marine environment, due to pozzolanic reactions [17].
(c) Environmental ions. Soluble ions such as Mg 2+ , SO4 2and Clare known to be responsible for the deterioration of concrete strength, so those ions are mainly considered in the process of building the prediction model [18].
(d) Temperature. Temperature could affect the hydration process of concrete. Moreover, temperature also plays an important role during the process of concrete deterioration, e.g., deterioration caused by drying-wetting cycles.
(e) Relative humidity. The relative humidity should be considered because it has an effect on the water transport in the pores of the concrete. But the concrete involved in this study is mainly immersed in seawater, so the relative humidity is always 100%.
(f) Age of concrete service. The age of concrete service is one of the most important factors that could cause different deterioration results in the strength of concrete, so it also should be considered.
In this study, data from 116 samples were used to build up a dataset. In the dataset, 80 groups of data were employed as the training sample and the other 36 samples as the testing sample. There are 8 input factors in those data, including the initial strength of concrete (F0), the mixing amount of fly ash (FA) and slag (SG), the concentration of Mg 2+ , SO4 2and Cl -, the service age (t) and the temperature (T). The output is the compressive strength of concrete after deterioration. Table  1 shows the range of input data and output data.

Prediction methodology
Machine learning techniques have been used to solve difficult problems, e.g., spam categorization [19]. There are many machine learning models available, e.g., Artificial Neural Network (ANN), Decision Tree (DT), Support Vector Machine (SVM). In this study, the SVM model was mainly used to explore the deterioration of concrete strength in the marine environment. A new optimization algorithm was applied to the machine learning technology. The prediction performance of SVM model was compared with that of other models often used in data mining, e.g., ANN model and DT model.  Figure 1 shows the implementation process of the SVM model. By using this model, the concrete strength after degradation is determined, taking into account the combined effect of multiple factors. The optimization algorithm is used for choosing the parameters of SVM model. The specific description of the algorithm is shown as follows: In Eq. 1, { , } =1 , ∈ is a set of the influential factors of the concrete strength of group i; ∈ is the output (i.e., the concrete strength of group i after degradation). Those data all come from the training dataset.

II. Choosing kernel function
Radial Basis Function (RBF) was chosen in this study: According to the structure optimization principle, the regression optimization goal could be expressed as: Where c is penalty parameter and g (gamma) is kernel function parameters.

III. Parameter optimization
The parameters c and g have great effects on the performance of prediction model. However, there is no proper way to determine those parameters. In most cases, the default value of c is 1 or other random values. The default value of g is 1/k (where k is the number of attributes) or other random values. In this study, a K-fold crossover algorithm was used to determine the parameters c and g following the procedure: 1. The collected training dataset was divided into many parts and set as "train (1)", "train (2)", "train (k -1)" and "train (k)". 2. In the process of prediction, one part of them should be set as the test dataset, while the rest k-1 parts should act as the training set. The average prediction accuracy is VC, which is mainly the optimization target. 3. When VC get the best accuracy, the best penalty parameter c and best radial basis kernel parameter g were obtained [20].

Figure 2. Parameter optimization
As shown in Figure 2, the range for searching c and g in this study was set as 2 -5 ~ 2 5 . The ordinate was the mean square error (MSE). After the crossover validation training, the model got the optimal penalty parameter c = 84.448, and that of radial basis kernel parameter was g = 0.18945. A small c would produce under-fitting, while a large c would cause over-fitting, so that both situations could impair the generalization of the SVM model.

IV. Calculating threshold
With the best c, the best g and the optimal constraint in formula (3) and (4), a dual problem could be solved, and the optimal solution * could be solved, where * = ( 1 * , … , * ) . The threshold b * was calculated with the following formula (b * = 0.5281):

V. Building decision function
The decision function could be built by using a *, b *, g and selected kernel function [21]: The decision function is the final SVM model, which could predict the concrete strength after degradation. With the input dataset of testing samples, the performance of the SVM model regarding generalization could be tested.

Prediction of training dataset
Eighty training samples were randomly selected as the inputted data, then using SVM model to predict the strength of those samples. The strength of concrete after degradation is predicted by using SVM model, as shown in Figure 3. The relative deviation of prediction points can be seen in Figure 3 (b). The relative deviation between the predicted values and the experimental values is small, and most of the relative deviation is in the range of [-10%, 10%]. The average relative error is 3.32%, and root mean squared error (RMSE) is 1.43. As shown in Figure 3 (c), the correlation coefficient R 2 reaches 0.96, and the residual error analysis shows that observation points and prediction points are strongly correlated with the residual fluctuations ranging from (-5, 5). The prediction results have shown that the SVM model could deal with the training samples satisfyingly. For the data beyond the scope of training dataset, more testing is needed to verify its capability of general prediction.

Predicting the trend of concrete degradation
In this study, the experimental data were taken from Wang [13] and S.Islam [14]. The strength of concrete after degradation was predicted by the proposed SVM model. Figure 5 shows the prediction results together with the experimental data. optimised, e.g. by using a more proper algorithm. To solve this problem, the degradation coefficient could be used to estimate the strength of concrete and help determine the durability and service life of concrete [3]. When the degradation coefficient of concrete is small, the concrete would be most likely to fail earlier than normal situation. In this study, we adopted two models of U.Schneider [2] and Wang [3] for the application in calculating degradation coefficient, and compared them with C-SVM models. Based on those calculations, the degradation coefficient is illustrated in Figure 6.  Figure 6 shows that the trend of the calculated Kc (corrosion resistant coefficient for concrete compressive strength) by using the U.Schneider calculation model and the C-SVM model is consistent with the experimental results. Results calculated by using U.Schneider model shows the best fit to the experimental results. However, the U.Schneider model is described by the mathematical formula with fitting parameters α1, α2 and α3. This fitting process of these parameters is tedious and is very inconvenient for engineering application. The Wang's model was a theoretical model and assumed that the deterioration of concrete strength would not begin with the process of micro-etching, so the result is not good. In terms of the accuracy of the prediction, the SVM model is superior to other calculation models for predicting the strength of concrete after degradation in practical application.

Comparison with other machine learning models
There are many other machine learning models available for concrete performance prediction, e.g. artificial neural network and decision tree [2, 5, 22, 23, 24 and 25]. In this study, those two models were established using the same dataset.  Figure 7 shows the decision tree after proper training. Compared to the ANN and SVM, the decision tree consists of many If-Then statements that constitute the logic, so it is considered as an explicit white box model. Figure 8 shows the process of training a neural network. This paper adopted a three-layer neural network with 8 nodes, 7 nodes and 1 node, respectively. The specific model parameters are set as follows: stride length net.trainParam.show = 10, momentum factor net.trainParam.mc = 0.9, learning rate net.trainParam.lr = 0.01, training times net.trainParam.epochs = 10000 and the goal of minimum error net.trainParam.goal = 0.01. As can be seen from Figure 8, the training errors decrease to a point below the goal of minimum error at step 8. The performance of different models (i.e., SVM, ANN and DT model) is evaluated. The results of the evaluation (MAE, RASE, R 2 and RES) are shown in Table 3. Table 3 indicates that the optimized SVM model has a better prediction performance and a smaller residual fluctuation than the artificial neural network (ANN) and the decision tree (DT). The generalization ability of ANN and DT is similar, but the prediction performance of ANN for the training dataset is better than that of DT. It can be concluded that it is not proper to determine the accuracy of prediction models by using training samples. The SVM model has good performances both on the training dataset and on the testing dataset, so it can be used for predicting the strength of degraded concrete in the marine environment.

Sensitivity analysis
Sensitivity analysis shows the significance of the input factors on the predicted results (i.e., strength on concrete) [26]. Some researchers have proposed various methods for sensitivity analysis. Chen [27] used the theory of rough-set, which divides various influential factors into five levels, including low, low-medium, medium, medium-high and high. Liong [28] defined a specific formula for the analysis of sensitivity: Where N is the number of data points. In these data, the inputted parameter should be changed with 20% constant amplitude. This study applied Liong's approach to calculating the sensitivity of different factors (e.g., F0, FA, SG, MG 2+ , Cl -, SO4 2-, T and t). The absolute value of the sensitivity is listed in Table 4. It's obvious that slag (SG) has the highest sensitivity among all factors. It means that the slag has a significant influence on the concrete strength and deterioration. The strength of concrete before degradation also has a high sensitivity. It can be explained by the fact that concrete with a higher strength usually has a higher density and lower porosity, the diffusion of ions into the concrete is more difficult and the deterioration of concrete could be mitigated. For the ions available in seawater, the sulfate ion appears to have a greater influence on the concrete strength than the chloride ion; it may be explained by the more serious damage induced by sulfate ingress than that induced by chloride. The temperature of sea water has the lowest sensitivity in the degradation of the concrete strength. However, the influence of the temperature cannot be ignored, because it can be significant when the concrete structures are subjected to wet-dry cycles, which may also lead to a degradation of the strength of concrete.

Conclusion
In this study, an optimized SVM model was proposed to predict the long-term strength degradation of concrete in the marine environment. Eight environmental factors were considered in the SVM model. It is concluded that the SVM model has a good prediction accuracy. The average relative error of the training dataset was 3.32%, and that of the testing dataset was 7.08%. Compared with artificial neural network and decision tree, the SVM model was found to be a better predictor.
The SVM model could be used to predict degradation coefficient and was more efficient and convenient than the U.Schneider formula in practical applications. Therefore, it can be recommended to predict concrete performance degradation under complicated marine environment conditions. Sensitivity analysis is conducted to show the influence of different factors (i.e., material factors and environmental factors). It was found that the material composition, especially the amount of slag was the most sensitive factor for the strength of the concrete. The second most significant factor was the strength of concrete before degradation. Among the environmental factors, diffusion of sulfate ions into the concrete has a major influence on the strength of concrete, and the influence of the temperature was insignificant.