To manage and pre-empt incident risks effectively by maritime stakeholders, predicted incident probabilities at ship level have different application aspects such as enhanced targeting for ship inspections, improved domain awareness and improving risk exposure assessments for strategic planning and asset allocations to manage risk exposure. Using a unique and comprehensive global dataset from 2014 to 2020 of 1.2 million observations, this study explores 144 model variants from the field of machine learning (18 random forest variants for 8 incident endpoints of interest) with the aim to enhance prediction capabilities to be used in maritime applications. An additional point of interest is to determine and highlight the relative importance of over 500 evaluated covariates. The results differ for each endpoint of interest and confirm that random forest methods improve prediction capabilities, based on a full year of out of sample evaluation. Targeting the top 10% most risky vessels would lead to an improvement of predictions by 2.7 to 4.9 compared to random selection. Balanced random forests and random forests with balanced training variants outperform regular random forests where the end selection of the variants also depends on the aggregation type and use of probabilities in the application areas of interest. The most important covariate groups to predict incident risk are related to beneficial ownership, the safety management company, size and age of the vessel and the importance of these factors is similar across the endpoint of interest considered here

, , , , , , , , , , , ,
Econometric Institute Research Papers
Department of Econometrics

Knapp, S., & van de Velden, M. (2021, December 13). Exploration of machine learning algorithms for maritime risk applications. Econometric Institute Research Papers. Retrieved from