Background: Timely prediction of temporal RVF epidemics is limited by absence of time-series data as disease epidemics occur for a short period of time after every seven to fifteen years’ episodes. This necessitates the use of anomaly detection systems (ADS) that would detect unusual behaviour within simulated data as a clue for temporal predictions of disease epidemics. Objective: to identify unusual patterns within simulate data that can be associated with disease epidemics and extend model application for epidemics detection.
Methods: ADS was implemented in MATLAB R2015b to detect patterns in population dynamics of Culex pipiens complex and temperature data associated with RVF epidemics. Data was fit to a model to estimate the parameters (μ, σ2) of a machine learning (ML) Gaussian distribution model. The model algorithm was implemented to select the threshold epsilon; ε using the F1 score values on a cross validation set to determine low probability being more likely to be the anomalous.
Results: preliminary results show Gaussian fit to computed probability density function for anomaly detection with best epsilon found using cross-validation was 8.947268×10-3 for a small data whereas the best F1 on cross validation set was 1.2195×10-2. Model testing of larger dataset had the best epsilon on cross-validation of 1.189075×10-5 with the best F1 on cross validation set found to be 9.52381×10-1. For a daily period from 01 January 1994 to 30 December 1999, ADS was able to identify 85 days related with RVF epidemics suggesting the possibly of disease epidemics occurrence.
Conclusion: ADS provide alternative technique to identify temporal epidemics correctly based on a few true epidemics in simulation data. Further work will focus on applying this model to high dimension dataset and comparing results with other ML algorithms such as support vector machines (SVM) and artificial neural networks (ANN)