A Novel Hybrid Deep Learning-Probabilistic Framework for Real-Time Crash Detection from Monocular Traffic Video
| dc.authorid | 0000-0003-0038-7519 | |
| dc.contributor.author | Erkartal, Resat Bugra | |
| dc.contributor.author | Yilmaz, Atinc | |
| dc.date.accessioned | 2026-01-31T15:09:04Z | |
| dc.date.available | 2026-01-31T15:09:04Z | |
| dc.date.issued | 2025 | |
| dc.department | İstanbul Beykent Üniversitesi | |
| dc.description.abstract | The rapid evolution of autonomous vehicle technologies has amplified the need for crash detection that operates robustly under complex traffic conditions with minimal latency. We propose a hybrid temporal hierarchy that augments a Region-based Convolutional Neural Network (R-CNN) with an adaptive time-variant Kalman filter (with total-variation prior), a Hidden Markov Model (HMM) for state stabilization, and a lightweight Artificial Neural Network (ANN) for learned temporal refinement, enabling real-time crash detection from monocular video. Evaluated on simulated traffic in CARLA and real-world driving in Istanbul, the full temporal stack achieves the best precision-recall balance, yielding 83.47% F1 offline and 82.57% in real time (corresponding to 94.5% and 91.2% detection accuracy, respectively). Ablations are consistent and interpretable: removing the HMM reduces F1 by 1.85-2.16 percentage points (pp), whereas removing the ANN has a larger impact of 2.94-4.58 pp, indicating that the ANN provides the largest marginal gains-especially under real-time constraints. The transition from offline to real time incurs a modest overall loss (-0.90 pp F1), driven more by recall than precision. Compared to strong single-frame baselines, YOLOv10 attains 82.16% F1 and a real-time Transformer detector reaches 82.41% F1, while our full temporal stack remains slightly ahead in real time and offers a more favorable precision-recall trade-off. Notably, integrating the ANN into the HMM-based pipeline improves accuracy by 2.2%, while the time-variant Kalman configuration reduces detection lag by approximately 0.5 s-an improvement that directly addresses the human reaction time gap. Under identical conditions, the best RCNN-based configuration yields AP@0.50 approximate to 0.79 with an end-to-end latency of 119 +/- 21 ms per frame (similar to 8-9 FPS). Overall, coupling deep learning with probabilistic reasoning yields additive temporal benefits and advances deployable, camera-only crash detection that is cost-efficient and scalable for intelligent transportation systems. | |
| dc.identifier.doi | 10.3390/app151910523 | |
| dc.identifier.issn | 2076-3417 | |
| dc.identifier.issue | 19 | |
| dc.identifier.uri | https://doi.org./10.3390/app151910523 | |
| dc.identifier.uri | https://hdl.handle.net/20.500.12662/10810 | |
| dc.identifier.volume | 15 | |
| dc.identifier.wos | WOS:001594651600001 | |
| dc.identifier.wosquality | Q2 | |
| dc.indekslendigikaynak | Web of Science | |
| dc.language.iso | en | |
| dc.publisher | Mdpi | |
| dc.relation.ispartof | Applied Sciences-Basel | |
| dc.relation.publicationcategory | Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı | |
| dc.rights | info:eu-repo/semantics/openAccess | |
| dc.snmz | KA_WoS_20260128 | |
| dc.subject | crash detection | |
| dc.subject | real-time systems | |
| dc.subject | R-CNN | |
| dc.subject | Kalman Filtering | |
| dc.subject | Hidden Markov Models | |
| dc.subject | Artificial Neural Networks | |
| dc.subject | monocular vision | |
| dc.subject | intelligent transportation systems | |
| dc.title | A Novel Hybrid Deep Learning-Probabilistic Framework for Real-Time Crash Detection from Monocular Traffic Video | |
| dc.type | Article |












