Paper Abstract
Unified evaluation for beach-safety vision
Rip currents are a serious and often under-addressed threat to beach safety, and the leading cause of coastal drownings worldwide. They are difficult to detect due to their amorphous structure, similarity to the background, and large variability in viewpoints and environments. Despite their dangers and growing interest in automated detection, existing research remains fragmented across datasets, metrics, models, and approaches, leading to an incomplete understanding of model performance.
We introduce RipBench, a unified benchmark that enables controlled evaluation of rip current detection spanning multiple tasks, including classification, axis-aligned and oriented object detection, and instance, semantic, and panoptic segmentation, on the same data with standardized splits. This allows direct comparison of model performance on all levels of visual abstraction.
Across 304 videos and 303,491 frames collected from diverse coastlines, our results expose a clear performance gap between coarse recognition and precise spatial understanding. While models achieve near-saturated classification performance, accurate localization proves to be substantially more challenging, with performance varying across tasks.
All tasks are supported with carefully curated annotations and evaluated using both standard and safety-critical metrics, with a focus on the F2 score to emphasize recall in this safety-critical setting. RipBench, along with multiple baseline models per task, will be released to support progress in real-world, multi-task vision for beach safety.
Benchmark Scope
Same data, multiple levels of visual abstraction
RipBench is designed to compare coarse recognition with precise spatial localization under realistic beach-monitoring conditions. Each task uses shared data splits so model behavior can be compared directly across abstraction levels.
Release Status
Materials withheld during review
Dataset downloads, code, pretrained baselines, evaluation scripts, leaderboards, and project links are not listed yet. This page will remain a minimal public placeholder until the review process allows the benchmark release.