Abstract
This paper presents CricTAL, a novel framework for Temporal
Activity Localisation (TAL) in cricket batting that
leverages pose-estimated keypoints to segment and classify
the distinct phases of a batting stroke: buildup, execution,
and follow-through. Unlike traditional cricket video analysis
methods, which typically focus on action classification,
broadcast cues, and highlight detection using RGB footage.
CricTAL reframes the problem as a fine-grained temporal
activity localisation task that works with poses, enabling
precise localisation of stroke phases in untrimmed video.
This paper implements and evaluates LSTM, RNN, TCN,
and Transformer-based experiments based on two modelling
strategies: a classification window approach and a
sliding window approach. The experiments are trained and
tested on our newly released uj-aqa-cricketvision dataset,
which contains annotated stroke-phase labels and pose keypoints
extracted from test cricket matches. Trained exclusively
on pose data, our best-performing model, a TCN,
achieves a test accuracy of 90.4% and a mean Average Precision
(mAP)@0.5 of 64.45%, demonstrating its robustness
in identifying phase boundaries. By introducing a pose-only
TAL formulation to the cricket domain, CricTAL enables the
generation of aligned, interpretable phase boundaries that
can directly support downstream Action Quality Assessment
(AQA), where phase-wise scoring demands temporally segmented
input.