Visual Tracking via Joint Sparse Representation

Given a bounding box defining the object of interest (target) in the first frame of a video sequence, the goal of a tracker, is to determine the object’s bounding box in subsequent frames. In contrast to specific trackers, where the object model is learned off-line, general tracking is more challenging since the object is previously unknown and needs to be learned throughout the video sequence. Primary challenges encountered in visual tracking are target appearance change and occlusion, while other challenges arise from variation in illumination, scale, and camera motion.

Sparse representation has recently shown appealing results in various computer vision applications. Generally, a candidate is represented using a linear combination of a few elements (atoms) from a dictionary composed of a number of previously found target images. The coefficients of this representation are used to find the best candidate. Apart from the ability to handle illumination and mild pose change, these trackers attempt to tackle occlusion.

joint sparse illustration

This paper presents a robust tracking approach to handle challenges such as occlusion and appearance change. The target is partitioned into a number of patches. The appearance of each patch is modeled using a dictionary composed of corresponding target patches in previous frames. In each frame, the target is found among a set of candidates generated by a particle filter, via a likelihood measure that is shown to be proportional to the sum of patch-reconstruction errors of each candidate. Since the target’s appearance often changes slowly in a video sequence, it is assumed that the target in the current frame and the best candidates of a small number of previous frames, belong to a common subspace. This is imposed using joint sparse representation to enforce the target and previous best candidates to have a common sparsity pattern. Moreover, an occlusion detection scheme is proposed that uses patch-reconstruction errors and a prior probability of occlusion, extracted from an adaptive Markov chain, to calculate the probability of occlusion per patch. In each frame, occluded patches are excluded when updating the dictionary. Extensive experimental results on several challenging sequences shows that the proposed method outperforms state-of-the-art trackers.


Patchwise Joint Sparse Tracking with Occlusion Detection by Ali Zarezade, Hamid R. Rabiee, and A. Soltani-Farani, IEEE Transactions on Image Processing (TIP), vol. 23, issue 10, 2014. arXiv

Collaborating Frames: Temporally Weighted Sparse Representation for Visual Tracking by Ali Soltani-Farani, Hamid R. Rabiee, and Ali Zarezade, IEEE International Conference on Image Processing (ICIP), IEEE, 2014. PAPER

The visual object tracking vot2013 challenge results by Matej Kristan and Ali Zarezade, IEEE International Conference on Computer Vision Workshops, 2013. PAPER

Project code

The MATLAB source codes for our proposed trackers in the above publication and several state-of-the-art visual tracking methods is available in the git repository. Please cite the above works if you use this software and contact first author in case of any problems.