Random ferns classifier is a variation on the random forest classifier first introduced in [24] and developed further in [25–26]. They have been applied to object recognition in [27], image classification in [28] and fast keypoints recognition in [16]. The advantage of random ferns classifier is that it enables different cues such as appearance and shape to be effortlessly combined as introduced above.
A common method for human detection is to use a sliding windows approach to search for humans in all possible positions and scales. In this way, the detection problem is transformed into a classification problem. Therefore, given a candidate patch in an image, our task is to assign it to the most likely class. Let ci, i = 0,1, be the set of classes such as human or non-human, and let fj, j = 1, 2, ⋯N, be the set of binary features that will be calculated over the patch that we are trying to classify. Formally, we are looking for,
where C refers to the class, our goal is to model the posterior human class probability given a set of N features. This can be expressed by the Bayes rule as,
Similarly, an equivalent expression may be written for the non-human class. By removing the priors P(f1, f2, ⋯, fN), common for all the classes, assuming uniform prior probabilities P(C), the problem reduces to finding,
But learning the joint likelihood distributions over all features is most likely intractable. Naive Bayes makes the simplifying assumption that features are conditionally independent given the class label,
However, this independence assumption is usually false, tending to grossly underestimate the true posterior probabilities. To make the problem tractable while accounting for these dependencies, a good compromise is to partition our features into M groups of size S = N/M. These groups are what we define as ferns, and we compute the joint probability for features in each fern. The conditional probability is as follows,
where Fk = {fσ(k, 1), fσ(k, 2), ⋯, fσ(k, S)}, k = 1, 2, ⋯ M, represents the k’th fern and σ(k, j)is a random permutation function. Hence, we follow a semi-naive Bayesian approach by modeling only some of the dependencies between features.
Furthermore, the class-conditional probabilities P(Fm|C = ci) are estimated for each fern Fm and class ci in the training phase. For each fern Fm, these terms are written as,
where is the number of training samples of class ci that evaluate to fern value k, k = 1, 2, ⋯, 2S and is the total number of samples for class ci. However, when the number of samples is not infinitely large, both and will be zero. To overcome this problem, is to be taken as,
where Nr represents a regularization term, which behaves as a uniform Dirichlet prior over feature values. In the following experiment, the parameter Nr is set to 1.
Do you have any questions about this protocol?
Post your question to gather feedback from the community. We will also invite the authors of this article to respond.