Recall Precision Curves


Thanks in advance for this data set.
Nice work from what I can tell so far.

But I’m a little confused how you create the recall vs precision curves.
In your documentation it says:

Specifically, we match predictions with the ground truth objects that have the smallest center-distance up to a certain threshold. For a given match threshold we calculate average precision (AP) by integrating the recall vs precision curve for recalls and precisions > 0.1. We finally average over match thresholds of {0.5, 1, 2, 4} meters and compute the mean across classes.

The paper says something similar.

In my understanding you get one precision and recall value for one algorithm (with a fixed set of parameters) and one distance threshold during matching.
To get several values you would need a parameter you can tune.
Am I on the wrong track here or which parameter do you tune?

take a look at the precision-recall curves on the top left at: (see for more).
Each distance threshold corresponds to one curve, so let’s pick one and ignore the others here.
The parameter to tune in a precision-recall curve is related to the classification score. Let’s call it x. If x = 0.500, then all predictions below that are negatives (for a given class) and all above are positive. So you compute precision and recall and plot them in the curve. Then you go to the next threshold x = 0.501 etc… The result is the above curve. Average Precision is the integral (although we ignore low recall and precision regions to avoid outliers).

Just to be on the save side.

By classification score you mean some kind of confidence, right?
In other words, the detection or classification algorithm needs to return a value (from zero to one) of how certain it is of a predicted classification?!?

And you threshold this score to get a binary value (e.g. if an algorithm says its confidence that an instance is a car is 0.6, and your current threshold is 0.5 -> then you assume the algorithm say “this is a car”).

Thanks for your clarification.

Almost right. Except that the confidence score does not have to be a probability and not even between 0 and 1. The only thing that matters is the ordering of the predictions.

You mean the ordering of the predictions for one instance at one time step?
For example an algorithm has to say for an instance of a car, that it’s most certain that the instance is a car - fairly certain that it is a human - and very uncertain that it is a traffic sign?

And how do you get from that ordering to a classification score you can threshold?

You sort the predictions for All instances.
The classification score determines the ordering.
You pick all possible thresholds, either by moving on a fixed grid or by taking all thresholds that you have predictions for.
Please refer to a tutorial on Average Precision for more information.