This page contains the experiments and some source code for the approach described in the work Exploring Representaion Learning for Frame to Frame Ego-Motion Estimation.
This section show the results on the three KITTI test sequences transformed in contrast and gamma values. The three set of parameters used for these transforms are:
Darkened sequences simulate dusk conditions. Lowering contrast makes more difficult for feature extractors to find corners, still with these values of contrast and gamma crisp shadows are still recognizable. Compared to standard sequences, the darkened PCNN sequence error on average is 1.39% higher, while the SVR-S is 6.53% higher and the VISO2 is 4.99% higher. When we look at the single trajectories, it is possible to notice that the decrease in performances is higher in the 09 and 10 sequences. We suppose that this is due to the higher field-depth and linear speeds, that make these sequences more challenging.
VISO2-M | SVR VO Sparse | SVR VO Dense | PCNN | |||||
---|---|---|---|---|---|---|---|---|
Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | |
08 | 26.33 | 0.0389 | 41.81 | 0.1114 | 18.06 | 0.0490 | 8.45 | 0.0249 |
09 | 13.64 | 0.0357 | 19.88 | 0.0669 | 34.30 | 0.0550 | 11.03 | 0.0338 |
10 | 22.74 | 0.0352 | 29.28 | 0.0670 | 25.49 | 0.0646 | 20.03 | 0.0458 |
Avg | 23.54 | 0.0387 | 35.04 | 0.1005 | 20.34 | 0.0545 | 10.28 | 0.0300 |
Trajectories for baseline and proposed methods on sequences darkened with 0.4 maximum contrast and 1.5 of gamma value.
Average errors across length and speed on the darkened sequences.
Example of the darkened images for sequences 08, 09 and 10.
Darkened 2 sequences simulate night conditions. At these levels of contrast and gamma the shadows in the images are very dark and a lot of small details are lost. Clearly these transforms are only an approximation of what happens with low-light vision, but they give an insight of what the estimator algorithms do when there is a comparable decrease of detail. With these sequences we see a stark difference between PCNN and other methods. SVR-S has the lowest performance, probably becouse of a very simple Lucas-Kanade sparse feature extraction. However SVR-D and VISO2 have a traslational error that is near doubled in respect to the PCNN's one, and a rotational error that is between 30-40% higher.
VISO2-M | SVR VO Sparse | SVR VO Dense | PCNN | |||||
---|---|---|---|---|---|---|---|---|
Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | |
08 | 37.82 | 0.0493 | 52.96 | 0.1479 | 30.18 | 0.0784 | 14.53 | 0.0366 |
09 | 30.18 | 0.0537 | 26.26 | 0.0842 | 23.66 | 0.0773 | 15.82 | 0.0458 |
10 | 25.97 | 0.1305 | 38.88 | 0.1024 | 24.36 | 0.0546 | 18.53 | 0.0464 |
Avg | 35.28 | 0.0610 | 44.61 | 0.1340 | 28.10 | 0.0792 | 15.25 | 0.0413 |
Trajectories for baseline and proposed methods on sequences darkened with 0.6 maximum contrast and 5 of gamma value.
Average errors across length and speed on the darkened sequences.
Example of the darkened images for sequences 08, 09 and 10.
Lightened sequences simulate high light conditions thanks to low value of gamma correction. These images have also very low contrast (min 0.2, max0.7), so they are particularly challenging. The highest performance issues are for VISO2 on sequence 10, where it fails to extract enough features in many frames, so the error is huge. As with the preceiding examples, the behaviour of PCNN is better than SVM and VISIO2.
VISO2-M | SVR VO Sparse | SVR VO Dense | PCNN | |||||
---|---|---|---|---|---|---|---|---|
Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | |
08 | 32.24 | 0.0378 | 46.51 | 0.1072 | 19.90 | 0.0595 | 10.16 | 0.0294 |
09 | 18.71 | 0.0268 | 22.85 | 0.0752 | 24.36 | 0.0491 | 20.08 | 0.0391 |
10 | 91.36 | 0.0541 | 43.73 | 0.1294 | 22.47 | 0.0734 | 21.02 | 0.0460 |
Avg | 36.83 | 0.0380 | 40.45 | 0.1059 | 21.31 | 0.0617 | 13.51 | 0.0343 |
Trajectories for baseline and proposed methods on sequences lightened with 0.2 minimum contrast, 0.7 maximum contrast and 0.2 of gamma value.
Average errors across length and speed on the lightened sequences.
Example of the lightened images for sequences 08, 09 and 10.
This section show the results on the three KITTI test sequences transformed in blur. The two blur radii used are 3 and 10 pixels.
These sequences are blurred with a small radius of 3 pixels. The motre striking result is that the effects of this blur are slightly beneficial to PCNN for the translational errors, while for the other methods are not. In detail, the decrease in error in respect to standard sequences is -0.33%, while VISO2 and SVR-S have an increase of +4.99% and +8.07%. The average rotational errors are +11% +106% 2.92% respectivlely for PCNN, SVR-S and VISO2, showing a higher under-performance of sparse SVR on this sequences.
VISO2-M | SVR VO Sparse | SVR VO Dense | PCNN | |||||
---|---|---|---|---|---|---|---|---|
Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | |
08 | 23.70 | 0.0431 | 25.23 | 0.0674 | 14.41 | 0.0386 | 7.41 | 0.0229 |
09 | 10.99 | 0.0317 | 12.43 | 0.0438 | 21.99 | 0.0332 | 6.74 | 0.0253 |
10 | 25.83 | 0.0454 | 20.09 | 0.0463 | 26.74 | 0.0590 | 19.35 | 0.0380 |
Avg | 23.54 | 0.0387 | 21.88 | 0.0623 | 17.52 | 0.0414 | 8.63 | 0.0262 |
Trajectories for baseline and proposed methods on sequences blurred with radius 3 pixels.
Average errors across length and speed on the radius 3 blurred sequences.
Example of the radius 3 blurred images for sequences 08, 09 and 10.
These sequences are blurred with a radius of 10 pixels. The results show only a slight increase in error for the dense methods, and again PCNN performs better, showing that the feature it learns are robust to high levels of blur. However, SVR-S and SVR-D are very similar in performances, and they are better than the blur s3 case. This suggest that there is something in SVR that helps in reducing the effects of high blur. The errors of VISO2 on highly blurred images are more than doubled.
VISO2-M | SVR VO Sparse | SVR VO Dense | PCNN | |||||
---|---|---|---|---|---|---|---|---|
Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | Trans [%] | Rot [deg/m] | |
08 | 53.25 | 0.0694 | 20.00 | 0.0493 | 14.81 | 0.0395 | 7.41 | 0.0229 |
09 | 38.02 | 0.0593 | 13.87 | 0.0503 | 22.06 | 0.0372 | 11.80 | 0.0350 |
10 | 82.37 | 0.2021 | 19.15 | 0.0514 | 26.61 | 0.0621 | 19.87 | 0.0416 |
Avg | 54.32 | 0.0856 | 18.66 | 0.0519 | 17.96 | 0.0433 | 9.82 | 0.0286 |
Trajectories for baseline and proposed methods on sequences blurred with radius 10 pixels.
Average errors across length and speed on the radius 10 blurred sequences.
Example of the radius 10 blurred images for sequences 08, 09 and 10.
In the following links you find the tar.gz archives containing the code we used, the HDF5 datasets with the optical flow images (for the unmodified kitti images) and the network weights. If you find the code useful for your research, please cite the two works related to it as:
@ARTICLE{Costante2016,
author={G. Costante and M. Mancini and P. Valigi and T. A. Ciarfuglia},
journal={IEEE Robotics and Automation Letters},
title={Exploring Representation Learning With CNNs for Frame-to-Frame Ego-Motion Estimation},
year={2016},
volume={1},
number={1},
pages={18-25},
doi={10.1109/LRA.2015.2505717},
month={Jan}
}
@ARTICLE{Ciarfuglia2014,
author = "Thomas A. Ciarfuglia and Gabriele Costante and Paolo Valigi and Elisa Ricci"
title = "Evaluation of non-geometric methods for visual odometry ",
journal = "Robotics and Autonomous Systems ",
volume = "62",
number = "12",
pages = "1717 - 1730",
year = "2014",
note = "",
issn = "0921-8890",
doi = "http://dx.doi.org/10.1016/j.robot.2014.08.001",
}