optical flow

This week’s guest blogpost is from Rik Bouwmeester from the Micro Air Vehicle lab, Faculty of Aerospace Engineering at the Delft University of Technology.

Tiny quadcopters like the Crazyflie can be operated in narrow, cluttered environments and in proximity to humans, making them the perfect candidate for search-and-rescue operations, monitoring of crop in a greenhouse, or performing inspections where other flying robots cannot reach. All these applications benefit from autonomy, allowing deployment without proximity to a base station or human operator and permitting swarming behavior.

Achieving autonomous navigation on nano quadcopters is challenging given the highly constrained payload and computational power of the platform. Most attention has been given to monocular solutions; the camera is a lightweight and energy-efficient passive sensor that captures rich information of the environment. One of the most important monocular visual cues is optical flow, which has been exploited on MAVs with higher payload for obstacle avoidance [1], depth estimation [2] and several bio-inspired methods for autonomous navigation [3–7].

Optical flow describes the apparent visual variations caused by relative motion between an observer and their surroundings. This rich visual cue contains tangled information of velocity and depth. However, calculating optical flow is expensive. The field of optical flow estimation is and has been for a couple of years dominated by convolutional neutral networks (CNNs). Despite efforts to find architectures of reduced size and latency [8-10], these methods are still highly computationally expensive, running at several to tens of FPS on modern desktop GPUs and requiring millions of parameters to run, rendering them incompatible with edge hardware.

To this end, we present “NanoFlowNet: Real-Time Dense Optical Flow on a Nano Quadcopter”, submitted to an international robotics conference, which introduces NanoFlowNet, a CNN architecture designed for real-time, fully on-board, dense optical flow estimation on the AI-deck.

CNN architecture

We adopt semantic segmentation CNN STDC-Seg [11] and modify it for optical flow estimation. The resulting CNN architecture may be considered “real-time” on desktop hardware, for deployment on edge devices such as a nano quadcopter the net must be significantly shrunk. We improve the latency of the architecture in three ways.

First, we redesign the key convolutional modules of the architecture, the Short-Term Dense Concatenate (STDC) module. By reordering the operations within the strided variant of the module, we save, depending on the location of the module within the architecture, from over 10% to over 50% of the MAC operations per module, while increasing the number of output filters with large receptive field size. A large receptive field size is desirable for optical flow estimation.

Second, inspired by MobileNets [12], we globally replace ‘regular’ convolutions with depthwise separable convolutions. Depthwise separable convolutions factorize a convolution into a depthwise and pointwise convolution, effectively reducing the calculational expense at a cost in representational capacity.

Third, we reduce the input dimensionality. We train and infer network on grayscale input images, reducing the required on-board memory for storing images by a factor 2/3. Any memory saved on the AI-deck’s L2 memory can be handed to AutoTiler for storing the CNN architecture, speeding up the on-board execution. Requiring more of a speed-up, we run the CNN on-board at a reduced input resolution of 160×112 pixels. Besides the speed-up through saved L2, reducing the input resolution makes all operations throughout the network cheaper. We downscale training data to closely match the target resolution. Both these changes come at a loss of input information. We will miss out on small objects and small displacements that are not captured by the resolution.

To give some intuition of the available memory: Estimating optical flow requires two input images. Storing two color input images at full resolution requires (2 x 324x324x3=) 630 kB. The AI-deck has 512 kB of L2 memory available.

Motion boundary detail guidance

Inspired by STDC-Seg, we guide the training of optical flow with a train-time-only auxiliary task to promote the encoding of spatial information in the early layers. Specifically, we introduce a motion boundary prediction task to the net. The motion boundary ground truth can be found in the optical flow datasets. This improves performance by 0.5 EPE on the MPI Sintel clean (train) benchmark, at zero cost to inference latency.

Performance on MPI Sintel

Given the scaling and conversion to grayscale of input data, our network is not directly comparable with results reported by other works. For comparison, we retrain one of the fastest networks in literature, Flownet2-s [13], on the same data. Given the reduction in resolution, we drop the deepest two layers to maintain a reasonable feature size. We name the model Flownet2-xs.

We benchmark the performance of the architecture on the optical flow dataset MPI Sintel. NanoFlowNet performs better than FlowNet2-xs, despite using less than 10% of the parameters. NanoFlowNet achieves 5.57 FPS on the AI-deck. FlowNet2-xs does not fit on the AI-deck due to the network size. To put the achieved latency of NanoFlowNet in perspective, we execute FlowNet2-xs’ first two convolutions and the final prediction layer on the GAP8. The three-layer architecture achieves 4.96 FPS, which is slower than running the entire NanoFlowNet. On a laptop GPU, the two architectures accomplish similar latency.

MethodMPI Sintel (train) [EPE]Frame rate [FPS]Parameters
CleanFinalGPUGAP8
FlowNet2-xs9.0549.4581501,978,250
NanoFlowNet7.1227.9791415.57170,881
Performance on MPI Sintel (train subset). (Average) end-to-end Point Error (EPE) describes how far off the estimated flow vectors are on average, lower is better.

Obstacle avoidance implementation

We demonstrate the effectiveness of NanoFlowNet by implementing it in a simple, proof-of-concept obstacle avoidance application on an AI-deck equipped Crazyflie. We let the quadcopter fly forward at constant velocity and implement the horizontal balance strategy [14], [15], where the quadcopter balances the optical flow in the left and right half plane by yawing.

We equip a Crazyflie with the Flow deck for positioning only. The total flight platform weighs 34 grams.

We augment the balance strategy by implementing active oscillations (a cyclic up-down movement), resulting in additional optical flow generated across the field of view. This is particularly helpful for avoiding obstacles in the direction of horizontal travel, since no optical flow is generated at the focus of expansion.

The obstacle avoidance implementation is demonstrated in an open and a cluttered environment in ‘the Cyber Zoo’, an indoor flight arena at the faculty of Aerospace Engineering at the Delft University of Technology. The control algorithm is most robust in the open environment, with the quadcopter managing to drain a full battery without crashing. In the cluttered environment, performance is more variable. Especially on occasions where obstacles are close to one another, the quadcopter tends to avoid the first obstacle successfully, only to turn straight into the second and crash into it. Adding a head-on collision detection based on FOE detection and divergence estimation (e.g., [7]) should help avoid obstacles in these cases.

Successful run in a cluttered environment in the Cyber Zoo. The Crazyflie manages to avoid collision until the battery is drained.

All in all, we consider the result a successful demonstration of the optical flow CNN. In future work, we expect to see applications that take more advantage of the resolution of the flow information.

Citation

Bouwmeester, R. J., Paredes-Vallés, F., De Croon, G. C. H. E. (2022). NanoFlowNet: Real-time Dense Optical Flow on a Nano Quadcopter. arXiv. https://doi.org/10.48550/arXiv.2209.06918

References

[1] Gao, P., Zhang, D., Fang, Q., & Jin, S. (2017). Obstacle avoidance for micro quadrotor based on optical flow. Proceedings of the 29th Chinese Control and Decision Conference, CCDC 2017, 4033–4037. https://doi.org/10.1109/CCDC.2017.7979206

[2] Sanket, N. J., Singh, C. D., Ganguly, K., Fermuller, C., & Aloimonos, Y. (2018). GapFlyt: Active vision based minimalist structure-less gap detection for quadrotor flight. IEEE Robotics and Automation Letters, 3(4), 2799–2806. https://doi.org/10.1109/LRA.2018.2843445

[3] Conroy, J., Gremillion, G., Ranganathan, B., & Humbert, J. S. (2009). Implementation of wide-field integration of optic flow for autonomous quadrotor navigation. Autonomous Robots, 27(3), 189–198. https://doi.org/10.1007/s10514-009-9140-0

[4] Zingg, S., Scaramuzza, D., Weiss, S., & Siegwart, R. (2010). MAV navigation through indoor corridors using optical flow. Proceedings – IEEE International Conference on Robotics and Automation, 3361–3368. https://doi.org/10.1109/ROBOT.2010.5509777

[5] De Croon, G. C. H. E. (2016). Monocular distance estimation with optical flow maneuvers and efference copies: A stability-based strategy. Bioinspiration and Biomimetics, 11(1). https://doi.org/10.1088/1748-3190/11/1/016004

[6] Serres, J. R., & Ruffier, F. (2017). Optic flow-based collision-free strategies: From insects to robots. Arthropod Structure and Development, 46(5), 703–717. https://doi.org/10.1016/j.asd.2017.06.003

[7] De Croon, G. C. H. E., De Wagter, C., & Seidl, T. (2021). Enhancing optical-flow-based control by learning visual appearance cues for flying robots. Nature Machine Intelligence, 3(1), 33–41. https://doi.org/10.1038/s42256-020-00279-7

[8] Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, 2720–2729. https://doi.org/10.1109/CVPR.2017.291

[9] Hui, T. W., Tang, X., & Loy, C. C. (2018). LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 8981–8989. https://doi.org/10.1109/CVPR.2018.00936

[10] Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2017). PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 8934–8943. https://doi.org/10.1109/CVPR.2018.00931

[11] Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., & Wei, X. (2021). Rethinking BiSeNet For Real-time Semantic Segmentation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 9711–9720. https://doi.org/10.1109/CVPR46437.2021.00959

[12] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. In arXiv. arXiv. http://arxiv.org/abs/1704.04861

[13] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). FlowNet 2.0: Evolution of optical flow estimation with deep networks. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655. https://doi.org/10.1109/CVPR.2017.179

[14] Souhila, K., & Karim, A. (2007). Optical flow based robot obstacle avoidance. International Journal of Advanced Robotic Systems, 4(1), 2. https://doi.org/10.5772/5715

[15] Cho, G., Kim, J., & Oh, H. (2019). Vision-based obstacle avoidance strategies for MAVs using optical flows in 3-D textured environments. Sensors, 19(11), 2523. https://doi.org/10.3390/s19112523