Author: Rik

Hi everyone,

I’m Rik, and today marks my return to Bitcraze. Some of you might remember me from a couple of years back when I spent a few months here as an intern. Perhaps you’ve even stumbled upon my guest blog post discussing the paper that concluded my master’s degree. I’m thrilled to be back in the fold!

Picture of Rik smiling against the Bitcraze logo

It’s exciting to be back in Sweden. I love to cook, and although traditional Swedish and Dutch cuisine share quite some similarities (a lot of potatoes and gravy), it’s nice to try out new ingredients and foods. Additionally, being close to the great outdoors is a major draw for me. I already went on my first hike and I heard there are some nice bouldering areas not so far from Malmö.

One of the things that initially attracted me to Bitcraze are its close ties with the research community. In fact, it was through my own research as a master’s student that I first encountered the company. I’m eager to deepen these connections further and collaborate with researchers across various disciplines. Combining this with a team of talented colleagues and a vibrant enthusiast community makes it a great opportunity to learn!

I’m particularly drawn to Bitcraze’s unique organizational ethos. The emphasis on self-organization and collective success resonates with me. The idea of continuously shaping our workplace to reflect our values truly excites me.

See you around!

This week’s guest blogpost is from Rik Bouwmeester from the Micro Air Vehicle lab, Faculty of Aerospace Engineering at the Delft University of Technology.

Tiny quadcopters like the Crazyflie can be operated in narrow, cluttered environments and in proximity to humans, making them the perfect candidate for search-and-rescue operations, monitoring of crop in a greenhouse, or performing inspections where other flying robots cannot reach. All these applications benefit from autonomy, allowing deployment without proximity to a base station or human operator and permitting swarming behavior.

Achieving autonomous navigation on nano quadcopters is challenging given the highly constrained payload and computational power of the platform. Most attention has been given to monocular solutions; the camera is a lightweight and energy-efficient passive sensor that captures rich information of the environment. One of the most important monocular visual cues is optical flow, which has been exploited on MAVs with higher payload for obstacle avoidance [1], depth estimation [2] and several bio-inspired methods for autonomous navigation [3–7].

Optical flow describes the apparent visual variations caused by relative motion between an observer and their surroundings. This rich visual cue contains tangled information of velocity and depth. However, calculating optical flow is expensive. The field of optical flow estimation is and has been for a couple of years dominated by convolutional neutral networks (CNNs). Despite efforts to find architectures of reduced size and latency [8-10], these methods are still highly computationally expensive, running at several to tens of FPS on modern desktop GPUs and requiring millions of parameters to run, rendering them incompatible with edge hardware.

To this end, we present “NanoFlowNet: Real-Time Dense Optical Flow on a Nano Quadcopter”, submitted to an international robotics conference, which introduces NanoFlowNet, a CNN architecture designed for real-time, fully on-board, dense optical flow estimation on the AI-deck.

CNN architecture

We adopt semantic segmentation CNN STDC-Seg [11] and modify it for optical flow estimation. The resulting CNN architecture may be considered “real-time” on desktop hardware, for deployment on edge devices such as a nano quadcopter the net must be significantly shrunk. We improve the latency of the architecture in three ways.

First, we redesign the key convolutional modules of the architecture, the Short-Term Dense Concatenate (STDC) module. By reordering the operations within the strided variant of the module, we save, depending on the location of the module within the architecture, from over 10% to over 50% of the MAC operations per module, while increasing the number of output filters with large receptive field size. A large receptive field size is desirable for optical flow estimation.

Second, inspired by MobileNets [12], we globally replace ‘regular’ convolutions with depthwise separable convolutions. Depthwise separable convolutions factorize a convolution into a depthwise and pointwise convolution, effectively reducing the calculational expense at a cost in representational capacity.

Third, we reduce the input dimensionality. We train and infer network on grayscale input images, reducing the required on-board memory for storing images by a factor 2/3. Any memory saved on the AI-deck’s L2 memory can be handed to AutoTiler for storing the CNN architecture, speeding up the on-board execution. Requiring more of a speed-up, we run the CNN on-board at a reduced input resolution of 160×112 pixels. Besides the speed-up through saved L2, reducing the input resolution makes all operations throughout the network cheaper. We downscale training data to closely match the target resolution. Both these changes come at a loss of input information. We will miss out on small objects and small displacements that are not captured by the resolution.

To give some intuition of the available memory: Estimating optical flow requires two input images. Storing two color input images at full resolution requires (2 x 324x324x3=) 630 kB. The AI-deck has 512 kB of L2 memory available.

Motion boundary detail guidance

Inspired by STDC-Seg, we guide the training of optical flow with a train-time-only auxiliary task to promote the encoding of spatial information in the early layers. Specifically, we introduce a motion boundary prediction task to the net. The motion boundary ground truth can be found in the optical flow datasets. This improves performance by 0.5 EPE on the MPI Sintel clean (train) benchmark, at zero cost to inference latency.

Performance on MPI Sintel

Given the scaling and conversion to grayscale of input data, our network is not directly comparable with results reported by other works. For comparison, we retrain one of the fastest networks in literature, Flownet2-s [13], on the same data. Given the reduction in resolution, we drop the deepest two layers to maintain a reasonable feature size. We name the model Flownet2-xs.

We benchmark the performance of the architecture on the optical flow dataset MPI Sintel. NanoFlowNet performs better than FlowNet2-xs, despite using less than 10% of the parameters. NanoFlowNet achieves 5.57 FPS on the AI-deck. FlowNet2-xs does not fit on the AI-deck due to the network size. To put the achieved latency of NanoFlowNet in perspective, we execute FlowNet2-xs’ first two convolutions and the final prediction layer on the GAP8. The three-layer architecture achieves 4.96 FPS, which is slower than running the entire NanoFlowNet. On a laptop GPU, the two architectures accomplish similar latency.

MethodMPI Sintel (train) [EPE]Frame rate [FPS]Parameters
CleanFinalGPUGAP8
FlowNet2-xs9.0549.4581501,978,250
NanoFlowNet7.1227.9791415.57170,881
Performance on MPI Sintel (train subset). (Average) end-to-end Point Error (EPE) describes how far off the estimated flow vectors are on average, lower is better.

Obstacle avoidance implementation

We demonstrate the effectiveness of NanoFlowNet by implementing it in a simple, proof-of-concept obstacle avoidance application on an AI-deck equipped Crazyflie. We let the quadcopter fly forward at constant velocity and implement the horizontal balance strategy [14], [15], where the quadcopter balances the optical flow in the left and right half plane by yawing.

We equip a Crazyflie with the Flow deck for positioning only. The total flight platform weighs 34 grams.

We augment the balance strategy by implementing active oscillations (a cyclic up-down movement), resulting in additional optical flow generated across the field of view. This is particularly helpful for avoiding obstacles in the direction of horizontal travel, since no optical flow is generated at the focus of expansion.

The obstacle avoidance implementation is demonstrated in an open and a cluttered environment in ‘the Cyber Zoo’, an indoor flight arena at the faculty of Aerospace Engineering at the Delft University of Technology. The control algorithm is most robust in the open environment, with the quadcopter managing to drain a full battery without crashing. In the cluttered environment, performance is more variable. Especially on occasions where obstacles are close to one another, the quadcopter tends to avoid the first obstacle successfully, only to turn straight into the second and crash into it. Adding a head-on collision detection based on FOE detection and divergence estimation (e.g., [7]) should help avoid obstacles in these cases.

Successful run in a cluttered environment in the Cyber Zoo. The Crazyflie manages to avoid collision until the battery is drained.

All in all, we consider the result a successful demonstration of the optical flow CNN. In future work, we expect to see applications that take more advantage of the resolution of the flow information.

Citation

Bouwmeester, R. J., Paredes-Vallés, F., De Croon, G. C. H. E. (2022). NanoFlowNet: Real-time Dense Optical Flow on a Nano Quadcopter. arXiv. https://doi.org/10.48550/arXiv.2209.06918

References

[1] Gao, P., Zhang, D., Fang, Q., & Jin, S. (2017). Obstacle avoidance for micro quadrotor based on optical flow. Proceedings of the 29th Chinese Control and Decision Conference, CCDC 2017, 4033–4037. https://doi.org/10.1109/CCDC.2017.7979206

[2] Sanket, N. J., Singh, C. D., Ganguly, K., Fermuller, C., & Aloimonos, Y. (2018). GapFlyt: Active vision based minimalist structure-less gap detection for quadrotor flight. IEEE Robotics and Automation Letters, 3(4), 2799–2806. https://doi.org/10.1109/LRA.2018.2843445

[3] Conroy, J., Gremillion, G., Ranganathan, B., & Humbert, J. S. (2009). Implementation of wide-field integration of optic flow for autonomous quadrotor navigation. Autonomous Robots, 27(3), 189–198. https://doi.org/10.1007/s10514-009-9140-0

[4] Zingg, S., Scaramuzza, D., Weiss, S., & Siegwart, R. (2010). MAV navigation through indoor corridors using optical flow. Proceedings – IEEE International Conference on Robotics and Automation, 3361–3368. https://doi.org/10.1109/ROBOT.2010.5509777

[5] De Croon, G. C. H. E. (2016). Monocular distance estimation with optical flow maneuvers and efference copies: A stability-based strategy. Bioinspiration and Biomimetics, 11(1). https://doi.org/10.1088/1748-3190/11/1/016004

[6] Serres, J. R., & Ruffier, F. (2017). Optic flow-based collision-free strategies: From insects to robots. Arthropod Structure and Development, 46(5), 703–717. https://doi.org/10.1016/j.asd.2017.06.003

[7] De Croon, G. C. H. E., De Wagter, C., & Seidl, T. (2021). Enhancing optical-flow-based control by learning visual appearance cues for flying robots. Nature Machine Intelligence, 3(1), 33–41. https://doi.org/10.1038/s42256-020-00279-7

[8] Ranjan, A., & Black, M. J. (2017). Optical flow estimation using a spatial pyramid network. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, 2720–2729. https://doi.org/10.1109/CVPR.2017.291

[9] Hui, T. W., Tang, X., & Loy, C. C. (2018). LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 8981–8989. https://doi.org/10.1109/CVPR.2018.00936

[10] Sun, D., Yang, X., Liu, M. Y., & Kautz, J. (2017). PWC-Net: CNNs for Optical Flow Using Pyramid, Warping, and Cost Volume. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 8934–8943. https://doi.org/10.1109/CVPR.2018.00931

[11] Fan, M., Lai, S., Huang, J., Wei, X., Chai, Z., Luo, J., & Wei, X. (2021). Rethinking BiSeNet For Real-time Semantic Segmentation. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 9711–9720. https://doi.org/10.1109/CVPR46437.2021.00959

[12] Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., Andreetto, M., & Adam, H. (2017). MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. In arXiv. arXiv. http://arxiv.org/abs/1704.04861

[13] Ilg, E., Mayer, N., Saikia, T., Keuper, M., Dosovitskiy, A., & Brox, T. (2017). FlowNet 2.0: Evolution of optical flow estimation with deep networks. Proceedings – 30th IEEE Conference on Computer Vision and Pattern Recognition, 1647–1655. https://doi.org/10.1109/CVPR.2017.179

[14] Souhila, K., & Karim, A. (2007). Optical flow based robot obstacle avoidance. International Journal of Advanced Robotic Systems, 4(1), 2. https://doi.org/10.5772/5715

[15] Cho, G., Kim, J., & Oh, H. (2019). Vision-based obstacle avoidance strategies for MAVs using optical flows in 3-D textured environments. Sensors, 19(11), 2523. https://doi.org/10.3390/s19112523

Hello everyone, my name is Rik and I’m Bitcraze’s new intern. I took up this internship as part of my MSc studies at the faculty of Aerospace Engineering at the Delft University of Technology. Over the past year I have worked with the AI-deck as part of my thesis work at the MAVLab, and I’m looking to contribute to Bitcraze with my practical knowledge of the platform.

For my MSc thesis I have primarily worked on the design of very small optical flow CNNs, capable of running on the GAP8 SoC, using a neural architecture search (NAS). I have implemented a pipeline on the AI-deck, feeding a stream of camera images into the CNN to determine optical flow on-board. One of the last remaining goals of my thesis work is to design an application which utilizes the resulting dense optical flow. In the meantime, the NAS is ever running to find the best possible architecture.

With my practical knowledge I hope to contribute to making the AI-deck an easier platform to work with. Of course, working at Bitcraze is a great learning opportunity, too. I’m already learning a lot about embedded systems and programming. After several years of studying, it’s great to get my first relevant working experience. And maybe most important of all, so far it has been a lot of fun working at Bitcraze and I expect to have a lot more of it. And yes, the falafel is really good.