When the Crazyflie was created the intended use case was manual flight with one drone. Over the years we have added support for positioning, swarms, autonomous flight and all sorts of nice features, and it has all been built on top of the original code base. Some of the original code is actually untouched after 7-8 years and needless to say, there is a slight worry that we might have taken design decisions back then that will come back and bite us in new use cases. This blog post will outline some of the work we have been doing to handle this problem by setting up an autonomous system where a Crazyflie is continuously flying – the infinite flight project.
The original design essentially assumed one Crazyflie that is controlled by one computer with one Crazyradio. On the computer the user was running the python client and controlling the flight manually with a game pad. The user might have restarted the Crazyflie before each flight, or at least when changing battery.
Now fast forward to the current situation where a swarm of Crazyflies might be controlled by multiple radios, each connecting to multiple Crazyflies. The Crayflies are flying autonomously, perhaps getting their current position from the Lighthouse system, or via radio based on information from a mocap system. Maybe telemetry data is sent back to the ground while commands for trajectories to the Crazyflies go in the other direction. In some systems the Crazyflies use wireless charging to be able to run continuously.
Obviously the current situation is very different from the original design with new or changed requirements. One is the extended use of the radio, and this is something that we have been talking about in some previous blog posts and we will not discuss that here. This blog post will instead be about one other important topic: long term stability.
In the original design, the Crazyflie was restarted often, maybe before each flight. This means that the code did not run for a very long time, so what happens if we use wireless charging and keep the firmware running for days? Will there be a problem? We decided to find out by starting an internal project we called “Infinite flight”. The idea was to set up a system with a Crazyflie with a Qi-charger for wireless charging and a Lighthouse deck for positioning. An app in the Crazyflie takes off, flies a trajectory and lands for recharging when the battery is out, the cycle is then repeated for as many times as possible. By doing this, we hoped to find any software problems in the firmware that might show up after some time, finding hardware that is worn out over time or other finding other issues. Spoiler alert: we have not reached infinity yet, but we have got a bit closer :-)
The setup is fairly straight forward, the firmware is based on the app used in the demo we used at IROS and ICRA, with some modifications. We have a ground station computer that collects data, it tries to continuously maintain a connection and re-establish it if it is lost. We log as much as possible to be able to analyse problems and understand what happened. We also added some tools to make it easier to visualize and dig in all the log data. The usual work flow has been to
- Start the Crazyflie and run the app
- Wait for something to go wrong (sometimes days)
- Analyze what happened and figure out if something needs to be changed
- Update and start over again
A surprising number of runs failed fairly quickly, only after a few flights. The reason has usually been some sort of handling error or problems with the test software, but some have been of more general interest or bugs.
We had a problem where the logs from the Crazyflie in the ground computer stopped without any apparent reason. It turned out to be related to the session-less nature of the CRTP protocol, there is no good way to determine if a session is alive or not, other than using a timeout. It turned out that there is a timeout in the Crazyflie firmware (that was not fresh in our memories) that stops logging if no packets are received for a while. The rationale is to avoid having old logs running if a client is disconnected. In our case we lost communication for a short period of time and the firmware simply stopped the logging. The python lib on the other hand had a longer timeout and did not have the view that the connection was lost. The solution we used is to set up the logs again if we don’t receive logging for a while (we fixed an issue in the python lib related to this)
In the future com stack we plan to have proper session handling which should remove this problem.
We have been using various flavors of Crazyflies in the tests, including some prototypes. We had some issues with one prototype that we did not understand, it had a hard time hitting the landing pad when landing. It turned out that the STM that was used in the prototype was reused from an old Crazyflie and it had some weird PID controller settings stored as persistent parameters. With the persistent parameters cleared, it worked as expected.
We have played a bit with tuning the controller, but the default settings are fairly OK and we used them most of the time.
Kalman estimator rate warning
The kalman estimator issued warnings that the rate was too high (!) after around 9 hours. This turned out to be a bug related to using a float for the time calculation, leading to a rounding error when the system had been running for more than 0x2000000 ticks.
Some of the prototypes we used had some glitches or irregularities, it is very hard to hand solder PCBs. These problems are only related to a specific hardware individual and can cause some unexpected behavior which takes time to figure out.
Landing pad edge
The landing/charging pad we use (also used in our demos) generally works fine, it has a “slope” towards the center which helps the Crazyflie slide to the correct position. If the Crazyflie miss the landing too much though, it will end up on the edge with one leg on the ground and an angle away from the center. In this case it sometimes fails to take off properly and crash. We solved this by adding a foam pad around the landing pad to “raise” the surrounding floor the the same level as the landing pad and thus reduce angle.
There is a known bug in the lighthouse deck that prevents it from receiving data at certain angles. We had some cases where our landing pad was located in a spot where we lost tracking of both the active base stations for some yaw angles. If the Crazyflie happened to land in that exact yaw angle, it lost the position while charging and it did not know its current position when it should take off again. This was solved by moving the landing pad to a different position.
Not yet investigated possible problems
One possible known problem is the system tick counter in FreeRTOS. The counter is a 32-bit unsigned word and it is increased every millisecond. The counter is used in the firmware as the internal “clock” to determine for instance how long ago the estimator was updated or to determine when to execute some piece of code the next time. This counter will wrap after 2^32 ms, that is around 7 weeks, and we don’t really know what will happen.
Results and conclusions
The longest we managed to keep the system running was 5 days. We had a slow charging cycle and only flew 57 times. In this session, the flight time was very stable between 5:30 and 5:45 for all flights. This test was done with a standard Crazyflie 2.1 with a motor upgrade kit.
The second longest session was 3 days, in this case we used a brushless prototype and pushed the charging very hard. We ended up doing 276 flights but the battery was pushed beyond specs and being too warm and charged too fast, it degraded over time and the flight time was reduced to only 2-3 minutes at the end.
We believe we have fixed most of the long term stability issues, but it is hard to know. There might be bugs lurking in the firmware that only show up under very special conditions. What we do know is that it is possible to fly for 5 days!