Author: Arnaud

I am back from parental leave and during this time I tried not to think too much about Crazyflie-related things to get a little break. However, over time, while geeking around, I eventually ended-up back to Crazyflie and Crazyradio designing a new channel-hopping communication protocol. This will likely be the subject of a future blog post but for the time being I thought I could write a bit about how the current Crazyflie radio communication is working.

Protocol layers

Like Many protocols, the Crazyflie communication protocol is layered. This allows to plug different elements at each level. When it comes to a Crazyflie client talking to a Crazyflie over the radio, the layering looks as follow:

Services are high-level functionalities like log that allows to get values of Crazyflie variables at regular interval. At this level there is essencially an API with commands like addLogBlock.

CRTP is a protocol that encapsulates the commands for each sevices. It multiplexes packets on the link using port numbers, this is very similar to TCP/UDP port on a network, each service is listening and sending packet on a pre-specified port.

Finally the radio link only deals with raw packets. The role of the radio link is to deliver packets from the PC to the Crazyflie and vice-versa. At this level, we have many link implementations, the most used are the radio and the USB link but there also is a Serial link that uses the Crazyflie serial port.

Radio link

The radio link is currently implemented by the Crazyradio (PA) on the PC side and the Crazyflie on the other side. The Crazyradio uses a nordic semiconductor nRF24LU1 USB/Radio chip and the Crazyflie a nRF51822 MCU/Radio. This is importance since, while the nRF51 has a quite flexible radio, the nRF24 does use a standalone SPI radio that has most of the packet handling hard-coded to a protocol that nordic call Enhanced Shockburst (ESB).

The ESB protocol handles sending packet and receiving acknowledgement automatically. A packet is sent on a radio channel, using a 5 bytes address, and when this packet is received on the other side an acknowledgement is sent back using the same address on the same channel. Both the original packet and the acknowledgement can contain a payload.

To implement a bi-directional radio link, the crazyradio is the one sending packets and the Crazyflies continuously listens and, when receiving a packet, sends back an ack. We do use the payload capabilities of both packet we send and of the ack to implement an uplink (PC->Crazyflie) and downlink (Crazyflie->PC) data link.

Of course, one problem with this setup is that while the PC can decide to send a packet at any moment, the Crazyflie needs to wait for the PC to send a packet to have a chance to send one back in an ACK. To make sure the Crazyflie has enough opportunity to send packets back, we are sending packets regular interval to the Crazyflie even if there is no packet to be sent. This polling allows to implement a continuous downlink.

The most important to see here is that the radio link gives to the upper layer, the CRTP layer, the illusion of a full duplex link. On the radio side this is implemented by polling regularly for downlink packets.

Communicating with multiple Crazyflies

In order to communicate with multiple Crazyflie, we just send packet to each Crazyflies one after each-other. This way we give equal chance for each Crazyflie to send back packets and doing so we divide the available bandwidth between them.

The main advantage of the polling protocol versus a more traditional P2P protocol where the Crazyflies would send when they want, is that when using polling the Crazyradio is the master and we can guarantee that we are not introducing any packet collision when we communicate.

Limitation and future

One major limitation of the current protocol is that it communicates on a single channel and requires the user to set manually channel and address for each Crazyflies. This means that the user has to tinker with parameters to find a good channel and has to manually handle all addressing.

Another limitation is that the polling is done over USB by python or, in case of ROS/Crazyswarm, in C++. This adds the USB latency to the equation and complexifies the client implementation.

I have been working on defining a new protocol that would be implemented efficiently in a Crazyradio and that would implement addressing and channel hopping. The idea is to get closer to a connection style more like bluetooth low energy where you do not have to care about channels and setting address, you just connect your device. Unlike BLE though, this protocol will be optimized for low latency. Stay tuned, we will likely talk about that more in future posts :-).

We are happy to announce that we have gotten Crazyflie 2 to fly autonomously using the Lighthouse deck and Lighthouse V2 base-stations. This was a very requested features, and while this is not stable and ready to use yet, it is a great milestone toward Lighthouse V2 support.

There exists two incompatible versions of the Lighthouse positioning system. Version 1 was released with the original HTC Vive VR system. In this system base-station are using two rotating laser beam that sweeps the room, one horizontal and one vertical, and an omnidirectional synchronization flash to allow IR light receiver to be located in the room. One limitation of this version is that up to two base-station can be used and no more, this is mainly due to the fact that beam identification is done using a TDMA scheme: base stations switch-on their laser in a dedicated time-slot one after each-other and adding more time slots for more base-stations will greatly reduce the update rate of the system.

Lighthouse V2, was released with the HTC Vive PRO headset and is also used by the Valve Index. The big change is that laser sweeps now carries modulated data and that there is only one rotor with two angled slit instead of the two rotors for V1. The V2 sweep data is described as ‘Sync on beam’ and contains timing information of how long it has been since the synchronization event (ie. when the rotor crossed 0 degree). The sweep data also allows to identify the base-station that has transmitted the sweep. This removes the need for an omni-directional synchronization pulse and allows more than two base-station to operate at the same time in the same space, since their sweeps can now be identified and timed.

The lighthouse V2 system is very elegant and scalable. However, actually decoding the signal from the sweeps has taken a lot of time since it is not documented and we needed to find-out what the encoding actually was. There has been effort on the internet to understand how the system worked, the most useful one is this github ticket that goes from raw data acquisition to fully unlocking the beam encoding.

I have been working on-and-off for a long time on making an FPGA design for the lighthouse deck to acquire and decode Lighthouse V2. The main blocking point until now was that I had not been able to reliably acquire useful signal from the system in order to allow real-time decoding on the Crazyflie. Added to that, there was some inconsistency between what we though the system was doing and what we could gather from the base-stations debug console. Recently though, the last piece of the puzzle, was to discover that the beam encoding was not Manchester, as we though, but Bi-phase mark code FM1 (BMC). Once this decoding was used everything made sense and worked.

Added to that, I started using SpinalHDL instead of raw Verilog to write the FPGA design which allows for much quicker iteration, much less frustration, and it also allowed me to easily make the design multi-clock which is required to decode the BMC signal: the beam decoder runs at 48MHz, and the rest of the system works at 24MHz. This design is required since the FPGA we use in the lighthouse deck is not fast enough to run everything at 48MHz.

The result, is a new FPGA firmware for the lighthouse deck that receives, identify and decode Lighthouse V2 sweep signal and send them over to the Crazyflie. The Crazyflie still has a little pulse packing to do (putting together pulses from a single sweep received on multiple sensors) and then can use pulse timing information to calculate azimuth and elevation at which the base-station sees the Crazyflie. This information is the same as the one we get from Lighthouse V1 and so the same algorithm can be used to calculate the Crazyflie position.

I hacked a proof of concept was this last fun Friday and it flies!

If anyone is curious the code for this demo has been uploaded as an out-of-tree driver and the code for the FPGA parts is already in the lighthouse-fpga project. The current Crazyflie code is too incomplete to be usable, but it is a nice starting point if anyone wants to play with Lighthouse V2 and the Crazyflie right away ;-).

As a side note, the Bitcraze team will shrink temporary as I, Arnaud, will go in parental leave until mid-August. I look forward to this new adventure and I trust the lighthouse V2 development and the forum will be in good hands in my absence.

We are currently finishing production test design for a couple of expansion decks and we figured we never wrote about it and about the more general board production process. In this blog post we wanted to talk a bit about how we test boards in the productions phase, taking as an example the forthcoming active marker deck.

The active marker deck

When finalizing an electronic board, we send to the manufacturer documentation that allows to manufacture & assemble a, hopefully, functional board. Although we assume that the individual components are in working order, the problem is that the assembling is not always perfect, so we need to check that everything we do is actually working,. This is what the production test is solving.

The first thing is to find out what to test, for that we need a strategy. The strategy we have been using is to test every step where we have modified or work on: for example we will test all the connections we have soldered in the manufacturing process. We will normally not test all the functionalities of ready-made module. For example, following this strategy, we will usually test all communication interface we have cabled, but we will not test all functionalities of a microcontroller we solder on the board, these are deemed to be already tested and working by the microcontroller manufacturer. This step usually end up with an annotated schematic:

Annoted schematics of ActiveMarker Deck

Once we know what to test and roughly how to test it, we document a test rig that will be able to run the tests automatically. Some tests are generic and applicable to all our boards, for example we do test voltages with a multi-meter on every board that has a regulator. Some tests are very board specific. For example, for the active marker deck we want to test IR LEDs and an IR detector, we define a test rig that has reflector to reflect the LED to the detector and we will use the onboard detector to test the LEDs:

Simple block diagram of the test rig for the ActiveMarker Deck

We are normally using a Crazyflie on all our test rig, since it is usually possible to test all functionality from the deck port. We also try as much as possible to integrate the test software into the real software. For the active marker deck it meant adding 38KHz modulated output mode to the LEDs in order to emit a signal detectable by the detector, which will make it to the final firmware. Finally, we have a test software, running on the test computer, that uses the Crazyflie python lib to talk to the Crazyflie and run the test. The last step of all the test is to write the deck One Wire identification memory so that it can be detected by a Crazyflie.

Screenshot of the test program for the test engineer

From these specification, the manufacturer can then build a test rig and start testing boards, non-passing board will be re-worked until they pass or discarded.

Test rig for the Multi-ranger expansion deck

What we have learned in our years at Bitcraze is that testing phase is the most important part of the development process of PCB. Therefore, the earliest we already start thinking about the production tests in the board design, the more smooth the final phase of production of our new products will be.

We talked about it in a previous post, it is more than time to implement a higher abstraction layer for the Crazyflie firmware to make it easy to implement custom automations and programs on top of the flying platform. In this post we will try to explain the state of the art and where we are thinking of heading. This is mostly a request for comments and we are creating a github ticket to discuss about it.

DELFT – Zwerm Drones TU Delft. – FOTO GUUS SCHOONEWILLE

The out-of-tree build and P2P API presented in the previous post is a great start: it allows to make project on top of the Crazyflie firmware that can easily be maintained over time and to communicate directly between Crazyflie without having a PC in the loop. Though we have not completely solved or documented the API that can be called by the programs written on top of the Crazyflie, this is what the APP-layer is supposed to provide.

The current plan for the app layer is to make the same functionality that is available in the Python crazyflie lib API, accessible from within the Crazyflie firmware, using similar API calls. This way we get the possibility of prototyping functionality in python code on a remote machine, and when it is working, easily convert it to an app onboard. This is already implemented, in part, for the log and param API as well as for the low level parts of the commander. It has enabled us to write programs like the multiranger push demo and SGBA from Kimberly’s paper. The API is not yet documented properly and the function calls do not look like the ones on the python lib side at this time, but our intention is to converge the APIs over time.

We think that having the same level of functionality for Log, Param and Commander within a Crazyflie app, as in the python API, will already allow to implement a lot of onboard programs much more easily than has been possible until now. If there is anything else you think would be interesting to develop in this field, do not hesitate to drop a comment in this post or in the github issue.

Over time the scope of Crazyflie has changed a lot. At first, Crazyflie was “just flying” with the only possible control was attitude (roll, pitch, yaw) and thrust setpoint sent from the Radio. Soon after, autonomous flight was investigated, first by implementing position controller outside Crazyflie and then, over time, moving position control on-board and sending position or trajectory setpoint to the Crazyflie. Now that the Crazyflie has good position control, the next step is to implement autonomous behavior and until now the most practical way is to do this from code running in an external computer. Similarly to what happened in the history of position control (first off-board and now onboard), it needs to be possible implement autonomous behavior in the Crazyflie itself. This blog post is about two newly implemented capabilities that will allow to implement automation in the Crazyflie firmware in an easy and maintainable way, namely the ‘App layer’ and the P2P communication.

App layer

The “App layer” is a term we have been using internally in Bitcraze to describe a set of functionalities that would allow to implement code in the Crazyflie. This includes the infrastructure to compile and maintain external code running in the Crazyflie as well as a set of API to control flight and behavior from C code rather than from radio communication.

Last week we implemented the first step of the App layer: the infrastructure part. It is now possible to build the Crazyflie firmware out-of-tree. This means that it is now possible, from a project, to point to the Crazyflie firmware folder and to compile a firmware from the project folder without touching the Crazyflie firmware folder. Practically it allows to create a git-repos implementing custom firmware code that has Crazyflie firmware as a sub-repos. This makes the maintenance of custom firmware code much easier than maintaining a branch of the Crazflie firmware as previously required.

A second piece that has been implemented is the app entry-point. It allows to start running code by just creating an “appMain()” function. The function will be called from a dedicated FreeRTOS task after the Crazyflie has initialized and started. This should make it much easier to get started.

For an example, we have extracted the multiranger push demo into a standalone git repos. This demonstrate the implementation of autonomous behavior using these new infrastructures.

Peer to Peer communication

The Crazyflie has been used for many research related to swarming, some examples are the crazyswarm project or the work done by Carnegie Mellon University. However, it is now time to turn it up a notch. On the forum and on the Github repository, there has been several request of enabling direct peer to peer communication to the Crazyflies. Now we finally found time to work on it and implement some basic functionality on the NRF and STM side of the firmware.

Currently, it is possible to send and receive a P2P packet in broadcast mode from the STM directly (see how to do this in the documentation). This enables data to be send from one Crazyflie to another with a maximum data size of 60 bytes. We were able to stress-test this with our test rig, by sending broadcast messages in a round-robin-kind of fashion, where the broadcast message was transferred through 10 Crazyflies in 10-20 ms. Even-though the current implementation is for now very minimal, we were able to fix some existing issues in the radiolink framework.

We will not stop there, as we are hoping to implement a communication system similar to how the CTRP protocol has been implemented. We are getting a lot of help by our active community members, so check out this github issue to be up-to-date with the current discussion.

Working at Bitcraze means designing electronics as well as writing software for it, but there is also a lot of other things going-on and managing servers is one of them.

Originally, in 2011 we had a virtual server in the cloud (a VPS like it was called back then). We carefully setup and configured this server and we maintained it over the years. This was easy and served us well but it was also very ‘manual’: any updates where done on the production machine directly and we where doing all changes directly in the production machine.

In 2015, when Kristoffer joined Bitcraze, he revamped completly the server setup. Suddenly we had not only one server but two, one production and one staging. And all servers running on these machines where running in docker containers. This means that the servers where nothing more than a well configured machine to run docker, and all important software and configuration are in docker images and are launched using docker-compose. This is the architecture we are still using today, a great description was written by Kristoffer in a previous blog post.

The docker architecture has worked well but we still have one major problem, we still have servers to take care of. They are still configured manually, needs to be kept up to date and happy at all time. We have been wanting to get rid of these servers and “just” run our containers.

Now enter Kubernetes. Kubernetes is a container Orchestrator originally made by Google that abstracts the servers and allows to run containers independently of handling servers. We had heard of Kubernetes back when we started working with Docker but deemed it overkill: we did not want to setup and handle a cluster of server by ourselves and back then it was not clear that how long Kubernetes would be around. Over time though Kubernetes has had support by all major cloud providers and there is managed offers that would allow us to not handle any servers ourselves.

We are not there yet, but we have been working over the summer at converting our infrastructure to Kubernetes and we are very close to deployment. One of the main parts is our internal Jenkins build server that currently build and deploy services by ssh-in into servers. With Kubernetes this deployment phase will be much simpler and will allow us to update the website without downtime, this is a welcome functionality now that the documentation is moved from the wiki to the main website.

With lightweight Kubernetes distributions like K3S there is also an opportunity to run containers in more domains. For instance we have been talking about making automated hardware system test-bench for a while to test every commit against the real hardware. With K3S and a couple of raspberry-pi that could be achieved quite easily. This is a subject for future fun Fridays though … :-).

Two weeks ago we posted about the demo we did for our new office move-in party. There has been multiple requests to share the script but unfortunately this is a hacked old script that is not going to be useful at all as an example. So, last week, we made an example that could run a synchronized swarm sequence.

The example has been pushed in the example folder of the Crazyflie-lib-python project. It is called synchronizedSequence.py. Running this example unmodified with 3 Crazyflies in a positioning system will give you this result. (Like the previous demo, this was done in a lighthouse system.)

One of the key design of the example is that it is based on a single control loop that can be synchronized with an outside system: in this example, there is a simple sleep of one seconds between each step of the sequence but it could for example be changed into a midi clock receiver to synchronize the sequence with music.

The example was developed with the help of Victor, a student we have hired to help-out during the summer. He has then played around a little bit to make a 9 Crazyflies sequence that is more impressive:

I uploaded Victor’s sequence in a github gist as it can be good for inspiration. One bit of warning though: as is, the sequence contains some vertical movements that are quite aggressive and the part where Crazyflies fly directly on top of each-other is more to be considered as a stress test.

We have recently moved to a new bigger office. With the summer arriving in Sweden, it was time to organize a small move-in after-work party with friends and family. For the occasion we wanted to play around with a small swarm of Crazyflies and the new Lighthouse positioning. Time being a sparse resource, we setup the ICRA2019 demo in the flight lab so that we would be able to fly during the party. We also started looking at our old swarm show that we ran with the LPS a year ago to see if we could run it with Lighthouse:

The show was a essentially a sequence of setpoints sent from a python script and controlling 9 Crazyflies 2.1 equipped with Lighthouse deck on the top and led-ring deck on the bottom synchronized on music. We setup the Crazyflie in the Lighthouse positioning system and converted the script to use the high-level-commander GOTO setpoints. We look forward at trying more advance control problems like trajectories to make more impressive synchronized flight choreography in the future but for now it already look quite good even with only GOTOs:

While running our ICRA demo, we came across a bug in the Crazyflie python-lib radio handling, limiting the number of Crazyflie that could be controlled using one Crazyradio PA. Communication with many Crazyflies is crucial as flying swarms is becoming more of an interesting topic for research and education. So we decided to take the problem at hand and create a radio test-bench:

To make the test-bench we have attached 10 roadrunner boards to a plank of wood together with USB switches that can provide enough power to the roadrunners. We used the roadrunner because it is mechanically easier to use in this context and it has an identical architecture to the Crazyflie 2.1 when it comes to the radio implementation.

Initially we will use the test-bench to run test scripts that pushes the communication to its limits and that consistently test the communication stack functionalities. This should allow us to find bug and verify that we solve them as well as discovering and documenting limitations.

Eventually we want to connect a raspberry-pi to the test-bench and run tests for each commit and pull-request to the crazyflie-firmware, crazyflie2-nrf-firmware and crazyflie-lib-python projects. This will guarantee that we do not introduce new limitations in the communication stack. The test-bench will also be very useful in implementing new functionalities like direct crazyflie-to-crazyflie P2P communication.

As a final note, the Crazyswarm project is not affected by the Crazyflie-lib bug since it is using the C++ implemented crazyflie-ros driver. Hence Crazyswarm can control more Crazyflies per Crazyradio PA, so it is still the preferred way to fly a swarm mostly when using a motion capture system. Though, with the progress made on LPS and Lighthouse positioning, running swarms, using the python API directly is a probably a more lightweight alternative.

3 of us where at ICRA 2019 in Montreal last week, where we met a lot of interesting people and a lot of Crazyflie users. Thanks a lot to everyone that drop by our booth, and for the ones that missed it we are planning on being at iROS2019 later this year so we might see you there :-).

We have already described our demo in a previous post, now that we run it we can update on how it went. We are also updating the ICRA2019 page with the latest source code and information so that anyone interested can reproduce the demo.

In its final state at the conference, the demo contained 8 Crazyflies 2.1 equiped with Lighthouse deck and Qi charger deck. There were 8 3D-printed charging pads on the floor with Ikea Qi wireless chargers and two HTC Vive base stations (V1) on tripods. The full system was contained in a cage, built from 50 cm-long tubes or aluminium and nets.

The full setup of the booth took us about 4 hours, this included about 3 hours for the cage, 15 min for the demo including calibration of the lighthouse base-station geometry and the rest to fine-tune things. This is by far our best setup time, we still need to prettify the cage a bit and to make is easier to install, but we will most likely re-use this system for upcoming conferences.

In this demo we aimed at keeping a Crazyflie in the air at every moment, to do so we had a computer connected to all 8 Crazyflies sending to one of them the signal to start flying if no other where actually in the air flying a trajectory. The flight was completly autonomous as we explained in our previous blog post. We setup the Crazyflie to fly 2 cycles and then land, which increase the rate of swap and so increased the ‘action’, though it also meant that during the swap two Crazyflies where flying. This drained the batteries a bit more than expected and meant that after about an hour all the Crazyflies where bellow the take-off threshold and we had to wait ~30 seconds between flights. Here is a video of it in action:

The demo was very care-free, we had very few Crashes and we mostly restarted the Crazyflies to swap batteries manually to add a bit of power in the swarm. The last day we decided to spice it up a little bit by adding a chair in the cage and by calibrating the chair position and flight trajectory, we managed to have the Crazyflie partly fly under it. This worked quite well most of the time and showed that the lighthouse positioning is repeatable and works fairly well with short occlusion in the path. Though we also found out that even though a single Crazyflie would always fly the same trajectory, two different Crazyflies will not. We think differences in propeller stiffness and the fact that the our Mellinger position controller has not been calibrated for changing YAW are the main reasons.

If you want to know more about the demo or if you want to reproduce it do not hesitate to visit the ICRA 2019 page that explains it in more details and links to the source code of everything including 3D printed parts for the cage and the landing pads.