Who’s Afraid of Butterflies?

This website provides some additional experimental results for the paper Who’s Afraid of Butterflies: A Close Examination of the Butterfly Attack. Our evaluation goal is to demonstrate that (1) The Butterfly attack barely influences the existing jitter in real systems to a significant extent. (2) The jitters induced by such an attack have minimal impact on control jitters.

Evaluation on Ground Vehicle

Evaluation Setup

The experimental setup in the original Butterfly Attack paper assumes an architecture where multiple ECU functionalities (e.g., motor control, propulsion-control software, etc.) are on the same ECU. This is a futuristic setting with no existing automotive system adapting such a design to the best of our knowledge. Furthermore, there is no target software/hardware described in the original paper. Hence we aren’t able to realize the attack on actual hardware, software and physical platform. To make the best effort in recreating the experiment under the same spirit, we conduct this experiment using a ground vehicle with ArduRover, where all controls are on the same SoC.

We conducted the Butterfly attack on a ground vehicle powered by ArduRover 4.0 control software, wherein all controls operate on a single System-on-Chip (SoC). ArduRover is a part of the ArduPilot project and is specifically designed to control ground-based vehicles. The computing platform is Navio2, a board built on top of the Raspberry Pi 3b, complete with sensors and external peripherals such as radio control.

The core idea of the attack is to use a message to influence the entry task, and this is similar to the radio control message in ArduRover which is implemented as MAVLink message. Therefore in our experiment we use the MAVLink message as the attack vector for influencing the timing behavior of the entry task in the system.

Evaluation Result

The line in the figure is downsampled by 50, and the marker is downsampled by 200 to enhance the visualization.

Task Release Intervals

The attack we implemented is as follows. The entry task is handleMessage, which is a non-critical task responsible for processing Mavlink messages. To have a similar message pattern as CAN bus, we use a ground station to send Mavlink messages to ArduRover periodically. The adversary intentionally jams three out of every four messages, following the same method in original Butterfly paper, to induce jitters in the task handleMessage(). There are two main factors from the handleMessage() task that determine the degree of jitter: its period and computation time. Since the workload of handleMessage() stays fixed, we explore the impact at different frequencies (10Hz, 100Hz, and 500Hz). Note that most of the existing vehicles generally generate the heartbeat message at 10 Hz~\cite{autoware_git}, therefore the settings under investigation is advantageous for the attacker. The CPU time consumed is 0.00125, 0.11%, and 0.48% respectively. From the figure, we can observe: First, the task release intervals can be directly manipulated by attackers, demonstrating the attacker’s capability to execute a predictable attack. Second, while the CPU consumption remains relatively low, it increases linearly with the message frequency. This linear relationship can be attributed to the predictable and straightforward workload associated with handling Mavlink messages.

Control Output Intervals

In ArduRover, Rover::set_servos() sends the control output at a frequency of 400Hz. To assess its sensitivity to jitters introduced by the handleMessage() task, we measured its task release time across our three experimental scenarios.

Under normal conditions, the average interval is 2.49 ms with a variance of 0.0063 ms. For the three different settings, the average intervals are 2.49 ms (variance: 0.0022 ms), 2.49 ms (variance: 0.0042 ms), and 2.48 ms (variance: 0.0081 ms), respectively.

Actual Control Error

The above figure shows the average control errors in lateral acceleration under both attack and baseline conditions. Lateral acceleration is used as a representative control state due to its direct impact on safety, and it is also the metric with the largest deviation in our experiments, which represents a worst case from our empirical study. Under attack conditions, the average error is 0.00237 m/s^2 with a variance of 0.000037m/s^2. In contrast, the baseline condition exhibits an error of 0.00223 m/s^2 with a variance of 0.000031m/s^2. This represents a 6.3% increase, yet it wasn’t significant enough to cause the control system failure. To further understand the feasibility, a jitter of 0.83 ms is manually injected into the control output task (0.83 ms is one third of the period 2.5 ms). It is selected since the original attack on automotive injects a jitter of 10 ms to a victim task with 30ms period. The attack resulted in an average error of 0.0040 m/s^2, which was still not significant enough to cause the vehicle to deviate from the original trajectory.

Actual Path

The trajectories for the execution of the same mission with and without attacks are shown in the following figure. From these experiments, we find that the existing design and deployment of the control software are incredibly resilient to computational jitter. Though the attacks are highly novel, it can be challenging to realize the exploitation in some systems

Evaluation on Ardupilot Drone

Evaluation Setup The experiments were conducted using an actual drone equipped with Ardupilot 4.0, and featuring a Navio2 as the computing unit. We follow the same experimental setup in Butterfly attack paper by jamming the GPS message 3 over every 4 to create jitters on task run_nav_update. As we were unable to identify the precise vulnerable code section within the drone’s software but found it within the ground station’s code in version Copter-4.0, we have chosen this version as our target software for the study.

Evaluation Result

For better visualization, the plotted lines in figures below are downsampled by 100, and the marker is downsampled by 400.

Task Release Intervals.

Without the attack, the average interval is 20.0051 ms, with 99.71% of the intervals lie within one variance (0.2227 ms). When the system is under attack, the average interval is 20.016 ms, with 99.88% of the intervals are also within one variance (0.6478 ms). The difference as compared to the previous study is likely due to the difference in the target software. In Butterfly attack, the task is assumed to have an execution time of 4 ms with period of 5 ms, while in ArduPilot 4.0 Copter configuration, the average execution time of the task is 56.6231 us with a period of 20,000 us.

Control Output Intervals

To understand the feasibility of the second step, we manually inject a 60 ms jitters into the task update_GPS(), since the jitters from step 1 was not large enough in our evaluation platform. The 60 ms was chosen because it is three times its period 20 ms, consistent with original Butterfly attack, where the jitter is 15ms in the entry task update_GPS()(three times of the period of 5ms). The resulting control command output intervals are shown in the following figure. Under normal conditions, the average interval is 2.5080 ms with a variance of 0.4266 ms, and 99.95% of the intervals fall within one variance. In the attack scenario, the average interval is 2.5083 ms with a variance of 0.4449 ms, and 99.95% of the intervals also fall within one variance. From the experiment, we observe no significant difference in the control intervals, prompting a closer examination at the root cause. It turns out in ArduPilot 4.0, the output control command is executed prior to the task update_GPS() within the main control loop, therefore control has no direct dependency on update_GPS().

Control Errors

The translation from victim task jitter to physical state deviation depends highly on the robustness of the control algorithm. To gain a better understanding of the feasibility of this step upon a physical platform, a 1 ms jitter is manually injected into the control task fast_loop (since the second step did not result in substantial jitter). The 1 ms jitter was chosen in our target system, where fast_loop runs at 400Hz with a period of 2.5 ms, to be consistent with the original attack, where a 4 ms jitter is injected into a task with a period of 10 ms. The average errors under normal conditions and during an attack are 0.0047m/s with variance of 0.0000149m/s and 0.0048m/s with variance of 0.0000128m/s, respectively. Regarding the injected jitter, it causes an average error of 0.0049m/s with variance of 0.0000134m/s.

Actual Path

The trajectories for the execution of the same mission with and without attacks are shown as follows: