Location>code7788 >text

Original] PREEMPT-RT system cpu utilization cycle CPU soaring problem

Popularity:949 ℃/2024-11-17 18:22:29

PREEMPT-RT system cpu utilization cycle CPU spike problem

catalogs
  • PREEMPT-RT system cpu utilization cycle CPU spike problem
    • contexts
    • impunity
    • recurrence condition (math.)
    • rationale
    • cure

contexts

In 22 years of debugging PREEMPT-RT system problems, the previous articleA Short Note on CPU Performance Optimization - Using Flame Maps to Locate Performance ProblemsOnly one of the issues was positioned to be resolved, and there was an underlying issue that was not renewed at the time. However, the issue has been of widespread concern lately as it has appeared on almost all PREEMPT-RT products, affecting CPU throughput for non-real-time tasks. Therefore, it is necessary to briefly document this issue in the hope that it will be helpful.

This paper only describes the causes and conclusions and omits the problem localization process.

impunity

In some application scenarios of the PREEMPT-RT system, the CPU load of the entire system can suddenly spike for hundreds of milliseconds or even seconds after an interval, even if no specific application is running. The duration and intervals will vary on different machines.

regardless of whether you usetopneverthelesspidstatMaking observations, it was only possible to determine that system CPU usage spiked and that the threads in question were indeterminate and not related to specific threads.

recurrence condition (math.)

Find a machine with good real-time performance, either a PREEMPT-RT system or a xenomai+rtnet system, and create a high real-time task. The task sends broadcast frames periodically to the target machine using a raw socket, the period can be 500us, 1ms or 2ms, but the period of sending frames must be very accurate.

rationale

This problem is common to PREEMPT-RT (at least the kernels I've touched so far, from 3.2 to 5.10, have this problem), and it occurs when there is more than one external cyclic event in the whole system, for example, receiving cyclic Ethernet frames sent by the PLC, cyclic IO interrupts triggered by external FPGAs, interrupting sending and receiving of Ethernet frames after synchronizing the EtherCAT master to the reference clock of a slave, and so on. etc.

Since the external periodic event (interrupt) is based on a different clock source than the PREEMPT-RT system scheduling clock source, there is clock drift between the two clocks, the periodic event will cross the cycle with the system scheduling event of PREEMPT RT itself, and when the two events are gradually approaching, both events have to be processed, and frequent contexts lead to cpu spikes, and the jitter of the system's real-time tasks will be slightly increase. This is caused by the mechanism of the PREEMPT-RT system to sacrifice CPU throughput in order to ensure real-time external events.

cure

While there is no complete solution, try the following relief measures:

  • For single CPU core systems, the system tick cannot be turned off and there is no solution to this problem;
  • For SMP multicore systems, enableCONFIG_NO_HZ_FULL, reduce the system cycle Tick while setting the affinity of cycle event interrupts to enableCONFIG_NO_HZ_FULLand there are no cycle tasks running on the CPU to come up for mitigation.

For more on the Linux clock subsystem, see previous posts on this blogIntroduction to the linux time subsystem

In the next article, we will look at system crashes under network storms caused by the PREEMPT-RT real-time mechanism.