Linux Scheduler Patches Aim To Address Performance Regression Since Last Year


LINUX KERNEL

A set of Linux kernel scheduler patches posted today are trying to address some performance regressions observed since the Linux 6.11 kernel that was released back in September 2024. These performance-fixing patches are flying under a “request for comments” flag and some of the regressions are tricky and perhaps not completely resolved, but it looks to be a step in the right direction.

Intel Linux engineer Peter Zijlstra posted the set of five scheduler patches today for trying to address some performance regressions. Peter commented on the Linux kernel mailing list:

“So [Chris Mason of Meta] poked me about how they’re having a wee performance drop after around 6.11. He’s extended his schbench tool to mimic the workload in question.



This benchmark wants to stay on a single (large) LLC. Both the machine Chris has (SKL, 20+ cores per LLC) and the machines I ran this on (SKL,SPR 20+ cores) are Intel, AMD has smaller LLC and the problem wasn’t as pronounced there.



Anyway, the patches are stable (finally!, I hope, knock on wood) but in a somewhat rough state. At the very least the last patch is missing ttwu_stat(), still need to figure out how to account it ;-)

Chris, I’m hoping your machine will agree with these numbers; it hasn’t been straight sailing in that regard.”

On the Intel Skylake server, the Linux kernel scheduler performance on Linux 6.15 was around 93% that of where it was pre-6.11. On the Intel Xeon Sapphire Rapids server the performance is 4~5% lower on more recent versions of the kernel with this scheduler workload. With the RFC patches posted today, the Linux kernel scheduler regressions appear to be largely resolved.

Linux scheduler regression benchmarks

Those interested in all the details can find them via this RFC patch series.

The original talk of a Linux 6.11 performance regression was raised earlier this month in this kernel mailing list thread by Chris Mason:

“I’ve spent some time trying to track down a regression in a networking benchmark, where it looks like we’re spending roughly 10% more time in new idle balancing than 6.9 did.

I’m not sure if I’ve reproduced that exact regression, but with some changes to schbench, I was able to bisect a regression of some kind down to commits in v6.11.”

Hopefully these regression fixes will all be sorted out and upstreamed soon.



Source link

Leave a Comment

Your email address will not be published. Required fields are marked *