Python 3.11 is on average 25% faster than 3.10, compiled with GCC on Ubuntu Linux, speedup can range from 10-60%

Is Python 3.11 really the fastest Python? One thing is sure, CPython 3.11 is on average 25% faster than CPython 3.10 when measured with the pyperformance benchmark suite, and compiled with GCC on Ubuntu Linux. Depending on the workload, the acceleration can reach 10 to 60%. The Python development team announced improvements and new features in Python 3.11 on October 4. In order to improve the performance of the Python programming language, Microsoft and the CPython Faster lad.

It’s a Microsoft-funded project, whose members include Python inventor Guido van Rossum, senior Microsoft software engineer Eric Snow, and Mark Shannon who is under contract with Microsoft as the project’s technical lead. Python is widely known to be so slow. While Python will never match the performance of low-level languages like C, Fortran, or even Java, we’d like it to be competitive with fast implementations of scripting languages, like V8 for JavaScript, says Mark Shannon.

Python is an interpreted, cross-paradigm, cross-platform programming language. It promotes structured, functional and object-oriented imperative programming. It has strong dynamic typing, automatic memory management by garbage collection and an exception handling system; it is thus similar to Perl, Ruby, Scheme, Smalltalk and Tcl.

To be efficient, virtual machines for dynamic languages must specialize the code they execute according to the types and values of the program being executed. This specialization is often associated with Just-in-time (JIT) compilers, but it is beneficial even without machine code generation. The Faster CPython project focuses on two major areas of Python: faster startup and faster execution.

Note that specialization improves performance, and adaptation allows the interpreter to change quickly when the usage pattern of a program changes, thus limiting the amount of extra work caused by poor specialization.

Faster start

Python caches bytecode in directory __pycache__ to speed up the loading of modules. In version 3.10, running Python modules looked like this:

Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate

In Python 3.11, core modules essential to getting Python started are “blocked”. This means that their code (and bytecode) objects are statically allocated by the interpreter. This reduces the steps of the module execution process to this:

Statically allocated code object -> Evaluate

Interpreter startup is now 10-15% faster in Python 3.11. This has a big impact for short-running programs using Python. Faster execution Python frames are created each time Python calls a Python function. This frame contains runtime information. The new frame optimizations are as follows:

Streamlined frame creation process;
avoid memory allocation by generously reusing frame space on the C stack;
Rationalization of the internal structure of the frame so that it contains only the essential information. Frames previously contained additional debugging and memory management information.

Old-style frame objects are now only created when requested by debuggers or Python introspection functions such as sys._getframe or inspect.currentframe. For most user code, no frame objects are created at all. As a result, almost all Python function calls were significantly speeded up. We measured a 3-7% speedup in pyperformance.

During a Python function call, Python calls an evaluative C function to interpret the code for that function. This effectively limits pure Python recursion which is reliable for the C stack. With 3.11, when CPython detects Python code calling another Python function, it creates a new frame and “jumps” to the new code inside the new frame . This avoids calling the C interpreter function.

Faster CPython explores optimizations for CPython. As said before, the core team is funded by Microsoft to work on this project full time. Pablo Galindo Salgado is also funded by Bloomberg LP to work on the project part-time. Finally, many contributors are community volunteers.

The Python language is placed under a free license close to the BSD license and works on most computer platforms, from smartphones to computers, from Windows Unix with in particular GNU/Linux via macOS, or even Android, iOS, and can also be translated into Java or .NET. It is designed to optimize the productivity of programmers by offering high-level tools and an easy-to-use syntax.

Most Python function calls no longer consume space in the C stack. This speeds up most of those calls. This speeds up most of these calls. In simple recursive functions like fibonacci or factorial, a speedup of 1.7x has been observed. This also means that recursive functions can scour much deeper (if the user increases the recursion limit). We measured a 1 to 3% improvement in the performance of py.

Adding a specialized and adaptive CPython interpreter will bring significant performance improvements. It is difficult to give significant figures, because it depends a lot on benchmarks and work that has not yet been done. Extensive experiments suggest speedups of up to 50%. Even if the speed gain was only 25%, it would still be a nice improvement,” says Shannon.

Specifically, we want to achieve these performance goals with CPython to benefit all Python users, including those who cannot use PyPy or other alternative virtual machines,” he adds. When Devclass spoke with Python Leadership Council Member and Lead Developer Pablo Galindo about the new Memray memory profiler, he described how the Python team is using Microsoft’s work in version 3.11.

One of the things we’re doing is making the interpreter faster, says Pablo Galindo, Python board member and lead developer. But it’s also going to use a little more memory, just a little bit, because most of these optimizations have some kind of memory cost, since we have to store things for later use, or because we have a optimized version but sometimes someone needs to request an unoptimized version for debugging, so we have to store both.

PEP 659: Adaptive Interpreter Specialization

The PEP 659 is one of the key elements of the Faster CPython project. The general idea is that although Python is a dynamic language, most code has regions where objects and types rarely change. This concept is known as type stability. At runtime, Python tries to find common patterns and type stability in the running code. Python then replaces the current operation with a more specialized operation.

This specialized operation uses fast paths available only for these use cases/types, which are generally better performing than their generic equivalents. It also invokes another concept called “inline caching”, in which Python caches the results of expensive operations directly in the bytecode. The specializer will also combine certain pairs of common statements into a super statement. This reduces the overhead during execution.

Python only specializes when it sees “hot” code (executed multiple times). This saves Python from wasting time on run-once code. Python can also despecialize when the code is too dynamic or when the usage changes. Specialization is attempted periodically, and specialization attempts are not too costly. This allows specialization to adapt to new circumstances.

The pyperformance project aims to be an authoritative source of benchmarks for all Python implementations. The focus is on real-world benchmarks, rather than synthetic benchmarks, using full applications where possible.

Source : Python

And you?

Python 3.11 is on average 25% faster than 3.10, do you think?