Brandt Bucher (brandt bucher), a Core developer of CPYTHON at Microsoft, has made significant strides in improving the performance and productivity of the CPYTHON interpreter. Bucher recently published the implementation of a Just-In-Time (JIT) compiler for Python. This implementation utilizes the copy-and-patch technique and is documented in a paper titled “A JIT Compiler for CPYTHON“. The release of the JIT compiler is scheduled for Christmas and the accompanying announcement is written in verses.
The proposed JIT compiler stands out for its high code generation speed, ease of maintenance, and seamless integration with the interpreter. It offers an automatic conversion of the SI language-based interpreter into a JIT compiler without the need for a separate logic of code generation or manual creation of assembler representations. This means that any error corrections made in the interpreter will automatically be reflected in the JIT compiler, as they share the same code generator.
The Copy-And-Patch method, which the proposed JIT is based on, relies on the similarity between code relocation tasks during loading of object files and bytecode substitution tasks in the JIT. During the program execution, the JIT performs bytecode instructions generated by the interpreter, copies pre-compiled machine code templates into a memory area containing executable code, and replaces the necessary values, such as arguments and constants. The process of loading object files involves copying machine code into memory and substituting external characters.
In the Copy-And-Patch JIT, LLVM is used to assemble an object file in the ELF format, which contains bytecode instructions and information for data replacement. The JIT replaces the bytecode instructions generated during program interpretation with machine code representations, while simultaneously substituting the required data for calculations. The implementation of the JIT requires LLVM as a dependency during the assembly stage, but the Runtime components are not tied to external dependencies. In total, only about 300 lines of manually written code in the SI language and 3000 lines of generated code in SI are needed.
Compared to traditional JIT instrumentation using LLVM