March 13, 2020
From preprocessing to machine code
1. NodeJS to C++ translation
- Connecting library adapters to ensure all C++ libraries used for NodeJS work properly
Our transpiled C++ code gets preprocessed. In this first stage of C++ code compilation, the compiler will go through our code and evaluate it. It will consider code like
#if statements. Taken the mentioned example of preprocessing an
#include operation in C++, the compiler will simply take the contents of the included file and paste it at the place of the
#include statement. To see the result of preprocessed C++ code, Visual Studio offers a cool feature to output and see what the C++ compiler preprocesses (files with the .i ending). Taking a closer look at the preprocessed
.i files, you’ll see the included code resulting from C++’
The very same happens for the C++
#define functionality. So if the preprocessed C++ code assigns a value to a defined variable, the compiler will simply insert and paste the value or pointer instead.
3. Abstract Syntax Tree (AST)
Based on the preprocessed code, the compiler will create a more useable version of the preprocessed code in the form of an abstract syntax tree (AST). An AST is a structured way of representing our code including tons of meta info, i.e. function callee initiators or data types. To get a better picture of what an AST is, check out astexplorer.net and check the result of your inserted example code either in the form of a
tree (own AST format) or in the form of a simple well-known object.
4. C++ Translation Units
Our C++ code evolves and we come closer to finally get machine code the computer can understand and execute. As a next step, the compiler creates object files with the file ending .obj for all translation units (the compiled/preprocessed .cpp files). If one .cpp file includes other .cpp files, this will end up in one single translation unit. If multiple .cpp files get compiled separately without including each other, this results in multiple translation units and thus multiple .obj files. And here’s the moment we’ve been waiting for: an object file is already machine code computers can execute 🎉.
To get a better idea of what the machine code does on our computer, use the “Assembly only listing” option Visual Studio provides. This will generate compilation output files with a .asm file ending. Observing those files you’ll get a human-readable result of what the machine code does on your CPU. Those files include low-level Assembly instructions like
5. Performance maximum (optional)
Mostly our handwritten NodeJS code (as a result also the compiled C++) isn’t 100% speed-/efficiency-optimized. Thus the C++ compiler offers the possibility to go for 100% performance maximum when compiling the machine code. These optimizations can be things like the elimination of unneeded variable declarations when the ‘direct way’ of code execution is shorter and less resource-consuming. Also, a pre-calculated value caused by statically used numbers (i.e. usage of
5 * 2, which is a static 10) can be the result of such a performance optimization compiler activity.
My name is Manuel Penaloza. I live in Austria and work as a web & software developer building things to enrich the internet and internal business processes. Doing so, I'm a big fan of considering and regularly auditing the aspect of "software has to support business success & goals". Find me on Twitter: @manpenaloza.