Simple Introduction
Based on materials I prepared for Computer System course
I’ve seen many people claim that Bash syntax is uncomfortable, but I had already gotten used to Bash syntax before encountering such opinions - “I met you before the rumors did”?? 😶🌫️
The topic of the related discussion session was: how does the compiler handle switch-case statements in C? Specifically, under what distribution of case
statements (continuous/discontinuous, size of intervals) will the compiler generate a jump table? Since we needed to explore different case
scenarios, we first needed to generate C source files with different case
patterns. This seemed rather mechanical, so we considered using an automated script. Our expectation for this script was that generator -b 10 -s 2 -d dest_dir -f file_name
would produce a file with 10 branches (-b
), branch intervals (-s
) of 2, in the directory (-d
) dest_dir, with the filename (-f
) file_name.c. With a single file generator in hand, we could then write another generator to call this one and create a batch of C source files with varying numbers of branches and intervals.
Command Line Argument Capture
The most obvious benefit of command line interfaces is that you can control a command’s specific behavior through options. For example, cat
displays file contents, while cat -n
adds line numbers. To capture command line arguments, we use getopts
:
while getopts ":b:d:f:s:" opt; do
case $opt in
b)
branch=$OPTARG # number of switch branches
;;
:)
echo "Option -$OPTARG requires an argument."
;;
?)
echo "Invalid option: -$OPTARG"
;;
esac
done
The Simplest Metaprogramming
The metaprogramming here refers to using code to generate code. This can be complex, but here we take the simplest approach - code is just a text file, right? So we can simply echo
code text and append it to the target file:
targetpath="./${dir}/${filename}"
echo -e "/* Created by switch_generator */\n"\
> ${targetpath} # sleep 0.1
echo -e \
"int main(){\n\n\
int i = 0, j = 0;\n\
switch (i) {\
" >> ${targetpath}; # sleep 0.1
for (( i = 1;i <= $branch; i++ )); do
record=$i
i=$(( i*seperate ))
echo -e \
" case $i:\n\
j += $i;\n\
break;\n\
" >> ${targetpath}; # sleep 0.1
i=$record
done
echo -e \
" default:\n\
j += 1000;\n\
break;\n\
}\n\
return 0;\n\
}" >> ${targetpath}
Source File Pipeline
Raising the level of abstraction, we use another script to call the above script, implementing batch production of source files. The core code is:
for (( branch_num = 1; branch_num <= $size; branch_num++ ));do
filename="${compiler}_branch_${branch_num}"
bash ./switch_generator.sh -b $branch_num -d $dir -f ${filename}.c
# $compiler -S ./${dir}/${filename}.c -o ./${dir}/${filename}.s
# ...
done
By batch compiling these files to assembly, then using grep
to check for and report jump tables, our task was complete. The principles are similar, so I won’t go into further detail. If you’re curious about the answers to the discussion questions:
- Compilers use jump tables when the number of consecutive branches is ≥ 4 (clang) / 5 (gcc); otherwise, they use subl, je conditional jump instructions;
- When branch constant intervals are ≥ 12 (clang) / 10 (gcc), compilers no longer use jump tables, but directly use subl, je for conditional testing and jumping;
- When branch variables form two consecutive segments with a large gap between them, such as 1,2,…,6, 101,102,…106, gcc generates two jump tables (this conclusion comes from my teammate LYT)