Simple Bash CLI Programs & Simple Metaprogramming

Simple Introduction

Based on materials I prepared for Computer System course

I’ve seen many people claim that Bash syntax is uncomfortable, but I had already gotten used to Bash syntax before encountering such opinions - “I met you before the rumors did”?? 😶‍🌫️

The topic of the related discussion session was: how does the compiler handle switch-case statements in C? Specifically, under what distribution of case statements (continuous/discontinuous, size of intervals) will the compiler generate a jump table? Since we needed to explore different case scenarios, we first needed to generate C source files with different case patterns. This seemed rather mechanical, so we considered using an automated script. Our expectation for this script was that generator -b 10 -s 2 -d dest_dir -f file_name would produce a file with 10 branches (-b), branch intervals (-s) of 2, in the directory (-d) dest_dir, with the filename (-f) file_name.c. With a single file generator in hand, we could then write another generator to call this one and create a batch of C source files with varying numbers of branches and intervals.

Command Line Argument Capture

The most obvious benefit of command line interfaces is that you can control a command’s specific behavior through options. For example, cat displays file contents, while cat -n adds line numbers. To capture command line arguments, we use getopts:

while getopts ":b:d:f:s:" opt; do
  case $opt in
    b)
        branch=$OPTARG      # number of switch branches
        ;;
    :)
      echo "Option -$OPTARG requires an argument."
      ;;
    ?)
      echo "Invalid option: -$OPTARG"
      ;;
  esac
done

The Simplest Metaprogramming

The metaprogramming here refers to using code to generate code. This can be complex, but here we take the simplest approach - code is just a text file, right? So we can simply echo code text and append it to the target file:

targetpath="./${dir}/${filename}"

echo -e "/* Created by switch_generator */\n"\
> ${targetpath}     # sleep 0.1 

echo -e \
"int main(){\n\n\
    int i = 0, j = 0;\n\
    switch (i) {\
" >> ${targetpath};  # sleep 0.1

for (( i = 1;i <= $branch; i++ )); do
    record=$i
    i=$(( i*seperate ))
    echo -e \
"    case $i:\n\
        j += $i;\n\
        break;\n\
" >> ${targetpath};  # sleep 0.1
    i=$record
done

echo -e \
"    default:\n\
        j += 1000;\n\
        break;\n\
    }\n\
    return 0;\n\
}" >> ${targetpath}

Source File Pipeline

Raising the level of abstraction, we use another script to call the above script, implementing batch production of source files. The core code is:

for (( branch_num = 1; branch_num <= $size; branch_num++ ));do
    filename="${compiler}_branch_${branch_num}"                             
    bash ./switch_generator.sh  -b $branch_num -d $dir -f ${filename}.c 
    # $compiler -S ./${dir}/${filename}.c -o ./${dir}/${filename}.s    
    # ...       
done

By batch compiling these files to assembly, then using grep to check for and report jump tables, our task was complete. The principles are similar, so I won’t go into further detail. If you’re curious about the answers to the discussion questions:

Compilers use jump tables when the number of consecutive branches is ≥ 4 (clang) / 5 (gcc); otherwise, they use subl, je conditional jump instructions;
When branch constant intervals are ≥ 12 (clang) / 10 (gcc), compilers no longer use jump tables, but directly use subl, je for conditional testing and jumping;
When branch variables form two consecutive segments with a large gap between them, such as 1,2,…,6, 101,102,…106, gcc generates two jump tables (this conclusion comes from my teammate LYT)