A Quick Look at the `file` Command

Simple Introduction

Although the date here is March 20, 2024, I’m actually writing this blog on February 18, 2025, based on materials from my Computer System discussion course. If you want to know a file’s type, you might use the file command. But how does file determine file types, or more specifically, how do files in the system tell file (or any program wanting to know its type) what type they are? Where do files store their type information?

Manuals Are Always Your Friend

The first second third method to understand a command is usually through whatis / man / tldr.

man whatis will tell you that whatis actually comes from man:

Each manual page has a short description available within it. whatis searches the manual page names and displays the manual page descriptions of any name matched.

man file will tell that file checks file types like this:

file tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: filesystem tests, magic tests, and language tests. The first test that succeeds causes the file type to be printed.

  1. filesystem tests
    System call stat, using its return value to determine the file type; stat can identify empty files / file types defined in <sys/stat.h>.
  2. magic tests
    Check if the file header contains specific magic bytes. For example, if the first five bytes of the file correspond to the ASCII characters “%PDF-”, it’s identified as a PDF file. If no magic bytes exist, it’s determined to be a text file, and file will continue to determine its encoding as ASCII/UTF-8/…
  3. language tests
    Determine the file’s language through keywords, such as inferring a text file is a C source file from main, struct, printf.

A closer way to observe this command’s execution process is strace file foo.bar; you can also redirect the output to a file and use Vim to view it for keyword searching / readability, like strace file foo.bar &> strace.out; vim strace.out.

Fooling file?

Exploiting the magic test

fake pdf

Exploiting the language test

C source file?

Simple Conclusion

To answer the initial question—file determines file types through filesystem tests, magic tests, and language tests. Files “express” their types through magic bytes in the file header, text encoding, or programming language keywords.

Wish You a Nice Day!
Built with Hugo
Theme Stack designed by Jimmy