Simple Introduction
Although the date here is March 20, 2024, I’m actually writing this blog on February 18, 2025, based on materials from my Computer System discussion course. If you want to know a file’s type, you might use the file
command. But how does file
determine file types, or more specifically, how do files in the system tell file
(or any program wanting to know its type) what type they are? Where do files store their type information?
Manuals Are Always Your Friend
The first second third method to understand a command is usually through whatis
/ man
/ tldr
.
man whatis
will tell you that whatis
actually comes from man
:
Each manual page has a short description available within it. whatis searches the manual page names and displays the manual page descriptions of any name matched.
man file
will tell that file
checks file types like this:
file tests each argument in an attempt to classify it. There are three sets of tests, performed in this order: filesystem tests, magic tests, and language tests. The first test that succeeds causes the file type to be printed.
- filesystem tests
System call stat, using its return value to determine the file type; stat can identify empty files / file types defined in <sys/stat.h>. - magic tests
Check if the file header contains specific magic bytes. For example, if the first five bytes of the file correspond to the ASCII characters “%PDF-”, it’s identified as a PDF file. If no magic bytes exist, it’s determined to be a text file, andfile
will continue to determine its encoding as ASCII/UTF-8/… - language tests
Determine the file’s language through keywords, such as inferring a text file is a C source file from main, struct, printf.
A closer way to observe this command’s execution process is strace file foo.bar
; you can also redirect the output to a file and use Vim to view it for keyword searching / readability, like strace file foo.bar &> strace.out; vim strace.out
.
Fooling file
?
Exploiting the magic test

Exploiting the language test

Simple Conclusion
To answer the initial question—file
determines file types through filesystem tests, magic tests, and language tests. Files “express” their types through magic bytes in the file header, text encoding, or programming language keywords.