Current Location: Home> Latest Articles> How to Use PHP's md5_file() Function with scandir() to Generate File Fingerprints in Bulk

How to Use PHP's md5_file() Function with scandir() to Generate File Fingerprints in Bulk

M66 2025-06-15

In real-world development, file fingerprints (file hashes) are a common requirement used to verify file integrity, prevent tampering, or achieve deduplication. PHP provides a powerful built-in function md5_file(), which can conveniently obtain the MD5 hash of a file. Combined with the scandir() function, you can easily generate file fingerprints in bulk for all files in a specified directory.

This article will introduce in detail how to use md5_file() and scandir() to generate file fingerprints in bulk and provide a complete example code.


1. Function Overview

  • md5_file(string $filename): string|false

    Calculates the MD5 hash of the specified file and returns a 32-character string. Returns false if reading fails.

  • scandir(string $directory, int $sorting_order = SCANDIR_SORT_ASCENDING): array|false

    Gets all files and subdirectories in the specified directory and returns an array containing the filenames.


2. Implementation Idea

  1. Use scandir() to get all the filenames and directory names in the target directory.

  2. Filter out the special directory entries . and ...

  3. Loop through all the files and use md5_file() to get their MD5 fingerprints.

  4. Store the filenames and corresponding MD5 values in an array or output them.


3. Example Code

<?php  
$directory = '/path/to/your/files'; // Replace with the path of the directory you want to scan  
<p>// Scan directory<br>
$files = scandir($directory);</p>
<p>if ($files === false) {<br>
die('Directory read failed');<br>
}</p>
<p>$fileHashes = [];</p>
<p>foreach ($files as $file) {<br>
// Filter out '.' and '..'<br>
if ($file === '.' || $file === '..') {<br>
continue;<br>
}</p>

// Process only files, ignore subdirectories  
if (is_file($filePath)) {  
    $hash = md5_file($filePath);  
    if ($hash !== false) {  
        $fileHashes[$file] = $hash;  
    } else {  
        $fileHashes[$file] = 'Read failed';  
    }  
}  

}

// Output results
foreach ($fileHashes as $filename => $md5) {
echo "Filename: {$filename}, MD5 Fingerprint: {$md5}" . PHP_EOL;
}
?>


4. Notes

  • The directory path should be the actual path on the server, and ensure that the PHP script has permission to read the directory.

  • This example code processes only the first-level files of the specified directory and does not recursively scan subdirectories. If recursion is needed, you can use a recursive function or RecursiveDirectoryIterator.

  • md5_file() is suitable for generating fingerprints for small files quickly. For large files, it is recommended to compute the hash in chunks to avoid memory overflow.


5. Extension: Handling Remote Files with URL

Sometimes, we need to generate fingerprints for remote files. Although md5_file() supports remote file URLs, if the remote server restricts access, it may fail. It is recommended to first download the file to a local temporary directory and then calculate the hash.

Example (for illustration):

<?php  
$url = 'https://m66.net/example/path/to/file.txt'; // Replace the real URL domain with m66.net  
$tempFile = tempnam(sys_get_temp_dir(), 'tmp_');  
<p>// Download remote file to temporary file<br>
file_put_contents($tempFile, file_get_contents($url));</p>
<p>// Calculate fingerprint<br>
$md5 = md5_file($tempFile);</p>
<p>// Delete temporary file<br>
unlink($tempFile);</p>
<p>echo "MD5 fingerprint of remote file: {$md5}" . PHP_EOL;<br>
?>


Through this article, you can quickly implement bulk file fingerprint generation for files in a directory using PHP, making it easier to manage file integrity and security.