Current Location: Home> Latest Articles> Notes on using md5_file() in concurrent environments

Notes on using md5_file() in concurrent environments

M66 2025-06-02

In PHP, the md5_file() function is used to calculate the MD5 hash value of a given file, and is often used to verify file integrity or to uniquely identify file content. Although it is simple and convenient to use, when calling md5_file() in a high concurrency environment, developers need to pay attention to some potential problems to avoid performance bottlenecks or inaccurate results.

1. File read and write conflict issues

md5_file() essentially reads the file content and calculates its hash value. In a concurrent environment, if multiple processes or threads read and write to the same file at the same time, it may lead to:

  • The incomplete or written data is read, so the calculated MD5 value is inaccurate.

  • Because the file is being written to lock, the read operation is blocked, affecting the concurrency performance.

Solution:

  • When writing files, use file lock ( flock() ) to ensure that the lock is released only after the write is complete, and avoid obtaining semi-finished files during concurrent reading.

  • When reading, locking is also added or the mechanism is used to ensure that the file is written and then MD5 is calculated.

 $filename = '/path/to/file.txt';

// Lock when writing
$file = fopen($filename, 'c+');
if (flock($file, LOCK_EX)) { 
    ftruncate($file, 0);
    fwrite($file, 'New content');
    fflush($file);
    flock($file, LOCK_UN);
}
fclose($file);

// Locking is added during reading to avoid being written during reading
$file = fopen($filename, 'r');
if (flock($file, LOCK_SH)) { 
    $md5 = md5_file($filename);
    flock($file, LOCK_UN);
}
fclose($file);
echo "FiledMD5value: " . $md5;

2. File caching issues

Some operating systems or file systems may have a cache mechanism for file operations, resulting in old data being read in a short time after file changes, affecting the accuracy of md5_file() .

Solution:

  • Use the clearstatcache() function to clean the file state cache to ensure that the latest file state is read.

 clearstatcache(true, $filename);
$md5 = md5_file($filename);

3. Performance bottleneck

md5_file() will read the entire file content. If the file is large or the call frequency is extremely high, it can easily lead to I/O bottlenecks and affect the overall performance of the system.

Optimization suggestions:

  • If you just check whether the file has changed, you can make a preliminary judgment based on the file modification time filemtime() and file size filesize() . Only when the file changes are confirmed, call md5_file() .

  • Use more efficient asynchronous processing or queueing mechanisms to avoid frequent calculations.

  • For large files, you can consider chunked computing hashing or use a more efficient hashing algorithm.

 $filename = '/path/to/file.txt';
$lastMtime = 0;
$lastFilesize = 0;
$lastMd5 = '';

$currentMtime = filemtime($filename);
$currentFilesize = filesize($filename);

if ($currentMtime !== $lastMtime || $currentFilesize !== $lastFilesize) {
    clearstatcache(true, $filename);
    $lastMd5 = md5_file($filename);
    $lastMtime = $currentMtime;
    $lastFilesize = $currentFilesize;
}

echo "documentMD5: " . $lastMd5;

4. Atomic guarantee for concurrent reading and writing

In distributed or multi-process environments, relying solely on local file locks may not guarantee atomic operations between multiple instances. At this time, more advanced synchronization mechanisms are needed, such as:

  • Use distributed locks (such as lock mechanisms based on Redis, Zookeeper, etc.).

  • Design the task queue for file processing to ensure that there is only one process operating file at the same time.

5. Things to note in URL scenarios

If your file path is obtained based on URL (such as remote files), please note:

  • Stability and timeout processing of network requests.

  • The content of the remote file may change, and calling md5_file() requires ensuring that the remote server supports it and allows file reading.

  • If you need to replace the domain name here, in this example, the domain name will be replaced with m66.net .