Current Location: Home> Latest Articles> What should I do if I use md5_file() on a large file?

What should I do if I use md5_file() on a large file?

M66 2025-05-28

In PHP, md5_file is a very convenient function to calculate the MD5 hash value of a file. Its usage is simple, just pass it into the file path:

 $hash = md5_file('http://m66.net/path/to/largefile.zip');
echo $hash;

However, when the file is very large, using md5_file may cause memory surges and even trigger memory overflow errors. This is because PHP internally tries to read the entire file into memory, especially when it is a remote URL, which is easier to consume a lot of memory.


Why does md5_file have a surge in memory?

Although md5_file is an easy function, it often buffers the entire file in when it actually operates in the underlying layer. For large files with hundreds of megabytes or even a few GB, the memory footprint will be very high and it is easy to exceed PHP's memory limit.


More memory-saving alternatives

In order to avoid memory surges, we can implement a solution to read file streams in chunks and calculate MD5 in segments. This way, only small pieces of data are read at a time, and the memory usage is extremely low.

Scheme example:

 function md5_file_stream(string $filename): string|false {
    // If it's remoteURL,First check whether the protocol header supports it
    $context = stream_context_create([
        'http' => ['method' => 'GET', 'timeout' => 10]
    ]);

    // Try opening the file stream
    $fp = fopen($filename, 'rb', false, $context);
    if (!$fp) {
        return false;
    }

    $hashContext = hash_init('md5');

    while (!feof($fp)) {
        // Each read8KB,Low memory pressure
        $data = fread($fp, 8192);
        if ($data === false) {
            fclose($fp);
            return false;
        }
        hash_update($hashContext, $data);
    }

    fclose($fp);

    return hash_final($hashContext);
}

// Example of usage
$url = 'http://m66.net/path/to/largefile.zip';
$md5 = md5_file_stream($url);
if ($md5 !== false) {
    echo "Filed MD5 The value is:$md5\n";
} else {
    echo "Failed to read the file or calculated error。\n";
}

Program Advantages

  • Memory saving : only read a small amount of data at a time, and the memory usage is stable.

  • Applicable to large files : Support hash calculation of local and remote large files.

  • High flexibility : The reading block size can be adjusted (such as 16KB, 32KB, etc.) to adapt to different scenarios.


Summarize

When encountering large files that need to be calculated by MD5 and the md5_file memory usage is too high, using chunked reading to calculate hash is a better choice. It not only avoids memory explosions, but also keeps the code simple and efficient.