Current Location: Home> Latest Articles> How to incrementally hash a file using hash_update_stream()

How to incrementally hash a file using hash_update_stream()

M66 2025-05-31

When processing large files, directly reading the entire file for hash calculations will take up a lot of memory, and may even lead to insufficient memory. To efficiently process large files, PHP provides the hash_update_stream() function, which allows us to incrementally hash the large files. In this way, we can calculate the hash value while reading the file, thus avoiding loading the entire file into memory at once.

What is the hash_update_stream() function?

hash_update_stream() is a function in PHP that incrementally updates the hash value of stream data. It's similar to hash_update() , but the difference is that it accepts a stream resource as input instead of directly processing string data. This way, you can process large amounts of data without loading it into memory at once.

How to use hash_update_stream() function?

When using the hash_update_stream() function, you need to first open a file stream and create a hash context for the file. You can then stream some of the contents of the file and gradually update the hash value.

Here are the basic steps for incremental hashing large files using hash_update_stream() :

  1. Initialize the hash context:

    First, use the hash_init() function to initialize the hash context. You can choose a supported hashing algorithm such as sha256 or md5 .

     $hash_algorithm = 'sha256';  // Select a hashing algorithm
    $context = hash_init($hash_algorithm);
    
  2. Open the file stream:

    Use the fopen() function to open the file and get a file stream.

     $file_path = 'path_to_large_file.txt';  // Replace with large file path
    $file_stream = fopen($file_path, 'rb');
    if (!$file_stream) {
        die("Unable to open the file!");
    }
    
  3. Incrementally update hash:

    Use the hash_update_stream() function to process the data of the file stream. You can read the contents of the file in batches and update the hash value.

     while (!feof($file_stream)) {
        $data = fread($file_stream, 8192);  // Each read8KBdata
        hash_update_stream($context, $data);
    }
    
  4. Get the final hash value:

    After reading the file, use the hash_final() function to get the final hash value.

     $final_hash = hash_final($context);
    echo "The hash value of the file is: " . $final_hash . PHP_EOL;
    
  5. Close the file stream:

    Finally, don't forget to close the file stream.

     fclose($file_stream);
    

Complete sample code

 <?php

// Select a hashing algorithm
$hash_algorithm = 'sha256';
$context = hash_init($hash_algorithm);

// Open the file stream
$file_path = 'path_to_large_file.txt';  // Replace with large file path
$file_stream = fopen($file_path, 'rb');
if (!$file_stream) {
    die("Unable to open the file!");
}

// Incremental update hash value
while (!feof($file_stream)) {
    $data = fread($file_stream, 8192);  // Each read8KBdata
    hash_update_stream($context, $data);
}

// Get the final hash value
$final_hash = hash_final($context);
echo "The hash value of the file is: " . $final_hash . PHP_EOL;

// Close the file stream
fclose($file_stream);

?>

Things to note

  1. Memory efficiency: When using the hash_update_stream() function to process large files, you can save a lot of memory because it does not load the entire file into memory, but reads the file by block and gradually updates the hash value.

  2. The size of the file read block: The size of the read block (the second parameter of fread() ) can be adjusted according to the actual situation. Generally speaking, 8KB to 64KB is a reasonable range that can be optimized based on file size and hardware conditions.

  3. Error handling: In actual applications, you should pay attention to handling file reading errors, such as file failure to open or read interrupts.

Summarize

The hash_update_stream() function is a very practical tool for handling hash calculations for large files. It can help us calculate hash values ​​step by step in streaming, avoiding the problem of loading the entire file into memory at one time. With reasonable block size and incremental updates, you can efficiently calculate the hash of large files, adapting to various memory-limited environments.