Current Location: Home> Latest Articles> How to read files in chunks and calculate hash

How to read files in chunks and calculate hash

M66 2025-06-05

When working with large files, directly reading the entire file and calculating the hash value can cause excessive memory usage and even cause program crashes when memory is insufficient. To solve this problem, you can use PHP's hash_update_stream function to read the file in chunks and calculate the hash value in real time. Here is an example that demonstrates how to read a file in chunks and calculate the hash value of the file to avoid loading the entire file at once.

What is hash_update_stream?

The hash_update_stream function is a function provided by PHP for streaming updates hash values. Unlike the hash_update() function, hash_update_stream updates hash values ​​block by block through a file stream ( resource ), which is suitable for processing large files.

Basic steps

  1. Open a file stream.

  2. Initialize the hash algorithm using hash_init function.

  3. Use hash_update_stream to read the file in chunks and calculate the hash value in real time.

  4. Close the file stream when finished and get the final hash value.

Sample code

 <?php
// Initialization hashing algorithm,Used hereSHA-256
$hashAlgorithm = 'sha256';

// Open the file stream
$filePath = 'path_to_your_large_file'; // Replace with your file path
$fileStream = fopen($filePath, 'rb');
if (!$fileStream) {
    die("Unable to open the file");
}

// Initialize hash calculation
$hashContext = hash_init($hashAlgorithm);

// Set the block size,Usually 8KB or 16KB
$chunkSize = 8192; // 8KB

// Read the file and update the hash value in real time
while (!feof($fileStream)) {
    $data = fread($fileStream, $chunkSize);
    hash_update_stream($hashContext, $data);
}

// Get the final hash value
$fileHash = hash_final($hashContext);

// The hash value of the output file
echo "The hash value of the file is: " . $fileHash . "\n";

// Close the file stream
fclose($fileStream);
?>

Detailed analysis

  1. Open file stream <br> Open the file using the fopen function and specify to read in binary mode ( rb ). This is to ensure that the read content does not cause problems due to different file encoding or line breaks.

  2. Initialization hashing algorithm
    hash_init() is used to initialize the hash algorithm and pass in the hash algorithm of your choice (such as sha256 , md5 , etc.). This creates a hash context for calculating the hash value step by step.

  3. Read the file and update the hash <br> Use fread to read a fixed-size file chunk (e.g. 8KB) each time. Then use hash_update_stream to update the read data blocks into the hash context in real time.

  4. Get the final hash value <br> Use the hash_final() function to get the final calculated hash value and close the file stream.

Use scenarios

  • Large file hash calculation <br> When processing large files (such as files larger than 1GB), files cannot be loaded into memory at once. At this time, reading the file in chunks and calculating the hash value can effectively reduce memory usage.

  • File Integrity Verification <br> For scenarios where file contents need to be ensured that the contents of files are not tampered with during transmission, it is very important to verify the integrity of files through hash values.

Code optimization suggestions

  • Dynamically adjust the block size <br> The size of the read block can be adjusted according to the system's memory and disk performance. Choosing the appropriate block size can further improve performance.

  • Multithreaded processing <br> For very large files, you can consider using multithreading technology to process different parts of the file simultaneously to further improve efficiency.

Frequently Asked Questions

  1. Too large files lead to insufficient memory <br> When using streaming to read files and updating the hash values ​​chunk by chunk, the file itself is not fully loaded into memory, thus avoiding memory overflow issues.

  2. File formats not supported by hash_update_stream function <br> This function handles binary data streams, so it can be used to process any type of file, including text files, pictures, videos, etc.

Other resources