Current Location: Home> Latest Articles> Detect whether large files are modified: Hash records and comparison

Detect whether large files are modified: Hash records and comparison

M66 2025-05-27

When processing large files, especially in scenarios where there are a lot of files and cannot be loaded directly into memory, we often need to determine whether the file has changed. Traditional file comparison methods may consume more memory, especially when the file size is huge. Therefore, it is an efficient and commonly used method to detect whether a file is modified based on a hash value.

In PHP, hash_update_stream can be used to gradually calculate the hash value of the file, thereby realizing detection of large file modifications. This article will introduce in detail how to use this function to check changes in file content.

1. What is hash_update_stream ?

hash_update_stream is a function in PHP that calculates the hash value of a data stream (such as a file). It allows us to process the file content step by step and calculate the hash value of the file. Unlike directly calculating the hash of the entire file, hash_update_stream is more suitable for large files because it avoids loading the entire file into memory at once.

2. The principle of detecting whether large files have been modified

To detect whether a file has been modified, we usually implement it through the following steps:

  1. Generate a hash value of the file : First, you need to generate an initial hash value for the file. This hash value can be calculated by hash_update_stream and saved as the "fingerprint" of the original state of the file.

  2. Check the hash value of the file periodically : When you need to check whether the file has changed, calculate the hash value of the file again and compare it with the hash value saved before.

  3. Determine whether the file is modified : If the new hash value is different from the old hash value, it means that the file has been modified. Otherwise, the file has not changed.

3. How to calculate file hash value using hash_update_stream

In order to implement the above check function, first we need to understand how to use hash_update_stream to calculate the hash value of a file. Here is a sample code:

 <?php
// Set file path
$file = 'path/to/your/largefile.txt';

// Open the file
$stream = fopen($file, 'rb');
if (!$stream) {
    die('无法Open the file');
}

// use hash_update_stream Gradually calculate file hash value
$context = hash_init('sha256'); // Different hashing algorithms can be selected,For example sha256
while (!feof($stream)) {
    $data = fread($stream, 8192); // Read files block by block
    hash_update_stream($context, $data); // Update hash value
}

// Calculate the final hash value
$hash = hash_final($context);
fclose($stream);

// The hash value of the output file
echo "The hash value of the file is: $hash\n";
?>

In the above code, we create a hash context through hash_init and gradually update the hash value through hash_update_stream . Read a small chunk of data from the file stream each time and pass it to hash_update_stream until the file is read. Finally, get the final hash value of the file through hash_final .

4. Use hash values ​​to check whether the file is modified

Next, we need to store the hash value of the file so that we can compare it if needed. Here is a sample code to check if the file has been modified:

 <?php
// File path to store the original hash value
$hashFile = 'path/to/your/previous_hash.txt';

// Get the original hash value(If there is)
$previousHash = file_exists($hashFile) ? file_get_contents($hashFile) : null;

// Get the hash value of the current file
$file = 'path/to/your/largefile.txt';
$stream = fopen($file, 'rb');
if (!$stream) {
    die('无法Open the file');
}

$context = hash_init('sha256');
while (!feof($stream)) {
    $data = fread($stream, 8192);
    hash_update_stream($context, $data);
}

$currentHash = hash_final($context);
fclose($stream);

// If there is之前的哈希值,Make a comparison
if ($previousHash !== null) {
    if ($previousHash === $currentHash) {
        echo "The file has not been modified。\n";
    } else {
        echo "The file has been modified。\n";
    }
} else {
    echo "No previous hash value was found,无法Make a comparison。\n";
}

// Save the current hash value,For next comparison
file_put_contents($hashFile, $currentHash);
?>

In this code, we first try to read the previously saved hash value from the file. Then, recalculate the hash value of the current file and compare it. If the hash values ​​are consistent, it means that the file has not changed; if it is different, the file has been modified. Finally, we save the current hash value to the file for the next comparison.

5. Summary

Through the above steps, we can efficiently use hash_update_stream to detect whether a large file has been modified. Compared to directly loading the entire file into memory, this method has obvious advantages in memory usage and performance, and is particularly suitable for handling very large files.

Through hash value comparison, we can implement the integrity check of the file content to ensure that the file is not tampered with or lost.

I hope this article can help you understand how to use the hash_update_stream function in PHP to detect the modification of large files and improve your file management efficiency in actual development.