During the development process, it is a common requirement to determine whether the file content is duplicated. For example, when dealing with file uploads, file storage, or preventing duplicate content, we need to be able to efficiently determine whether the file content is the same. PHP provides many tools to achieve this goal, and the hash_update_stream function is a very practical method, especially when dealing with large files, it can efficiently calculate the hash value of the file and help us quickly determine whether the file is duplicated.
hash_update_stream is a hash function built in PHP, which allows us to hash a stream (such as a file stream). Unlike the traditional method of directly calculating the hash of the entire file, hash_update_stream can gradually read the stream and update the hash value, which can avoid loading large files into memory at once, saving memory and calculation time.
To use the hash_update_stream function to determine whether the file is duplicated, it is usually done to calculate the hash value for each file (such as MD5 or SHA256), and then compare the calculated hash value with the stored hash value. If the hash value is the same, the file content is considered to be duplicate.
First, we need to open the file and read its content step by step, and use hash_update_stream to calculate the hash value of the file.
<?php
// File path
$filePath = 'path/to/your/file.txt';
// Select a hashing algorithm
$hashAlgo = 'sha256'; // You can choose md5、sha1 Other algorithms
// Open the file
$file = fopen($filePath, 'rb');
// Initialize hash resources
$hashContext = hash_init($hashAlgo);
// Read the file step by step and update the hash value
while (!feof($file)) {
$chunk = fread($file, 1024); // Read file blocks,Avoid loading large files at once
hash_update_stream($hashContext, $chunk); // Update hash value
}
// Calculate the final hash value
$hashValue = hash_final($hashContext);
// Close the file
fclose($file);
echo "The hash value of the file is: $hashValue";
?>
After calculating the hash value of the file, we can compare it with the file hash value already in the database or storage system to determine whether the file is duplicated.
<?php
// Suppose we already have a stored hash list
$storedHashes = [
'd2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2', // Stored file hash value
'a3a3a3a3a3a3a3a3a3a3a3a3a3a3a3a3'
];
// Check whether the calculated hash value exists in the stored hash value
if (in_array($hashValue, $storedHashes)) {
echo "Duplicate file content!";
} else {
echo "File content is not repeated,Ready to upload or store!