Current Location: Home> Latest Articles> Use hash to determine whether the file content is duplicated

Use hash to determine whether the file content is duplicated

M66 2025-05-27

During the development process, it is a common requirement to determine whether the file content is duplicated. For example, when dealing with file uploads, file storage, or preventing duplicate content, we need to be able to efficiently determine whether the file content is the same. PHP provides many tools to achieve this goal, and the hash_update_stream function is a very practical method, especially when dealing with large files, it can efficiently calculate the hash value of the file and help us quickly determine whether the file is duplicated.

What is the hash_update_stream function?

hash_update_stream is a hash function built in PHP, which allows us to hash a stream (such as a file stream). Unlike the traditional method of directly calculating the hash of the entire file, hash_update_stream can gradually read the stream and update the hash value, which can avoid loading large files into memory at once, saving memory and calculation time.

How to use hash_update_stream to determine whether the file is duplicated?

To use the hash_update_stream function to determine whether the file is duplicated, it is usually done to calculate the hash value for each file (such as MD5 or SHA256), and then compare the calculated hash value with the stored hash value. If the hash value is the same, the file content is considered to be duplicate.

Step 1: Open the file and calculate the hash value

First, we need to open the file and read its content step by step, and use hash_update_stream to calculate the hash value of the file.

 <?php
// File path
$filePath = 'path/to/your/file.txt';

// Select a hashing algorithm
$hashAlgo = 'sha256'; // You can choose md5、sha1 Other algorithms

// Open the file
$file = fopen($filePath, 'rb');

// Initialize hash resources
$hashContext = hash_init($hashAlgo);

// Read the file step by step and update the hash value
while (!feof($file)) {
    $chunk = fread($file, 1024); // Read file blocks,Avoid loading large files at once
    hash_update_stream($hashContext, $chunk); // Update hash value
}

// Calculate the final hash value
$hashValue = hash_final($hashContext);

// Close the file
fclose($file);

echo "The hash value of the file is: $hashValue";
?>

Step 2: Comparison of hash value with existing file hash value

After calculating the hash value of the file, we can compare it with the file hash value already in the database or storage system to determine whether the file is duplicated.

 <?php
// Suppose we already have a stored hash list
$storedHashes = [
    'd2d2d2d2d2d2d2d2d2d2d2d2d2d2d2d2', // Stored file hash value
    'a3a3a3a3a3a3a3a3a3a3a3a3a3a3a3a3'
];

// Check whether the calculated hash value exists in the stored hash value
if (in_array($hashValue, $storedHashes)) {
    echo "Duplicate file content!";
} else {
    echo "File content is not repeated,Ready to upload or store!