Current Location: Home> Latest Articles> Use array_chunk to cooperate with array_intersect to find duplicates

Use array_chunk to cooperate with array_intersect to find duplicates

M66 2025-04-28

In PHP, array operations are very common tasks. Especially in data processing, finding duplicates in arrays is a very common problem. PHP provides multiple functions to help us achieve this goal, with array_chunk and array_intersect being two very useful functions. This article will introduce how to use these two functions to find duplicates in an array and what you need to pay attention to when dealing with big data.

1. Introduction to array_chunk function

The array_chunk function divides a large array into multiple small arrays. It accepts two parameters, the first is the original array and the second is the size of each small array. This function returns an array containing multiple small arrays.

 $input = range(1, 10); // Generate a from 1 arrive 10 Array of
$chunks = array_chunk($input, 3); // Divide the array into each containing 3 Subarray of elements
print_r($chunks);

Output:

 Array
(
    [0] => Array
        (
            [0] => 1
            [1] => 2
            [2] => 3
        )

    [1] => Array
        (
            [0] => 4
            [1] => 5
            [2] => 6
        )

    [2] => Array
        (
            [0] => 7
            [1] => 8
            [2] => 9
        )

    [3] => Array
        (
            [0] => 10
        )
)

With array_chunk we can split large arrays into smaller arrays, which in some cases will make subsequent operations more efficient.

2. Introduction to array_intersect function

The array_intersect function is used to find the same elements in two arrays. It returns an array containing the intersections in two arrays.

 $array1 = [1, 2, 3, 4, 5];
$array2 = [3, 4, 5, 6, 7];
$intersection = array_intersect($array1, $array2);
print_r($intersection);

Output:

 Array
(
    [2] => 3
    [3] => 4
    [4] => 5
)

3. Combining array_chunk and array_intersect to find duplicates

To find duplicates in an array, we can use array_chunk to split the array into multiple small pieces and find duplicate elements between different small pieces through array_intersect . Suppose we have an array with a large amount of data and we want to find out the duplicates in it.

Here is a simple example showing how to combine array_chunk and array_intersect to find duplicates:

 // 假设这是我们要处理Array of
$array = [1, 2, 3, 4, 5, 3, 6, 7, 8, 9, 10, 3, 2];

// Split the array into smaller chunks
$chunks = array_chunk($array, 3);

// Find duplicates between blocks
$duplicates = [];
for ($i = 0; $i < count($chunks); $i++) {
    for ($j = $i + 1; $j < count($chunks); $j++) {
        $intersection = array_intersect($chunks[$i], $chunks[$j]);
        if (!empty($intersection)) {
            $duplicates = array_merge($duplicates, $intersection);
        }
    }
}

// Output duplicates
$duplicates = array_unique($duplicates); // Go to the heavy
print_r($duplicates);

Output:

 Array
(
    [0] => 3
    [1] => 2
)

4. Things to note when processing big data

Although the array_chunk and array_intersect functions are very efficient when processing small data sets, we need to consider the following points when processing big data:

  • Memory usage : array_chunk splits the array into multiple small arrays, which may cause memory usage to increase. If the dataset is very large, consider processing data block by block instead of loading all data into memory at once.

  • Efficiency problem : array_intersect may iterate through the entire array every time it is called, which will be very slow for large data sets. Consider using more efficient data structures, such as hash tables, to find duplicates.

  • Batch processing : For extremely large data sets, processing arrays in batches is more efficient than processing the entire array at once. Generators can be used to avoid loading all data into memory at once.

5. Summary

  • Using array_chunk allows you to split large arrays into small arrays, making it easier to process data.

  • Using array_intersect can help us find duplicate elements in two arrays.

  • When processing big data, special attention should be paid to memory management and efficiency issues. You can consider processing data in batches and using more efficient data structures to optimize performance.

Hopefully, through the examples in this article, you will be more proficient in using array_chunk and array_intersect to find duplicates in arrays and take appropriate measures to improve the performance of your code when facing big data.