Performance optimization is critical when performing large-scale API crawling tasks. For developers using PHP and cURL, rational use of curl_share_init() can significantly reduce system resource consumption and improve efficiency. This article will introduce in detail the function, usage scenarios and how to correctly implement it in the code.
curl_share_init() is a function provided by the PHP cURL extension to initialize a shared handle (cURL Share Handle). By sharing handles, you can share resources such as DNS cache, SSL session cache, cookies, etc. between multiple cURL easy handles, avoiding repeated connections or repeated resolution of domain names, thereby significantly improving the performance of multiple concurrent requests.
In short, the core function of this function is "resource sharing and reduce duplicate work".
When you need to make hundreds or thousands of HTTP requests at the same time, for example:
Crawl a large REST API database;
Regularly synchronize multiple third-party data sources;
Access the same API endpoint with high concurrency.
If each request establishes a DNS query and SSL handshake separately, this can cause a huge performance waste. Using curl_share_init() can allow these requests to share partial cache, reducing overhead.
Here is a sample code for optimizing large-scale API crawling tasks using curl_share_init() :
<?php
// Initialize the shared handle
$sh = curl_share_init();
// Set the shared resource type,We share here DNS and Cookie
curl_share_setopt($sh, CURLSHOPT_SHARE, CURL_LOCK_DATA_DNS);
curl_share_setopt($sh, CURLSHOPT_SHARE, CURL_LOCK_DATA_COOKIE);
$urls = [
'https://m66.net/api/data1',
'https://m66.net/api/data2',
'https://m66.net/api/data3',
// You can continue to add more API address
];
$multiHandle = curl_multi_init();
$curlHandles = [];
// Initialize each cURL easy Handle
foreach ($urls as $i => $url) {
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// 绑定共享Handle
curl_setopt($ch, CURLOPT_SHARE, $sh);
curl_multi_add_handle($multiHandle, $ch);
$curlHandles[$i] = $ch;
}
// Perform concurrent requests
$running = null;
do {
curl_multi_exec($multiHandle, $running);
curl_multi_select($multiHandle);
} while ($running > 0);
// Get results
foreach ($curlHandles as $i => $ch) {
$response = curl_multi_getcontent($ch);
echo "Response from {$urls[$i]}:\n";
echo $response . "\n";
curl_multi_remove_handle($multiHandle, $ch);
curl_close($ch);
}
// Free up resources
curl_share_close($sh);
curl_multi_close($multiHandle);
Shared resource settings <br> Commonly used to specify shared resource types using curl_share_setopt() include:
CURL_LOCK_DATA_DNS : Shared DNS cache.
CURL_LOCK_DATA_COOKIE : Share cookies.
CURL_LOCK_DATA_SSL_SESSION : Share the SSL session.
Don't share across processes
The shared handle to cURL applies only to the same process. Multi-process crawling requires the use of other mechanisms (such as external caches or databases).
Release resources correctly <br> After the crawling task is over, remember to call curl_share_close() and curl_multi_close( ) to avoid memory leakage
By enabling DNS and SSL session sharing, you can avoid:
Repeat the same domain name (especially for a large number of requests from the same domain);
Repeat the SSL handshake (especially HTTPS requests).
In actual testing, for hundreds of concurrent requests, this optimization can significantly reduce CPU and network load and improve overall throughput.