Current Location: Home> Latest Articles> Use mysqli::get_charset to avoid the problem of failure in Unicode emoji storage

Use mysqli::get_charset to avoid the problem of failure in Unicode emoji storage

M66 2025-05-23

When storing user input content using MySQL database, especially text fields such as comments, nicknames, and chat records, we often encounter a problem: when the user enters an Emoji emoticon or some special Unicode symbol, saving to the database will fail, or even if the save is successful, it will be garbled or question marks when it is taken out and displayed ( ? ).

This is because the default character set of MySQL databases (such as utf8 ) does not fully support 4-byte Unicode characters (including most Emoji). To solve this problem, in addition to database-level configuration, PHP code also needs to ensure that the character set connected to the client is set correctly. This article will explain how to use the mysqli::get_charset function to check and ensure that the connection character set is correct to solve the problem that Unicode emoticons cannot be stored.

Problem background

MySQL's utf8 encoding actually only supports up to 3 bytes of characters, while Emoji and some Unicode characters require 4 bytes, so utf8mb4 must be used. If the server side (database table, fields, connection) does not set utf8mb4 , these characters will either fail to insert, or be truncated and replaced with question marks.

Usually, we need to do three things:

  1. The character set for the database and table is set to utf8mb4 .

  2. Specify utf8mb4 when connecting to the database.

  3. Verify that the character set used in the PHP code is correct.

Check character sets using mysqli::get_charset

PHP's mysqli class provides the get_charset method, which can be used to obtain the character set information of the current connection. With this function, we can check in the code whether the current connection has used utf8mb4 and adjust it if it does not match.

The sample code is as follows:

 <?php
// Database connection information
$mysqli = new mysqli('localhost', 'username', 'password', 'database');

// Check if the connection is successful
if ($mysqli->connect_errno) {
    die('Connection failed: ' . $mysqli->connect_error);
}

// Check the current connection character set
$charsetInfo = $mysqli->get_charset();
echo 'Current connection character set: ' . $charsetInfo->charset . PHP_EOL;

// If not utf8mb4,Then set to utf8mb4
if ($charsetInfo->charset !== 'utf8mb4') {
    if (!$mysqli->set_charset('utf8mb4')) {
        die('Failed to set character set: ' . $mysqli->error);
    } else {
        echo 'The connection character set has been set to utf8mb4' . PHP_EOL;
    }
}

// Example insert contains Emoji Data of
$stmt = $mysqli->prepare('INSERT INTO messages (content) VALUES (?)');
$content = 'test Emoji ?? ??';
$stmt->bind_param('s', $content);

if ($stmt->execute()) {
    echo 'Data insertion successfully!' . PHP_EOL;
} else {
    echo 'Insert failed: ' . $stmt->error . PHP_EOL;
}

$stmt->close();
$mysqli->close();
?>

Database configuration suggestions

To fully support Emoji and other 4-byte characters, in addition to the code-level settings, you also need:

  1. Database, table, and field character set settings:

     ALTER DATABASE your_database CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
    ALTER TABLE your_table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
    
  2. Connection parameters: Make sure the [mysqld] part of the MySQL configuration file my.cnf contains:

     character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci
    

summary

Using mysqli::get_charset can help us dynamically check whether the connection character set is correct in our code and make corrections if necessary. Combining the correct configuration of databases and tables, it can completely solve the problem that Unicode emoji cannot be stored. This way, users can use various Emojis in your application with ease without worrying about loss or garbled code.