問題描述
我正在使用 DomDocument 用 PHP 生成 XML 文件,我需要處理亞洲字符.我正在使用 pdo_mssql 驅動程序從 MSSQL2008 服務器中提取數據,并對 XML 屬性值應用 utf8_encode().只要沒有特殊字符,一切正常.
I'm generating a XML file with PHP using DomDocument and I need to handle asian characters. I'm pulling data from the MSSQL2008 server using the pdo_mssql driver and I apply utf8_encode() on the XML attribute values. Everything works fine as long as there's no special characters.
服務器是 MS SQL Server 2008 SP3
The server is MS SQL Server 2008 SP3
數據庫、表和列的排序規則都是SQL_Latin1_General_CP1_CI_AS
The database, table and column collation are all SQL_Latin1_General_CP1_CI_AS
我使用的是 PHP 5.2.17
I'm using PHP 5.2.17
這是我的 PDO 對象:
Here's my PDO object:
$pdo = new PDO("mssql:host=MyServer,1433;dbname=MyDatabase", user123, password123);
我的查詢是一個基本的 SELECT.
My query is a basic SELECT.
我知道將特殊字符存儲到 SQL_Latin1_General_CP1_CI_AS 列中并不是很好,但理想情況下,讓它在不更改的情況下工作會很好,因為其他非 PHP 程序已經使用該列并且它工作正常.在 SQL Server Management Studio 中,我可以正確地看到亞洲字符.
I know storing special characters into SQL_Latin1_General_CP1_CI_AS columns isn't great, but ideally it would be nice to make it work without changing it, because other non-PHP programs already use that column and it works fine. In SQL Server Management Studio I can see the asian characters correctly.
考慮到以上所有細節,我應該如何處理數據?
Considering all the details above, how should I process the data?
推薦答案
我找到了解決方法,所以希望這對某人有所幫助.
I found how to solve it, so hopefully this will be helpful to someone.
首先,SQL_Latin1_General_CP1_CI_AS 是 CP-1252 和 UTF-8 的奇怪組合.基本字符是 CP-1252,所以這就是為什么我所要做的就是 UTF-8 并且一切正常.亞洲和其他 UTF-8 字符以 2 個字節編碼,php pdo_mssql 驅動程序似乎討厭不同長度的字符,因此它似乎對 varchar(而不是 nvarchar)執行 CAST,然后所有 2 個字節字符都變成問號('?').
First, SQL_Latin1_General_CP1_CI_AS is a strange mix of CP-1252 and UTF-8. The basic characters are CP-1252, so this is why all I had to do was UTF-8 and everything worked. The asian and other UTF-8 characters are encoded on 2 bytes and the php pdo_mssql driver seems to hate varying length characters so it seems to do a CAST to varchar (instead of nvarchar) and then all the 2 byte characters become question marks ('?').
我通過將其轉換為二進制文件來修復它,然后使用 php 重建文本:
I fixed it by casting it to binary and then I rebuild the text with php:
SELECT CAST(MY_COLUMN AS VARBINARY(MAX)) FROM MY_TABLE;
在 php 中:
//Binary to hexadecimal
$hex = bin2hex($bin);
//And then from hex to string
$str = "";
for ($i=0;$i<strlen($hex) -1;$i+=2)
{
$str .= chr(hexdec($hex[$i].$hex[$i+1]));
}
//And then from UCS-2LE/SQL_Latin1_General_CP1_CI_AS (that's the column format in the DB) to UTF-8
$str = iconv('UCS-2LE', 'UTF-8', $str);
這篇關于將 SQL_Latin1_General_CP1_CI_AS 編碼為 UTF-8的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!