問題描述
我使用 PHP.
我的字符串看起來像這樣
這是一個字符串測試寬度???和一些über+奇怪的字符:_like this?
問題
有沒有辦法刪除非字母數字字符并用空格替換它們?以下是一些非字母數字字符:
Is there a way to remove non-alphanumeric characters and replace them with a space? Here are some non-alphanumeric characters:
- -
- +
- :
- _
- ?
我已經閱讀了很多關于它的主題,但它們不支持其他語言,比如這個:
I've read many threads about it but they don't support other languages, like this one:
preg_replace("/[^A-Za-z0-9 ]/", '', $string);
要求
- 我的非字母字符列表可能不完整.
- 我的內容包含不同語言的字符,例如 ???ü.可能還有很多.
- 非字母數字字符應替換為空格.否則這個詞會粘在一起.
推薦答案
你可以試試這個:
preg_replace('~[^p{L}p{N}]++~u', ' ', $string);
p{L}
代表所有字母字符(不管是什么字母).
p{L}
stands for all alphabetic characters (whatever the alphabet).
p{N}
代表數字.
帶有 u 修飾符的主題字符串的字符被視為 unicode 字符.
With the u modifier characters of the subject string are treated as unicode characters.
或者這個:
preg_replace('~P{Xan}++~u', ' ', $string);
p{Xan}
包含 unicode 字母和數字.
p{Xan}
contains unicode letters and digits.
P{Xan}
包含所有非 unicode 字母和數字.(小心,它也包含空格,你可以用 ~[^p{Xan}s]++~u
保留)
P{Xan}
contains all that is not unicode letters and digits. (Be careful, it contains white spaces too that you can preserve with ~[^p{Xan}s]++~u
)
如果您想要一組更具體的允許字母,您必須將 p{L}
替換為 unicode 表.
If you want a more specific set of allowed letters you must replace p{L}
with ranges in unicode table.
示例:
preg_replace('~[^a-zà-??-???d]++~ui', ' ', $string);
為什么在這里使用所有格量詞 (++)?
~P{Xan}+~u
會給你和 ~P{Xan}++~u
一樣的結果.這里的區別在于,在第一個引擎中,引擎記錄了每個回溯位置(我們不需要),而在第二個中它沒有(如在原子組中).結果是性能上的利潤很小.
~P{Xan}+~u
will give you the same result as ~P{Xan}++~u
. The difference here is that in the first the engine records each backtracking position (that we don't need) when in the second it doesn't (as in an atomic group). The result is a small performance profit.
我認為在可能的情況下使用所有格量詞和原子組是一種很好的做法.
I think it's a good practice to use possessive quantifiers and atomic groups when it's possible.
然而,PCRE 正則表達式引擎在明顯的情況下會自動使一個量詞所有格(例如:a+b
=> a++b
),除非 PCRE 模塊有使用 PCRE_NO_AUTO_POSSESS 選項編譯.(http://www.pcre.org/pcre.txt)
However, the PCRE regex engine makes automatically a quantifier possessive in obvious situations (example: a+b
=> a++b
) except If the PCRE module has been compiled with the option PCRE_NO_AUTO_POSSESS. (http://www.pcre.org/pcre.txt)
有關所有格量詞和原子組的更多信息此處(所有格量詞)和此處(原子組) 或此處一個>
More informations about possessive quantifiers and atomic groups here (possessive quantifiers) and here (atomic groups) or here
這篇關于PHP 正則表達式 - 刪除所有非字母數字字符的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!