問(wèn)題描述
TL;DR:我有一個(gè) CMS 系統(tǒng),它使用文件內(nèi)容的 SHA-1 作為文件名來(lái)存儲(chǔ)附件(不透明文件).鑒于我已經(jīng)知道兩個(gè)文件的 SHA-1 哈希匹配,如何驗(yàn)證上傳的文件是否真的與存儲(chǔ)中的文件匹配?我想要高性能.
TL;DR: I have an CMS system that stores attachments (opaque files) using SHA-1 of the file contents as the filename. How to verify if uploaded file really matches one in the storage, given that I already know that SHA-1 hash matches for both files? I'd like to have high performance.
長(zhǎng)版:
當(dāng)用戶向系統(tǒng)上傳新文件時(shí),我計(jì)算上傳文件內(nèi)容的 SHA-1 哈希值,然后檢查存儲(chǔ)后端中是否已存在具有相同哈希值的文件.PHP 在我的代碼運(yùn)行之前將上傳的文件放在 /tmp
中,然后我對(duì)上傳的文件運(yùn)行 sha1sum
以獲得文件內(nèi)容的 SHA-1 哈希值.然后我從計(jì)算出的 SHA-1 散列計(jì)算扇出并決定 NFS 掛載目錄層次結(jié)構(gòu)下的存儲(chǔ)目錄.(例如,如果文件內(nèi)容的 SHA-1 哈希值為 37aefc1e145992f2cc16fabadcfe23eede5fb094
,則永久文件名是 /nfs/data/files/37/ae/fc1e145992f2cc16eedefadcfe>.除了保存實(shí)際的文件內(nèi)容,我
INSERT
為用戶提交的元數(shù)據(jù)(例如Content-Type
、原始文件名、日期戳等)在SQL 數(shù)據(jù)庫(kù)中添加一個(gè)新行.
When an user uploads a new file to the system, I compute SHA-1 hash of the uploaded file contents and then check if a file with identical hash already exists in the storage backend. PHP puts the uploaded file in /tmp
before my code gets to run and then I run sha1sum
against the uploaded file to get SHA-1 hash of the file contents. I then compute fanout from the computed SHA-1 hash and decide storage directory under NFS mounted directory hierarchy. (For example, if the SHA-1 hash for a file contents is 37aefc1e145992f2cc16fabadcfe23eede5fb094
the permanent file name is /nfs/data/files/37/ae/fc1e145992f2cc16fabadcfe23eede5fb094
.) In addition to saving the actual file contents, I INSERT
a new line into a SQL database for the user submitted meta data (e.g. Content-Type
, original filename, datestamp, etc).
我目前想出的極端情況是,新上傳的文件具有 SHA-1 哈希值,該哈希值與存儲(chǔ)后端中的現(xiàn)有哈希值相匹配.我知道這種意外發(fā)生的變化是天文數(shù)字低,但我想確定.(對(duì)于故意案例,請(qǐng)參閱 https://shattered.io/)
The corner case I'm currently figuring out is the case where a new uploaded file has SHA-1 hash that matches existing hash in the storage backend. I know that the changes for this happening by accident are astronomically low, but I'd like to be sure. (For on purpose case, see https://shattered.io/)
給定兩個(gè)文件名$file_a
和$file_b
,如何快速檢查兩個(gè)文件的內(nèi)容是否相同?假設(shè)文件太大加載到內(nèi)存中.對(duì)于 Python,我會(huì)使用 filecmp.cmp()
但 PHP 似乎沒(méi)有任何類似的東西.我知道這可以通過(guò) fread()
完成并在找到不匹配的字節(jié)時(shí)中止,但我寧愿不編寫該代碼.
Given two filenames $file_a
and $file_b
, how to quickly check if both files have identical contents? Assume that files are too big to be loaded into memory. With Python, I'd use filecmp.cmp()
but PHP does not seem to have anything similar. I know that this can be done with fread()
and aborting if a non-matching byte is found, but I'd rather not write that code.
推薦答案
如果你已經(jīng)有一個(gè) SHA1 和,你可以簡(jiǎn)單地做:
If you already have one SHA1 sum, you can simply do:
if ($known_sha1 == sha1_file($new_file))
否則
if (filesize($file_a) == filesize($file_b)
&& md5_file($file_a) == md5_file($file_b)
)
也檢查文件大小,以在一定程度上防止散列沖突(這已經(jīng)非常不可能了).也使用 MD5,因?yàn)樗?SHA 算法快得多(但不那么獨(dú)特).
Checking file size too, to somewhat prevent a hash collision (which is already very unlikely). Also using MD5 because it's significantly faster than the SHA algorithms (but a little less unique).
更新:
這是如何準(zhǔn)確地相互比較兩個(gè)文件.
This is how to exactly compare two files against each other.
function compareFiles($file_a, $file_b)
{
if (filesize($file_a) != filesize($file_b))
return false;
$chunksize = 4096;
$fp_a = fopen($file_a, 'rb');
$fp_b = fopen($file_b, 'rb');
while (!feof($fp_a) && !feof($fp_b))
{
$d_a = fread($fp_a, $chunksize)
$d_b = fread($fp_b, $chunksize);
if ($d_a === false || $d_b === false || $d_a !== $d_b)
{
fclose($fp_a);
fclose($fp_b);
return false;
}
}
fclose($fp_a);
fclose($fp_b);
return true;
}
這篇關(guān)于使用純 PHP 驗(yàn)證兩個(gè)文件是否相同?的文章就介紹到這了,希望我們推薦的答案對(duì)大家有所幫助,也希望大家多多支持html5模板網(wǎng)!