問題描述
我一直在閱讀有關 PHP 文件上傳安全性的文章,并且有幾篇文章建議重命名文件.例如,OWASP 文章無限制文件上傳說:
I've been reading up on PHP file upload security and a few articles have recommended renaming the files. For example, the OWASP article Unrestricted File Upload says:
建議使用算法來確定文件名.為了例如,文件名可以是文件名加上文件名的 MD5 哈希值當天的日期.
It is recommended to use an algorithm to determine the filenames. For instance, a filename can be a MD5 hash of the name of file plus the date of the day.
如果用戶上傳了一個名為 Cake Recipe.doc
的文件,是否真的有理由將其重命名為 45706365b7d5b1f35
?
If a user uploads a file named Cake Recipe.doc
is there really any reason to rename it to 45706365b7d5b1f35
?
如果答案是肯定的,無論出于何種原因,那么您如何跟蹤原始文件名和擴展名?
If the answer is yes, for whatever reason, then how do you keep track of the original file name and extension?
推薦答案
對于您的主要問題,重命名文件是否是一種好習慣,答案是肯定的,特別是如果您正在創建一種用戶上傳的文件存儲庫形式他們選擇的文件(和文件名),原因如下:
To your primary question, is it good practice to rename files, the answer is a definite yes, especially if you are creating a form of File Repository where users upload files (and filenames) of their choosing, for several reason:
- 安全性 - 如果您的應用程序編寫不當,允許按名稱或通過直接訪問下載文件(這很可怕,但確實發生了),那么用戶(無論是惡意的還是故意的)就更難猜測"" 文件名.
- 唯一性 -- 兩個不同的人上傳同名文件的可能性非常高(即 avatar.gif、readme.txt、video.avi 等).使用唯一標識符可顯著降低兩個文件同名的可能性.
- 版本控制——使用唯一名稱保存文檔的多個版本"要容易得多.它還避免了需要額外的代碼來解析文件名以進行更改.一個簡單的例子是將 document.pdf 轉換為 document(1).pdf,當您不低估用戶為事物創建可怕名稱的能力時,這會變得更加復雜.
- Length -- 使用已知的文件名長度總是比使用未知的文件名長度更好.我總是可以知道(我的文件路徑)+(X 個字母)是某個長度,其中(我的文件路徑)+(隨機用戶文件名)是完全未知的.
- OS -- 在嘗試將極其隨機/長的文件名寫入驅動器時,上述長度也會產生問題.您必須考慮特殊字符、長度和修剪文件名的問題(用戶可能無法收到工作文件,因為擴展名已被修剪).
- 執行 -- 操作系統很容易執行名為 .exe、.php 或(插入其他擴展名)的文件.沒有擴展就很難.
- URL 編碼 -- 確保名稱是 URL 安全的.
Cake Recipe.doc
不是 URL 安全名稱,并且可能在某些系統(服務器端或瀏覽器端)/某些情況下,當名稱應為urlencode
時導致不一致d 值.
- Security - if you have a poorly written application that allows the download of files by name or through direct access (it's a horrid, but it happens), it's much harder for a user, whether maliciously or on purpose, to "guess" the names of files.
- Uniqueness -- the likelihood of two different people uploading a file of the same name is very high (ie. avatar.gif, readme.txt, video.avi, etc). The use of a unique identifier significantly decreases the likelihood that two files will be of the same name.
- Versioning -- It is much easier to keep multiple "versions" of a document using unique names. It also avoids the need for additional code to parse a filename to make changes. A simple example would document.pdf to document(1).pdf, which becomes more complicated when you don't underestimate users abilities to create horrible names for things.
- Length -- working with known filename lengths is always better than working with unknown filename lengths. I can always know that (my filepath) + (X letters) is a certain length, where (my filepath) + (random user filename) is completely unknown.
- OS -- the length above can also create problems when attempting to write extremely random/long filenames to a drive. You have to account for special characters, lengths and the concerns for trimmed filenames (user may not receive a working file because the extension has been trimmed).
- Execution -- It's easy for the OS to execute a file named .exe, or .php, or (insert other extension). It's hard when there isn't an extension.
- URL encoding -- Ensuring the name is URL safe.
Cake Recipe.doc
is not a URL safe name, and can on some systems (either server or browser side) / some situations, cause inconsistencies when the name should be aurlencode
d value.
至于存儲信息,您通常會在數據庫中執行此操作,這與您已有的需求沒有什么不同,因為您需要一種方法來引用文件(誰上傳,名稱是什么,有時它在哪里存儲,上傳時間,有時是大小).除了文件的用戶名之外,您只需添加文件的實際存儲名稱.
As for storing the information, you would typically do this in a database, no different than the need you have already, since you need a way to refer back to the file (who uploaded, what the name is, occassionally where it is stored, the time of upload, sometimes the size). You're simply adding to that the actual stored name of the file in addition to the user's name for the file.
OWASP 的建議不錯——使用文件名和時間戳(不是日期)大多是唯一的.我更進一步,包括帶有時間戳的微時間,以及其他一些獨特的信息,這樣就不會在同一時間段內重復上傳小文件——我還存儲了上傳日期這是針對 md5 沖突的額外保險,在存儲許多文件和多年的系統中,這種沖突的可能性更高.您極不可能在同一天使用文件名和微時間生成兩個像 md5s 一樣的文件.一個例子是:
The OWASP recommendation isn't a bad one -- using the filename and a timestamp (not date) would be mostly unique. I take it a step further to include the microtime with the timestamp, and often some other unique bit of information, so that a duplicate upload of a small file couldn't occur in the same timeframe -- I also store the date of the upload which is additional insurance against md5 clashes, which has a higher probability in systems that store many files and for years. It is incredibly unlikely that you would generate two like md5s, using filename and microtime, on the same day. An example would be:
$filename = date('Ymd') . '_' . md5($uploaded_filename . microtime());
我的 2 美分.
這篇關于上傳的文件應該重命名嗎?的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!