問題描述
我正在嘗試使用 XMLHttpRequest
(使用最近的 Webkit)下載二進制文件,并使用這個簡單的函數對其內容進行 base64 編碼:
I'm trying to download a binary file using XMLHttpRequest
(using a recent Webkit) and base64-encode its contents using this simple function:
function getBinary(file){
var xhr = new XMLHttpRequest();
xhr.open("GET", file, false);
xhr.overrideMimeType("text/plain; charset=x-user-defined");
xhr.send(null);
return xhr.responseText;
}
function base64encode(binary) {
return btoa(unescape(encodeURIComponent(binary)));
}
var binary = getBinary('http://some.tld/sample.pdf');
var base64encoded = base64encode(binary);
附帶說明,以上所有內容都是標準的 Javascript 內容,包括 btoa()
和 encodeURIComponent()
:https://developer.mozilla.org/en/DOM/window.btoa
As a side note, everything above is standard Javascript stuff, including btoa()
and encodeURIComponent()
: https://developer.mozilla.org/en/DOM/window.btoa
這很順利,我什至可以使用 Javascript 解碼 base64 內容:
This works pretty smoothly, and I can even decode the base64 contents using Javascript:
function base64decode(base64) {
return decodeURIComponent(escape(atob(base64)));
}
var decodedBinary = base64decode(base64encoded);
decodedBinary === binary // true
現在,我想使用 Python 解碼 base64 編碼的內容,它使用一些 JSON 字符串來獲取 base64encoded
字符串值.天真地這就是我所做的:
Now, I want to decode the base64-encoded contents using Python which consume some JSON string to get the base64encoded
string value. Naively this is what I do:
import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64))
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()
但生成的文件無效,看起來操作與 UTF-8、編碼或我仍然不清楚的東西混淆了.
But the resulting file is invalid, looks like the operation's messaed up with UTF-8, encoding or something which is still unclear to me.
如果我嘗試在將 UTF-8 內容放入目標文件之前對其進行解碼,則會引發錯誤:
If I try to decode UTF-8 contents before putting them in the destination file, an error is raised:
import urllib
import base64
# ... retrieving of base64 encoded string through JSON
base64 = "77+9UE5HDQ……………oaCgA="
source_contents = urllib.unquote(base64.b64decode(base64)).decode('utf-8')
destination_file = open(destination, 'wb')
destination_file.write(source_contents)
destination_file.close()
$ python test.py
// ...
UnicodeEncodeError: 'ascii' codec can't encode character u'ufffd' in position 0: ordinal not in range(128)
附帶說明,這是同一文件的兩個文本表示形式的屏幕截圖;左側:原件;右側:從 base64 解碼字符串創建的字符串:http://cl.ly/0U3G34110z3c132O2e2x
As a side note, here's a screenshot of two textual representations of a same file; on left: the original; on right: the one created from the base64-decoded string: http://cl.ly/0U3G34110z3c132O2e2x
在嘗試重新創建文件時,是否有已知的技巧來規避這些編碼問題?您自己將如何實現這一目標?
Is there a known trick to circumvent these problems with encoding when attempting to recreating the file? How would you achieve this yourself?
非常感謝任何幫助或提示:)
Any help or hint much appreciated?:)
推薦答案
所以我在回答自己——對此感到抱歉——但我認為這對像我這樣迷失的人可能有用;)
So I'm answering to myself — and sorry for that?— but I think it might be useful for someone as lost as I was?;)
所以你必須使用 ArrayBuffer 并設置 responseType
XMLHttpRequest
對象實例的屬性到 arraybuffer
用于檢索原生字節數組,可以使用以下便捷函數將其轉換為 base64(找到 那里,作者在這里可能有福了):
So you have to use ArrayBuffer and set the responseType
property of your XMLHttpRequest
object instance to arraybuffer
for retrieving a native array of Bytes, which can be converted to base64 using the following convenient function (found there, author may be blessed here):
function base64ArrayBuffer(arrayBuffer) {
var base64 = ''
var encodings = 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/'
var bytes = new Uint8Array(arrayBuffer)
var byteLength = bytes.byteLength
var byteRemainder = byteLength % 3
var mainLength = byteLength - byteRemainder
var a, b, c, d
var chunk
// Main loop deals with bytes in chunks of 3
for (var i = 0; i < mainLength; i = i + 3) {
// Combine the three bytes into a single integer
chunk = (bytes[i] << 16) | (bytes[i + 1] << 8) | bytes[i + 2]
// Use bitmasks to extract 6-bit segments from the triplet
a = (chunk & 16515072) >> 18 // 16515072 = (2^6 - 1) << 18
b = (chunk & 258048) >> 12 // 258048 = (2^6 - 1) << 12
c = (chunk & 4032) >> 6 // 4032 = (2^6 - 1) << 6
d = chunk & 63 // 63 = 2^6 - 1
// Convert the raw binary segments to the appropriate ASCII encoding
base64 += encodings[a] + encodings[b] + encodings[c] + encodings[d]
}
// Deal with the remaining bytes and padding
if (byteRemainder == 1) {
chunk = bytes[mainLength]
a = (chunk & 252) >> 2 // 252 = (2^6 - 1) << 2
// Set the 4 least significant bits to zero
b = (chunk & 3) << 4 // 3 = 2^2 - 1
base64 += encodings[a] + encodings[b] + '=='
} else if (byteRemainder == 2) {
chunk = (bytes[mainLength] << 8) | bytes[mainLength + 1]
a = (chunk & 64512) >> 10 // 64512 = (2^6 - 1) << 10
b = (chunk & 1008) >> 4 // 1008 = (2^6 - 1) << 4
// Set the 2 least significant bits to zero
c = (chunk & 15) << 2 // 15 = 2^4 - 1
base64 += encodings[a] + encodings[b] + encodings[c] + '='
}
return base64
}
所以這是一個有效的代碼:
So here's a working code:
var xhr = new XMLHttpRequest();
xhr.open('GET', 'http://some.tld/favicon.png', false);
xhr.responseType = 'arraybuffer';
xhr.onload = function(e) {
console.log(base64ArrayBuffer(e.currentTarget.response));
};
xhr.send();
這將記錄一個有效表示二進制文件內容的base64編碼字符串.
This will log a valid base64 encoded string representing the binary file contents.
對于無法訪問 ArrayBuffer
并且 btoa()
編碼字符失敗的舊瀏覽器,這是另一種獲取任何二進制文件的 base64 編碼版本:
For older browsers not having access to ArrayBuffer
and having btoa()
failing on encoding characters, here's another way to get a base64 encoded version of any binary:
function getBinary(file){
var xhr = new XMLHttpRequest();
xhr.open("GET", file, false);
xhr.overrideMimeType("text/plain; charset=x-user-defined");
xhr.send(null);
return xhr.responseText;
}
function base64Encode(str) {
var CHARS = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
var out = "", i = 0, len = str.length, c1, c2, c3;
while (i < len) {
c1 = str.charCodeAt(i++) & 0xff;
if (i == len) {
out += CHARS.charAt(c1 >> 2);
out += CHARS.charAt((c1 & 0x3) << 4);
out += "==";
break;
}
c2 = str.charCodeAt(i++);
if (i == len) {
out += CHARS.charAt(c1 >> 2);
out += CHARS.charAt(((c1 & 0x3)<< 4) | ((c2 & 0xF0) >> 4));
out += CHARS.charAt((c2 & 0xF) << 2);
out += "=";
break;
}
c3 = str.charCodeAt(i++);
out += CHARS.charAt(c1 >> 2);
out += CHARS.charAt(((c1 & 0x3) << 4) | ((c2 & 0xF0) >> 4));
out += CHARS.charAt(((c2 & 0xF) << 2) | ((c3 & 0xC0) >> 6));
out += CHARS.charAt(c3 & 0x3F);
}
return out;
}
console.log(base64Encode(getBinary('http://www.google.fr/images/srpr/logo3w.png')));
希望這能像對我一樣幫助其他人.
Hope this helps others as it did for me.
這篇關于使用 Javascript 檢索二進制文件內容,base64 對其進行編碼并使用 Python 對其進行反向解碼的文章就介紹到這了,希望我們推薦的答案對大家有所幫助,也希望大家多多支持html5模板網!