背景
在文件名可能千奇百怪家坎,常見的問題為中文亂碼和標(biāo)點不識別的問題。比如中文情況吝梅,要么文件名被轉(zhuǎn)碼為%xx格式虱疏,要么空格被url截斷等;又比如下載文件名中有英文的逗號嘆號苏携,這樣的文件可能導(dǎo)致下載請求處理異常做瞪。
原因
采用HTTP協(xié)議下載文件時,需要在HTTP請求的頭部設(shè)置Content-Type和Content-Disposition,前者與文件類型相關(guān)装蓬,后者用于指定下載后文件名以及相應(yīng)的編碼規(guī)則著拭。
根據(jù)RFC 3986,URL中的特殊字符將被轉(zhuǎn)義為 "%xx"格式(%加上一個16進(jìn)制數(shù)字)牍帚,具體見下文:
A percent-encoding mechanism is used to represent a data octet in a component when that octet’s corresponding character is outside the allowed set or is being used as a delimiter of, or within, the component. A percent-encoded octet is encoded as a character triplet, consisting of the percent character “%” followed by the two hexadecimal digits representing that octet’s numeric value. For example, “%20” is the percent-encoding for the binary octet “00100000” (ABNF: %x20), which in US-ASCII corresponds to the space character (SP). Section 2.4 describes when percent-encoding and decoding is applied.
pct-encoded = "%" HEXDIG HEXDIG
The uppercase hexadecimal digits ‘A’ through ‘F’ are equivalent to the lowercase digits ‘a(chǎn)’ through ‘f’, respectively. If two URIs differ only in the case of hexadecimal digits used in percent-encoded octets, they are equivalent. For consistency, URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings.
ISO-8859-1 編碼是單字節(jié)編碼儡遮,向下兼容ASCII,其編碼范圍是0x00-0xFF暗赶,0x00-0x7F之間完全和ASCII一致鄙币,0x80-0x9F之間是控制字符,0xA0-0xFF之間是文字符號蹂随。
Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.
處理方法
1.指定Header的編碼十嘿,由于空格會被轉(zhuǎn)義為"\+",因為要將其轉(zhuǎn)為"%20"
response.addHeader(HttpHeaders.CONTENT_DISPOSITION, "attachment;filename*=UTF-8''" + URLEncoder.encode(filename, "UTF-8").replaceAll("\\+", "%20"));
2.指定Tomcat編碼
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" URIEncoding="UTF-8" />
3.應(yīng)用添加編碼過濾器
<filter>
<filter-name>springUtf8Encoding</filter-name>
<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
<init-param>
<param-name>encoding</param-name>
<param-value>UTF-8</param-value>
</init-param>
<init-param>
<param-name>forceEncoding</param-name>
<param-value>true</param-value>
</init-param>
</filter>
<filter-mapping>
<filter-name>springUtf8Encoding</filter-name>
<url-pattern>/*</url-pattern>
</filter-mapping>
參考:
blog.robotshell.org/2012/deal-with-http-header-encoding-for-file-download/
borninsummer.com/2016/12/07/http-charset/