通过HTTP传送数据时,有些时候并不能事先确定body的长度,因此无法得到Content-Length的值, 就不能在header中指定Content-Length了,造成的最直接的影响就是:接收方无法通过Content-Length得到报文体的长度, 那怎么判断发送方发送完毕了呢?HTTP 1.1协议在header中引入了Transfer-Encoding,当其值为chunked时, 表明采用chunked编码方式来进行报文体的传输
HTTP 1.1中有两个实体头(Entity-Header)直接与编码相关,分别为Content-Encoding和Transfer-Encoding.
先说Content-Encoding, 该头表示实体已经采用了的编码方式.Content-Encoding是请求URL对应实体(Entity)本身的一部分.比如请求URL为http://host/image.png.gz时,可能会得到的Content-Encoding为gzip.Content-Encoding的值是不区分大小写的,目前HTTP1.1标准中已包括的有gzip/compress/deflate/identity等.
与Content-Encoding头对应,HTTP请求中包含了一个Accept-Encoding头,该头用来说明用户代理(User-Agent,一般也就是浏览器)能接受哪些类型的编码. 如果HTTP请求中不存在该头,服务器可以认为用户代理能接受任何编码类型.
接下来重点描述Transfer-Encoding, 该头表示为了达到安全传输或者数据压缩等目的而对实体进行的编码. Transfer-Encoding与Content-Encoding的不同之处在于:
1, Transfer-Encoding只是在传输过程中才有的,并非请求URL对应实体的本身特性.
2, Transfer-Encoding是一个"跳到跳"头,而Content-Encoding是"端到端"头.
该头的用途举例如,请求URL为http://host/abc.txt,服务器发送数据时认为该文件可用gzip方式压缩以节省带宽,接收端看到Transfer-Encoding为gzip首先进行解码然后才能得到请求实体.
此外多个编码可能同时对同一实体使用,所以Transfer-Encoding头中编码顺序相当重要,它代表了解码的顺序过程.同样,Transfer-Encoding的值也是不区分大小写的,目前HTTP1.1标准中已包括的有gzip/compress/deflate/identity/chunked等.
Transfer-Encoding中有一类特定编码:chunked编码.该编码将实体分块传送并逐块标明长度,直到长度为0块表示传输结束, 这在实体长度未知时特别有用(比如由数据库动态产生的数据). HTTP1.1标准规定,只要使用了Transfer-Encoding的地方就必须使用chunked编码,并且chunked必须为最后一层编码.任何HTTP 1.1应用都必须能处理chunked编码.
与Transfer-Encoding对应的请求头为TE,它主要表示请求发起者愿意接收的Transfer-Encoding类型. 如果TE为空或者不存在,则表示唯一能接受的类型为chunked.
其他与Transfer-Encoding相关的头还包括Trailer,它与chunked编码相关,就不细述了.
顾名思义,Content-Length表示传输的实体长度,以字节为单位(在请求方法为HEAD时表示会要发送的长度,但并不实际发送.).Content-Length受Transfer-Encoding影响很大,只要Transfer-Encoding不为identity,则实际传输长度由编码中的chunked决定,Content-Length即使存在也被忽略.
关于HTTP Message Body的长度
在HTTP中有消息体(Message body)和实体(Entity body)之分,简单说来在没有Transfer-Encoding作用时,消息体就是实体,而应用了Transfer-Encoding后,消息体就是编码后的实体,如下:
1 2 3 4 5 6 |
Message body = Transfer-Encoding encode(Entity body) 如何确定消息体的长度? HTTP 1.1标准给出了如下方法(按照优先级依次排列): 1, 响应状态(Response Status)为1xx/204/304或者请求方法为HEAD时,消息体长度为0. 2, 如果使用了非"identity"的Transfer-Encoding编码方式,则消息体长度由"chunked"编码决定,除非该消息以连接关闭为结束. 3, 如果存在"Content-Length"实体头,则消息长度为该数值. 3, 如果消息使用关闭连接方式代表消息体结束,则长度由关闭前收到的长度决定. 该条对HTTP Request包含的消息体不适用. |
具体详细的 RFC 7230 说明如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 |
3.3.3. Message Body Length The length of a message body is determined by one of the following (in order of precedence): 1. Any response to a HEAD request and any response with a 1xx (Informational), 204 (No Content), or 304 (Not Modified) status code is always terminated by the first empty line after the header fields, regardless of the header fields present in the message, and thus cannot contain a message body. 2. Any 2xx (Successful) response to a CONNECT request implies that the connection will become a tunnel immediately after the empty line that concludes the header fields. A client MUST ignore any Content-Length or Transfer-Encoding header fields received in such a message. 3. If a Transfer-Encoding header field is present and the chunked transfer coding (Section 4.1) is the final encoding, the message body length is determined by reading and decoding the chunked data until the transfer coding indicates the data is complete. If a Transfer-Encoding header field is present in a response and the chunked transfer coding is not the final encoding, the message body length is determined by reading the connection until it is closed by the server. If a Transfer-Encoding header field is present in a request and the chunked transfer coding is not the final encoding, the message body length cannot be determined reliably; the server MUST respond with the 400 (Bad Request) status code and then close the connection. If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length. Such a message might indicate an attempt to perform request smuggling (Section 9.5) or response splitting (Section 9.4) and ought to be handled as an error. A sender MUST remove the received Content-Length field prior to forwarding such a message downstream. 4. If a message is received without Transfer-Encoding and with either multiple Content-Length header fields having differing field-values or a single Content-Length header field having an invalid value, then the message framing is invalid and the recipient MUST treat it as an unrecoverable error. If this is a request message, the server MUST respond with a 400 (Bad Request) status code and then close the connection. If this is a response message received by a proxy, the proxy MUST close the connection to the server, discard the received response, and send a 502 (Bad Gateway) response to the client. If this is a response message received by a user agent, the user agent MUST close the connection to the server and discard the received response. 5. If a valid Content-Length header field is present without Transfer-Encoding, its decimal value defines the expected message body length in octets. If the sender closes the connection or the recipient times out before the indicated number of octets are received, the recipient MUST consider the message to be incomplete and close the connection. 6. If this is a request message and none of the above are true, then the message body length is zero (no message body is present). 7. Otherwise, this is a response message without a declared message body length, so the message body length is determined by the number of octets received prior to the server closing the connection. |