Crisp Reading Notes on Latest Technology Trends and Basics

References

Document Encodings

HTTP Responses

  • HTTP responses usually contain a “Content-Type: xxxx; charset=yyy”
    • Based on the charset, the encoding can be inferred
  • This requires the WebServer to either
    • know the encoding up-front,
    • or to understand it by reading a bit of the document
  • A XML document always begins with <?xml encoding=”…”>
  • A HTML document would have this as part of its meta-tag itself

In the Document itself

<meta http-equiv=“Content-Type” content=“text/html; charset=utf-8”>

Browser Inferring the Encoding

  • For many encodings, the browser tries to infer the encoding from the distribution of characters
  • This applies for variants of the Code-page encodings
    • Each language gets its own set of mappings, that have their own distributions in typical documents
    • If the browser did not get it right, we just change the encoding manually on the browser, and read the document.

URL Encodings

  • The URL encodings are relevant in two places
    • As the URL in the HTTP request ( Both GET and POST )
    • For posting the contents of the form
  • The encoding for URLs is
    • Convert to UTF-8 first
    • Then, replace all reserved characters with their %-escaped sequences
    • Other sequences may also be %-escaped.
  • During form-submit, the payload could be www-form-url encoded.
    • This also follows the URL encoding rules for the most part.

UTF Encodings

UTF encodings have the following interesting features that make them very good encodings

  1. The beginning of a character has a zero in the first bit or  11 in the first two bits
    • This makes it easy to synchronize the bytes
    • The number of bytes occupied is specified by the number of Contiguous 1s in the first byte.
  2. This makes it easy to skip over this character and move to the next
    • Also, it clearly shows what kind of UTF encoding is being used (JSON)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Tag Cloud

%d bloggers like this: