2012-09-10

←Older revision

Revision as of 06:44, 10 September 2012

(One intermediate revision not shown.)

Line 68:

Line 68:

http://www.my.site/dir/page.html

http://www.my.site/dir/page.html

-

When a form is sent, using the default GET method, the data values from the form are included in the URL
,
e.g.:

+

When a form is sent, using the default GET method, the data values from the form are included in the URL e.g.:

http://www.my.site/dir/page.html?name=Durst&age=18

http://www.my.site/dir/page.html?name=Durst&age=18

Line 94:

Line 94:

If UTF-8 is not correct for your environment, you may use one of two methods to set the charset encoding of the query string in GET requests:

If UTF-8 is not correct for your environment, you may use one of two methods to set the charset encoding of the query string in GET requests:

-

# to change the encoding on a request by request basis, call [http://download.eclipse.org/jetty/stable-7/apidocs/org/eclipse/jetty/server/Request.html#setQueryEncoding%28java.lang.String%29 Request.setQueryEncoding("xxx")] - where "xxx" is replaced with the name of the charset encoding desired - before reading any of the content or params. If you don't wish to use this jetty-specific API, you can instead call [http://docs.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#setAttribute%28java.lang.String,%20java.lang.Object%29 javax.servlet.ServletRequest.setAttribute("org.eclipse.jetty.server.Request.queryEncoding", "xxx")] - where "xxx" is replaced with the name of the charset encoding desired - before reading the content or params. ''NOTE'' that for efficiency reasons, the parameters are parsed only the first time any of the Request.getParameterXXX() methods are called. Therefore, if you are not sure of the encoding used on the request, after using one of the 2 methods explained above for setting the character encoding, retrieve the query string with [http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getQueryString%28%29 javax.servlet.http.HttpServletRequest.getQueryString() and try parsing it yourself (see below "Techniques for working with international characters" for further tips for working out what encoding was used).

+

# to change the encoding on a request by request basis, call [http://download.eclipse.org/jetty/stable-7/apidocs/org/eclipse/jetty/server/Request.html#setQueryEncoding%28java.lang.String%29 Request.setQueryEncoding("xxx")] - where "xxx" is replaced with the name of the charset encoding desired - before reading any of the content or params. If you don't wish to use this jetty-specific API, you can instead call [http://docs.oracle.com/javaee/6/api/javax/servlet/ServletRequest.html#setAttribute%28java.lang.String,%20java.lang.Object%29 javax.servlet.ServletRequest.setAttribute("org.eclipse.jetty.server.Request.queryEncoding", "xxx")] - where "xxx" is replaced with the name of the charset encoding desired - before reading the content or params. ''NOTE'' that for efficiency reasons, the parameters are parsed only the first time any of the Request.getParameterXXX() methods are called. Therefore, if you are not sure of the encoding used on the request, after using one of the 2 methods explained above for setting the character encoding, retrieve the query string with [http://docs.oracle.com/javaee/6/api/javax/servlet/http/HttpServletRequest.html#getQueryString%28%29 javax.servlet.http.HttpServletRequest.getQueryString()
]
and try parsing it yourself (see below "Techniques for working with international characters" for further tips for working out what encoding was used).

# to change the encoding for every request, set the system property "org.eclipse.jetty.util.UrlEncoding.charset" to the encoding you want to use.

# to change the encoding for every request, set the system property "org.eclipse.jetty.util.UrlEncoding.charset" to the encoding you want to use.

+

+

==== International characters in domain names ====

+

+

In May 2010, the first [http://en.wikipedia.org/wiki/Internationalized_domain_name domain names using internationalized characters] were loaded into the DNS root name servers. It is now possible to register domains such as:

+

+

åäö.com

+

+

To accommodate these names, given the restrictions on acceptable characters in hostnames, web browsers and clients must translate internationalized hostnames to ascii-only using an encoding called [http://tools.ietf.org/html/rfc3492 punycode]. Here's an example:

+

+

The following domain name:

+

+

http://www.åäö.com:8080/test/

+

+

would be translated by the browser to:

+

+

http://www.xn--4cab6c.com:8080/test/

+

+

You can read more about internationalized domain names and jetty (such as how to configure virtual hosts with internationalized characters) here:

+

[[/Jetty/Howto/Configure_Virtual_Hosts#Configuring_Virtual_Hosts_with_Non-ascii_Characters Virtual Hosts with non-ascii characters]]

===Handling of International characters by browsers===

===Handling of International characters by browsers===

Line 144:

Line 163:

===International characters in Jetty configuration files===

===International characters in Jetty configuration files===

-

Jetty configuration files use XML. If international characters need to be included in these files, the standard XML character encoding method should be used. Where the character has a defined abbreviation (such as ü for u-umlaut), that should be used. In other cases the hexadecimal description of the character's Unicode code value should be used. For example Α defines the Greek capital A letter. Use of the decimal form (Α) seems now to be unfashionable in W3C circles.

-

===Future Possibilities===

+

Jetty configuration files use XML. If international characters need
to be
included
in
these files,
the
standard XML character encoding method should be used. Where the character has
a
defined abbreviation (such as ü for u-umlaut),
that
should
be
used
.
In other cases
the
hexadecimal description
of the
character's Unicode code value should be used
.
For example Α defines the Greek capital A letter. Use
of the
decimal
form
(Α) seems now
to
be unfashionable in W3C circles
.

-

It is
to be
hoped that something like the IRI scheme described
in the
Internet Draft will evolve into
a
standard
that
will
be
adopted by suppliers of web servers and browsers
.
It will probably also need changes to HTTP and/or
the
use of internationalised versions
of the
http and https protocols
.
As currently drafted, such a scheme would not,
of
its own solve
the
problem of dynamic data derived from
form
GET submissions. This will require changes
to
HTML4 or, more likely, extensions to a future version of XHTML
.

+

-

The whole area of form data handling may be radically improved if the Xforms program is successful. This has defined an XML-based approach
to
forms and associated data and event handling and uses Unicode throughout. The Xforms 1.0 specification is currently (August 2002) at 'last call working draft' status
,
and a number of experimental implementations, some using browser applets or plug
-
ins, have been announced.

+

For information on how
to
configure virtual host names that use internationlized domain names in jetty config files
,
see [[/Jetty/Howto/Configure_Virtual_Hosts#Configuring_Virtual_Hosts_with_Non
-
ascii_Characters]]

-

Neither of these likely developments will improve the handling of international characters by 'todays' browsers, so designers of web services for the 'open' market seem likely to have to work within today's constraints for a long time.

-

Anyone interested in the full complexity of handing international characters and languages might like to read the W3C's Character Model (currently a working draft) and follow the W3C's International Activity.

}}

}}

Show more