public final class HttpUrl extends Object
http
or https
. Use this
class to compose and decompose Internet addresses. For example, this code will compose and print
a URL for Google search:
HttpUrl url = new HttpUrl.Builder()
.scheme("https")
.host("www.google.com")
.addPathSegment("search")
.addQueryParameter("q", "polar bears")
.build();
System.out.println(url);
which prints:
https://www.google.com/search?q=polar%20bears
As another example, this code prints the human-readable query parameters of a Twitter search:
HttpUrl url = HttpUrl.parse("https://twitter.com/search?q=cute%20%23puppies&f=images");
for (int i = 0, size = url.querySize(); i < size; i++) {
System.out.println(url.queryParameterName(i) + ": " + url.queryParameterValue(i));
}
which prints:
q: cute #puppies
f: images
In addition to composing URLs from their component parts and decomposing URLs into their
component parts, this class implements relative URL resolution: what address you'd reach by
clicking a relative link on a specified page. For example:
HttpUrl base = HttpUrl.parse("https://www.youtube.com/user/WatchTheDaily/videos");
HttpUrl link = base.resolve("../../watch?v=cbP2N1BQdYc");
System.out.println(link);
which prints:
https://www.youtube.com/watch?v=cbP2N1BQdYc
Sometimes referred to as protocol, A URL's scheme describes what mechanism should be
used to retrieve the resource. Although URLs have many schemes (mailto
, file
,
ftp
), this class only supports http
and https
. Use java.net.URI
for URLs with arbitrary schemes.
Username and password are either present, or the empty string ""
if absent. This class
offers no mechanism to differentiate empty from absent. Neither of these components are popular
in practice. Typically HTTP applications use other mechanisms for user identification and
authentication.
The host identifies the webserver that serves the URL's resource. It is either a hostname like
square.com
or localhost
, an IPv4 address like 192.168.0.1
, or an IPv6
address like ::1
.
Usually a webserver is reachable with multiple identifiers: its IP addresses, registered
domain names, and even localhost
when connecting from the server itself. Each of a
webserver's names is a distinct URL and they are not interchangeable. For example, even if http://square.github.io/dagger
and http://google.github.io/dagger
are served by the same
IP address, the two URLs identify different resources.
The port used to connect to the webserver. By default this is 80 for HTTP and 443 for HTTPS. This class never returns -1 for the port: if no port is explicitly specified in the URL then the scheme's default is used.
The path identifies a specific resource on the host. Paths have a hierarchical structure like "/square/okhttp/issues/1486" and decompose into a list of segments like ["square", "okhttp", "issues", "1486"].
This class offers methods to compose and decompose paths by segment. It composes each path from a list of segments by alternating between "/" and the encoded segment. For example the segments ["a", "b"] build "/a/b" and the segments ["a", "b", ""] build "/a/b/".
If a path's last segment is the empty string then the path ends with "/". This class always builds non-empty paths: if the path is omitted it defaults to "/". The default path's segment list is a single empty string: [""].
The query is optional: it can be null, empty, or non-empty. For many HTTP URLs the query string is subdivided into a collection of name-value parameters. This class offers methods to set the query as the single string, or as individual name-value parameters. With name-value parameters the values are optional and names may be repeated.
The fragment is optional: it can be null, empty, or non-empty. Unlike host, port, path, and query the fragment is not sent to the webserver: it's private to the client.
Each component must be encoded before it is embedded in the complete URL. As we saw above, the
string cute #puppies
is encoded as cute%20%23puppies
when used as a query
parameter value.
Percent encoding replaces a character (like 🍩
) with its UTF-8 hex bytes
(like %F0%9F%8D%A9
). This approach works for whitespace characters, control characters,
non-ASCII characters, and characters that already have another meaning in a particular context.
Percent encoding is used in every URL component except for the hostname. But the set of
characters that need to be encoded is different for each component. For example, the path
component must escape all of its ?
characters, otherwise it could be interpreted as the
start of the URL's query. But within the query and fragment components, the ?
character
doesn't delimit anything and doesn't need to be escaped.
HttpUrl url = HttpUrl.parse("http://who-let-the-dogs.out").newBuilder()
.addPathSegment("_Who?_")
.query("_Who?_")
.fragment("_Who?_")
.build();
System.out.println(url);
This prints:
http://who-let-the-dogs.out/_Who%3F_?_Who?_#_Who?_
When parsing URLs that lack percent encoding where it is required, this class will percent encode
the offending characters.
Hostnames have different requirements and use a different encoding scheme. It consists of IDNA mapping and Punycode encoding.
In order to avoid confusion and discourage phishing attacks, IDNA Mapping transforms names to avoid
confusing characters. This includes basic case folding: transforming shouting SQUARE.COM
into cool and casual square.com
. It also handles more exotic characters. For example, the
Unicode trademark sign (™) could be confused for the letters "TM" in http://ho™mail.com
.
To mitigate this, the single character (™) maps to the string (tm). There is similar policy for
all of the 1.1 million Unicode code points. Note that some code points such as "🍩" are
not mapped and cannot be used in a hostname.
Punycode converts a Unicode string to an ASCII
string to make international domain names work everywhere. For example, "σ" encodes as "xn--4xa".
The encoded string is not human readable, but can be used with classes like InetAddress
to establish connections.
Java includes both java.net.URL
and java.net.URI
. We offer a new URL
model to address problems that the others don't.
Although they have different content, java.net.URL
considers the following two URLs
equal, and the equals()
method between them returns true:
java.net.URL
unusable for many things. It shouldn't be used as a Map
key or in a Set
. Doing so is both inefficient because equality may
require a DNS lookup, and incorrect because unequal URLs may be equal because of how they are
hosted.
These two URLs are semantically identical, but java.net.URI
disagrees:
:80
) and the absent trailing slash (/
)
cause URI to bucket the two URLs separately. This harms URI's usefulness in collections. Any
application that stores information-per-URL will need to either canonicalize manually, or suffer
unnecessary redundancy for such URLs.
Because they don't attempt canonical form, these classes are surprisingly difficult to use securely. Suppose you're building a webservice that checks that incoming paths are prefixed "/static/images/" before serving the corresponding assets from the filesystem.
String attack = "http://example.com/static/images/../../../../../etc/passwd";
System.out.println(new URL(attack).getPath());
System.out.println(new URI(attack).getPath());
System.out.println(HttpUrl.parse(attack).encodedPath());
By canonicalizing the input paths, they are complicit in directory traversal attacks. Code that
checks only the path prefix may suffer!
/static/images/../../../../../etc/passwd
/static/images/../../../../../etc/passwd
/etc/passwd
The java.net.URI
class is strict around what URLs it accepts. It rejects URLs like
"http://example.com/abc|def" because the '|' character is unsupported. This class is more
forgiving: it will automatically percent-encode the '|', yielding "http://example.com/abc%7Cdef".
This kind behavior is consistent with web browsers. HttpUrl
prefers consistency with
major web browsers over consistency with obsolete specifications.
Neither of the built-in URL models offer direct access to path segments or query parameters.
Manually using StringBuilder
to assemble these components is cumbersome: do '+'
characters get silently replaced with spaces? If a query parameter contains a '&', does that
get escaped? By offering methods to read and write individual query parameters directly,
application developers are saved from the hassles of encoding and decoding.
The URL (JDK1.0) and URI (Java 1.4) classes predate builders and instead use telescoping constructors. For example, there's no API to compose a URI with a custom port without also providing a query and fragment.
Instances of HttpUrl
are well-formed and always have a scheme, host, and path. With
java.net.URL
it's possible to create an awkward URL like http:/
with scheme and
path but no hostname. Building APIs that consume such malformed values is difficult!
This class has a modern API. It avoids punitive checked exceptions: get()
throws IllegalArgumentException
on invalid input or parse()
returns null if the input is an invalid URL. You can even be explicit about whether each
component has been encoded already.
Modifier and Type | Class and Description |
---|---|
static class |
HttpUrl.Builder |
Modifier and Type | Method and Description |
---|---|
static int |
defaultPort(String scheme)
Returns 80 if
scheme.equals("http") , 443 if scheme.equals("https") and -1
otherwise. |
String |
encodedFragment()
Returns this URL's encoded fragment, like
"abc" for http://host/#abc . |
String |
encodedPassword()
Returns the password, or an empty string if none is set.
|
String |
encodedPath()
Returns the entire path of this URL encoded for use in HTTP resource resolution.
|
List<String> |
encodedPathSegments()
Returns a list of encoded path segments like
["a", "b", "c"] for the URL http://host/a/b/c . |
String |
encodedQuery()
Returns the query of this URL, encoded for use in HTTP resource resolution.
|
String |
encodedUsername()
Returns the username, or an empty string if none is set.
|
boolean |
equals(Object other) |
String |
fragment()
Returns this URL's fragment, like
"abc" for http://host/#abc . |
static HttpUrl |
get(String url)
Returns a new
HttpUrl representing url . |
static HttpUrl |
get(URI uri) |
static HttpUrl |
get(URL url)
|
int |
hashCode() |
String |
host()
Returns the host address suitable for use with
InetAddress.getAllByName(String) . |
boolean |
isHttps() |
HttpUrl.Builder |
newBuilder() |
HttpUrl.Builder |
newBuilder(String link)
Returns a builder for the URL that would be retrieved by following
link from this URL,
or null if the resulting URL is not well-formed. |
static HttpUrl |
parse(String url)
Returns a new
HttpUrl representing url if it is a well-formed HTTP or HTTPS
URL, or null if it isn't. |
String |
password()
Returns the decoded password, or an empty string if none is present.
|
List<String> |
pathSegments()
Returns a list of path segments like
["a", "b", "c"] for the URL http://host/a/b/c . |
int |
pathSize()
Returns the number of segments in this URL's path.
|
int |
port()
Returns the explicitly-specified port if one was provided, or the default port for this URL's
scheme.
|
String |
query()
Returns this URL's query, like
"abc" for http://host/?abc . |
String |
queryParameter(String name)
Returns the first query parameter named
name decoded using UTF-8, or null if there is
no such query parameter. |
String |
queryParameterName(int index)
Returns the name of the query parameter at
index . |
Set<String> |
queryParameterNames()
Returns the distinct query parameter names in this URL, like
["a", "b"] for http://host/?a=apple&b=banana . |
String |
queryParameterValue(int index)
Returns the value of the query parameter at
index . |
List<String> |
queryParameterValues(String name)
Returns all values for the query parameter
name ordered by their appearance in this
URL. |
int |
querySize()
Returns the number of query parameters in this URL, like 2 for
http://host/?a=apple&b=banana . |
String |
redact()
Returns a string with containing this URL with its username, password, query, and fragment
stripped, and its path replaced with
/... . |
HttpUrl |
resolve(String link)
Returns the URL that would be retrieved by following
link from this URL, or null if
the resulting URL is not well-formed. |
String |
scheme()
Returns either "http" or "https".
|
String |
topPrivateDomain()
Returns the domain name of this URL's
host() that is one level beneath the public
suffix by consulting the public suffix list. |
String |
toString() |
URI |
uri()
Returns this URL as a
java.net.URI . |
URL |
url()
Returns this URL as a
java.net.URL . |
String |
username()
Returns the decoded username, or an empty string if none is present.
|
public URL url()
java.net.URL
.public URI uri()
java.net.URI
. Because URI
is more strict than this
class, the returned URI may be semantically different from this URL:
[
and |
will be escaped.
%xx
will be encoded like %25xx
.
These differences may have a significant consequence when the URI is interpreted by a webserver. For this reason the URI class and this method should be avoided.
public String scheme()
public boolean isHttps()
public String encodedUsername()
URL | encodedUsername() |
---|---|
http://host/ | "" |
http://username@host/ | "username" |
http://username:password@host/ | "username" |
http://a%20b:c%20d@host/ | "a%20b" |
public String username()
URL | username() |
---|---|
http://host/ | "" |
http://username@host/ | "username" |
http://username:password@host/ | "username" |
http://a%20b:c%20d@host/ | "a b" |
public String encodedPassword()
URL | encodedPassword() |
---|---|
http://host/ | "" |
http://username@host/ | "" |
http://username:password@host/ | "password" |
http://a%20b:c%20d@host/ | "c%20d" |
public String password()
URL | password() |
---|---|
http://host/ | "" |
http://username@host/ | "" |
http://username:password@host/ | "password" |
http://a%20b:c%20d@host/ | "c d" |
public String host()
InetAddress.getAllByName(String)
. May
be:
android.com
.
127.0.0.1
.
::1
. Note that there are no square braces.
xn--n3h.net
.
URL | host() |
---|---|
http://android.com/ | "android.com" |
http://127.0.0.1/ | "127.0.0.1" |
http://[::1]/ | "::1" |
http://xn--n3h.net/ | "xn--n3h.net" |
public int port()
https://square.com:8443/
and 443 for https://square.com/
. The result is in [1..65535]
.
URL | port() |
---|---|
http://host/ | 80 |
http://host:8000/ | 8000 |
https://host/ | 443 |
public static int defaultPort(String scheme)
scheme.equals("http")
, 443 if scheme.equals("https")
and -1
otherwise.public int pathSize()
http://host/a/b/c
. This is always at least 1.
URL | pathSize() |
---|---|
http://host/ | 1 |
http://host/a/b/c | 3 |
http://host/a/b/c/ | 4 |
public String encodedPath()
"/"
.
URL | encodedPath() |
---|---|
http://host/ | "/" |
http://host/a/b/c | "/a/b/c" |
http://host/a/b%20c/d | "/a/b%20c/d" |
public List<String> encodedPathSegments()
["a", "b", "c"]
for the URL http://host/a/b/c
. This list is never empty though it may contain a single empty string.
URL | encodedPathSegments() |
---|---|
http://host/ | [""] |
http://host/a/b/c | ["a", "b", "c"] |
http://host/a/b%20c/d | ["a", "b%20c", "d"] |
public List<String> pathSegments()
["a", "b", "c"]
for the URL http://host/a/b/c
. This list is never empty though it may contain a single empty string.
URL | pathSegments() |
---|---|
http://host/ | [""] |
http://host/a/b/c" | ["a", "b", "c"] |
http://host/a/b%20c/d" | ["a", "b c", "d"] |
@Nullable public String encodedQuery()
URL | encodedQuery() |
---|---|
http://host/ | null |
http://host/? | "" |
http://host/?a=apple&k=key+lime | "a=apple&k=key+lime" |
http://host/?a=apple&a=apricot | "a=apple&a=apricot" |
http://host/?a=apple&b | "a=apple&b" |
@Nullable public String query()
"abc"
for http://host/?abc
. Most callers should
prefer queryParameterName(int)
and queryParameterValue(int)
because these methods offer
direct access to individual query parameters.
URL | query() |
---|---|
http://host/ | null |
http://host/? | "" |
http://host/?a=apple&k=key+lime | "a=apple&k=key
lime" |
http://host/?a=apple&a=apricot | "a=apple&a=apricot" |
http://host/?a=apple&b | "a=apple&b" |
public int querySize()
http://host/?a=apple&b=banana
. If this URL has no query this returns 0. Otherwise it returns
one more than the number of "&"
separators in the query.
URL | querySize() |
---|---|
http://host/ | 0 |
http://host/? | 1 |
http://host/?a=apple&k=key+lime | 2 |
http://host/?a=apple&a=apricot | 2 |
http://host/?a=apple&b | 2 |
@Nullable public String queryParameter(String name)
name
decoded using UTF-8, or null if there is
no such query parameter.
URL | queryParameter("a") |
---|---|
http://host/ | null |
http://host/? | null |
http://host/?a=apple&k=key+lime | "apple" |
http://host/?a=apple&a=apricot | "apple" |
http://host/?a=apple&b | "apple" |
public Set<String> queryParameterNames()
["a", "b"]
for http://host/?a=apple&b=banana
. If this URL has no query this returns the empty set.
URL | queryParameterNames() |
---|---|
http://host/ | [] |
http://host/? | [""] |
http://host/?a=apple&k=key+lime | ["a", "k"] |
http://host/?a=apple&a=apricot | ["a"] |
http://host/?a=apple&b | ["a", "b"] |
public List<String> queryParameterValues(String name)
name
ordered by their appearance in this
URL. For example this returns ["banana"]
for queryParameterValue("b")
on http://host/?a=apple&b=banana
.
URL | queryParameterValues("a") | queryParameterValues("b") |
---|---|---|
http://host/ | [] | [] |
http://host/? | [] | [] |
http://host/?a=apple&k=key+lime | ["apple"] | [] |
http://host/?a=apple&a=apricot | ["apple",
"apricot"] | [] |
http://host/?a=apple&b | ["apple"] | [null] |
public String queryParameterName(int index)
index
. For example this returns "a"
for queryParameterName(0)
on http://host/?a=apple&b=banana
. This throws if
index
is not less than the query size.
URL | queryParameterName(0) | queryParameterName(1) |
---|---|---|
http://host/ | exception | exception |
http://host/? | "" | exception |
http://host/?a=apple&k=key+lime | "a" | "k" |
http://host/?a=apple&a=apricot | "a" | "a" |
http://host/?a=apple&b | "a" | "b" |
public String queryParameterValue(int index)
index
. For example this returns "apple"
for queryParameterName(0)
on http://host/?a=apple&b=banana
. This
throws if index
is not less than the query size.
URL | queryParameterValue(0) | queryParameterValue(1) |
---|---|---|
http://host/ | exception | exception |
http://host/? | null | exception |
http://host/?a=apple&k=key+lime | "apple" | "key lime" |
http://host/?a=apple&a=apricot | "apple" | "apricot" |
http://host/?a=apple&b | "apple" | null |
@Nullable public String encodedFragment()
"abc"
for http://host/#abc
. This
returns null if the URL has no fragment.
URL | encodedFragment() |
---|---|
http://host/ | null |
http://host/# | "" |
http://host/#abc | "abc" |
http://host/#abc|def | "abc|def" |
@Nullable public String fragment()
"abc"
for http://host/#abc
. This returns null
if the URL has no fragment.
URL | fragment() |
---|---|
http://host/ | null |
http://host/# | "" |
http://host/#abc | "abc" |
http://host/#abc|def | "abc|def" |
public String redact()
/...
. For example, redacting http://username:password@example.com/path
returns http://example.com/...
.@Nullable public HttpUrl resolve(String link)
link
from this URL, or null if
the resulting URL is not well-formed.public HttpUrl.Builder newBuilder()
@Nullable public HttpUrl.Builder newBuilder(String link)
link
from this URL,
or null if the resulting URL is not well-formed.@Nullable public static HttpUrl parse(String url)
HttpUrl
representing url
if it is a well-formed HTTP or HTTPS
URL, or null if it isn't.public static HttpUrl get(String url)
HttpUrl
representing url
.IllegalArgumentException
- If url
is not a well-formed HTTP or HTTPS URL.@Nullable public String topPrivateDomain()
host()
that is one level beneath the public
suffix by consulting the public suffix list. Returns
null if this URL's host()
is an IP address or is considered a public suffix by the
public suffix list.
In general this method should not be used to test whether a domain is valid or routable. Instead, DNS is the recommended source for that information.
URL | topPrivateDomain() |
---|---|
http://google.com | "google.com" |
http://adwords.google.co.uk | "google.co.uk" |
http://square | null |
http://co.uk | null |
http://localhost | null |
http://127.0.0.1 | null |
Copyright © 2019. All rights reserved.