Regexp to parse URL

This is a great solution from very popular Python urllib3

_URI_RE = re.compile(
	    r"^(?:([a-zA-Z][a-zA-Z0-9+.-]*):)?"
	    r"(?://([^\\/?#]*))?"
	    r"([^?#]*)"
	    r"(?:\?([^#]*))?"
	    r"(?:#(.*))?$",
	    re.UNICODE | re.DOTALL,
	)

Regexp to parse URL

To determine the hostname we need to match an authority against other regexp:

_SUBAUTHORITY_PAT = ("^(?:(.*)@)?(%s|%s|%s)(?::([0-9]{0,5}))?$") % (
	    _REG_NAME_PAT,
	    _IPV4_PAT,
	    _IPV6_ADDRZ_PAT,
	)

hostname ip4 or ip6

The regexp-es to find DNS hostname, IPv4 or IPv5 are here:

_REG_NAME_PAT = r"(?:[^\[\]%:/?#]|%[a-fA-F0-9]{2})*"

_IPV4_PAT = r"(?:[0-9]{1,3}\.){3}[0-9]{1,3}"
_IPV4_RE = re.compile("^" + _IPV4_PAT + "$")

_IPV6_ADDRZ_PAT = r"\[" + _IPV6_PAT + r"(?:" + _ZONE_ID_PAT + r")?\]"
_IPV6_ADDRZ_RE = re.compile("^" + _IPV6_ADDRZ_PAT + "$")

#url #regex
0
Ivan Borshchov profile picture
May 17, 2021
by Ivan Borshchov
Did it help you?
Yes !
No