Regexp to parse URL

#StandWithUkraine
Today, 4th December 2022, Ukraine is still bravely fighting for democratic values, human rights and peace in whole world. Russians ruthlessly kill all civilians in Ukraine including childs and destroy their cities. We are uniting against Putinโ€™s invasion and violence, in support of the people in Ukraine. You can help by donating to Ukrainian's army.

This is a great solution from very popular Python urllib3

_URI_RE = re.compile(
	    r"^(?:([a-zA-Z][a-zA-Z0-9+.-]*):)?"
	    r"(?://([^\\/?#]*))?"
	    r"([^?#]*)"
	    r"(?:\?([^#]*))?"
	    r"(?:#(.*))?$",
	    re.UNICODE | re.DOTALL,
	)

Regexp to parse URL

To determine the hostname we need to match an authority against other regexp:

_SUBAUTHORITY_PAT = ("^(?:(.*)@)?(%s|%s|%s)(?::([0-9]{0,5}))?$") % (
	    _REG_NAME_PAT,
	    _IPV4_PAT,
	    _IPV6_ADDRZ_PAT,
	)

hostname ip4 or ip6

The regexp-es to find DNS hostname, IPv4 or IPv5 are here:

_REG_NAME_PAT = r"(?:[^\[\]%:/?#]|%[a-fA-F0-9]{2})*"

_IPV4_PAT = r"(?:[0-9]{1,3}\.){3}[0-9]{1,3}"
_IPV4_RE = re.compile("^" + _IPV4_PAT + "$")

_IPV6_ADDRZ_PAT = r"\[" + _IPV6_PAT + r"(?:" + _ZONE_ID_PAT + r")?\]"
_IPV6_ADDRZ_RE = re.compile("^" + _IPV6_ADDRZ_PAT + "$")

#url #regex
0
Ivan Borshchov profile picture
May 17, 2021
by Ivan Borshchov
Did it help you?
Yes !
No