Vancouver Web Consultants

July 10, 2009

Validating a URI with PHP

Filed under: PHP — Tags: , — Nelson @ 4:34 pm

If you find yourself needing to validate a URI, not just checking that it’s well formed with a regular expression like this:

/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i

…but actually verifying that the URI points to a functioning web page, the following code will do the trick. It starts off feeding a well formed URL to PHP’s parse_url(), then uses cURL to follow any redirects (10 maximum) until it finds a 200 status. I’ve tried a number of different methods, but this one seems to work the best. I ran into problems where an OpenDNS wild card was causing all my bad URIs to return a status 200, so the code checks for the term “opendns” and returns false. Here’s the code:

function validate_url($url) {
	if(empty($url)){ return false; }
	$url = preg_match("/http:\/\//", $url) ? $url : "http://".$url;
	$parts = parse_url($url);
	$url = $parts['host'];

	$ch = curl_init();
	curl_setopt($ch, CURLOPT_URL, $url);
	curl_setopt($ch, CURLOPT_HEADER, true);
	curl_setopt($ch, CURLOPT_NOBODY, true);
	curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
	curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
	curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
	$data = curl_exec($ch);
	curl_close($ch);
	preg_match_all("/HTTP\/1\.[1|0]\s(\d{3})/",$data,$matches);
	$code = end($matches[1]);
	if(!$data) { return false; }
	if(stristr($data,'opendns')){ return false; }
	return $code==200 ? true : false;
}

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Leave a comment

Powered by WordPress