Extracting URL Data Like Facebook

If you share a link on facebook, you will see a beautiful live snippet comes below. Have you ever tried to code this functionality? Well, this tutorial helps you to do this by extracting URL data and previewing it below the textarea like facebook.

Extracting URL Data

Preparation

You are going to create two files (ex: url-extract.html, get-contents.php). Include jQuery library and initialize document.ready function in your HTML file and then add the basic HTML code like below.


<!DOCTYPE html>
<html>
<head>
 <title>URL</title>
 <script type="text/javascript" src="https://code.jquery.com/jquery-1.11.3.min.js"></script>
 <script type="text/javascript">
 $(document).ready(function () {
 });
 </script>
</head>
<body>
<div id="container">
 <h2>Extracting URL Data Like FB</h2>
 <div id="loading"></div>
 <div class="clear"><br></div>
 <textarea id="url"></textarea>
 <div class="clear"><br></div>
 <div id="result"></div>
</div>
</body>

Getting Contents of URL using AJAX

By using jQuery .keyup() function, we select the value being typed in the textarea instantly.

Then we filter URL from the string using regular expressions. Once it is filtered we need to make an Ajax request to get the contents of the URL.

Since Ajax follows same origin policy and do not allows to send XMLHttpRequest to external URL, we have to get the contents of the URL with PHP.



	$(document).ready(function () {
		$("#url").keyup(function () {
			//selecting string from the textarea
			var content = $(this).val();
			//regular expression to match URL from the string
			var regex = /(\b((https?|ftp|file):\/\/)?[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
		
			//filtering URL from the string and checking whether it is true
			if(eURL= content.match(regex)){
					//showing loading icon
					if($("#loading").hasClass("loaded") == false){
						$("#loading").html("");
					}

					//making an ajax request 
					$.ajax({
					  url: "get-contents.php?url="+eURL,
					  async: true,
					  success: function(data) {
					  	//going to do something
					  }  
					});
		});
	});

get-contents.php

We are adding “http://” in front of the URL if it has no protocol.

Then we use file_get_contents() function to get the contents of the URL and printing the output which will be received by Ajax


<?php
$gotURL = $_GET['url'];

//adding http if URL has no protocol
if (!preg_match("~^(?:f|ht)tps?://~i", $gotURL)) {
 $checkedURL = "http://" . $gotURL;
}else{
 $checkedURL = $gotURL;
}

//fetching content of the URL
$content = file_get_contents($checkedURL);

//printing the fetched content
echo $content;
?>

Filtering Data From Ajax Response

Inside of Ajax success function, we have to store Ajax response data as a jQuery base selector using jQuery $.parseHTML function to a variable called “source“.

Then we convert URL to string using toString() function in order to use split() function later.

Now we select og:image, title, description and author name from the source like below


	//creating a base selector with the response data
  	var source = $($.parseHTML(data)); 

  	//converting URL to String
	var sURL = eURL.toString();

	//select URL from og:image meta tag
  	var imageURL = source.filter('meta[property="og:image"]').attr('content');
   	
   	//if no od:image meta tag found
   	if(typeof imageURL === "undefined"){
   		var image = '';
   		var bbn = "class = 'bbn'";
   	}else{
   		//if URL has protocol in it
   		if(imageURL.indexOf("://") > -1){
   			var image = "";	
   			
   		}else{
   			var image = "";
   		}
   		var bbn = "";
   	}

   	//getting the title
   	var title = source.filter("title").text(); 
   	//getting the meta description
   	var desc = source.filter("meta[name=description]").attr("content");
   	//Checking whether the string has any protocol and splitting it by '/'
	if (sURL.indexOf("://") > -1) {
		var split = sURL.split('/');
		var splitted = split[2];
	}
	else {
		var split = sURL.split('/');
		var splitted = split[0];
	}

	//removing port no if any
	var portsplit = splitted.split(':');
	
	//if you want to remove www like facebook
	var wwwsplit = portsplit[0].split('.');
	if(wwwsplit[0] == 'www'){
		var domain = wwwsplit[1]+'.'+wwwsplit[2];
	}else{
		var domain = wwwsplit[0]+'.'+wwwsplit[1]+'.'+wwwsplit[2];
	}
	//selecting author name from anchor tag with author rel
	var hasAuthor = source.find("a[rel=author]").text(); 
	if(hasAuthor ==''){
		hasAuthor = source.find("link[rel=author]").text(); 
	}
	//if author name found
	if(hasAuthor != ''){
		var author = "by "+hasAuthor;	
	}else{
		var author = '';
	}

Showing Filtered Data

Now using .html() function We need to show all the filtered values after hiding the loading icon.


//setting a class name for loading div to prevent it from loading once it is loaded
$("#loading").html("").addClass("loaded");
//Insert all the data as HTML inside of #result div
$( "#result" ).html("<div id='thumbnail' "+bbn+"><img src='close.png' width='10px' id='remove'>"+image+"</div><div id='texts'><div id='title'><span>"+title+"</span></div> <div id='desc'><span>"+desc+"</span></div> <div id='meta'><div id='domain'>"+domain+"</div><div id='author'>"+author+"</div><div class='clear'></div></div></div>");

That’s it, now you have to add custom CSS to style it as per your taste. Also you can add remove preview functionality. The final code of “url-extract.html” will look like this:


<!DOCTYPE html>
<html>
<head>
 <title>URL</title>
 <script type="text/javascript" src="https://code.jquery.com/jquery-1.11.3.min.js"></script>
 <script type="text/javascript">
 $(document).ready(function () {
 $("#url").keyup(function () {
 //selecting string from the textarea
 var content = $(this).val();
 //regular expression to match URL from the string
 var regex = /(\b((https?|ftp|file):\/\/)?[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;
 
 //filtering URL from the string and checking whether it is true
 if(eURL= content.match(regex)){

 //checking whether the preview was cleared previously
 if($("#result").hasClass("cleared") == false){

 if($("#loading").hasClass("loaded") == false){
 $("#loading").html("<img src='loading.gif'>");
 }

 //making an ajax request 
 $.ajax({
 url: "get-contents.php?url="+eURL,
 async: true,
 success: function(data) {

 //creating a base selector with the response data
 var source = $($.parseHTML(data)); 

 //converting URL to String
 var sURL = eURL.toString();

 //select URL from og:image meta tag
 var imageURL = source.filter('meta[property="og:image"]').attr('content');
 
 //if no od:image meta tag found
 if(typeof imageURL === "undefined"){
 var image = '';
 var bbn = "class = 'bbn'";
 }else{
 //if URL has protocol in it
 if(imageURL.indexOf("://") > -1){
 var image = "<img src='"+imageURL+"' id='tImg'>"; 
 
 }else{
 var image = "<img src='http://"+domain+"/"+imageURL+"' id='tImg'>";
 }
 var bbn = "";
 }

 //getting the title
 var title = source.filter("title").text(); 
 //getting the meta description
 var desc = source.filter("meta[name=description]").attr("content");
 //Checking whether the string has any protocol and splitting it by '/'
 if (sURL.indexOf("://") > -1) {
 var split = sURL.split('/');
 var splitted = split[2];
 }
 else {
 var split = sURL.split('/');
 var splitted = split[0];
 }

 //removing port no if any
 var portsplit = splitted.split(':');
 
 //if you want to remove www like facebook
 var wwwsplit = portsplit[0].split('.');
 if(wwwsplit[0] == 'www'){
 var domain = wwwsplit[1]+'.'+wwwsplit[2];
 }else{
 var domain = wwwsplit[0]+'.'+wwwsplit[1]+'.'+wwwsplit[2];
 }
 //selecting author name from anchor tag with author rel
 var hasAuthor = source.find("a[rel=author]").text(); 
 if(hasAuthor ==''){
 hasAuthor = source.find("link[rel=author]").text(); 
 }
 //if author name found
 if(hasAuthor != ''){
 var author = "<i></i>by "+hasAuthor; 
 }else{
 var author = '';
 }
 //setting a class name for loading div to prevent it from loading once it is loaded
 $("#loading").html("").addClass("loaded");
 //Insert all the data as HTML inside of #result div
 $( "#result" ).html("<div id='thumbnail' "+bbn+"><img src='close.png' width='10px' id='remove'>"+image+"</div><div id='texts'><div id='title'><span>"+title+"</span></div> <div id='desc'><span>"+desc+"</span></div> <div id='meta'><div id='domain'>"+domain+"</div><div id='author'>"+author+"</div><div class='clear'></div></div></div>");
 //If remove is clicked all URL data will be removed
 $("#remove").click(function () {
 $("#result").html("").addClass("cleared");
 });
 } 
 });
 }
 }else{
 eURL = '';
 $("#result").html("").removeClass("cleared");
 $("#loading").html("").removeClass("loaded");
 }

 });
 });
 </script>

 <style type="text/css">
 #container{
 width: 470px;
 margin: 30px auto;
 font-family: Georgia, 'lucida grande',tahoma,verdana,arial,sans-serif;

 }
 h2{
 float: left;
 margin: 0;
 }
 .clear{
 clear: both;
 }
 .bbn{
 border-bottom: none !important;
 }
 #loading{
 float: right;
 width: 15px;
 }
 #loading img{
 width: 100%;
 }
 #url{
 float: left;
 width: 99%;
 }
 #remove{
 float: right;
 width:10px;
 position: relative;
 z-index: 1;
 cursor: pointer;
 }

 #thumbnail{
 
 border: 1px solid #cccccc;
 background: #f7f7f7;
 
 }
 #thumbnail #tImg{
 width: 100%;
 margin-top: -10px;
 position: relative;
 z-index: 0;
 }
 #texts{
 border: 1px solid #cccccc;
 border-top: none;
 padding: 2%;

 }
 #title{
 font-size: 20px;
 font-weight: bold;
 cursor: pointer;
 margin-bottom: 10px;
 }
 #desc{
 cursor: pointer;
 font-size: 14px;
 margin-bottom: 10px;
 }

 #title span:hover, #desc span:hover{
 background: #ffff99;
 }
 #domain{
 margin-right: 5px;
 }
 #author, #domain{
 float: left;
 text-transform: uppercase;
 font-size: 11px;
 color: #9197a3;
 }
 i{
 margin-right: 5px;
 border-left: 1px solid #9197a3;
 }
 </style>
</head>
<body>
<div id="container">
 <h2>URL Data Extraction Like FB</h2><div id="loading"></div>
 <div class="clear"><br></div>
 <textarea id="url"></textarea>
 <div class="clear"><br></div>
 <div id="result"></div>
 <div id="data"></div>
</div>
</body>
</html>

Now you have learnt extracting URL data like facebook. But this is a simple version and we can add more functionalities like live title and description edit options and thumbnail selection option. For this tutorial we selected images from og:image meta tag but some sites don’t use this so how do we get images for snippet?

So, I will post another part of this tutorial with more functionalities like in facebook timeline.

Keep sharing this post


Anand Roshan

Author: Anand Roshan

An entrepreneur, programmer and a passionate artist who loves to work independently. Get social: Twitter | Google + | Artist Page

One comment on “Extracting URL Data Like Facebook”

Leave a Reply

Your email address will not be published. Required fields are marked *