Regular Expressions, the Faster, Simpler Alternative

Sometime back I came across the following snippet:

function containsBadChars(str){
      if ((str.indexOf("\#") != -1) ||
          (str.indexOf("\&") != -1) ||
          (str.indexOf("\=") != -1) ||
          (str.indexOf("\\") != -1) ||
          (str.indexOf("\?") != -1) ||
          (str.indexOf("\:") != -1) ||
          (str.indexOf("\;") != -1) ||
          (str.indexOf("\'") != -1) ||
          (str.indexOf("\"") != -1) ||
          (str.indexOf("\[") != -1) ||
          (str.indexOf("\]") != -1) ||
          (str.indexOf("\{") != -1) ||
          (str.indexOf("\}") != -1)){
              return true;
          } else{
              return false;
          }
       };

This functions checks the existence of any of the following characters ‘#’, ‘&’, ‘=’, ‘\\’, ‘?’, ‘:’, ‘;’, ‘\”, ‘”‘, ‘[‘, ‘]’, ‘{‘, ‘}’ and returns true if found.

Using Regular Expressions we can modify this function in a more elegant one like this:

function containsBadCharsRegex(str){
      return /[\#\&\=\\\?\:\;\'\"\[\]\{\}]/.test(str);
};

Every time I talk about elegance with programmers they point out the performance gain/loss issue. A valid concern. So I did a performance test. Here is the code I used:

<html>
<head>
<title>Regex Demo</title>
<head>
   
<script>
    //in-elegant but functionally operational function
    function containsBadChars(str){
      if ((str.indexOf("\#") != -1) || 
          (str.indexOf("\&") != -1) || 
          (str.indexOf("\=") != -1) || 
          (str.indexOf("\\") != -1) ||
          (str.indexOf("\?") != -1) || 
          (str.indexOf("\:") != -1) || 
          (str.indexOf("\;") != -1) || 
          (str.indexOf("\'") != -1) || 
          (str.indexOf("\"") != -1) || 
          (str.indexOf("\[") != -1) || 
          (str.indexOf("\]") != -1) || 
          (str.indexOf("\{") != -1) || 
          (str.indexOf("\}") != -1)){
              return true; 
          } else{ 
              return false;
          }
       };
       
    //elegant and functionally operational function   
    function containsBadCharsRegex(str){
      return /[\#\&\=\\\?\:\;\'\"\[\]\{\}]/.test(str);
    };
    
    //List of test characters
    //Last four are valid characters
    //first 13 bad characters
    var testCharacters = ['#', '&', '=', '\\', '?', ':', ';', '\'', '"', '[', ']', '{', '}', '$', 'H', '1', '*']; 
   
    console.log('Testing the long way:');
    console.profile() 
    //check each character
    for(var i in testCharacters){
      //create the test string
      var str = "The quick brown fox quickly jumped over the lazy dog. This is the " + testCharacters[i] + " for testing";
      var isValid = containsBadChars(str);
      //comment for profiling
      console.log('Long Way. Character: ' + testCharacters[i] + ' isValid: ' + isValid);
    }
    console.profileEnd()
    
    console.log('Testing the short(regex) way');
    console.profile() 
    for(var i in testCharacters){
      var str = "The quick brown fox quickly jumped over the lazy dog. This is the " + testCharacters[i] + " for testing";
      var isValid = containsBadCharsRegex(str);
      //comment for profiling
      console.log('Short(Regex) Way. Character: ' + testCharacters[i] + ' isValid: ' + isValid);
    }
    console.profileEnd()
    
</script>
</head>
<body>
</body>
</html>

First we need to make sure that they are functionally identical.

Running the program gives the following console log output:

Testing the long way:
Long Way. Character: # isValid: true
Long Way. Character: & isValid: true
Long Way. Character: = isValid: true
Long Way. Character: \ isValid: true
Long Way. Character: ? isValid: true
Long Way. Character: : isValid: true
Long Way. Character: ; isValid: true
Long Way. Character: ‘ isValid: true
Long Way. Character: ” isValid: true
Long Way. Character: [ isValid: true
Long Way. Character: ] isValid: true
Long Way. Character: { isValid: true
Long Way. Character: } isValid: true
Long Way. Character: $ isValid: false
Long Way. Character: H isValid: false
Long Way. Character: 1 isValid: false
Long Way. Character: * isValid: false

Testing the short(regex) way:
Short(Regex) Way. Character: # isValid: true
Short(Regex) Way. Character: & isValid: true
Short(Regex) Way. Character: = isValid: true
Short(Regex) Way. Character: \ isValid: true
Short(Regex) Way. Character: ? isValid: true
Short(Regex) Way. Character: : isValid: true
Short(Regex) Way. Character: ; isValid: true
Short(Regex) Way. Character: ‘ isValid: true
Short(Regex) Way. Character: ” isValid: true
Short(Regex) Way. Character: [ isValid: true
Short(Regex) Way. Character: ] isValid: true
Short(Regex) Way. Character: { isValid: true
Short(Regex) Way. Character: } isValid: true
Short(Regex) Way. Character: $ isValid: false
Short(Regex) Way. Character: H isValid: false
Short(Regex) Way. Character: 1 isValid: false
Short(Regex) Way. Character: * isValid: false

They are functionally equivalent. Now on to performance.

After commenting out the console.log statements the profiler gives the following output:

Testing the long way:
Profile (0.143ms, 17 calls)
containsBadChars 17 100% 0.143ms 0.143ms 0.008ms 0.002ms 0.067ms

Testing the short(regex) way:
Profile (0.047ms, 17 calls)
containsBadCharsRegex 17 100% 0.047ms 0.047ms 0.003ms 0.002ms 0.006ms

As you can see the Regex way is about 3 times faster.

Happy programming!

jQuery Document Ready…use it wisely!

jQuery has done an excellent job with the document .ready() function which allows us to accurately know when the DOM has been loaded so that we can begin processing our JavaScript code. Prior to this we had to rely on the windows onLoad event which tends to occur much later in the load cycle.

Usually, one ready handler per page or view is used as the document ready handler will parse/execute the code once the DOM has loaded. Having this handler process code not relevant to the document at hand would be wasting resources.

Unfortunately I have noticed that this facility is used freely and loosely, even when sometimes it is not required, under the guise of “just to be on the safe side”. Not using this function properly has an impact on the performance of the application.

Consider a very simple example. We have two JS files test1.js and test2.js which are respectively included in test1.php and test2.php. These files are shown below:

test1.php

<html>
<head>
   <title>Test 1</title>
   
   <script src="jquery-1.4.2.js"></script>   
   <script src="test1.js"></script>
</head>
<body>

<div id="testDiv1">
<?php
$rows = 100;
echo    '<table>';
for($i = 0; $i < $rows; $i++){
	echo "<tr>";
	echo " <td>{$i}</td>";
	echo "</tr>";
}
echo '</table>';
?>
</div>
</body>
</html>

and the JS include file test1.js:

(function($) {
   $(document).ready(function(event) {
      console.log('test1.js - Number of tr elements: ' + $('#testDiv1 table tr').length);    
      
      //assume event bindings
      
      //assume processing code
      
   });
})(jQuery);

When run, the php file prints out a table structure with 100 rows. When the DOM is loaded(the table structure in our case) the ready event fires and prints out a message on the console:
test1.js – Number of tr elements: 100

test2.php and test2.js are similar, except this one prints out the number of ‘td’ elements instead of tr.

test2.php

<html>
<head>
    <title>Test 2</title>
   
   <script src="jquery-1.4.2.js"></script>   
   <script src="test2.js"></script>
</head>
<body>

<div id="testDiv2">
<?php
$rows = 100;
echo    '<table>';
for($i = 0; $i < $rows; $i++){
	echo "<tr>";
	echo " <td>{$i}</td>";
	echo "</tr>";
}
echo '</table>';
?>
</div>
</body>
</html>

and the JS include file test2.js:

(function($) {
   $(document).ready(function(event) {
      console.log('test2.js - Number of td elements: ' + $('#testDiv2 table td').length);    
      
      //assume event bindings
      
      //assume processing code
      
   });
})(jQuery);

When run the console prints out:
test2.js – Number of td elements: 100

As long as each JS file with the ready function is included in its respective host file or view everything works as it should without any overlap or conflict. But life is not so simple.

To reduce page load times it is customary to join many files into one and then minify the resulting file. This takes away the performance degradation due to latency issues as only one file is loaded compared to many.

By the same token it is also advisable that JS code specific to a view or a page be loaded per page while common code be bundled together and minified. This leads to a highly efficient load which does not tax the resources and also keeps the latency issues to a minimum.

Recently I came across an open source project based on the MVC pattern which joined 55 JS files, with each file having its document ready function. This bundle included common and specific files. Let me illustrate to you how much of a performance impact this can make. Let’s join test1.js and test2.js into one file and call it test.js. I have not minified this file for the sake of clarity.

test.js

(function($) {
   $(document).ready(function(event) {
      console.log('test1.js - Number of tr elements: ' + $('#testDiv1 table tr').length);    
      
      //assume event bindings
      
      //assume processing code
      
   });
})(jQuery);

(function($) {
   $(document).ready(function(event) {
      console.log('test2.js - Number of td elements: ' + $('#testDiv2 table td').length);    
      
      //assume event bindings
      
      //assume processing code
      
   });
})(jQuery);

Also let’s modify test1.php to include this file:
test1.php

<html>
<head>
   <title>Test with joined JS file</title>
   
   <script src="jquery-1.4.2.js"></script>   
   <script src="test.js"></script>
</head>
<body>

<div id="testDiv1">
<?php
$rows = 100;
echo    '<table>';
for($i = 0; $i < $rows; $i++){
	echo "<tr>";
	echo " <td>{$i}</td>";
	echo "</tr>";
}
echo '</table>';
?>
</div>
</body>
</html>

If we now run this we get the following on the console:
Console [5]=
test1.js – Number of tr elements: 100

Console [6]=
test2.js – Number of td elements: 0

Notice that there are two document ready handlers in test.js, and both ran. This is because when the DOM loads they cannot distinguish a relevant file from an irrelevant one. The first message is valid as it relates to test1.php but the second is not, as it relates to test2.php. This means the effort put in by the JS engine for the second handler was a waste.

Usually the document ready handler does not contain a simple log message, but a slew of event bindings and other JS code. Some of the files in the aforementioned project had close to a dozen event bindings. Even if we were to assume an average of five event bindings per file, that’s 275 bindings. Take into account event bubbling, live bindings and we are talking serious performance problems.

In such circumstances we cannot rescue everything but we can inject a little saving grace.

Here is the same test1.php and test.js with changes, which I will explain in a minute:

test1.php

<html>
<head>
   <title>Test with joined JS file</title>
   
   <script src="jquery-1.4.2.js"></script>   
   <script src="test.js"></script>
</head>
<!-- Note the new title attribute -->
<body title="test1">

<div id="testDiv1">
<?php
$rows = 100;
echo    '<table>';
for($i = 0; $i < $rows; $i++){
	echo "<tr>";
	echo " <td>{$i}</td>";
	echo "</tr>";
}
echo '</table>';
?>
</div>
</body>
</html>

Notice that in the body tag I have added a title attribute.

test.js

(function($) {
   $(document).ready(function(event) {
      //check if for this handler ***NEW****
      var title = $('body').attr('title');
      if(title !== 'test1'){
         return false;
      };
      console.log('test1.js - Number of tr elements: ' + $('#testDiv1 table tr').length);    
      
      //assume event bindings
      
      //assume processing code
      
   });
})(jQuery);

(function($) {
   $(document).ready(function(event) {
      //check if for this handler ***NEW***
      var title = $('body').attr('title');
      if(title !== 'test2'){
         return false;
      };
      console.log('test2.js - Number of td elements: ' + $('#testDiv2 table td').length);    
      
      //assume event bindings
      
      //assume processing code
      
   });
})(jQuery);

If you run this now you will get the following in the console:
test1.js – Number of tr elements: 100

Let me explain why.

In test.php I have added a title attribute which identifies this file

In the JS file I have added the following code in the document ready handler:
//check if for this handler
var title = $(‘body’).attr(‘title’);
if(title !== ‘test1’){
return false;
};

It only continues to processes the handler code if the title attribute matches “test1”. So the handler connected with test2 never continues which saves resources.

The crux of the idea is to make processing relevant to the document at hand. This is only one way of doing it. Many other ways come to mind and I leave them to your imagination.

I hope this helps you and I look forward to your feedback.

Implementation of a Tree Structure in PHP

Further to my post on implementing a doubly linked list I took on the adventure of implementing a tree structure. I have used my linked list as my base where I use a hash table to store my node or child objects. Also the most important difference being that the next node is now a collection of children nodes. Also it goes without saying that not implementing a recursive iterator for the tree would make the tree implementation incomplete.

We start with the main class, followed by the node class and the iterator. Finally an example on application of the tree to create an unordered ul/li list.

/**
 * JTree
 * 
 * This class implements the Tree structure and is based on linked list using a hash table.
 * Using hash table prevents all possible recursive references and
 * allows for more efficient garbage collection. A particularly sore point in PHP.
 * 
 * I have used my implementation of Doubly Linked list as my base. 
 * You can find more information on it here:
 * https://phptouch.com/2011/03/15/doubly-linked-list-in-php/
 * 
 * I have heavily relied on the following 2 references for their algorithms.
 * Beginning Algorithims by Simon Harris and James Ross. Wrox publishing.
 * Data Structures and Algorithms in Java Fourth Edition by Michael T. Goodrich
 * and Roberto Tamassia. John Wiley & Sons.
 * 
 * *********************LICENSE****************************************
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation, either version 3 of the License, or
 * (at your option) any later version.

 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.

 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 * *********************LICENSE**************************************** 
 * @package JTree
 * @author Jayesh Wadhwani
 * @copyright 2011 Jayesh Wadhwani. 
 * @license  GNU GENERAL PUBLIC LICENSE 3.0
 * @version 1.0
 */
class JTree {
   /**
    * @var UID for the header node 
   */
	private $_head;

   /**
    * @var size of list 
    */
	private $_size;
   
   /**
   * @var hash table to store node objects 
   */
	private $_list = array();

	/**
	 * JTree::__construct()
	 * 
	 * @return
	 */
	public function __construct() {
		$this->_head = $this->createNode('HEAD');
		$this->_size = 0;
	}

    /**
    * JTree::getList()
    * 
    * Retreives the hash table
    * 
    * @return array
    */
	public function getTree() {
		return $this->_list;
	}

   /**
   * JTree::getNode()
   * Given a UID get the node object
   * 
   * @param mixed $uid
   * @return JNode/Boolean
   */
	public function getNode($uid) {
		if(empty($uid)) {
			throw new Exception('A unique id is required.');
		}
		$ret = false;
      //look for the node in the hash table
      //return false if not found
		if(array_key_exists($uid,$this->_list) === true) {
			$ret = $this->_list[$uid];
		}
		return $ret;
	}

   /**
    * JTree::setChild()
    * 
    * This is a helper function. Given a UID for a node
    * set it as the next UID for the node. 
    * 
    * @param mixed $uid
    * @param mixed $childUid
    * @return void
    */
	public function setChild($uid, $childUid) {
		if(empty($uid) || empty($childUid)) {
			throw new Exception('Both a from and a to UIDs are required.');
		}
      //get the node object for this node uid
		$node = $this->getNode($uid);

		if($node !== false) {
			$node->setChild($childUid);
		}
	}

   /**
    * JTree::setParent()
    * 
    * This is a helper function to set the parent uid
    * 
    * @param mixed $uid - UID of the object to be processed on
    * @param mixed $prevUid - put this as next in the above object
    * @return void
    */
	public function setParent($uid, $parentUid) {
		if(empty($uid) || empty($parentUid)) {
			throw new Exception('Both a from and a to UIDs are required.');
		}
		$node = $this->getNode($uid);

		if($node !== false) {
			$node->setParent($parentUid);
		}
	}

	/**
	 * JTree::createNode()
	 * 
    * Create a node, store in hash table
    * return the reference uid
	 * @param mixed $value
	 * @param mixed $uid
	 * @return string $uid
	 */
	public function createNode($value, $uid = null) {
		if(!isset($value)) {
			throw new Exception('A value is required to create a node');
		}

		$node = new JNode($value, $uid);
		$uid = $node->getUid();
		$this->_list[$uid] = $node;
		return $uid;
	}

	/**
	 * JTree::addChild()
	 * 
	 * @param mixed $parentUid
	 * @param mixed $childUid
	 * @return
	 */
	public function addChild($parentUid = null, $childUid) {
		if(empty($childUid)) {
			throw new Exception('A UID for the child is required.');
		}
      //if no parent assume it is the head
		if(empty($parentUid)) {
			$parentUid = $this->_head;
		}
		//parent points to child
		$this->setChild($parentUid, $childUid);

		//child points to parent
		$this->setParent($childUid, $parentUid);

		return $childUid;
	}

	/**
	 * JTree::addFirst()
    * Add the first child right after the head
	 * 
	 * @param mixed $uid
	 * @return void
	 */
	public function addFirst($uid) {
		if(empty($uid)) {
			throw new Exception('A unique ID is required.');
		}
		$this->addChild($this->_head, $uid);
	}

   /**
    * JTree::getChildren()
    * 
    * This is a helper function to get the child node uids given the node uid
    * 
    * @param mixed $uid
    * @return mixed
    */
	public function getChildren($uid) {
		if(empty($uid)) {
			throw new Exception('A unique ID is required.');
		}

		$node = $this->getNode($uid);

		if($node !== false) {
			return $node->getChildren();
		}
	}

   /**
    * JTree::getParent()
    * 
    * This is a helper function to get the 
    * parent node uid
    * 
    * @param mixed $uid
    * @return string $uid
    */
	public function getParent($uid) {
		if(empty($uid)) {
			throw new Exception('A unique ID is required.');
		}
		$ret = false;
      $node = $this->getNode($uid);

		if($node !== false) {
			$ret = $node->getParent();
		}
      return $ret;
	}

	/**
	 * JTree::getValue()
	 * 
	 * @param mixed $uid
	 * @return
	 */
	public function getValue($uid) {
		if(empty($uid)) {
			throw new Exception('A unique ID is required.');
		}

		$node = $this->getNode($uid);
		return $node->getValue();
	}
}

The node class. Note the collection of children.

/**
 * JNode
 * 
 * This is a simple class to construct a node
 * Please note that each node object will be 
 * eventually stored in a hash table where the 
 * hash will be a UID.
 * 
 * Note that in comparison to thee Doubly Linked List implementation
 * the children are now stored in an array
 * 
 * @package JTree   
 * @author Jayesh Wadhwani
 * @copyright Jayesh Wadhwani
 * @version 2011
 */
class JNode {
   /**
    * @var _value for the value field 
   */
	private $_value;
   /**
    * @var _parent uid of the parent node 
   */
	private $_parent;
   /**
    * @var _children collection of uids for the child nodes 
   */
	private $_children = array();
   /**
    * @var _uid for this node 
   */
	private $_uid;

	/**
	 * JNode::__construct()
	 * 
	 * @param mixed $value
	 * @param mixed $uid
	 * @return void
	 */
	public function __construct($value = null,$uid = null) {
		if(!isset($value)) {
			throw new Exception('A value is required to create a node');
		}
		$this->setValue($value);
		$this->setUid($uid);
	}

	/**
	 * JNode::setUid()
	 * 
	 * @param mixed $uid
	 * @return
	 */
	public function setUid($uid = null) {
		//if uid not supplied...generate
		if(empty($uid)) {
			$this->_uid = uniqid();
		} else {
			$this->_uid = $uid;
		}
	}

	/**
	 * JNode::getUid()
	 * 
	 * @return string uid
	 */
	public function getUid() {
		return $this->_uid;
	}

	/**
	 * JNode::setValue()
	 * 
	 * @param mixed $value
	 * @return void
	 */
	public function setValue($value) {
		$this->_value = $value;
	}

	/**
	 * JNode::getValue()
	 * 
	 * @return mixed
	 */
	public function getValue() {
		return $this->_value;
	}

	/**
	 * JNode::getParent()
	 * 
    * gets the uid of the parent node
    * 
	 * @return string uid
	 */
	public function getParent() {
		return $this->_parent;
	}

	/**
	 * JNode::setParent()
	 * 
	 * @param mixed $parent
	 * @return void
	 */
	public function setParent($parent) {
		$this->_parent = $parent;
	}

	/**
	 * JNode::getChildren()
	 * 
	 * @return mixed
	 */
	public function getChildren() {
		return $this->_children;
	}

	/**
	 * JNode::setChild()
	 * 
    * A child node's uid is added to the childrens array
    * 
	 * @param mixed $child
	 * @return void
	 */
	public function setChild($child) {
		if(!empty($child)) {
			$this->_children[] = $child;
		}
	}

	/**
	 * JNode::anyChildren()
	 * 
    * Checks if there are any children 
    * returns ture if it does, false otherwise
    * 
	 * @return bool
	 */
	public function anyChildren() {
		$ret = false;

		if(count($this->_children) > 0) {
			$ret = true;
		}
		return $ret;
	}

	/**
	 * JNode::childrenCount()
	 * 
    * returns the number of children
    * 
	 * @return bool/int
	 */
	public function childrenCount() {
	  $ret = false;
     if(is_array($this->_children)){
      $ret = count($this->_children);
     }
     return $ret;
	}
}

The tree recursive iterator.


/**
 * JTreeIterator
 * 
 * The Tree structure would be incomplete if I did not include a 
 * iterator. There is nothing special about this iterator and its implementation
 * is pretty standard.
 * I have extended the arrayIterator because I am using an array for my hash table.  
 * Note that I have not implemented the next and rewind methods as I do not need to
 * special with these. So the parent(ArrayIterator) methods will be called by default.
 * 
 * @package    
 * @author  Jayesh Wadhwani
 * @copyright Jayesh Wadhwani
 * @version 2011
 */
class JTreeIterator extends ArrayIterator implements RecursiveIterator {
   /**
    * @var _list this is the hash table 
   */
	private $_list = array();
   /**
    * @var _next this is for the children 
   */
	private $_next = array();
   /**
    * @var _position the iterator position 
   */
	private $_position;

	/**
	 * JTreeIterator::__construct()
	 * 
	 * @param mixed $list - the hash table
	 * @param mixed $tree - 
	 * @return JTreeIterator
   */
	public function __construct(array $list, array $tree = null) {
		$this->_list = $list;

		if(is_null($tree)) {
			reset($this->_list);
			$next = current($this->_list);
			$this->_next = $next->getChildren();
		} else {
			$this->_next = $tree;
		}

		parent::__construct($this->_next);
	}

	/**
	 * JTreeIterator::current()
	 * 
	 * @return
   */
	public function current() {
      //get the object uid from the hash table
      //then get the object
		$current = parent::current();
		$nObj = $this->_list[$current];
		return $nObj;
	}

	/**
	 * JTreeIterator::key()
	 * 
	 * @return
	 */
	public function key() {
		$key = parent::key();
		$key = $this->_next[$key];
		return $key;
	}

	/**
	 * JTreeIterator::hasChildren()
	 * 
	 * @return mixed
	 */
	public function hasChildren() {
		$next = $this->_list[$this->key()];
		$next = $next->anyChildren();
		return $next;
	}

	/**
	 * JTreeIterator::getChildren()
	 * 
	 * @return JTreeIterator
	 */
	public function getChildren() {
		$childObj = $this->_list[$this->key()];
		$children = $childObj->getChildren();
		return new JTreeIterator($this->_list, $children);
	}
}

As an example assume that you wanted to create a unordered list based on an array of values
similar to something you would get from a database which typically stores hierarchical data using the ‘The Adjacency List Model’. Further assume the following collection:

$categories = array();
$categories[] = array('id' => 1, 'weather_condition' => 'weather', 'parent_id' => 0);
$categories[] = array('id' => 2, 'weather_condition' => 'Earthquakes', 'parent_id' => 1);
$categories[] = array('id' => 3, 'weather_condition' => 'Major', 'parent_id' => 2);
$categories[] = array('id' => 4, 'weather_condition' => 'Minor', 'parent_id' => 2);
$categories[] = array('id' => 5, 'weather_condition' => 'Fires', 'parent_id' => 1);
$categories[] = array('id' => 6, 'weather_condition' => 'Rain', 'parent_id' => 1);
$categories[] = array('id' => 7, 'weather_condition' => 'Flooding', 'parent_id' => 6);
$categories[] = array('id' => 8, 'weather_condition' => 'Washout', 'parent_id' => 6);
$categories[] = array('id' => 9, 'weather_condition' => 'Hurricanes', 'parent_id' => 1);

To use our iterator we need to extend our iterator off RecursiveIteratorIterator . Customize it to build our ul/li string.

/**
 * JTreeRecursiveIterator
 * 
 * To use a recursive iterator you have to extend of the RecursiveIteratorIterator
 * As an example I have built an unordered list 
 * For detailed information on please see RecursiveIteratorIterator
 * http://us.php.net/manual/en/class.recursiveiteratoriterator.php
 * 
 * @package   JTree
 * @author Jayesh Wadhwani 
 * @copyright Jayesh Wadhwani
 * @license  GNU GENERAL PUBLIC LICENSE 3.0
 * @version 1.0 2011
 */
class JTreeRecursiveIterator extends RecursiveIteratorIterator {
   /**
   * @var _jTree the JTree object 
   */
	private $_jTree;
   /**
   * @var _str string with ul/li string 
   */
	private $_str;

	/**
	 * JTreeRecursiveIterator::__construct()
	 * 
	 * @param mixed $jt - the tree object
	 * @param mixed $iterator - the tree iterator
	 * @param mixed $mode 
	 * @param integer $flags
	 * @return
	 */
	public function __construct(JTree $jt, $iterator, $mode = LEAVES_ONLY, $flags = 0) {

		parent::__construct($iterator, $mode, $flags);
		$this->_jTree = $jt;
		$this->_str = "<ul>\n";
	}

	/**
	 * JTreeRecursiveIterator::endChildren()
	 * Called when end recursing one level.(See manual) 
	 * @return void
	 */
	public function endChildren() {
		parent::endChildren();
		$this->_str .= "</ul></li>\n";
	}

	/**
	 * JTreeRecursiveIterator::callHasChildren()
	 * Called for each element to test whether it has children. (See Manual)
    * 
	 * @return mixed
	 */
	public function callHasChildren() {
		$ret = parent::callHasChildren();
		$value = $this->current()->getValue();

		if($ret === true) {
			$this->_str .= "<li>{$value}<ul>\n";
		} else {
			$this->_str .= "<li>{$value}</li>\n";
		}
		return $ret;
	}

	/**
	 * JTreeRecursiveIterator::__destruct()
	 * On destruction end the list and display.
	 * @return void
	 */
	public function __destruct() {
		$this->_str .= "</ul>\n";
                echo $this->_str;
	}

}

Now that our code is done we can use it.


//create a new tree object
$jt = new JTree();

//iterate building the tree
foreach($categories as $category) {
	$uid = $jt->createNode($category['weather_condition'],$category['id']);
	$parentId = null;

	if(!empty($category['parent_id'])) {
		$parentId = $category['parent_id'];
	}

	$jt->addChild($parentId, $uid);
}

$it = new JTreeRecursiveIterator($jt, new JTreeIterator($jt->getTree()), true);

//iterate to create the ul list
foreach($it as $k => $v) {}


And the output is…

<ul>
   <li>weather
      <ul>
         <li>Earthquakes
            <ul>
               <li>Major</li>
               <li>Minor</li>
           </ul>
        </li>
       <li>Fires</li>
       <li>Rain
          <ul>
            <li>Flooding</li>
            <li>Washout</li>
         </ul>
      </li>
      <li>Hurricanes</li>
    </ul>
  </li>
</ul>

and it looks something like this:

  • weather
    • Earthquakes
      • Major
      • Minor
    • Fires
    • Rain
      • Flooding
      • Washout
    • Hurricanes

Enjoy 🙂
P.S. Based on Chris’s comments I wrote the second part to this post: Implementation of a Tree Structure in PHP – II

Option to select specific tab after removal in jQuery UI.TABS

One of the problems with the remove method of jquery ui.tabs is that after deleting the tab it either selects the tab on the right or if it is the last tab it selects one on the left.

My initial impression was that I could change this behavior by calling the select method in the ‘tabsremove’ event. But that does not work as the default selection takes place before the event is fired as indicated in the source below:

    remove: function( index ) {
                index = this._getIndex( index );
                var o = this.options,
                    $li = this.lis.eq( index ).remove(),
                    $panel = this.panels.eq( index ).remove();
              //DEFAULT SELECTION TAKES PLACE HERE
                // If selected tab was removed focus tab to the right or
                // in case the last tab was removed the tab to the left.
                if ( $li.hasClass( "ui-tabs-selected" ) && this.anchors.length > 1) {
                    this.select( index + ( index + 1 < this.anchors.length ? 1 : -1 ) );
                }
                o.disabled = $.map(
                    $.grep( o.disabled, function(n, i) {
                        return n != index;
                    }),
                    function( n, i ) {
                        return n >= index ? --n : n;
                    });
                this._tabify();
              //EVENT FIRED HERE
                this._trigger( "remove", null, this._ui( $li.find( "a" )[ 0 ], $panel[ 0 ] ) );
                return this;
            },

So that I could better control as to what was selected after tab removal I over-rode the remove function like so:

$.extend( $.ui.tabs.prototype, {
        remove: function( index, nextIndex) {
              var o = this.options;
         
             index = this._getIndex( index );
         
          //if nextIndex passed use it or use use the default which is follows:
          //If selected tab was removed focus tab to the right or
            //in case the last tab was removed the tab to the left.
          if(nextIndex !== undefined){
             nextIndex = this._getIndex(nextIndex);
          }else{
             if ( $li.hasClass( "ui-tabs-selected" ) && this.anchors.length > 1) {
                   nextIndex = index + ( index + 1 < this.anchors.length ? 1 : -1 );
             }
          };
        
            //remove this tab
          $li = this.lis.eq( index ).remove();
            $panel = this.panels.eq( index ).remove();
         
        
          //select the next tab
          this.select(nextIndex);
          //make sure to remove the removed tab index
          //from the disabled list
            o.disabled = $.map(
                $.grep( o.disabled, function(n, i) {
                    return n != index;
                }),
                function( n, i ) {
                    return n >= index ? --n : n;
                });
          this._tabify();
            this._trigger( "remove", null, this._ui( $li.find( "a" )[ 0 ], $panel[ 0 ] ) );
         
            return this;
        }
});

USAGE:
.tabs( “remove” , index, nextIndex )
Example:
$tabs.tab(“remove”, 2, 5); //This would remove tab at index 2 and then select tab at index 5

I hope it helps someone and I look forward to everyone’s feedback.

Also replicated on the jQuery Forum:
http://forum.jquery.com/topic/ui-tabs-remove-method-additional-option-to-select-sepecific-tab-after-removal

Doubly Linked List in PHP

One of the weaknesses of PHP is the lack of Abstract Data Structures. 5.3 brought in a few but nothing very significant. Users of 5.2+ have none. Creating data structures in PHP is also not very simple because PHP’s garbage collection mechanism does not like recursive references. PHP bug 33595 confirms this very eloquently. This also teaches us an important lesson. Never take garbage collection for granted. Always clean up after yourself.

It all boils down to how objects in PHP are referenced. We have object identifiers and not pointers or object hashes. I did get some implementations on the web but all seemed to have forgotten about the GC issue. A large list with thousands of nodes resulted in memory errors.

To decouple my implementation from object identifiers I decided to use hash table to store my node objects. Let me explain.

In a traditional implementation I would have something like the following. A rudimentary node class with few methods shown. Also note that $_previous and $_next are object identifiers. :

class JNode
{
   private $_value;
   private $_previous;
   private $_next;

   public function __construct($value)
   {
      $this->setValue($value);
   }

public function setPrevious(JNode $previous)
   {
      $this->_previous = $previous;
   }

   public function setNext(JNode $next)
   {
      $this->_next = $next;
   }

You could use it like this:

$head = new MyNode('HEAD');
$head->setPrevious(null);

$node1 = new MyNode('NODE-1');
$node1->setPrevious($head);
$head->setNext($node1);

...
...

Rather than using object identifiers why not do it this way. I have modified the class to use UIDs. Also note that in the constructor I have setting up a new UID for the object. Now $_next and $_previous will be UIDs and not object identifiers.:

public function __construct($value)
   {
      $this->setValue($value);
      $this->setUid();
   }

   public function setUid($uid = null)
   {
      //if uid not supplied...generate
      if(empty($uid)) {
         $this->_uid = uniqid();
      } else {
         $this->_uid = $uid;
      }
   }

   public function getUid()
   {
      return $this->_uid;
   }

   public function setValue($value)
   {
      if(empty($value)) {
         throw new Exception('A value is required.');
      }
      $this->_value = $value;
   }

   public function setPrevious($previous)
   {
      if(empty($previous)) {
         throw new Exception('A unique ID is required.');
      }
      $this->_previous = $previous;
   }

   public function setNext($next)
   {
      if(empty($next)) {
         throw new Exception('A unique ID is required.');
      }
      $this->_next = $next;
   }

The use would be according to these line:

//my hashtable
//uid => object 
$myHashTable = array();

$head = new MyNode('HEAD');
$headUid = $head->getUid();
$myHashTable[$headUid] = $head;

$head->setPrevious(null);

$node1 = new MyNode('NODE-1');
$node1Uid = $node1->getUid();
$myHashTable[$node1Uid ] = $node1;

$node1->setPrevious($headUid);
$head->setNext($node1Uid);

...
...

Now rather than using the object identifier as reference to the object we use a UID to refer to the object identifier which refers to the actual object. This way there are no circular or recursive references. Also GC becomes very straightforward. Every list has an iterator. Writing one in this case also becomes very simple. Now that you have seen the concept here is the final implementation.

The final implementation can be found on Github: doubly-linked-list-php: Implementation of a doubly linked list for legacy PHP sites (5.2.X)

Please note that this is a minimum implementation. You could add many more methods such as inserting at an index, getting the size of the list, indexOf. My main intent is to show how to use UIDs as object references. Also note that in the main class I have helper classes to create, add and delete nodes.

In the following example I have included memory use statements to make sure that I clean up properly.

//instantiate the list
$ll = new JDoublyLinkedList();

//First node goes after the head.
$firstNode = $ll->createNode('First Node');
$ll->addFirst($firstNode);

//10 more nodes
for($i = 0; $i < 5; $i++) {
   $node = $ll->createNode('Node-' . $i);
   $ll->addNext($firstNode, $node);
   $firstNode = $node;
}

//and this is how we iterate...iterate
$it = new JLinkedListIterator($ll);

foreach($it as $k => $v) {
   echo '<pre>', $k, ': ', print_r($v, true), '</pre>';
}

unset($ll, $it);

echo 'Peak: ' . number_format(memory_get_peak_usage(), 0, '.', ',') .  " bytes<br>";

echo 'End: ' . number_format(memory_get_usage(), 0, '.', ',') . " bytes<br>";

The output would be something like this:

Initial: 126,360 bytes

4d7f677322b38: JNode Object
(
    [_value:private] => First Node
    [_previous:private] => 4d7f677322b06
    [_next:private] => 4d7f677322b69
    [_uid:private] => 4d7f677322b38
)

4d7f677322b69: JNode Object
(
    [_value:private] => Node-0
    [_previous:private] => 4d7f677322b38
    [_next:private] => 4d7f677322b95
    [_uid:private] => 4d7f677322b69
)

4d7f677322b95: JNode Object
(
    [_value:private] => Node-1
    [_previous:private] => 4d7f677322b69
    [_next:private] => 4d7f677322bbf
    [_uid:private] => 4d7f677322b95
)

4d7f677322bbf: JNode Object
(
    [_value:private] => Node-2
    [_previous:private] => 4d7f677322b95
    [_next:private] => 4d7f677322be8
    [_uid:private] => 4d7f677322bbf
)

4d7f677322be8: JNode Object
(
    [_value:private] => Node-3
    [_previous:private] => 4d7f677322bbf
    [_next:private] => 4d7f677322c12
    [_uid:private] => 4d7f677322be8
)

4d7f677322c12: JNode Object
(
    [_value:private] => Node-4
    [_previous:private] => 4d7f677322be8
    [_next:private] => 4d7f677322b1b
    [_uid:private] => 4d7f677322c12
)

Peak: 175,952 bytes
End: 135,080 bytes

Notice that the cleanup was not bad. If I had to create 1000 nodes then this is what the memory utilization looks like:

Initial: 126,376 bytes
Peak: 803,464 bytes
End: 192,440 bytes

That’s pretty decent in terms of GC but there is still room for improvement.

Multiple replacements with JavaScript Regular Expressions

Recently I came across a piece of code using the replace command. Looking at the code I got the feeling that the author was not very comfortable with Regex replacements. Here is the snippet which I have modified slightly to explain my point.

<!DOCTYPE html>
<html>
<head>
<script src="jquery-1.4.2.js"></script>
<script language = "JavaScript" type = "text/javascript">

  $(document).ready(function (){
      
      var text = [];
      text.push('Suddenly, the wolf appeared beside her. "What are you doing out here, little girl?"\n');
      text.push('the wolf asked in a voice as friendly as he could muster.\n');
      text.push('I\'m on my way to see my Grandma who lives through the forest, near the brook,"  Little Red Riding Hood replied.');
      
      var postdata = [];   
      var re = new RegExp("\n");
      var re1 = new RegExp("[.]", "g");
      var re2 = new RegExp("[,]", "g");
      var re3 = new RegExp("[?]", "g");
      var re4 = new RegExp("[!]", "g");
      var re5 = new RegExp("[:]", "g");
      var re6 = new RegExp("[;]", "g");
      var re7 = new RegExp("[']", "g");
      var re8 = new RegExp('["]', "g");

      $.each(text, function(){
         var content = this.replace(re, ' ');
         content = content.replace(re1, '').replace(re2, '').replace(re3, '').replace(re4, '')
            .replace(re5, '').replace(re6, '').replace(re7, '').replace(re8, '');
         
         postdata.push(content);
      });

      //print it out
      for(var i in postdata){
         console.log('OLD: ' + postdata[i]);
      }
            
  });
</script>
</head>
<body>
<!-- some stuff -->
</body>
</html>

As is obvious all the above code does is replaces the character ‘\n’ with a space and characters dot, comma, question mark, exclamation, colon, semi-colon, single and double quotes with an empty string.

If you were to run this the output would be:
Console [7]=
OLD: Suddenly the wolf appeared beside her What are you doing out here little girl

Console [8]=
OLD: the wolf asked in a voice as friendly as he could muster

Console [9]=
OLD: Im on my way to see my Grandma who lives through the forest near the brook Little Red Riding Hood replied

Fortunately JavaScript’s Regex engine is quite robust and flexible and can pretty much take on any other Regex engines. So the above code can be written as follows:

<!DOCTYPE html>
<html>
<head>
<script src="jquery-1.4.2.js"></script>
<script language = "JavaScript" type = "text/javascript">

  $(document).ready(function (){
      
      var text = [];
      text.push('Suddenly, the wolf appeared beside her. "What are you doing out here, little girl?"\n');
      text.push('the wolf asked in a voice as friendly as he could muster.\n');
      text.push('I\'m on my way to see my Grandma who lives through the forest, near the brook,"  Little Red Riding Hood replied.');

      var postdata = [];
      var re = /['\n']/g;
      var re1 = /[\.\,\?\!\:\;\'\"\']/g;
      
      for(var i in text){
         var content = text[i].replace(re, ' ').replace(re1, '');
         postdata.push(content);
      };

       for(var i in postdata){
         console.log('NEW: ' + postdata[i]);
      }
      
  });
</script>
</head>
<body>
<!-- some stuff -->
</body>
</html>

The output is:

Console [10]=
NEW: Suddenly the wolf appeared beside her What are you doing out here little girl

Console [11]=
NEW: the wolf asked in a voice as friendly as he could muster

Console [12]=
NEW: I m on my way to see my Grandma who lives through the forest near the brook Little Red Riding Hood replied

As you can see it is much more elegant and many times faster. Also notice that instead of using the jQuery $.each iterator I have use the JavaScript for…in. This is much faster than $.each. I recommend that you use this whenever possible.

Detached…not quite!

jQuery 1.4 has a very helpful function ‘detach’ where I can detach a section and keep it away for later use. The best part is that it keeps all my bindings intact. But like anything else you need to watch out for caveats.

Consider the following example:


$(document).ready(function (){           

   //detach and keep the reference
   var $a = $('#div-abcde').detach();

   //try to change the value of one of the detached elements
   $('#abcde').val('5678');
   
   //attach it back
   $('#div-abcd').after($a);
   
   //output
   alert($('#abcde').val());
  });
<form id = "grid-3762" name = "grid" action = "">
     <div id="div-abcd"><input type="text" id="abcd" class="abcd" value="abcd" /> </div>
      <div id="div-abcde"><input type="text" id="abcde" class="abcd" value="abcde" /> </div>
</form>

This works as expected. The statement ‘$(‘#abcde’).val(‘5678′);’ has no effect as this element has been detached. So the output is as expected, that is ‘abcde’.

Now lets do the same example a little differently. Consider this:


$(document).ready(function (){           
    //keep a reference to the element.
    //remember this element wil be part of the detach
    var $r = $('#abcde');
   
   //detach and keep the reference
   var $a = $('#div-abcde').detach();

   //try to change the value using the reference
    $r.val('5678');
   
   //attach it back
   $('#div-abcd').after($a);
   
   //output
   alert($('#abcde').val());
  });

So what we are doing here that we are changing the value of the element by using a reference to the element. This is reference $r to the element.

Now the output is ‘5678’!

So even though it is detached it still maintains its references.

For everything there is a function!

JavaScript as you know is a functional language, meaning that functions are treated as ‘first class’ objects. They can be used as plain old functions, assigned to variables, passed in a parameters, created on the fly, extended and act and work as objects.

A PHP programmer who has been treating JavaScript as a supplementary language finds this variety of function construction and invocation a bit frustrating. To ease the pain here is a short essay.

There are three common ways of defining and invoking a JavaScript function.

The most common is the ‘Named function’ that we all know and love.

    //named or declarative function
    function myFn(someVar){
        alert('myFn executed - ' + someVar); 
   };
   
   //To invoke
    myFn(77);

The output is as expected: myFn executed – 77

Just to whet your appetite this simple function can have properties and methods and can be augmented using the prototype property.

The next method of declaring a function is as an expression or literal. Consider the following example:

//function expression or function literal
   var myFn2 = function(someVar){
      alert('Function executed - ' + someVar); 
   };
//to invoke
   myFn2(88);

The output in this case will be ‘Function executed – 88’

As you can see there is nothing special about this except that the variable myFn2 holds the reference to the function.

The third way of defining and invoking is using the constructor pattern.

//using the constructor
   function myFn3(someVar){
      alert('Function executed - ' + someVar); 
   };
//to invoke
   var myFnInstance = new myFn3(99);

The output in this case will be ‘Function executed – 99’

But, all said and done, these various function incarnations begs us the question…where should we be using each type, how and why?

One of my favorites is to pass a function as a parameter. This is something very powerful and its applications are only restricted by your imagination. Consider the following example:

 //Main function which accepts a function reference as an argument
   var myMainFn = function(someFunc){
      someFunc();
   };
  
   //A simple function expression
   var myFn1 = function(){
      alert('myFn1 executed');
       
   };
   
   myMainFn(myFn1);

The output is ‘myFn1 executed’!

Think about it. A main dispatch function which receives a function reference as an argument. We now have a dynamic function dispatch table.

My next favorite type is ‘Returning Functions’. Consider the following example:

//the main function
 var myReturningFn1 = function(someVar){
      alert(someVar);
      //returning function
      return function(){
         alert(++someVar);
      };
   }
   
   var returnFn = myReturningFn1(111);
   returnFn();

In this function rather than returning a variable it returns a reference to the function. In other words if I were to do this:

myReturningFn1(111);

the output would be ‘111’.

But by doing:
var returnFn = myReturningFn1(111);
we are getting a reference to the return function. And to execute all we do
returnFn() and the output would be 112.

I could have written the function like so:

 var myReturningFn1 = function(someVar){
      alert(someVar);
      
      var returnFn = function(){
         alert(++someVar);
      }; 
      return returnFn; 
   }
   
   var returnFn = myReturningFn1(111);
   returnFn();

However you must realize that the first way of writing is more in vogue.

Getting back to the first way of writing the function an astute reader will have realized that after doing:

var returnFn = myReturningFn1(111);

The function myReturningFn1 has finished and gone out of scope and so has the variable someVar! But when you now do

returnFn() it actaully increments someVar. Which means that even if the outer function has gone out of scope the inner function still has access to its variables. This is called a ‘closure’, one of the most important concepts of a functional language and a very powerful feature of JavaScript. In this case the outer function myReturnFn1 closes over the inner function resulting in a closure.

More than that, one of the most valuable qualities of the returning functions is that you on the outside do not have any access to the variables of the outer function. Only the returning or the inner function does. You may say they are private variables:-)

One typical pattern I love to use in my programs is:

var myReturningFn1 = function(someVar){
      alert(someVar);
      var someObj = {
        init: function(){
         alert('Init! - ' + ++someVar);
        } 
      };
      return someObj;
   }
   
   var returnFn = myReturningFn1(111);
   returnFn.init();

The output in this case is ‘Init! – 112’

This way I have my methods neatly tucked away and my private variable safeguarded from accidental changes.

Finally my next favorite is the ‘immediate functions’. Consider the following:

 (function myImmediateFn(){
      alert('Executed!');
   })();

The output will be ‘Executed!’

The beauty of this is that the function is declared, executed and then destroyed, all in in go. If you wanted to pass in a parameter then we could write this as:

Finally my next favorite is the ‘immediate functions’. Consider the following:

 (function myImmediateFn(someVar){
      alert('Executed! - ' + someVar);
   })(88);

The output will be ‘Executed! – 88’

This pattern is also called the (function(){})(); pattern.

This is used very often in plugins.

(function($) {
  //your plugin code
})(jQuery);

I tend to use this pattern quite a bit when I want my libraries to be initialized on load.

I hope this essay has created an appetite for more functional exploration.

If you plan to go ahead I must recommend these three books by JavaScript gurus!

1. Secrets of the JavaScript Ninja – John Resig
2. JavaScript Patterns – Stoyan Stefanov
3. JavaScript: The Good Parts – Douglas Crockford

A live binding fun fact

jQuery allows event delegation using .live().

When using jQuery there are instances where I have to bind my events to the document root, i.e. $(document). My impression was that ‘live’ should also work with this root. But it does not. Consider the following example.

<script language = "JavaScript" type = "text/javascript">
  $(document).ready(function (){

  $(document).bind('custom_event.am', function(){
      alert('Bind works with document root!');
   });
   
   $(document).trigger('custom_event.am');
  
  $(document).live('custom_event1.am', function(){
      alert('Live does not work with document root!');
   });
   
   $('#am_wrapper').trigger('custom_event1.am');
   
  });
  
</script>

But if we use anything other than the document root, there is no problem. Consider the following example.

<script language = "JavaScript" type = "text/javascript">
  $(document).ready(function (){

  $('#am_wrapper').live('custom_event1.am', function(){
      alert('Live now works!');
   });
   
   $('#am_wrapper').trigger('custom_event1.am');
   
  });
  

</script>
<div id = "am_wrapper">  </div>

To unravel the mystery I delved into the jQuery source code and concluded the following.

Live events are bound to the document root. So when a ‘live’ event fires it bubbles up to the document root. A special handler function first called(‘liveHandler’). Before it executes the custom(your) handler it makes an attempt to gather all the binding for the the said event. To do this it runs this statement:

match = jQuery( event.target ).closest( selectors, event.currentTarget );

This fails because ‘closest’ starts from the current element and moves upwards. As the target and current target both are the document root there is no where to go. As there are no matches(match.length === 0) the function exits without doing much.

Even if it wasn’t for this caveat I am not a big fan of binding my events to the root. Would prefer to reserve a top level element to store all my bindings.

Undefined is a special value not a string!

Recently I came across this statement

 if (val != 'undefined' && val != null && val != 'null') {
//some statements
}

undefined and null are simple types and not literal strings!

alert(null === ‘null’); will return false so will alert(undefined === ‘undefined’);

So the statement should be:

 if (val !== undefined && val !== null) {
//some statements
}