<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-2155973909202515480</id><updated>2011-07-07T14:39:01.574-07:00</updated><category term='PHP'/><category term='tower defense'/><category term='iterator'/><category term='google maps'/><category term='python'/><category term='map reduction'/><category term='C'/><category term='completeness'/><category term='gdbm'/><category term='desktop tower defense'/><category term='map'/><category term='independent set'/><category term='geographic location'/><category term='directory scan'/><category term='monte carlo'/><category term='maximal independent set'/><category term='Graph Library'/><category term='database'/><category term='directory listing'/><title type='text'>Random programming</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://jdhsu.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://jdhsu.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>jd hsu</name><uri>http://www.blogger.com/profile/07390042970137791452</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_gvQh1p7tre8/S0wk8rSPWBI/AAAAAAAAAAM/Q4KbTqG8Tf0/S220/DSC00577.JPG'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>5</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-2155973909202515480.post-4309475910652010677</id><published>2010-01-20T22:47:00.000-08:00</published><updated>2010-02-18T12:54:03.500-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='map'/><category scheme='http://www.blogger.com/atom/ns#' term='independent set'/><category scheme='http://www.blogger.com/atom/ns#' term='map reduction'/><category scheme='http://www.blogger.com/atom/ns#' term='maximal independent set'/><title type='text'>Reducing the Granularity of a Map</title><content type='html'>You've got this database of place coordinates, but find that two or more places are "too" close from each other.&lt;br /&gt;For instance, you want to make sure that no 2 places are closer than &lt;b&gt;d&lt;/b&gt; mile(s) from each other.&lt;br /&gt;The problem is to find the minimum set of nodes to remove in order to satisfy the restriction.&lt;br /&gt;&lt;br /&gt;While it is easy to visualize a solution for a graph of small size, it can quickly get complicated.&lt;br /&gt;Consider these 3 places.&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://4.bp.blogspot.com/_gvQh1p7tre8/S1fpTTfpLNI/AAAAAAAAAA8/3NqN41oW6rw/s1600-h/3places.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"&gt;&lt;img border="0" src="http://4.bp.blogspot.com/_gvQh1p7tre8/S1fpTTfpLNI/AAAAAAAAAA8/3NqN41oW6rw/s320/3places.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;How do you decide which node(s) to remove in order to maintain the restriction that &lt;i&gt;d &amp;gt; 3&lt;/i&gt; for all pairs of points?&lt;br /&gt;There clearly are 3 possibilities:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Remove red point&lt;/li&gt;&lt;li&gt;Remove black and green points&lt;/li&gt;&lt;li&gt;Remove all 3 points&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;The good news is that this is a well known optimization problem: &lt;a href="http://en.wikipedia.org/wiki/Independent_set_problem"&gt;the Independent Set problem&lt;/a&gt;.&lt;br /&gt;The bad news is that it is NP-hard.&lt;br /&gt;A simpler problem that can be solved with a greedy algorithm is finding &lt;i&gt;a&lt;/i&gt; maximal set.&lt;br /&gt;Given two nodes A and B such that distance(A, B) &amp;lt; k, remove the node with the least neighbors.&lt;br /&gt;In this algorithm, the map is simply a graph where there is an edge between two nodes if they are sufficiently close from each other (if the two locations are neighbors).&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: plain"&gt;loop map as place1:&lt;br /&gt;loop map as place2:&lt;br /&gt;  if place1 != place2 and&lt;br /&gt;     distance(place1, place2) &lt; k:&lt;br /&gt;    if neighbors(place1) &gt; neighbors(place2):&lt;br /&gt;      discard(place2)&lt;br /&gt;    else:&lt;br /&gt;      discard(place1)&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Of course, any domain knowledge should be exploited.&lt;br /&gt;For instance, if the map represents the biggest cities in country X, then the population size of each location should be taken into account before discarding a node.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155973909202515480-4309475910652010677?l=jdhsu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jdhsu.blogspot.com/feeds/4309475910652010677/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jdhsu.blogspot.com/2010/01/reducing-granularity-of-map.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/4309475910652010677'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/4309475910652010677'/><link rel='alternate' type='text/html' href='http://jdhsu.blogspot.com/2010/01/reducing-granularity-of-map.html' title='Reducing the Granularity of a Map'/><author><name>jd hsu</name><uri>http://www.blogger.com/profile/07390042970137791452</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_gvQh1p7tre8/S0wk8rSPWBI/AAAAAAAAAAM/Q4KbTqG8Tf0/S220/DSC00577.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_gvQh1p7tre8/S1fpTTfpLNI/AAAAAAAAAA8/3NqN41oW6rw/s72-c/3places.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2155973909202515480.post-1354745866632252209</id><published>2010-01-20T19:50:00.000-08:00</published><updated>2010-01-20T20:01:56.279-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='database'/><category scheme='http://www.blogger.com/atom/ns#' term='geographic location'/><category scheme='http://www.blogger.com/atom/ns#' term='google maps'/><title type='text'>Geographical Mapping and Google Maps</title><content type='html'>What you want to do: Partition a geographical area, where each area contains elementary entities.&lt;br /&gt;You want to set up a web service where users are allowed to add locations by clicking on a map, say using the Google Maps API.&lt;br /&gt;The problem is that adding all the locations at the top level of a tree will quickly lead to a bloated data structure increasing search time and make it hard to scale.&lt;br /&gt;One approach is to divide the area into partitions, a bit like the United States is divided into states.&lt;br /&gt;States work fine as their union does cover the whole country, but their respective shapes is unnecessarily tortuous.&lt;br /&gt;All we need really is a bunch of possibly overlapping circular areas defined by a center and a radius r.&lt;br /&gt;&lt;br /&gt;&lt;div class="separator" style="clear: both; text-align: center;"&gt;&lt;a href="http://3.bp.blogspot.com/_gvQh1p7tre8/S1fO4TgYI6I/AAAAAAAAAAw/81z5JqFDp5w/s1600-h/usa.jpg" imageanchor="1" style="clear: right; float: right; margin-bottom: 1em; margin-left: 1em;"&gt;&lt;img border="0" src="http://3.bp.blogspot.com/_gvQh1p7tre8/S1fO4TgYI6I/AAAAAAAAAAw/81z5JqFDp5w/s320/usa.jpg" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;It turns out it is relatively easy to find databases of US populated locations with their coordinates.&lt;br /&gt;Simply use these as your centers and pick a radius that will cover the whole territory, depending on the granularity of your database.&lt;br /&gt;That way, users browse by locations and add markers for each location, allowing your data structure to scale.&lt;br /&gt;&lt;br /&gt;A problem that this layout creates is that of marker visibility.&lt;br /&gt;Assuming two towns are rather close from each other and one user adds a place through town A, then all users browsing that area through town B will not see this marker.&lt;br /&gt;A solution is to store all the neighbor locations of a given location.&lt;br /&gt;Given a location L1, a location L2 is a neighbor of L1 iff distance(L1, L2) &lt; 2r, where r is the radius of a location.&lt;br /&gt;Using this schema, town B would be listed as a neighbor of town A and, as such, its list of place markers would be displayed by all its neighbors, including town A.&lt;br /&gt;That way, no matter through which location a user adds a marker, all other users will be able to see it if they browse that area through neighbor locations (locations that would cover the marker).&lt;br /&gt;&lt;br /&gt;What if I select a location on the west coast, then add a marker on the east coast?&lt;br /&gt;This will break the system as this marker will not be visible by any other locations than the original location and its neighbors.&lt;br /&gt;That's why in addition to storing neighbors of a location, users should not be able to add place markers more than r distance from a location's center.&lt;br /&gt;&lt;br /&gt;If all these constraints are respected, you've got a scalable database of places, under the provision that users know the general location that they are looking for. A reasonable assumption.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155973909202515480-1354745866632252209?l=jdhsu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jdhsu.blogspot.com/feeds/1354745866632252209/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jdhsu.blogspot.com/2010/01/blog-post.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/1354745866632252209'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/1354745866632252209'/><link rel='alternate' type='text/html' href='http://jdhsu.blogspot.com/2010/01/blog-post.html' title='Geographical Mapping and Google Maps'/><author><name>jd hsu</name><uri>http://www.blogger.com/profile/07390042970137791452</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_gvQh1p7tre8/S0wk8rSPWBI/AAAAAAAAAAM/Q4KbTqG8Tf0/S220/DSC00577.JPG'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_gvQh1p7tre8/S1fO4TgYI6I/AAAAAAAAAAw/81z5JqFDp5w/s72-c/usa.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2155973909202515480.post-2497035595363589913</id><published>2010-01-20T03:04:00.000-08:00</published><updated>2010-01-20T21:31:02.838-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='monte carlo'/><category scheme='http://www.blogger.com/atom/ns#' term='tower defense'/><category scheme='http://www.blogger.com/atom/ns#' term='python'/><category scheme='http://www.blogger.com/atom/ns#' term='completeness'/><category scheme='http://www.blogger.com/atom/ns#' term='desktop tower defense'/><title type='text'>Desktop Tower Defense: My Computer Build</title><content type='html'>Remember that &lt;a href="http://www.handdrawngames.com/DesktopTD/Game.asp"&gt;disturbingly addicting flash game from 2 years ago&lt;/a&gt;?&lt;br /&gt;There was two things to consider: how to arrange your turrets (your build), and which turrets to pick.&lt;br /&gt;Can you have your computer produce the perfect layout? &lt;br /&gt;Assuming that you assign a score for each area of the grid based on neighbor blocks and longest distance from start to finish, building a machine that maximizes the setup is &lt;a href="http://en.wikipedia.org/wiki/NP-complete"&gt;NP-complete&lt;/a&gt;.&lt;br /&gt;What about simply maximizing the overall vertical and horizontal distance that creeps have to follow? Still NP-complete.&lt;br /&gt;Let's forget trying to build a deterministic algorithm and fall back on &lt;a href="http://en.wikipedia.org/wiki/Monte_Carlo_method"&gt;Monte Carlo&lt;/a&gt; instead.&lt;br /&gt;The strategy is to randomly place blocks and retain the layouts with longest path from start to finish.&lt;br /&gt;Here is the best layout found by my machine:&lt;br /&gt;&lt;br /&gt;&lt;a href="http://sites.google.com/site/codestench/images-1/Screenshot.jpg"&gt;&lt;img src="http://sites.google.com/site/codestench/images-1/Screenshot.jpg" alt="screenshot" style="width:380px;"/&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;If you look at it closely, you'll see a resemblance with &lt;a href="http://www.youtube.com/watch?v=UR2ux2vLaNo"&gt;what a human would build&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Here's the Python code:&lt;br /&gt;&lt;strong&gt;graph.py&lt;/strong&gt;&lt;br /&gt;&lt;pre class="brush: python"&gt;import sys&lt;br /&gt;import random&lt;br /&gt;import heapq&lt;br /&gt;&lt;br /&gt;class Graph:&lt;br /&gt;  NODE_FREE = 0&lt;br /&gt;  NODE_OCCUPIED = -1&lt;br /&gt;  NODE_V_START = 10&lt;br /&gt;  NODE_V_END = 11&lt;br /&gt;  NODE_H_START = 20&lt;br /&gt;  NODE_H_END = 21&lt;br /&gt;&lt;br /&gt;  visited = []&lt;br /&gt;  &lt;br /&gt;  def __init__(self, width, height):&lt;br /&gt;    self.width = width&lt;br /&gt;    self.height = height&lt;br /&gt;    self.nnodes = width * height + 4 # grid + start and end points&lt;br /&gt;    self.nedges = 0&lt;br /&gt;    self.nodes = []&lt;br /&gt;    self.edges = []&lt;br /&gt;    self.tower_pos = [] # list of nodes&lt;br /&gt;    self.vstart = -1&lt;br /&gt;    self.vend = -1&lt;br /&gt;    self.hstart = -1&lt;br /&gt;    self.hend = -1&lt;br /&gt;&lt;br /&gt;    """ Add the nodes """ &lt;br /&gt;    for _ in range(self.nnodes):&lt;br /&gt;      self.nodes.append(self.NODE_FREE)&lt;br /&gt;      row = []&lt;br /&gt;      for _ in range(self.nnodes):&lt;br /&gt;        row.append(False)&lt;br /&gt;        self.visited.append(False)&lt;br /&gt;      self.edges.append(row)&lt;br /&gt;&lt;br /&gt;    """ Set up the grid """&lt;br /&gt;    active_nodes = self.width * self.height&lt;br /&gt;    for n in range(active_nodes):&lt;br /&gt;      if n % self.width != 0:&lt;br /&gt;        self.add_edge(n, n - 1)&lt;br /&gt;      if n % self.width != self.width - 1:&lt;br /&gt;        self.add_edge(n, n + 1)&lt;br /&gt;      if n - self.width &gt;= 0:&lt;br /&gt;        self.add_edge(n, n - self.width)&lt;br /&gt;      if n + self.width &lt; active_nodes:&lt;br /&gt;        self.add_edge(n, n + self.width)&lt;br /&gt;&lt;br /&gt;    """ &lt;br /&gt;        Define vertical and horizontal starting and ending nodes (4)&lt;br /&gt;        Link start and end nodes to the entry and exit nodes (nodes #10-17, #529-536&lt;br /&gt;        and {208;234;260;286;312;338}, {233;259;285;311;337;363})&lt;br /&gt;    """&lt;br /&gt;&lt;br /&gt;    self.vstart = self.nnodes - 1&lt;br /&gt;    self.vend = self.nnodes - 2&lt;br /&gt;    self.hstart = self.nnodes - 3&lt;br /&gt;    self.hend = self.nnodes - 4&lt;br /&gt;  &lt;br /&gt;    self.nodes[self.vstart] = self.NODE_V_START&lt;br /&gt;    self.nodes[self.vend] = self.NODE_V_END&lt;br /&gt;    self.nodes[self.hstart] = self.NODE_H_START&lt;br /&gt;    self.nodes[self.hend] = self.NODE_H_END&lt;br /&gt;&lt;br /&gt;    self.add_edge(self.vstart, 10)&lt;br /&gt;    self.add_edge(self.vstart, 11)&lt;br /&gt;    self.add_edge(self.vstart, 12)&lt;br /&gt;    self.add_edge(self.vstart, 13)&lt;br /&gt;    self.add_edge(self.vstart, 14)&lt;br /&gt;    self.add_edge(self.vstart, 15)&lt;br /&gt;    self.add_edge(self.vstart, 16)&lt;br /&gt;    self.add_edge(self.vstart, 17)&lt;br /&gt;&lt;br /&gt;    self.add_edge(self.vend, 529)&lt;br /&gt;    self.add_edge(self.vend, 530)&lt;br /&gt;    self.add_edge(self.vend, 531)&lt;br /&gt;    self.add_edge(self.vend, 532)&lt;br /&gt;    self.add_edge(self.vend, 533)&lt;br /&gt;    self.add_edge(self.vend, 534)&lt;br /&gt;    self.add_edge(self.vend, 535)&lt;br /&gt;    self.add_edge(self.vend, 536)&lt;br /&gt;&lt;br /&gt;    self.add_edge(self.hstart, 208)&lt;br /&gt;    self.add_edge(self.hstart, 234)&lt;br /&gt;    self.add_edge(self.hstart, 260)&lt;br /&gt;    self.add_edge(self.hstart, 286)&lt;br /&gt;    self.add_edge(self.hstart, 312)&lt;br /&gt;    self.add_edge(self.hstart, 338)&lt;br /&gt;&lt;br /&gt;    self.add_edge(self.hend, 233)&lt;br /&gt;    self.add_edge(self.hend, 259)&lt;br /&gt;    self.add_edge(self.hend, 285)&lt;br /&gt;    self.add_edge(self.hend, 311)&lt;br /&gt;    self.add_edge(self.hend, 337)&lt;br /&gt;    self.add_edge(self.hend, 363)&lt;br /&gt;&lt;br /&gt;  def display_vars(self):&lt;br /&gt;    print self.nnodes, ": ", self.nodes&lt;br /&gt;    print self.nedges, ": ", self.edges&lt;br /&gt;&lt;br /&gt;  def display(self):&lt;br /&gt;    for row in range(self.height):&lt;br /&gt;      for col in range(self.width):&lt;br /&gt;        if self.NODE_FREE == self.nodes[row * self.width + col]:&lt;br /&gt;          print "-",&lt;br /&gt;        else:&lt;br /&gt;          print "X",&lt;br /&gt;      print&lt;br /&gt;    print&lt;br /&gt;&lt;br /&gt;  def display_edges(self):&lt;br /&gt;    for i in range(self.nnodes):&lt;br /&gt;      print i, ": ",&lt;br /&gt;      for j in range(self.nnodes):&lt;br /&gt;        if True == self.edges[i][j]:&lt;br /&gt;          print j, "(", self.nodes[j], ")", ", ",&lt;br /&gt;      print&lt;br /&gt;&lt;br /&gt;  def reset(self):&lt;br /&gt;    self.tower_pos = []&lt;br /&gt;    for i in range(self.width * self.height):&lt;br /&gt;      self.nodes[i] = self.NODE_FREE&lt;br /&gt;    self.visited = []&lt;br /&gt;    for _ in range(self.nnodes):&lt;br /&gt;      self.visited.append(False)&lt;br /&gt;&lt;br /&gt;  """ Initialize the graph with turrets on the nodes given in the list @tower_pos """&lt;br /&gt;  def restore(self, tower_pos):&lt;br /&gt;    self.reset()&lt;br /&gt;    for i in tower_pos:&lt;br /&gt;      self.place_tower_on_node(i)&lt;br /&gt;&lt;br /&gt;  def add_node(self, val):&lt;br /&gt;    self.nnodes += 1&lt;br /&gt;    self.nodes.append(val)&lt;br /&gt;    row = []&lt;br /&gt;    for _ in range(self.max_nnodes):&lt;br /&gt;      row.append(False)&lt;br /&gt;      self.visited.append(False)&lt;br /&gt;    self.edges.append(row)&lt;br /&gt;    return self.nnodes-1&lt;br /&gt;&lt;br /&gt;  def add_edge(self, src, dst):&lt;br /&gt;    self.edges[src][dst] = True&lt;br /&gt;    self.edges[dst][src] = True&lt;br /&gt;    self.nedges += 1 &lt;br /&gt;&lt;br /&gt;  def del_edge(self, src, dst):&lt;br /&gt;    self.edges[src][dst] = False&lt;br /&gt;    self.edges[dst][src] = False&lt;br /&gt;    self.nedges -= 1&lt;br /&gt;&lt;br /&gt;  def place_tower(self, row, col):&lt;br /&gt;    """ Updates the graph to reflect the drawing of a tower """&lt;br /&gt;    if row &lt; 0 or col &lt; 0 or row &gt;= self.height-1 or col &gt;= self.width-1:&lt;br /&gt;      return False&lt;br /&gt;    node_num = row * self.width + col&lt;br /&gt;    if self.nodes[node_num] != self.NODE_FREE or self.nodes[node_num + 1] != self.NODE_FREE or self.nodes[node_num + self.width] != self.NODE_FREE or self.nodes[node_num + self.width + 1] != self.NODE_FREE:&lt;br /&gt;      return False&lt;br /&gt;    self.nodes[node_num] = self.NODE_OCCUPIED&lt;br /&gt;    self.nodes[node_num + 1] = self.NODE_OCCUPIED&lt;br /&gt;    self.nodes[node_num + self.width] = self.NODE_OCCUPIED&lt;br /&gt;    self.nodes[node_num + self.width + 1] = self.NODE_OCCUPIED&lt;br /&gt;    self.tower_pos.append(node_num)&lt;br /&gt;    return True&lt;br /&gt;&lt;br /&gt;  def place_tower_on_node(self, node):&lt;br /&gt;    """ Updates the graph to reflect the drawing of a tower """&lt;br /&gt;    row = node / self.width&lt;br /&gt;    col = node % self.width&lt;br /&gt;    if row &lt; 0 or col &lt; 0 or row &gt;= self.height-1 or col &gt;= self.width-1:&lt;br /&gt;      return False&lt;br /&gt;    if self.nodes[node] != self.NODE_FREE or self.nodes[node + 1] != self.NODE_FREE or self.nodes[node + self.width] != self.NODE_FREE or self.nodes[node + self.width + 1] != self.NODE_FREE:&lt;br /&gt;      return False&lt;br /&gt;    self.nodes[node] = self.NODE_OCCUPIED&lt;br /&gt;    self.nodes[node + 1] = self.NODE_OCCUPIED&lt;br /&gt;    self.nodes[node + self.width] = self.NODE_OCCUPIED&lt;br /&gt;    self.nodes[node + self.width + 1] = self.NODE_OCCUPIED&lt;br /&gt;    self.tower_pos.append(node)&lt;br /&gt;    return True&lt;br /&gt; &lt;br /&gt;  def dijkstra(self, start, end):&lt;br /&gt;    q = [(0, start)]  # Heap of (cost, path_head)&lt;br /&gt;    visited = set()&lt;br /&gt;    push, pop, add = heapq.heappush, heapq.heappop, visited.add&lt;br /&gt;    while True:&lt;br /&gt;      if 0 == len(q):&lt;br /&gt;        return False&lt;br /&gt;      (cost, v1) = pop(q)&lt;br /&gt;      if v1 not in visited and self.nodes[v1] != self.NODE_OCCUPIED:&lt;br /&gt;        add(v1)&lt;br /&gt;        if v1 == end:&lt;br /&gt;          return cost&lt;br /&gt;        for v2 in range(self.nnodes):&lt;br /&gt;          if True == self.edges[v1][v2]:&lt;br /&gt;            if v2 not in visited:&lt;br /&gt;              push(q, (cost + 1, v2))&lt;br /&gt;&lt;br /&gt;  def cc(self):&lt;br /&gt;    visited = self.visited&lt;br /&gt;    for count in range(self.nnodes):&lt;br /&gt;      if (self.visited[count] == False):&lt;br /&gt;        yield self.dfs(count)&lt;br /&gt;&lt;br /&gt;  def dfs(self, start, target):&lt;br /&gt;    return self.__run_dfs(start, target)&lt;br /&gt;&lt;br /&gt;  def __run_dfs(self, i, target):&lt;br /&gt;    self.visited[i] = True&lt;br /&gt;    if i == target: return True&lt;br /&gt;    for neighbor in range(self.nnodes):&lt;br /&gt;      if True == self.edges[i][neighbor] and False == self.visited[neighbor]:&lt;br /&gt;        self.__run_dfs(neighbor, target)&lt;br /&gt;    return False&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;strong&gt;bestpath.py&lt;/strong&gt;&lt;br /&gt;&lt;pre class="brush: python"&gt;import sys&lt;br /&gt;import random&lt;br /&gt;from dtd_graph import Graph&lt;br /&gt;&lt;br /&gt;def scramble(g):&lt;br /&gt;  best_path = 0&lt;br /&gt;  nodes = []&lt;br /&gt;  &lt;br /&gt;  while True:&lt;br /&gt;    g.reset()&lt;br /&gt;    for _ in range(68):&lt;br /&gt;      node = random.randint(0, g.width * g.height)&lt;br /&gt;      while False == g.place_tower_on_node(node):&lt;br /&gt;        node = random.randint(0, g.width * g.height)&lt;br /&gt;&lt;br /&gt;    vpath = g.dijkstra(g.vstart, g.vend)&lt;br /&gt;    hpath = g.dijkstra(g.hstart, g.hend)&lt;br /&gt;    if False == vpath or False == hpath:&lt;br /&gt;      continue&lt;br /&gt;    if best_path &lt; vpath + hpath:&lt;br /&gt;      best_path = vpath + hpath&lt;br /&gt;      g.display()&lt;br /&gt;      print vpath, hpath&lt;br /&gt;&lt;br /&gt;def run():&lt;br /&gt;  g_width = 26&lt;br /&gt;  g_height = 22&lt;br /&gt;  g = Graph(g_width, g_height)&lt;br /&gt;  #g.display_edges()&lt;br /&gt;  scramble(g)&lt;br /&gt;&lt;br /&gt;def main():&lt;br /&gt;  if len(sys.argv) != 1:&lt;br /&gt;    print "usage:  python", sys.argv[0], "&lt; filename"&lt;br /&gt;    return -1&lt;br /&gt;  else:&lt;br /&gt;    run()&lt;br /&gt;&lt;br /&gt;if __name__ == '__main__':&lt;br /&gt;  sys.exit(main())&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155973909202515480-2497035595363589913?l=jdhsu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jdhsu.blogspot.com/feeds/2497035595363589913/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jdhsu.blogspot.com/2010/01/desktop-tower-defense-my-computer-build.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/2497035595363589913'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/2497035595363589913'/><link rel='alternate' type='text/html' href='http://jdhsu.blogspot.com/2010/01/desktop-tower-defense-my-computer-build.html' title='Desktop Tower Defense: My Computer Build'/><author><name>jd hsu</name><uri>http://www.blogger.com/profile/07390042970137791452</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_gvQh1p7tre8/S0wk8rSPWBI/AAAAAAAAAAM/Q4KbTqG8Tf0/S220/DSC00577.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2155973909202515480.post-2482732006174252485</id><published>2010-01-17T19:42:00.000-08:00</published><updated>2010-01-17T20:03:11.747-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='C'/><category scheme='http://www.blogger.com/atom/ns#' term='gdbm'/><category scheme='http://www.blogger.com/atom/ns#' term='Graph Library'/><title type='text'>On-Disk C Graph Library</title><content type='html'>This is a small on-disk graph library, which I find very convenient whenever I am dealing with huge graphs.&lt;br /&gt;Back a year or so ago, I could not find any disk-based graph library, which was the motivation for the present code.&lt;br /&gt;It uses gdbm for persistent data storage, meaning that the graph size is only limited by your storage space, but beware, as operations on huge graphs will take a long time.&lt;br /&gt;You should either compile this code with a Linux variant or compatible.&lt;br /&gt;Adjacency lists are kept sorted and the look-up uses a binary search.&lt;br /&gt;I needed to create a Graph that could scale and run a DFS on it, that's the reason why DFS is the only algorithm available to walk the graph.&lt;br /&gt;There is this function &lt;i&gt;bpresearch&lt;/i&gt; taken from Minix.&lt;br /&gt;Everything else is &lt;a href="http://www.gnu.org/licenses/gpl-3.0.html"&gt;GPLv3&lt;/a&gt;.&lt;br /&gt;I've used it on several projects successfully, but it is alpha code so use carefully if at all, and you should probably rewrite &lt;i&gt;bpresearch&lt;/i&gt; if you want to work with a GPLv3 licence.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;graph.h&lt;/strong&gt;:&lt;br /&gt;&lt;pre class="brush: cpp"&gt;#define GRAPH_EDGES_FILE "graph_edges.dbf"&lt;br /&gt;#define GRAPH_VISITED_FILE "graph_visited.dbf"&lt;br /&gt;typedef struct graph{&lt;br /&gt;  unsigned long int nnodes;&lt;br /&gt;  unsigned long int nedges;&lt;br /&gt;  GDBM_FILE edges;&lt;br /&gt;  GDBM_FILE visited;&lt;br /&gt;  size_t size; // size of one data element&lt;br /&gt;  int (*compar)(const void* n1, const void* n2);&lt;br /&gt;} graph_t;&lt;br /&gt;/* Call this function if you want to create a new graph */&lt;br /&gt;int graph_init (graph_t* g_ptr, const char* edges_file, size_t size, int (*compar)(const void* n1, const void* n2));&lt;br /&gt;int graph_restore (graph_t* g_ptr, const char* edges_file, size_t size, int (*compar)(const void* n1, const void* n2));&lt;br /&gt;void graph_close (graph_t* g_ptr);&lt;br /&gt;int graph_node_exists (graph_t* g_ptr, void* n);&lt;br /&gt;int graph_edge_exists (graph_t* g_ptr, void* n1, void* n2);&lt;br /&gt;int graph_add_node (graph_t* g_ptr, void* n);&lt;br /&gt;int graph_add_edge (graph_t* g_ptr, void* n1, void* n2);&lt;br /&gt;int graph_del_node (graph_t* g_ptr, void* n);&lt;br /&gt;int graph_del_edge (graph_t* g_ptr, void* n1, void* n2);&lt;br /&gt;void* graph_first_node (graph_t* g_ptr);&lt;br /&gt;void* graph_next_node (graph_t* g_ptr, void* n);&lt;br /&gt;/* calls @process on every node of the search */&lt;br /&gt;void graph_dfs (graph_t* g_ptr, const void* n, void (*process)(const void* n));&lt;br /&gt;/* calls @process_first on the first node of every component and calls @process_all on every node. The first node is passed to both functions */&lt;br /&gt;void graph_cc (graph_t* g_ptr, void (*process_first)(const void* n), void (*process_all)(const void* n));&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;strong&gt;graph.c&lt;/strong&gt;:&lt;br /&gt;&lt;pre class="brush: cpp"&gt;#include &amp;lt;stdio.h&amp;gt;&lt;br /&gt;#include &amp;lt;stdlib.h&amp;gt;&lt;br /&gt;#include &amp;lt;string.h&amp;gt;&lt;br /&gt;#include &amp;lt;unistd.h&amp;gt;&lt;br /&gt;#include &amp;lt;gdbm.h&amp;gt;&lt;br /&gt;#include "graph.h"&lt;br /&gt;int graph_init (graph_t* g_ptr, const char* edges_file, size_t size, int (*compar)(const void* n1, const void* n2)){&lt;br /&gt;  g_ptr-&gt;nnodes = 0L;&lt;br /&gt;  g_ptr-&gt;nedges = 0L;&lt;br /&gt;  g_ptr-&gt;edges = gdbm_open ((char*)edges_file, 0/* use default */, GDBM_NEWDB|GDBM_SYNC, 0600, 0);&lt;br /&gt;  g_ptr-&gt;visited = gdbm_open (GRAPH_VISITED_FILE, 0/* use default */, GDBM_NEWDB|GDBM_SYNC, 0600, 0);&lt;br /&gt;  if (!g_ptr-&gt;edges || !g_ptr-&gt;visited){&lt;br /&gt;    fprintf (stderr, "(EE) Failed creating graph file(s)\n");&lt;br /&gt;    return (-1);&lt;br /&gt;  }&lt;br /&gt;  g_ptr-&gt;size = size;&lt;br /&gt;  g_ptr-&gt;compar = compar;&lt;br /&gt;  return 0;&lt;br /&gt;}&lt;br /&gt;int graph_restore (graph_t* g_ptr, const char* edges_file, size_t size, int (*compar)(const void* n1, const void* n2)){&lt;br /&gt;  g_ptr-&gt;edges = gdbm_open ((char*)edges_file, 0/* use default */, GDBM_WRITER|GDBM_SYNC, 0600, 0);&lt;br /&gt;  g_ptr-&gt;visited = gdbm_open (GRAPH_VISITED_FILE, 0/* use default */, GDBM_NEWDB|GDBM_SYNC, 0600, 0);&lt;br /&gt;  g_ptr-&gt;size = size;&lt;br /&gt;  if (!g_ptr-&gt;edges || !g_ptr-&gt;visited){&lt;br /&gt;    fprintf (stderr, "(EE) Failed opening graph file(s)\n");&lt;br /&gt;    return (-1);&lt;br /&gt;  }&lt;br /&gt;  unsigned long int nnodes = 0L;&lt;br /&gt;  unsigned long int nedges = 0L;&lt;br /&gt;  datum key = gdbm_firstkey (g_ptr-&gt;edges);&lt;br /&gt;  datum data;&lt;br /&gt;  while (key.dptr){&lt;br /&gt;    data = gdbm_fetch (g_ptr-&gt;edges, key);&lt;br /&gt;    datum nextkey = gdbm_nextkey (g_ptr-&gt;edges, key);&lt;br /&gt;    nnodes += 1;&lt;br /&gt;    nedges += data.dsize / g_ptr-&gt;size;&lt;br /&gt;    free (key.dptr);&lt;br /&gt;    free (data.dptr);&lt;br /&gt;    key = nextkey;&lt;br /&gt;  }&lt;br /&gt;  g_ptr-&gt;nnodes = nnodes;&lt;br /&gt;  g_ptr-&gt;nedges = nedges;&lt;br /&gt;  g_ptr-&gt;size = size;&lt;br /&gt;  g_ptr-&gt;compar = compar;&lt;br /&gt;  return 0;&lt;br /&gt;}&lt;br /&gt;void graph_close (graph_t* g_ptr){&lt;br /&gt;  gdbm_close (g_ptr-&gt;edges);&lt;br /&gt;  gdbm_close (g_ptr-&gt;visited);&lt;br /&gt;  unlink (GRAPH_VISITED_FILE);&lt;br /&gt;}&lt;br /&gt;int graph_node_exists (graph_t* g_ptr, void* n){&lt;br /&gt;  datum key = {(char*)n, g_ptr-&gt;size};&lt;br /&gt;  return gdbm_exists (g_ptr-&gt;edges, key);&lt;br /&gt;}&lt;br /&gt;int graph_edge_exists (graph_t* g_ptr, void* n1, void* n2){&lt;br /&gt;  if (!graph_node_exists (g_ptr, n1) || !graph_node_exists (g_ptr, n2))&lt;br /&gt;    return 0;&lt;br /&gt;  int result;&lt;br /&gt;  datum key = {(char*)n1, g_ptr-&gt;size};&lt;br /&gt;  datum data = gdbm_fetch (g_ptr-&gt;edges, key);&lt;br /&gt;  void* edges = (void*)data.dptr;&lt;br /&gt;  size_t len = data.dsize / g_ptr-&gt;size;&lt;br /&gt;  if (bsearch (n2, edges, len, g_ptr-&gt;size, g_ptr-&gt;compar))&lt;br /&gt;    result = 1;&lt;br /&gt;  else&lt;br /&gt;    result = 0;&lt;br /&gt;  if (data.dsize)&lt;br /&gt;    free (edges);&lt;br /&gt;  return result;&lt;br /&gt;}&lt;br /&gt;int graph_add_node (graph_t* g_ptr, void* n){&lt;br /&gt;  datum key = {(char*)n, g_ptr-&gt;size};&lt;br /&gt;  datum data = {"", 1};&lt;br /&gt;  int result = gdbm_store (g_ptr-&gt;edges, key, data, GDBM_INSERT);&lt;br /&gt;  if (0 == result)&lt;br /&gt;    g_ptr-&gt;nnodes += 1;&lt;br /&gt;  return result;&lt;br /&gt;}&lt;br /&gt;/*&lt;br /&gt; * Searches an array of nmemb objects, the initial&lt;br /&gt; * member of which is pointed to by base, for a member that would follow the&lt;br /&gt; * object pointed to by key.&lt;br /&gt; * Modified from Minix bsearch&lt;br /&gt; * (c) copyright 1987 by the Vrije Universiteit, Amsterdam, The Netherlands.&lt;br /&gt; * See the copyright notice in the ACK home directory, in the file "Copyright".&lt;br /&gt; * $Header: /opt/proj/minix/cvsroot/src/lib/ansi/bsearch.c,v 1.1.1.1 2005/04/21 14:56:04 beng Exp $&lt;br /&gt; */&lt;br /&gt;static void* bpresearch (const void* key, const void* base, size_t nmemb, size_t size, int (*compar)(const void*, const void*)){&lt;br /&gt;  register const void* mid_point = base;&lt;br /&gt;  int moreflag = 0;&lt;br /&gt;  while (nmemb &gt; 0){&lt;br /&gt;    mid_point = (char*)base + size * (nmemb &gt;&gt; 1);&lt;br /&gt;    if (compar (key, mid_point) &gt; 0){&lt;br /&gt;      base  = (char*)mid_point + size;&lt;br /&gt;      nmemb = (nmemb - 1) &gt;&gt; 1;&lt;br /&gt;      moreflag = 1;&lt;br /&gt;    }else{&lt;br /&gt;      nmemb &gt;&gt;= 1;&lt;br /&gt;      moreflag = 0;&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;  return (void*)((char*)mid_point + moreflag * size);&lt;br /&gt;}&lt;br /&gt;static void insert_sorted (void* dest, const void* key, const void* base, size_t nmemb, size_t size, int (*compar)(const void*, const void*)){&lt;br /&gt;  void* follow = bpresearch (key, base, nmemb, size, compar);&lt;br /&gt;  const char* bse = base;&lt;br /&gt;  char* dst = dest;&lt;br /&gt;  size_t first_half = (char*)follow - bse;&lt;br /&gt;  size_t second_half = (bse + nmemb * size) - (char*)follow;&lt;br /&gt;  memcpy (dst, bse, first_half);&lt;br /&gt;  memcpy (dst + first_half, key, size);&lt;br /&gt;  memcpy (dst + first_half + size, bse + first_half, second_half);&lt;br /&gt;}&lt;br /&gt;int graph_add_edge (graph_t* g_ptr, void* n1, void* n2){&lt;br /&gt;  if (0 == memcmp (n1, n2, g_ptr-&gt;size)&lt;br /&gt;  || !graph_node_exists (g_ptr, n1) &lt;br /&gt;  || !graph_node_exists (g_ptr, n2) &lt;br /&gt;  || graph_edge_exists (g_ptr, n1, n2))&lt;br /&gt;    return -1;&lt;br /&gt;  datum key = {(char*)n1, g_ptr-&gt;size};&lt;br /&gt;  datum data = gdbm_fetch (g_ptr-&gt;edges, key);&lt;br /&gt;  void* edges = (void*)data.dptr;&lt;br /&gt;  size_t new_size = data.dsize + g_ptr-&gt;size;&lt;br /&gt;  char new_edges[new_size];&lt;br /&gt;  insert_sorted (new_edges, n2, edges, data.dsize/g_ptr-&gt;size, g_ptr-&gt;size, g_ptr-&gt;compar);&lt;br /&gt;  datum new_data = {new_edges, new_size};&lt;br /&gt;  if (data.dsize)&lt;br /&gt;    free (edges);&lt;br /&gt;  int result = gdbm_store (g_ptr-&gt;edges, key, new_data, GDBM_REPLACE);&lt;br /&gt;  if (0 == result)&lt;br /&gt;    g_ptr-&gt;nedges += 1;&lt;br /&gt;  return result;&lt;br /&gt;}&lt;br /&gt;int graph_del_node (graph_t* g_ptr, void* n){&lt;br /&gt;  if (!graph_node_exists (g_ptr, n))&lt;br /&gt;    return -1;&lt;br /&gt;  datum key = {(char*)n, g_ptr-&gt;size};&lt;br /&gt;  int result = gdbm_delete (g_ptr-&gt;edges, key);&lt;br /&gt;  if (0 == result)&lt;br /&gt;    g_ptr-&gt;nnodes -= 1;&lt;br /&gt;  return result;&lt;br /&gt;}&lt;br /&gt;static void remove_sorted (void* dest, const void* key, const void* base, size_t nmemb, size_t size, int (*compar)(const void*, const void*)){&lt;br /&gt;  void* ptr = bsearch (key, base, nmemb, size, compar);&lt;br /&gt;  if (!ptr) return;&lt;br /&gt;  const char* bse = base;&lt;br /&gt;  char* dst = dest;&lt;br /&gt;  size_t first_half = (char*)ptr - bse;&lt;br /&gt;  size_t second_half = (bse + nmemb * size) - ((char*)ptr + size);&lt;br /&gt;  memcpy (dst, bse, first_half);&lt;br /&gt;  memcpy (dst + first_half, bse + first_half + size, second_half);&lt;br /&gt;}&lt;br /&gt;int graph_del_edge (graph_t* g_ptr, void* n1, void* n2){&lt;br /&gt;  if (!graph_edge_exists (g_ptr, n1, n2))&lt;br /&gt;    return -1;&lt;br /&gt;  datum key = {(char*)n1, g_ptr-&gt;size};&lt;br /&gt;  datum data = gdbm_fetch (g_ptr-&gt;edges, key);&lt;br /&gt;  void* edges = (void*)data.dptr;&lt;br /&gt;  size_t new_size = data.dsize - g_ptr-&gt;size;&lt;br /&gt;  datum new_data;&lt;br /&gt;  if (0 == new_size){&lt;br /&gt;    new_data.dptr = "";&lt;br /&gt;    new_data.dsize = 1;&lt;br /&gt;  }else{&lt;br /&gt;    char new_edges[new_size];&lt;br /&gt;    remove_sorted (new_edges, n2, edges, data.dsize/g_ptr-&gt;size, g_ptr-&gt;size, g_ptr-&gt;compar);&lt;br /&gt;    new_data.dptr = new_edges;&lt;br /&gt;    new_data.dsize = new_size;&lt;br /&gt;  }&lt;br /&gt;  if (data.dsize)&lt;br /&gt;    free (edges);&lt;br /&gt;  int result = gdbm_store (g_ptr-&gt;edges, key, new_data, GDBM_REPLACE);&lt;br /&gt;  if (0 == result)&lt;br /&gt;    g_ptr-&gt;nedges -= 1;&lt;br /&gt;  return result;&lt;br /&gt;}&lt;br /&gt;void* graph_first_node (graph_t* g_ptr){&lt;br /&gt;  datum key = gdbm_firstkey (g_ptr-&gt;edges);&lt;br /&gt;  return (void*)key.dptr;&lt;br /&gt;}&lt;br /&gt;void* graph_next_node (graph_t* g_ptr, void* n){&lt;br /&gt;  datum prevkey = {(char*)n, g_ptr-&gt;size};&lt;br /&gt;  datum key = gdbm_nextkey (g_ptr-&gt;edges, prevkey);&lt;br /&gt;  return (void*)key.dptr;&lt;br /&gt;}&lt;br /&gt;static int is_visited (graph_t* g_ptr, const void* n){&lt;br /&gt;  datum key = {(char*)n, g_ptr-&gt;size};&lt;br /&gt;  return gdbm_exists (g_ptr-&gt;visited, key);&lt;br /&gt;}&lt;br /&gt;static void mark_visited (graph_t* g_ptr, const void* n){&lt;br /&gt;  datum key = {(char*)n, g_ptr-&gt;size};&lt;br /&gt;  datum data = {"", 1};&lt;br /&gt;  gdbm_store (g_ptr-&gt;visited, key, data, GDBM_REPLACE);&lt;br /&gt;}&lt;br /&gt;static void reset_visited (graph_t* g_ptr){&lt;br /&gt;  gdbm_close (g_ptr-&gt;visited);&lt;br /&gt;  g_ptr-&gt;visited = gdbm_open (GRAPH_VISITED_FILE, 0/* use default */, GDBM_NEWDB|GDBM_SYNC, 0600, 0);&lt;br /&gt;}&lt;br /&gt;static void dfs (graph_t* g_ptr, const void* n, void (*process)(const void* n)){&lt;br /&gt;  mark_visited (g_ptr, n);&lt;br /&gt;  process (n);&lt;br /&gt;  datum key = {(char*)n, g_ptr-&gt;size};&lt;br /&gt;  datum data = gdbm_fetch (g_ptr-&gt;edges, key);&lt;br /&gt;  void* edges = (void*)data.dptr;&lt;br /&gt;  size_t len = data.dsize / g_ptr-&gt;size;&lt;br /&gt;  for (int i=0; i&amp;lt;len; i+=1){&lt;br /&gt;    if (!is_visited (g_ptr, edges)){&lt;br /&gt;      dfs (g_ptr, edges, process);&lt;br /&gt;      edges = (char*)edges + g_ptr-&gt;size;&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;  if (data.dptr)&lt;br /&gt;    free (data.dptr);&lt;br /&gt;}&lt;br /&gt;void graph_dfs (graph_t* g_ptr, const void* n, void (*process)(const void* n)){&lt;br /&gt;  reset_visited (g_ptr);&lt;br /&gt;  dfs (g_ptr, n, process);&lt;br /&gt;}&lt;br /&gt;/* Iterate the graph and find connected components&lt;br /&gt;The function @process_first is called on the first node of a connected component - this node can be any node of the component.&lt;br /&gt;@process_all is called on every other node&lt;br /&gt;*/&lt;br /&gt;void graph_cc (graph_t* g_ptr, void (*process_first)(const void* n), void (*process_all)(const void* n)){&lt;br /&gt;  reset_visited (g_ptr);&lt;br /&gt;  void* node = graph_first_node (g_ptr);&lt;br /&gt;  while (node){&lt;br /&gt;    void* nextnode = graph_next_node (g_ptr, node);&lt;br /&gt;    if (!is_visited (g_ptr, node)){&lt;br /&gt;      process_first (node);&lt;br /&gt;      dfs (g_ptr, node, process_all);&lt;br /&gt;    }&lt;br /&gt;    free (node);&lt;br /&gt;    node = nextnode;&lt;br /&gt;  }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155973909202515480-2482732006174252485?l=jdhsu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jdhsu.blogspot.com/feeds/2482732006174252485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jdhsu.blogspot.com/2010/01/on-disk-c-graph-library.html#comment-form' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/2482732006174252485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/2482732006174252485'/><link rel='alternate' type='text/html' href='http://jdhsu.blogspot.com/2010/01/on-disk-c-graph-library.html' title='On-Disk C Graph Library'/><author><name>jd hsu</name><uri>http://www.blogger.com/profile/07390042970137791452</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_gvQh1p7tre8/S0wk8rSPWBI/AAAAAAAAAAM/Q4KbTqG8Tf0/S220/DSC00577.JPG'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-2155973909202515480.post-5960009476386449817</id><published>2010-01-11T23:38:00.001-08:00</published><updated>2010-01-12T21:20:36.453-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='PHP'/><category scheme='http://www.blogger.com/atom/ns#' term='directory scan'/><category scheme='http://www.blogger.com/atom/ns#' term='iterator'/><category scheme='http://www.blogger.com/atom/ns#' term='directory listing'/><title type='text'>9 Ways to Iterate Over a Directory in PHP</title><content type='html'>Recently, I bent over trying to optimize a straight file lookup in PHP. Like most things PHP, you have a wide variety of choices to iterate over a directory. This post goes over their respective efficiencies and implementations. The results presented here will vary depending on your host environment.&lt;br /&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;/div&gt;&lt;div style="font-family: Arial,Helvetica,sans-serif;"&gt;&lt;b&gt;&lt;span style="font-size: large;"&gt;Problem Description:&lt;/span&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;The problem is to quickly match directory entries to a given string using PHP. The string in question is usually part of a file name or even a regular expression. Some of the code presented here does not allow regular expression matching, so take this comparison with a grain of salt. For the comparison itself, I am using the &lt;i&gt;strpos&lt;/i&gt; function. This simply means that we will be looking for substrings of a filename, which is usually good enough when looking for a file, auto-completing a name...etc &lt;b&gt;&amp;nbsp;&lt;/b&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Approaches:&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Standard scan&lt;/i&gt;&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$dirp = opendir('.');&lt;br /&gt;if($dirp){&lt;br /&gt;  $counter = 0;&lt;br /&gt;  while(FALSE !== ($file = readdir($dirp))){&lt;br /&gt;    if(FALSE !== strpos($file, $query)){&lt;br /&gt;      $counter += 1;&lt;br /&gt;      if($counter &amp;gt; $max_hits)&lt;br /&gt;        break;&lt;br /&gt;    }&lt;br /&gt;  }&lt;br /&gt;  closedir($dirp);&lt;br /&gt;}&amp;nbsp;&lt;/pre&gt;&lt;pre class="brush: php;"&gt;&amp;nbsp;&lt;/pre&gt;&lt;i&gt;Pros&lt;/i&gt;: Scales well&lt;br /&gt;This is the standard way of iterating over a directory.&lt;br /&gt;It uses little memory.&lt;br /&gt;This code is intuitive,&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Cons&lt;/i&gt;: It is awfully close to the metal&lt;i&gt;.&lt;/i&gt;&lt;br /&gt;The fact that you need to close the directory handle is another inconvenience.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;The glob function&lt;/b&gt;:&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$files = glob("*$query*", GLOB_NOSORT);&lt;/pre&gt;&lt;br /&gt;&lt;i&gt;Pros:&lt;/i&gt; Very handy function that allows for compact code.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Cons:&lt;/i&gt; It does not scale, therefore avoid for anything but toying around, or if you're absolutely certain that your directory contains few files.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;SPL GlobIterator&lt;/b&gt;:&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$iterator = new GlobIterator("*$query*", FilesystemIterator::KEY_AS_FILENAME);&lt;br /&gt;$counter = 0;&lt;br /&gt;foreach($iterator as $item){&lt;br /&gt;&amp;nbsp; $counter += 1;&lt;br /&gt;&amp;nbsp; if($counter &amp;gt; $max_hits)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;i&gt;Note:&lt;/i&gt; The GlobIterator class is a PHP 5.3.0 addition to the Standard PHP Library.&lt;br /&gt;&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;br /&gt;&lt;i&gt;Pros:&lt;/i&gt; Scales well. Convenient and efficient.&lt;br /&gt;The code is compact and readable.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Cons:&lt;/i&gt; Forces a OO approach&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;SPL DirectoryIterator&lt;/b&gt;:&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$iterator = new DirectoryIterator(".");&lt;br /&gt;$counter = 0;&lt;br /&gt;if($iterator){&lt;br /&gt;&amp;nbsp; foreach($iterator as $item){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(FALSE !== strpos($item-&amp;gt;getFilename(), $query)){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $counter += 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if($counter &amp;gt; $max_hits)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;i&gt;Note:&lt;/i&gt; This is the SPL scandir.&lt;br /&gt;This class was introduced earlier than GlobIterator.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Pros:&lt;/i&gt; Scales well&lt;br /&gt;Besides being much more convenient to use than a regular scandir, it also gives access to individual &lt;a href="http://www.php.net/manual/en/class.splfileinfo.php"&gt;files information&lt;/a&gt;, provides good caching, and is easier to modify.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Cons:&lt;/i&gt; Slower than GlobIterator&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Extending SPL DirectoryIterator&lt;/b&gt;:&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;class DirectoryReader extends DirectoryIterator{&lt;br /&gt;&amp;nbsp; // constructor.. duh!&lt;br /&gt;&amp;nbsp; function __construct($path, $query){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $this-&amp;gt;query = $query;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; /*** pass the $path off to the parent class constructor ***/&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; parent::__construct($path);&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; /*** members are only valid if they match the query ***/&lt;br /&gt;&amp;nbsp; function valid(){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(parent::valid()){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(FALSE === strpos(parent::getFilename(), $this-&amp;gt;query)){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; parent::next();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return $this-&amp;gt;valid();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; return TRUE;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; return FALSE;&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; private $query = NULL;&lt;br /&gt;} // end class&lt;br /&gt;$iterator = new DirectoryReader(".", $query);&lt;br /&gt;$counter = 0;&lt;br /&gt;while($iterator-&amp;gt;valid()){&lt;br /&gt;&amp;nbsp; $counter += 1;&lt;br /&gt;&amp;nbsp; if($counter &amp;gt; $max_hits)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;br /&gt;&amp;nbsp; $iterator-&amp;gt;next();&lt;br /&gt;}&lt;/pre&gt;&lt;i&gt;Note&lt;/i&gt;: This is the OO way of customizing your iterator.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Pros&lt;/i&gt;: Scales well&lt;br /&gt;Readable, makes use of iterators&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Cons&lt;/i&gt;: Slow&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Flat Index File&lt;/b&gt;:&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$fp = fopen('index.txt', 'r');&lt;br /&gt;$counter = 0;&lt;br /&gt;if($fp){&lt;br /&gt;&amp;nbsp; while($line = fgets($fp)){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(FALSE !== strpos($line, $query)){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $counter += 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if($counter &amp;gt; $max_hits)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; fclose($fp);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;i&gt;Note:&lt;/i&gt; This is definitely an optimization trick.&lt;br /&gt;Internally, PHP uses the php_stream* functions throughout, and is eventually bound to issuing a multitude of system calls to navigating the file-system data structure, which is &lt;i&gt;likely&lt;/i&gt;, in many instances, to be slower than reading through a single file.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Pros:&lt;/i&gt; Scales well.&lt;br /&gt;Gets rid of a lot of overhead.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Cons:&lt;/i&gt; Requires mostly static data as the index needs to be rebuilt on data change.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Forcing Cache on Flat index file&lt;/b&gt;:&lt;/i&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$entries = file('index.txt', FILE_SKIP_EMPTY_LINES);&lt;br /&gt;$counter = 0;&lt;br /&gt;if($entries){&lt;br /&gt;&amp;nbsp; foreach($entries as $entry){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(FALSE !== strpos($entry, $query)){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $counter += 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if($counter &amp;gt; $max_hits)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;} &lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;i&gt;Pros&lt;/i&gt;: Gets rid of a lot of overhead. &lt;br /&gt;&lt;br /&gt;&lt;i&gt;Cons&lt;/i&gt;: Does not scale.&lt;br /&gt;Slower than 6). &lt;br /&gt;Trying to outsmart your compiler is usually not a good idea.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;&lt;b&gt;Flat Index File and PREG&lt;/b&gt;:&lt;/i&gt; &lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;&amp;lt;/code&amp;gt;"&gt;$fp = fopen('index.txt', 'r');&lt;br /&gt;$counter = 0;&lt;br /&gt;if($fp){&lt;br /&gt;&amp;nbsp; while($line = fgets($fp)){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(0 !== preg_match("/$query/", $line)){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $counter += 1;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; if($counter &amp;gt; $max_hits)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; break;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; fclose($fp);&lt;br /&gt;}&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;i&gt;Note&lt;/i&gt;: The difference here is in the matching function.&lt;br /&gt;It gives more flexibility, while keeping a cache of &lt;a href="http://www.php.net/manual/en/intro.pcre.php"&gt;compiled regular expressions&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;&lt;i&gt;Search Tree&lt;/i&gt;&lt;/b&gt;:&lt;br /&gt;Create look-up directory:&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;if(2 !== $argc){&lt;br /&gt;&amp;nbsp; fwrite(STDERR, "Usage: {$argv[0]} &amp;lt;source directory&amp;gt;\n");&lt;br /&gt;&amp;nbsp; exit(1);&lt;br /&gt;}&amp;nbsp;&lt;br /&gt;function process_substr($str, $orig_str){&lt;br /&gt;&amp;nbsp; $base_dir = getcwd();&lt;br /&gt;&amp;nbsp; $chars = preg_split('//', $str, -1, PREG_SPLIT_NO_EMPTY);&lt;br /&gt;&amp;nbsp; $path = implode('/', $chars);&lt;br /&gt;&amp;nbsp; @mkdir($path, 0777, TRUE);&lt;br /&gt;&amp;nbsp; chdir($path);&lt;br /&gt;&amp;nbsp; if(!file_exists('results.dat'))&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $results = array();&lt;br /&gt;&amp;nbsp; else&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $results = unserialize(file_get_contents('results.dat'));&lt;br /&gt;&amp;nbsp; if(!isset($results[$orig_str]))&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $results[$orig_str] = TRUE;&lt;br /&gt;&amp;nbsp; file_put_contents("results.dat", serialize($results));&lt;br /&gt;&amp;nbsp; chdir($base_dir);&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;function substrings($str, $callback, $min_len=3){&lt;br /&gt;&amp;nbsp; $str_len = strlen($str);&lt;br /&gt;&amp;nbsp; for($len = $min_len; $len &amp;lt; $str_len-$min_len; $len += 1){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for($start = 0; $start &amp;lt;= $str_len-$len; $start += 1){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $callback(substr($str, $start, $len), $str);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;}&lt;br /&gt;$source_dir = $argv[1];&lt;br /&gt;$dirp = opendir($source_dir);&lt;br /&gt;if($dirp){&lt;br /&gt;&amp;nbsp; while(FALSE !== ($entry = readdir($dirp))){&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; if(strlen($entry) &amp;gt;= 3)&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; substrings($entry, 'process_substr');&lt;br /&gt;&amp;nbsp; }&lt;br /&gt;&amp;nbsp; closedir($dirp);&lt;br /&gt;}else{&lt;br /&gt;&amp;nbsp; fwrite(STDERR, "Cannot open supplied directory\n");&lt;br /&gt;&amp;nbsp; exit(2);&lt;br /&gt;}&lt;/pre&gt;&lt;br /&gt;Look up a string:&lt;br /&gt;&lt;pre class="brush: php;"&gt;$query = 'foo';&lt;br /&gt;$chars = preg_split('//', $query, -1, PREG_SPLIT_NO_EMPTY);&lt;br /&gt;$path = implode('/', $chars);&lt;br /&gt;$counter = count(unserialize(file_get_contents("$path/results.dat")));&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;i&gt;Notes&lt;/i&gt;: In this example, the Search Tree is stored straight to the file system.&lt;br /&gt;There is nothing wrong with this but you have to be very careful since there are some gotchas.&lt;br /&gt;Prefer a DBMS in real implementations.&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Pros&lt;/i&gt;: Very fast.&lt;br /&gt;Scales well.&lt;br /&gt;&lt;i&gt;&amp;nbsp;&lt;/i&gt;&lt;br /&gt;&lt;i&gt;Cons&lt;/i&gt;: Takes a lot of space.&lt;br /&gt;Requires mostly static data.&lt;br /&gt;Takes some effort to set up.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Some Timings:&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Tests are run using PHP 6.0.0 built: Jan 11 2010 19:09:25, Linux and ReiserFS v3 partition&lt;br /&gt;&lt;br /&gt;&lt;b&gt;- For a directory containing&amp;nbsp; 125833 entries&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$max_hits = 1000000;&lt;br /&gt;$query = 'foo';&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Opendir: 23 matches in 0.22206497192383 us&lt;br /&gt;Glob: 23 matches in 0.091760158538818 us&lt;br /&gt;GlobIterator: 23 matches in 0.091748952865601 us&lt;br /&gt;DirectoryReader: 23 matches in 0.58006715774536 us&lt;br /&gt;DirectoryIterator: 23 matches in 0.27896213531494 us&lt;br /&gt;Flat File: 23 matches in 0.12962293624878 us&lt;br /&gt;Cached Flat File: 23 matches in 0.13482689857483 us&lt;br /&gt;Flat File and PREG: 23 matches in 0.30407810211182 us&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$max_hits = 11;&lt;br /&gt;$query = 'foo';&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Opendir: 11 matches in 0.076170921325684 us&lt;br /&gt;Glob: 23 matches in 0.092523097991943 us&lt;br /&gt;GlobIterator: 11 matches in 0.09209680557251 us&lt;br /&gt;DirectoryReader: 11 matches in 0.19307088851929 us&lt;br /&gt;DirectoryIterator: 11 matches in 0.092113971710205 us&lt;br /&gt;Flat File: 11 matches in 0.025264978408813 us&lt;br /&gt;Cached Flat File: 11 matches in 0.079818964004517 us&lt;br /&gt;Flat File and PREG: 11 matches in 0.060965061187744 us&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$max_hits = 11;&lt;br /&gt;$query = 'foo..............................';&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Opendir: 0 matches in 0.17899489402771 us&lt;br /&gt;Glob: 0 matches in 0.10579991340637 us&lt;br /&gt;GlobIterator: 0 matches in 0.10554003715515 us&lt;br /&gt;DirectoryReader: 0 matches in 0.58912897109985 us&lt;br /&gt;DirectoryIterator: 0 matches in 0.25362181663513 us&lt;br /&gt;Flat File: 0 matches in 0.12676692008972 us&lt;br /&gt;Cached Flat File: 0 matches in 0.1414749622345 us&lt;br /&gt;Flat File and PREG: 0 matches in 0.31661105155945 us&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$max_hits = 501;&lt;br /&gt;$query = 'san';&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Opendir: 501 matches in 0.18119215965271 us&lt;br /&gt;Glob: 646 matches in 0.095612049102783 us&lt;br /&gt;GlobIterator: 501 matches in 0.09652304649353 us&lt;br /&gt;DirectoryReader: 501 matches in 0.40601682662964 us&lt;br /&gt;DirectoryIterator: 501 matches in 0.21287703514099 us&lt;br /&gt;Flat File: 501 matches in 0.083136081695557 us&lt;br /&gt;Cached Flat File: 501 matches in 0.1099488735199 us&lt;br /&gt;Flat File and PREG: 501 matches in 0.21364402770996 us&lt;br /&gt;&lt;br /&gt;&lt;b&gt;- For a directory containing 4869 entries&lt;/b&gt;&lt;br /&gt;&lt;br /&gt;&lt;pre class="brush: php;"&gt;$max_hits = 11;&lt;br /&gt;$query = 'foo';&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;Opendir: 12 matches in 0.010089159011841 us&lt;br /&gt;Glob: 14 matches in 0.0035510063171387 us&lt;br /&gt;GlobIterator: 12 matches in 0.0045490264892578 us&lt;br /&gt;DirectoryReader: 12 matches in 0.027838945388794 us&lt;br /&gt;DirectoryIterator: 12 matches in 0.0090799331665039 us&lt;br /&gt;Flat File: 12 matches in 0.0038931369781494 us&lt;br /&gt;Cached Flat File: 12 matches in 0.0040509700775146 us&lt;br /&gt;Flat File and PREG: 12 matches in 0.029109001159668 us&lt;br /&gt;&lt;i&gt;Search Tree&lt;/i&gt;: 14 matches in 0.00045108795166016 us&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Notes:&lt;/b&gt;&lt;br /&gt;Some of the approaches cannot be tweaked (e.g. glob).&lt;br /&gt;Also, using a Search Tree is only possible when dealing with medium to small size data as the look-up tree will take a lot of space.&lt;br /&gt;The Search Tree preformance is only influenced by 1)The size of your alphabet 2)The Depth of the tree, hence the number of hits and query length are irrelevant.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: large;"&gt;&lt;b&gt;Conclusion:&lt;/b&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;If you have no idea what to pick, and if PHP 5 &amp;gt;= 5.3.0, use &lt;i&gt;SPL GlobIterator(3).&lt;/i&gt;&lt;br /&gt;The long answer is that (as it is so often the case) it depends on your requirements: number of hits, length of the input string. In order of preference,&lt;br /&gt;&lt;ul&gt;&lt;li&gt;If your data set is small, mostly static &lt;i&gt;AND&lt;/i&gt; efficiency is a issue, use a &lt;i&gt;Search Tree(9)&lt;/i&gt;&lt;/li&gt;&lt;li&gt;If your data is mostly static but too big to build a look-up tree, and the required number of hits is low, a &lt;i&gt;Flat Index File(6)&lt;/i&gt; is likely to improve lookup time.&lt;/li&gt;&lt;li&gt;In every other cases, and if available, use &lt;i&gt;SPL GlobIterator(3)&lt;/i&gt;&lt;/li&gt;&lt;li&gt;&lt;i&gt;SPL DirectoryIterator(4) &lt;/i&gt;is fine but avoid extending it&lt;/li&gt;&lt;li&gt;&lt;i&gt;Standard Scan(1)&lt;/i&gt; is fine but use it as a last resort&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;b&gt;References:&lt;/b&gt;&lt;br /&gt;&lt;a href="http://php.net/downloads.php"&gt;PHP Source Code&lt;/a&gt; &lt;br /&gt;&lt;a href="http://www.php.net/manual/en/function.scandir.php"&gt;Scandir Function&lt;/a&gt;&lt;br /&gt;&lt;a href="http://php.net/manual/en/book.spl.php"&gt;Standard PHP Library (SPL)&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.php.net/%7Ehelly/php/ext/spl/"&gt;SPL-Standard PHPLibrary&lt;/a&gt;&lt;br /&gt;&lt;a href="http://www.phpro.org/tutorials/Introduction-to-SPL.html"&gt;Introduction to Standard PHP Library (SPL)&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;i&gt;Please don't hesitate to flame on any nonsense that I may have written&lt;/i&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/2155973909202515480-5960009476386449817?l=jdhsu.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://jdhsu.blogspot.com/feeds/5960009476386449817/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://jdhsu.blogspot.com/2010/01/9-ways-to-iterate-over-directory-in-php.html#comment-form' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/5960009476386449817'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/2155973909202515480/posts/default/5960009476386449817'/><link rel='alternate' type='text/html' href='http://jdhsu.blogspot.com/2010/01/9-ways-to-iterate-over-directory-in-php.html' title='9 Ways to Iterate Over a Directory in PHP'/><author><name>jd hsu</name><uri>http://www.blogger.com/profile/07390042970137791452</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='24' height='32' src='http://4.bp.blogspot.com/_gvQh1p7tre8/S0wk8rSPWBI/AAAAAAAAAAM/Q4KbTqG8Tf0/S220/DSC00577.JPG'/></author><thr:total>3</thr:total></entry></feed>
