Putting glob() to the test
In a new NetTuts+ post, Marcus Schumann offers a quick tip: Loop Through Folders with PHP’s Glob().
Are you still using opendir() to loop through folders in PHP? Doesn’t that require a lot of repetitive code everytime you want to search a folder? Luckily, PHP’s glob() is a much smarter solution.
The glob() function is convenient but the solution using the fewest lines of code isn’t always the most efficient — if by efficient you mean fastest.
This came up in a question on Stack Overflow in January. A user asked how best to get a list of files in a directory (excluding “.” and “..” and other subdirectories) and return it as an array. Several readers offered suggestions, and out of curiosity I benchmarked all their alternatives. I ran each method 1,000 times on a directory containing about 400 files. My benchmark results ranged from 12.4 seconds down to 1.2 seconds. That’s a pretty wide spread, so it’s worth paying attention to performance as well as coding convenience. Here are the results in order from slowest to fastest method:
The first method was to use glob() to return an array, and then loop over the result to exclude directories. This was the slowest, running in 12.4 seconds.
foreach(glob('*') as $file_or_dir) {
if( !is_dir($file_or_dir) ) // is_dir will match . and ..
{
$files[] = $file_or_dir;
}
}
Next was simply using glob() without filtering directories. This ran in 8.1 seconds.
$files = glob('*');
Using glob() with the optional GLOB_NOSORT argument shows how much impact sorting has on the results. If you don’t need sorted results, it’s worthwhile to say so, because this solution ran in 6.4 seconds — nearly double the performance of the slowest method.
foreach(glob('*', GLOB_NOSORT) as $file_or_dir) {
if( !is_dir($file_or_dir) ) // is_dir will match . and ..
{
$files[] = $file_or_dir;
}
}
The scandir() function is another alternative. This ran in 6.5 seconds.
$files = scandir('.');
$result = array();
foreach ($files as $file)
{
if (($file == '.') || ($file == '..'))
{
continue;
}
$result[] = $file;
}
Next using scandir() with array_diff() to filter out the dot-directories had slightly better performance at 6.4 seconds, and this is almost as concise as using glob().
$files = array_diff(scandir('.'), array('.', '..'));
The opendir() method for which Marcus wanted to find an alternative isn’t so shabby. This ran in 5.3 seconds.
$files = array();
$dir = opendir('.');
while(($currentfile = readdir($dir)) !== false)
{
if( !is_dir($currentfile) )
{
$files[] = $currentfile;
}
}
closedir($dir);
But using glob() in a bare form with GLOB_NOSORT shows that it may have been pretty costly to loop over the results. This ran in 2.2 seconds.
$files = glob('*', GLOB_NOSORT);
Or perhaps is_dir() was the source of the performance problem, because if we use opendir() and filter results by comparing to literal dot-directory names, we get the time down to 1.2 seconds.
$files = array();
$dir = opendir('.');
while(($currentFile = readdir($dir)) !== false)
{
if ( $currentFile == '.' or $currentFile == '..' )
{
continue;
}
$files[] = $currentFile;
}
closedir($dir);
Of course it’s desirable to write concise code, but don’t assume this always equates to fast code. Rapid development and rapid code are independent goals, and you need to decide which has greater priority on a case-by-case basis.
And remember to use GLOB_NOSORT unless you actually need the list of files sorted.
Photo courtesy of Rick Audet. http://www.flickr.com/photos/spine/2425394931/ Released under Creative Commons Attribution licenses.
Leave a comment
Use the form below to leave a comment:
Responses and Pingbacks
April 28th, 2010 at 5:33 pm
This is good to know! I wonder how glob() compares to SPL’s DirectoryIterator.
April 28th, 2010 at 5:56 pm
I couldn’t resist benchmarking DirectoryIterator. 🙂
$files = array();
foreach (new DirectoryIterator(‘.’) as $item) {
$currentFile = (string) $item;
if ($currentFile == ‘.’ or $currentFile == ‘..’) continue;
$files[] = $currentFile;
}
I iterated 1,000 times over a directory with 1,000 files. Here are my results:
Glob: 6.448 sec.
Opendir: 3.048 sec.
DirectorIterator: 1.793 sec.
April 28th, 2010 at 7:19 pm
Thanks Bill.
I’ve always wondered about glob()’s performance.
Questions:
Did you try any of the SPL classes?
I assume you were on Linux?
Cheers from sunny Australia.
April 29th, 2010 at 10:40 am
[…] glob function, the subject of a recent post on NETTUTS.com, is the topic of this new post from Bill Karwin on the php|architect website. He focuses on the efficiency of the function over […]
April 29th, 2010 at 11:03 am
I was wondering about DirectoryIterator. Thanks Hector. I had a question about it once, you can find it in the URL I attached to this mail. It’s about how to represent an entire folder-tree as an array, recursively.
http://stackoverflow.com/questions/952263/
April 29th, 2010 at 1:20 pm
Thanks for the additional data point Hector. The DirectoryIterator wasn’t one that was suggested on the original StackOverflow thread in January, but it’s good to see how it compares.
SPL is undervalued, I think because of its neglected documentation.
I ran my tests on a Macbook Pro (Core 2 Duo 2.4GHz) running OS X Panther 10.5.8.
April 29th, 2010 at 3:14 pm
Awesome comparison Bill! I was going to start using glob() because the short code, but now I am going to bury myself in the SPL documentation (thx Hector).
Good work Bill on helping build a faster web.
May 18th, 2010 at 9:17 pm
Hi,
I’m interested in how you ran the tests. Did you use PHP as a script on the command line or as a page on a web site? If it was over the web, how (if at all) did you account for network latency, web server overhead, etc.?
When I ran your glob code as a script on 400 files (82K each, generated with dd and urandom), even with a large number of iterations it didn’t come close to the large numbers you’re seeing. In fact, it showed the same numbers as your final 1.2sec opendir() example: they both had sub-second responses.
Cheers,
—
Sam
May 24th, 2010 at 12:45 pm
Hi Sam, thanks for your comment. I ran these tests as a command-line script, not as a web page.
You can see the full source of my test script at StackOverflow: http://stackoverflow.com/questions/2120287/directory-to-array-with-php/2120496#2120496
The test results may vary on your platform, because you have a different CPU, a different filesystem, a different operating system, etc. I ran my tests on a Macbook Pro, running OS X Leopard 10.5.8, CPU is a 2.4GHz Core 2 Duo.
September 19th, 2012 at 9:48 pm
Even faster (and solves .htacess being listed too):
if($currentFile[0]!==’.’)
Btw.: thanks for sharing! Best…
June 6th, 2013 at 12:03 pm
One of the best thing with glob is the file search.
glob(‘./*.jpg’);
glob(‘./filename*’);
So is there any bench marks for this, where we want some actually file name?
April 29th, 2020 at 10:33 pm
my scandir is too slow with million files.