PRIMARY CATEGORY → BASH
RESOURCES
David A. WheelerHow to handle filenames correctlySee Here
Stephane ChazelasHandle filenames safelySee Here
Stephane ChazelasNever parse ls’s outputSee here
GlobsMeaninge.g
*Match any str, of any lenghtfoo
foo*Match any str beginning with foofooter
bar?Match bar followed by one or cero charsbart
[kz]shMatch sh beginning with k or zksh
  • * → Matches cero or more chars
  • ? → Matches one specifically char
  • [...] → Matches one char in a specified set

Globbing, Globs, Filename Expansion (All the same) happens after Word Splitting, which means that any file expanded through Globbing corresponds to One Word for Bash and not undergoes any word or field separation.

for _file in ./*
do
        rm -f -- "$_file"
done

Above code will cause $_file parameter to be expanded as ./filename instead of filename.

Globbing is the latest Type of Expasion that happens during Bash Parsing Process, aka Bash Parser

Be aware that, in previous example, if directory has no files, * glob pattern will return the pattern instead ./*, then rm will receive as a non-option argument a non-existent file.

To avoid this issue, check file existence before executing any command which has that file as argument →

POSIX Compliance
for _file in ./*
do
        [ -e "$_file" ] || [ -L "$_file" ] || continue # [ ] Instead of [[ ]]
 
        printf "File -> %s\n" "$_file"
done
Non-Standard Shell Extension → Nullglob
foo()
{
        local -- _file= _ns= # Empty Parameters to avoid previous values
 
        shopt -q nullglob ; _ns=$? # Shell Extension Checking
        (( _ns )) && shopt -s nullglob # Enable if disabled
 
        for _file in ./* # Use ./* instead of *
        do
                printf "File -> %s\n" "$_file"
        done
 
        (( _ns )) && shopt -u nullglob # Disable if enabled
}

This Nullglob Non-Standard Shell Expansion results way more efficient than the POSIX one. This is due to non requiring File Existence Check for each iteration (i.e. [ -e "$_file ])

Note that Globbing should only be used on for loops. If used as non-option command argument, and expasion results on a too long Filename List, command may not handle correctly all arguments

  • ./ → Prevents file processing, whose name starts with -, as a non-option cmd argument
  • -- → Denotes the end of option argumentes. It should be used as a additional measure, not the only one
  • ./.[!.]* ./..?* or dotglob → Adds hidden files to glob expansion list
  • [ -e file ] or nullglob → Avoids to process non existent files

But, with the above measures applied, the following case may arises →

# Incorrect, It may cause command hang-up
$ ( shopt -s nullglob ; command -- ./* ./.[!.]* ./..?* )

Above command expects, at least, one file matching the previous pattern. If not, it’ll hang trying to read from the standard input (fd 0)

In cases where globbing does not expand to none pathname and nullglob shell expansion is enabled, to prevent above case, just add /dev/null as final non-option argument

$ ( shopt -s nullglob ; command -- ./* ./.[!.]* /dev/null )

Handle Correctly Pathname and Filename

To handle correctly either files with control chars ("\n", "\t"...) or another processing-sensitive char, there are different approches to get to the same place ➡️

  • Globbing → Does not undergoes word splitting cause happens after it. So there’s no need to worry about pathnames which contains control chars or others

Note that Globbing is useful to get a list of unhidden files on a specific directory, but, it has some limitations to overcome like empty matches (nullglob), symlinks, non-recursive search (globstar) and hidden files (dotglob)

To deal with above aspects, it’d be more feasible to make use of find binary

  • Find → It has several features to perform advanced and recursive searchs on a specific path given certain criteria
Globbing
POSIX Compliant - Non-Hidden Files
for _file in ./* # ./* instead of *
do
		# POSIX → [ ] instead of [[ ]]
        [ -e "$_file" ] || [ -L "$_file" ] || continue
        
        cat -- "$_file" # -- → referring to end of cmd options
done
POSIX Compliant - Including Hidden Files
for _file in ./* ./.[!.]* ./..?* # Non-POSIX Compliant → {*,.[!.]*,..?*} 
do
        [ -e "$_file" ] || [ -L "$_file" ] || continue
 
        cat -- "$_file"
done
Non Standard Bash Extension For Loop
foo()
{
        local -- _file= _ns= # Empty Parameters to avoid previous values
 
        shopt -q nullglob ; _ns=$? # Shell Extension Checking
        (( _ns )) && shopt -s nullglob # Enable if disabled
 
        for _file in ./* ./.[!.]* ./..?* # Use ./ before globs
        do
                printf "File -> %s\n" "$_file"
        done
 
        (( _ns )) && shopt -u nullglob # Disable if enabled
}
Non Standard Bash Extension Oneliner (Command Interface)
# nullglob to avoid errors on empty matches
# -- as additional protective measure (End of command option arguments)
# /dev/null as non-option arg to avoid command hang-up on stdin
 
$ ( shopt -s nullglob ; cat -- ./* ./.[!.]* ./..?* /dev/null )
foo()
{
        local -- _file= _gs=
        shopt -q globstar ; _gs=$?
 
        (( _gs )) && shopt -s globstar
 
        for _file in ./** # ./** indicates recursively
        do
                printf "File -> %s\n" "$_file"
        done
 
        (( _gs )) && shopt -u globstar
}

By default, this recursive shell search behaves this way →

  • Omit hidden files
  • Prune dot dirs (does not descend into them)
  • Since Bash v.4.3, does not follow symlinks. This prevents infinite loops and duplicated entries

Due to latest point, take into account the use of globstar on systems with < Bash v4.3

Therefore, It’d be feasible to perform a recursive search with globstar, including hidden files and avoiding errors on empty matches, using dotglob and nullglob Bash extensions, respectively →

foo()
{
        # globstar, {dot,null}glob Status Checking
 
        for _file in ./** # With above Bash extensions enabled
        do
                # [ -e "$_file" ] || [ -L "$_file" ] if nullglob not enabled
				# Less efficient but POSIX Compliant
                command -- "$_file"
        done
 
        # Return Bash extensions to their initial state
}

However, find offers a more reliable way to do this, being able to deploy several advanced filters that globbing could not

Remember that globstar recursive search way is non-standard and It offers poor control over the recursion

Not to mention that, as long as the number of files handled increase, is way more feasible to make use of find in terms of performance

# Note subshell use in foo to not modify Parent Shell Env attributes
# Being foo function () →
foo ()
(
    shopt -s globstar dotglob nullglob
    for _file in ./**
    do
        printf "File -> %s\n" "$_file"
    done
)
$ export -f -- foo # Bash's Child Process inherit foo func in its env
# Benchmark between foo and 'find .' command over 50k Files
$ hyperfine --shell bash foo --warmup 3 --min-runs 500 -i 'find .'

In the above benchmark, find shows a better performance over globstar - nullglob - dotglob recursive search on more than 50k files

Note that performance breach between both will increase as long as the number of files increases, so it seems more feasible to make use of find for better yield and robustness

Find is nearly in every UNIX system while globstar is Non-POSIX Compliant and > Bash v4.0 (Infinite Loops) - v4.3

FIND

It’s an external Binary, not a shell Builting

$ command -V find
find is /usr/bin/find

Take into account that any binary’s execution leads to the creation of a Shell’s child process through system calls like fork or similar (vfork, clone)

This child process’s env is a clone of its parent’s env. Then, the binary is executed inside that child process through execve syscall

That does not happen on globstar way due to none binary is required, only shell functionalities (builtins, keywords, expansions…) are used during it

But since find is only executed once or a few times on nearly any context, this does not imply a perceptible yield reduction

Default behaviour →

  • Does not omit hidden files. To omit them →
$ find . -name '.' -o -path '*/.*' -prune -o ... # Omit hidden Files and . dir
  • Applies a recursive search. To limit it to current directory (Non-recursive) →
$ find . -maxdepth 1 -name '.' -o ... # Non-recursive and Omit . dir
  • Is passed a directory and all matches begins with that directory in the command’s output. Therefore, errors probably don’t arise when filename starts with a -, unlike globbing
  • find -exec option allows to directly run commands with any matched file. Although, if the pathnames are needed back into the shell, several alternatives arise →

Before proceed to show them, take into account the following stuff →

  • Filenames can contain any char except Zero Byte (aka Null Byte \0) and slash /

Hence, since filenames can contain special chars like newlines \n, reading files line-by-line will fail → read

Some alternatives such as the following will also fail due to above problem →

Incorrect Ways of Handling Filenames
for _file in $( find . -name '.' -o -print )
do
        printf "File -> %s\n" "$_file"
done
$ cat $( find . -name '.' -o -print ) > ./foo
(
        for _file in $( find . -name '.' -o -print )
        do
                printf "File -> %s\n" "$_file"
        done
)
find . -name '.' -o -print | while read _file
do
        printf "File -> %s\n" "$_file"
done
find . -name '.' -o -print | while IFS= read -r _file
do
        printf "File -> %s\n" "$_file"
done
$ find . -name '.' -o -print | xargs cat --

As mentioned in above sections, a filename can contain any character unless Null Byte \0 or backslash \

Since find process matched files and prints them line-by-line (i.e. one file per line), It’s unfeasible to reprocess those ones through →

  • read’s default behavior as read process an input string until a newline char \n
  • Command Substitution as It removes trailing newlines and has to be unquoted, therefore it undergoes Word Splitting and Globbing

Which can be improved modifying IFS’s value to a \n and disabling globbing through set -f. Although, the same problem arises, It cannot handle correctly filenames with newlines

Therefore, process filenames line-by-line through those ways will fail

A good approach is to separate pathnames with a Null Byte \0 rather than with a newline \n since filenames cannot contain \0

Above way is POSIX Compliant since 2023, although It has been going on for a long time

See Issue 8

POSIX Compliant - Find -exec ’{}’ ;
$ find . -name '.' -o -exec cat -- '{}' \;
POSIX Compliant - Find -exec ’{}’ +
$ find . -name '.' -o -exec cat -- '{}' +
POSIX Compliant - Find -print0 + Xargs -0
$ find . -name '.' -o -print0 | xargs -0 -I{} cat -- {}

Above commands are recommended if filename handling process is no longer than a few commands

If a more complex filename processing task is required, such as the execution of several commands, the following are the feasible ones →

POSIX Compliant - Find -print0 + IFS= Read -d ” (Pipelined)
find . -name '.' -o -print0 | while IFS= read -rd '' _file
do
        printf "File -> %s\n" "$_file"
done
Non-POSIX Compliant - Find -print0 + IFS= Read -d ” (Process Substitution)
while IFS= read -rd '' _file # read -d → Null Byte delimiter
do
        printf "File -> %s\n" "$_file"
 
done < <( find . -name '.' -o -print0 ) # Process Substitution
POSIX Compliant - Find -print0 + IFS= Read -d ” (FIFOS)
mkfifo -- ./namedpipe # FIFO creation
 
find . -name '.' -o -print0 > ./namedpipe & # Find's output to FIFO in bg
 
while IFS= read -rd '' _file 0<&4 # reads from FD4 rather than stdin (FD0)
do
        printf "File -> %s\n" "$_file"
 
done 4< ./namedpipe # FIFO opened on read mode and assigned to FD4

Cleaner but not POSIX Compliant

foo()
{
        local -- _file= _FIFO=./namedpipe # Scope limited to the function
 
        trap 'rm --force "$_FIFO"' RETURN # FIFO Cleanup
 
        mkfifo -- "$_FIFO" # FIFO Creation
 
        find . -name '.' -o -print0 > "$_FIFO" & # Find writes to the FIFO
 
        while IFS= read -rd '' _file 0<&4 # Read reads from the FIFO (FD4)
        do
                printf "File -> %s\n" "$_file"
 
        done 4< "$_FIFO" # FD 4 created refering to the FIFO
 
        return 0
}