IFS as Internal/Input Field Separator, basically a set of chars that acts as delimiters when Word Splitting is performed on an input string or parameter reference
By default, IFS parameter is assigned as value a blank, tab char and newline →
A similar case arise when IFS has as its value non-whitespace characters. Again, no initial or final trimming and no inner consolidation is performed on IFS chars →
Any non-whitespace char (Only one) plus all adjacent whitespaces ones act as a single delimiter, they get consolidated
INFO
Trimming at the beginning and the end of input string continues as long as whitespace chars are in IFS’s value
If there is more than one consecutive non-whitespace char and It’s inside IFS’ values, those ones are treated as single delimiters (no consolidation) and one empty Word/Field is created →
$ IFS=' :'$ foo=" ::This is:: a test::"$ printf "=%s= " $foo== == =This= =is= == =a= =test= == # Two adjacent : create one Word
CAUTION
Be aware that IFS=' :' modifies IFS’s value globally, which is often not what is desired, below there’re several ways to handle this situation correctly
Unquoted Shell Expansion
When Shell Expansion is performed in shell’s parsing, all unquoted expansion undergoes Word Splitting and Globbing
INFO
There’re different types of shell expansion and they all occur in a set order after command has been split into tokens
Above way uses unquoted command substitution to process file’s lines. Thus, that expansion undergoes Word Splitting and Globbing
Leading \n and leading/trailing/inner blanks and \t are treated as delimiters due to IFS default value
Due to Word splitting applied on command substitution output, the inner ones are consolidated and the leading/trailing are chopped off
Same applies with inner newlines, they’re stripped due to word splitting
As mentioned earlier, trailing newlines are trimmed in command substitution output too
Note that there’s one line (the foo one) that has been generated through Globbing due to * char. It expands to all files in the current directory
Above situation can be improved as follows →
( IFS=$'\n' set -f for _line in $( < ./foo ) do printf "Line -> %s\n" "$_line" done)
IMPORTANT
Subshell used to prevent Global modifications like the IFS and Globbing ones
IFS parameter is set to \n. Therefore, Word Splitting only acts when a newline char is found
Filename expansion is disabled due to set -f, which means that when a globbing char is found (*, ?, [ ]), It does not expand to any file in CWD
But above way continues to be incorrect →
Empty lines (leading/trailing/inner\n) are removed due to Word Splitting as IFS value is \n
Remember that adjacent inner whitespace chars are consolidated into a single delimiter. Thus, they do not create any additional words
Command substitution keeps trimming trailing newlines. In fact, that trimming is performed before the Word Splitting one since command substitution occurs before the split
Command Substitution expansion cannot be quoted as its output would be processed once, no line-by-line (i.e. it’d be taken as a single word)
Likewise, an empty string cannot be assigned to IFS parameter since no Word splitting would act and, therefore, the same as above would occur (all output as one single word)
To the above problems, It can be added that all is executed into a subshell ( ). Hence, any parameter modification or creation is not reflected in the Parent Shell’s env
This can be modifiy as follows →
foo(){ local -- _line \ # Or declare | typeset IFS=$'\n' _oldSetOptions=$( set +o ) # Store prior Opts set -f # Disable globbing for _line in $( < ./foo ) do printf "%s\n" "$_line" done eval "$_oldSetOptions" # Restore initial Opts}
With above code, It’s no longer necessary to use a subshell ( )
IFS modification scope is limited locally to the function due to local shell builtin
Initial Shell Options like the globbing one are stored in a parameter. Once all stuff is done, they’re restored to their prior values through eval
IMPORTANT
Be aware that above code is non-POSIX-Compliant due to following reasons →
local, declare and typeset. They are supported by many shell but, strictly, not POSIX
ANSI-C Quoting to assing value to IFS . Bash extension, not POSIX
$( < file ) . Bash extension, not POSIX
A POSIX-Compliant code would be this one →
bar(){ _line= _savedIFS= _oldSetOptions=$( set +o ) # Store Opts [ -n "${IFS+set}" ] && _savedIFS=$IFS # Store Initial IFS set -f ; eval "$( printf 'IFS="\n"' )" # Instead of $'\n' for _line in $( cat ./foo ) # instead of $(< file) do printf "%s\n" "$_line" done unset -v -- IFS # IFS Cleanup [ -n "${_savedIFS+set}" ] && { IFS=$_savedIFS ; unset _savedIFS ; } # Restore IFS eval "$_oldSetOptions" # Restore Shell Options}
Simple parameter assignments are used instead of local``cat command in command subtitution rather than $(< file)
Note that no expansion is performed as input unlike above examples
File is passed directly as read’s input
By default, read processes all input line-by-line (until \n)
In this case, input string does not undergoes Word Splitting since IFS is empty. Therefore, no leading and trailing whitespaces chars are trimmed
It’s true that previous code blocks stdin’s read while loop. To keep stdin free and be able to read from it →
while IFS read -r _line <&4 # Read stdin (FD0) is taken from FD4do printf "%s\n" "$_line"done 4< ./foo # FD4 assigned to ./foo file opened in read mode
What is happening above is that ./foo file is being opened in read mode. That open resource is assigned to file descriptor 4
That file descriptor only refers to ./foo file in the while loop context. Thus, while loop’s stdin remains free to be used
$* Quoted Form
When positional parameters are expanded through Parameter Expansion, if expansion is not quoted, Word Splitting applies on the resulting string according to IFS values
To expand all positional parameters, It can be done using $* or $@ expansions and their quoted forms, "$*" and "$@", respectively
Let’s create a function that prints the Args number and each of them to better appreciate the following behavior →
foo (){ printf "Args -> %d |" "$#" # Number of arguments printf " =%s=" "$@" ; echo # Each argument}
$ set -- "Ubuntu Focal" "Linux Mint" "Debian Bookworm" # Set Positional Args
$* unquoted expansion undergoes word splitting. Therefore, that action is applied for each positional parameter
Likewise, $@ unquoted expansion undergoes word splitting. As mentioned above repeteadly, not being quoted means that split-glob is performed on that expansion
( IFS=$'\n' ; set -f # Set IFS to newline and Disable Globs for _file in $( find . -name '.' -o -print ) # Omit . directory do printf "File -> %s\n" "$_file" done)
That is, perform any actions with system files. See Globbing for more information
As explained in the sections at the beginning, any File processing line-by-line via command subtitution →
( IFS=$'\n' ; set -f # IFS to \n and Globbing Disabled for _line in $(< ./foo ) # Same as $( cat ./foo ) do printf "%s\n" "$_line" done)
Note that all above examples are executed inside subshell ( ) to avoid modify IFS’s value globally and Shell Options such as the globbing one (set -f)
Also as mentioned several times, all parameter creation or modification inside a subshell is not reflected outside it (i.e. Those changes do not apply in Shell Parent’s env)
To handle that situation, there are several ways to perform IFS and Shell Options modification without affect them globally plus keep any parameters modification outside child processes
INFO
In the below sections, only a newline \n char is assigned to IFS parameter to make it more complicated due to trailing newline trimming which command substitution performs
Likewise, they’re gonna take the last above code related to File Processing line-by-line as example
The one related to array elements or positional arguments returned as a single string through "S*" expansion is explained in above sections
All related to handle filenames correctly through globbing and find command is here
Local Shell Builtin + C-ANSI Quoting
Non-POSIX Compliant
foo(){ local -- _line= \ IFS=$'\n' \ # \n as IFS value through ANSI-C Format _oldSetOptions=$( set +o ) # Store Shell Options set -f # Disable Globbing (noglob opt) for _line in $( < ./foo ) # Non-POSIX Compliant do printf "%s\n" "$_line" done eval "$_oldSetOptions" # Restore Shell Options}
Explanation
IFS’s modification scope is limited locally to the function due to local shell builtin
CAUTION
Be aware that IFS modification will retains its value in the loop context
A newline \n character is assigned to IFS through C-ANSI Quoting
Shell Options are saved for later restoration set +o. Therefore, It can proceed to Globbing disabling with no problem set -f
Command Substitution is performed and for loop takes it as input. Each field/word resulting from that unquoted expansion is assigned to _line parameter for each iteration
Thus, IFS’s value has to be modified only to a newline. This way each expansion’s output line is treated as a single line after Word Splitting and _line receives an entire line rather than word-by-word (i.e. If blanks were also part of IFS’s value)
Remember that above code cannot handle leading/trailing/inner empty lines →
Since IFS’s value is \n, Word Splitting takes that value as delimiter and deletes it
That occurs due to any sequence of whitespace chars is consolidated and taken as a single delimiter. Furthermore, adjacent whitespaces characters do not form a word
Command Substitution trims any trailing newline from its output. In fact, this is the reason why final empty lines are removed since prior expansion happens before Word Splitting
CAUTION
Note that above code is _Non-POSIX Compliant _ due to:
Local Shell Builtin → Limits IFS’s modification Scope to the function
ANSI-C Quoting → IFS’s value is assigned through this way
$( < ./file ) → Generates the For loop’s input
Those actions can be implemented in a POSIX Compliant way using a simple parameter assignment rather than local
It’s not possible to limit IFS scope but, if assigned, its initial value can be saved in a variable for later restoration
There’re several POSIX-Compliant ways to assign a \n to IFS instead of ANSI-C Quoting. See below
Just use $( cat ./file ) rather than $(< ./file )
Take into account that above code is correct as long as any Shell validation is implemented before it to prevent unintended results
Since POSIX-oriented shells such as sh or dash do not support above Bash-like functionality
bar(){ case $( /bin/ps -p "$PPID" -o comm= ) in *bash) return 0 ;; *) printf "Not allowed Shell. Try with Bash...\n" 1>&2 return 1 ;; esac}
This way the command which generated script’s Parent Process is extracted through command substitution and evaluated in a case statement
Eval + Printf
POSIX Compliant
foo(){ _line= _savedIFS= _oldSetOptions=$( set +o ) # Save Shell Options [ -n "${IFS+set}" ] && _savedIFS=$IFS # If set, saves IFS's value set -f # Disable Globbing eval "$( printf 'IFS="\n"' )" # Assing \n to IFS as its value for _line in $( cat < ./foo ) # Command Substitution processing do printf "%s\n" "$_line" done unset -v -- IFS # IFS Cleanup [ -n "${_savedIFS+set}" ] && { # If IFS was set, restore it IFS=$_savedIFS ; unset -- _savedIFS ; } eval "$_oldSetOptions" # Restore Shell Options}
Explanation
All initial Shell Options are stored in a variable for later restoring (i.e. set +o and eval "$oldSetOptions)
The same applies with IFS, but only if IFS is set. No value can be stored if a parameter is unassigned
Therefore, IFS’s value is stored for further restoring, regardless of whether its value is an empty string or other. See This
A newline is assigned to IFS as a value through eval and printf
CAUTION
This is the correct way to assing a \n to IFS parameter →
$ eval "$( printf 'IFS="\n"' )"
Note that the following ways are incorrect due to several reasons →
$ IFS=\n # Wrong!
Above code assigns the n character as IFS’s value. Backslash char \ does not remain because It’s sintactically interpred as an escape character by the shell
Remember any character preceded by a \ is treated as literal (i.e. a \ is sintactically meaningfull for the shell)
$ IFS='\n' # Wrong!
As the other one, this in incorrect due to any character in quotes is treated as literal and It loses its syntactic meaning if it has one
Therefore, IFS receives as value literally \n . Note that It’s not an escape sequence
$ IFS=$( printf '\n' )
Due to printf’s behavior, \n is interpreted as an escape sequence (i.e. a newline) and not as literal
The problem in above assignment is that Command Substitution trims any trailing newline in its output
Thus, because of this deletion, an empty string is assigned to IFS
$ IFS=$( printf '"\n"' )
Likewise, that does not work either. Here, the same applies as when this is is done IFS='\n', that is, IFS get as value literally \n (Not an escape sequence!)
All this being said, the right way comes →
$ eval "$( printf 'IFS="\n"' )"
Several thing happens in above command:
The Command Substitution’s resulting string is the following → IFS="\n"
No trailing newline trimming is performed as last output char is a double quote " and not a newline \n
Note that, as mentioned above, although \n sequence is in quotes, printf behaviour does that \n is interpreted as an escape sequence rather than a literal
So, on paper, IFS="\n" is a valid assignment to get a newline inside IFS. It only remains for that printf string to be interpreted as a command
That requirement is satisfied by eval shell builtin. It does that assigment to IFS
After the line-by-line file processing is done, IFS parameter is unset. This allows to restore its previous state in case It was unassigned
Regardless of that, if the parameter that stored initial IFS’s value is set, which also means that IFS was set, IFS recovers its previous value. Otherwise, IFS keeps unset due to above step
Shell options are restored to their initial value through eval "$_oldSetOptions"
Printf + Parameter Expansion
POSIX Compliant
$ IFS=$( printf '\nX' )$ IFS=${IFS%X}
Explanation
Same as the eval + printf one. Only IFS assignment is different
A character is placed after the newline \n to prevent that the expansion trims trail newline
Therefore, that char is the last one in the Command Substitution’s output rather than \n
After that, a type of Parameter Expansion is used to remove that trailing char X
INFO
Note that It’s not necessary to quote above Command Substitution since It occurs in an parameter assigment
Escalar parameter assigment, together with case statements, [[ ]] shell keyword and other cases, does not undergoe Word Splitting and Globbing
Having seen the above situations, It has to be said that no one should read file lines with a for loop since this way need to process a Command Substitution’s output
That expansion cannot be quoted as It will be treated as a single string. Therefore, only one iteration will be done with that string as the for loop’s parameter value
Thus, Word Splitting and Globbing will be performed together with command subtitution trimming trailing newlines from its output
Also, remember that, once expansion is performed and above actions are occurs on output’s string, the for loop processes each resulting word/field assigning it to the declared parameter
$ for _line in $(< ./foo) ; do printf "%s\n" "$_line" ; done
As mentioned earlier, this situation can be improved modifying IFS to a newline and disabling globbing with set -f sentence
( IFS=$'\n' set -f for _line in $( < ./foo) do printf "%s\n" "$_line" done)
With above code, because of IFS limited to just a newline, a line with blanks between non-whitespace chars is not split into several lines due to Word Splitting and the foor loop processing
Globbing character are not interpreted neither, therefore, no filename expansion is performed and no line is generated for each file matched with that glob pattern
Although, expansion continues trimming trailing newlines from its output
Likewise, since IFS has \n as its value and newlines characters are considered whitespace chars, any consecutive sequence of newlines is consolidated as one single delimiter
In other words, above situation causes the empty lines to be skipped
You cannot possibly preserve blank lines if you are relying on IFS to split on newlines
Moreover, that IFS modification will be remain in the loop context, which means that any unquoted expansion or other situations where Word Splitting acts, will be taken according to that IFS value
That why FOR LOOP IS NOT A RECOMMENDED WAY TO PROCESS FILE LINES
Once above actions are performed, that string is assigned to read’s declared parameter
To prevent that leading and trailing blanks and tabs are trimmed, IFS is set to an empty string (read’s default behavior)
With that, no splitting, or globbing or final \n trimming or empty lines deletion is performed
Note that read -r option makes that read treats \ chars as literals and not as escape sequence such as newlines or tabs
Note that above correct way to handle file lines is way more reliable and shorter than this one. The same applies with this one
If a file’s last line does not end with a newline character \n, read process it but returns false
Since while loop iterates until read returns false, the remaning line is stored in read’s declared parameter but It’s not processed inside the loop itself
To prevent above situation, just process that remaning line individually →
While loop iterates until read returns false, then [[ ]] keyword checks if there was content after last \n processed by read. If true, that content (parameter’s value) is processed
This causes that while loop continues iterating one last time even if read command returns false, so that the last line with no newline can be processed