PRIMARY CATEGORY → BASH
IFS
as Internal/Input Field Separator, basically a set of chars that acts as delimiters when Word Splitting is performed on an input string or parameter reference
By default, IFS
parameter is assigned as value a blank, tab char and newline →
If IFS
is not set, It behaves such as its default value →
IMPORTANT
In previous command,
$foo
parameter expansion is not double quoted to enable Word Splitting (also Globbing)
There’s no Word Splitting if IFS
has as value an empty string →
INFO
Note that
( )
is used in above commands to prevent modifyingIFS
globally. Thus, this parameter modification only affects Subshell’s environmentThe above concepts may better understood as follows →
IFS
with default value →
IFS
unset →
IFS
with an empty string as value →
When IFS is relevant ?
Read
Shell builtin
read
’s input string processing steps →
read
receives a string as input- That string undergoes Word Splitting according to
IFS
values - Every field resulting from prior splitting is assigned to a
read
’s parameter
INFO
read -r
takes\
char as literal. Therefore, It prevents escaping any chars and maintains the input string format
Last variable gets the remaining words if resulting fields’ number is greater than the number of parameters defined by read
IMPORTANT
No inner consolidation is performed in the remaining part of the string if number of fields is greater than than number of
read
’s variablesTake into account that leading and trimming whitespaces continues to be performed even in the remaining part
if read -a
is specified, each field resulting from field splitting is assigned into an array as element, according to IFS
→
Any leading and trailing blanks and tabs \s
are trimmed from input string if IFS
is set to those values (e.g. IFS
default value or Unset)
INFO
Note that
$' '
ANSI-C Quoting is used in the input string to interpret sequence chars such as\n
or\t
The following way is incorrect because sequence chars are treated as literals due to single quotes
' '
→
Likewise, string’s inner blanks and \t
are consolidated into one char →
While if IFS
is set to an empty string and therefore field splitting is not performed, the following situations do not arise →
- Leading and Trailing blanks and tabs are not stripped out
- The inner ones are not consolidated
A similar case arise when IFS
has as its value non-whitespace characters. Again, no initial or final trimming and no inner consolidation is performed on IFS
chars →
IMPORTANT
Above situation changes if
IFS
’ value is a mixture of whitespace and non-whitespace charsIFS=' :'
→Any non-whitespace char (Only one) plus all adjacent whitespaces ones act as a single delimiter, they get consolidated
INFO
Trimming at the beginning and the end of input string continues as long as whitespace chars are in
IFS
’s valueIf there is more than one consecutive non-whitespace char and It’s inside
IFS
’ values, those ones are treated as single delimiters (no consolidation) and one empty Word/Field is created →CAUTION
Be aware that
IFS=' :'
modifiesIFS
’s value globally, which is often not what is desired, below there’re several ways to handle this situation correctly
Unquoted Shell Expansion
When Shell Expansion is performed in shell’s parsing, all unquoted expansion undergoes Word Splitting and Globbing
INFO
There’re different types of shell expansion and they all occur in a set order after command has been split into tokens
Ordered from the first to the last:
- Brace Expansion. Non-POSIX Compliant →
- Tilde Expansion → Check GNU Bash Manual
Note that one
$
char preceding a string introduce differente types of expansion like the following ones
- Parameter Expansion →
Parameter expansion may be enclosed between brackets to perform any specific action such as string manipulation of the parameters’ value
There’re much more, nearly all related to string manipulation
Command Substitution → Command execution inside a subshell
( )
. Any trailing newlines in command output are trimmedCAUTION
Be aware that embedded newlines
\n
in output command may be deleted due to Word Splitting
- Arithmetic Expansion →
- Process Substitution. Non-POSIX Compliant. Command is executed in a subshell
()
asynchronously. Its input or output appears as filenameOnce any of above expansions occur, if not quoted
' ', " ", $' '
, both Word Splitting and Globbing apply to themNote that Globbing (aka Filename Expansion) occurs after Word Splitting.
Therefore, when Globbing is used to expand files, each matched files is taken as a single Word/Field by the shell due to no Word splitting after that
With all above actions done, Parameter Assignment, Redirections and Quote Removal are performed prior to the command execution
As mentioned above, word splitting splits the resulting expansion string into different fields or words through IFS
parameter’s value
Therefore, following case undergoes word splitting →
While this one does not →
Same happens with other Shell Expansions like Command Substitution
Be aware that any variable reference inside command substitution must be quoted even if command subtitution itself is quoted →
INFO
In Command Substitution, trailing newlines are stripped out in subshell’s output
Likewise, output’s embedded newlines may be removed due to Word Splitting if
IFS
is unset, default or with a\n
explicitly assignedA situation related to the above actions that may arise repeteadly is the following:
Having a file named
foo
with this content →IMPORTANT
Note the following notation with
batcat --show-all
:
·
→ Blank/Whitespace├──┤
→ Tab\t
␊
→ Newline\n
Above file’s content has several blanks, tabs and newlines, both internal and leading/trailing
If
foo
file has to be processed line-by-line, below command is incorrect due to several reasons (each field/word is stored in$_line
per iteration) →Above way uses unquoted command substitution to process file’s lines. Thus, that expansion undergoes Word Splitting and Globbing
Leading
\n
and leading/trailing/inner blanks and\t
are treated as delimiters due toIFS
default valueDue to Word splitting applied on command substitution output, the inner ones are consolidated and the leading/trailing are chopped off
Same applies with inner newlines, they’re stripped due to word splitting
As mentioned earlier, trailing newlines are trimmed in command substitution output too
Note that there’s one line (the
foo
one) that has been generated through Globbing due to*
char. It expands to all files in the current directoryAbove situation can be improved as follows →
IMPORTANT
Subshell used to prevent Global modifications like the
IFS
and Globbing ones
IFS
parameter is set to\n
. Therefore, Word Splitting only acts when a newline char is foundFilename expansion is disabled due to
set -f
, which means that when a globbing char is found (*
,?
,[ ]
), It does not expand to any file in CWDBut above way continues to be incorrect →
- Empty lines (leading/trailing/inner
\n
) are removed due to Word Splitting asIFS
value is\n
Remember that adjacent inner whitespace chars are consolidated into a single delimiter. Thus, they do not create any additional words
- Command substitution keeps trimming trailing newlines. In fact, that trimming is performed before the Word Splitting one since command substitution occurs before the split
Command Substitution expansion cannot be quoted as its output would be processed once, no line-by-line (i.e. it’d be taken as a single word)
Likewise, an empty string cannot be assigned to
IFS
parameter since no Word splitting would act and, therefore, the same as above would occur (all output as one single word)To the above problems, It can be added that all is executed into a subshell
( )
. Hence, any parameter modification or creation is not reflected in the Parent Shell’s envThis can be modifiy as follows →
With above code, It’s no longer necessary to use a subshell
( )
IFS
modification scope is limited locally to the function due tolocal
shell builtin
Initial Shell Options like the globbing one are stored in a parameter. Once all stuff is done, they’re restored to their prior values through
eval
IMPORTANT
Be aware that above code is non-POSIX-Compliant due to following reasons →
local
,declare
andtypeset
. They are supported by many shell but, strictly, not POSIX
- ANSI-C Quoting to assing value to
IFS
. Bash extension, not POSIX
$( < file )
. Bash extension, not POSIXA POSIX-Compliant code would be this one →
Simple parameter assignments are used instead of
local``cat
command in command subtitution rather than$(< file)
That
IFS
assignment is treated hereAlthough, behaviour remains the same, i.e. empty lines are still removed by Word splitting and Command Substitution, as in the last examples above
Having seen all incorrect examples, this is the correct way to handle and process line-by-line an input (e.g. A file) →
Note that no expansion is performed as input unlike above examples
File
is passed directly asread
’s inputBy default,
read
processes all input line-by-line (until\n
)In this case, input string does not undergoes Word Splitting since
IFS
is empty. Therefore, no leading and trailing whitespaces chars are trimmedIt’s true that previous code blocks stdin’s
read while loop
. To keep stdin free and be able to read from it →What is happening above is that
./foo
file is being opened in read mode. That open resource is assigned to file descriptor 4That file descriptor only refers to
./foo
file in thewhile loop
context. Thus,while loop
’s stdin remains free to be used
$* Quoted Form
When positional parameters are expanded through Parameter Expansion, if expansion is not quoted, Word Splitting applies on the resulting string according to IFS
values
To expand all positional parameters, It can be done using $*
or $@
expansions and their quoted forms, "$*"
and "$@"
, respectively
Let’s create a function that prints the Args number and each of them to better appreciate the following behavior →
$*
unquoted expansion undergoes word splitting. Therefore, that action is applied for each positional parameter
- Likewise,
$@
unquoted expansion undergoes word splitting. As mentioned above repeteadly, not being quoted means that split-glob is performed on that expansion
If above expansions are quoted:
"$*"
→ Expands all positional parameters as a single string where each parameter is separated by the firstIFS
value
"$@"
→ Expands all positional parameters, but, unlike the above one, each positional parameter is treated as a single quoted word
That is, "$@"
is the same as "$1" "$2" "$3" ...
Same occurs with array expansion to extract all array elements →
Ways to set IFS
’s values
There are several ways to assign values to IFS
parameter, both POSIX and Non-POSIX Compliant
Take into account that It may arise situations where It’s necessary to modify IFS
without affect its value globally, like as follows:
- Any function that returns/prints array elements as a single string with each element separated by
IFS
’s first value →
- Any filename handling process through command substitution or another expansion →
That is, perform any actions with system files. See Globbing for more information
- As explained in the sections at the beginning, any File processing line-by-line via command subtitution →
Note that all above examples are executed inside subshell ( )
to avoid modify IFS
’s value globally and Shell Options such as the globbing one (set -f
)
Also as mentioned several times, all parameter creation or modification inside a subshell is not reflected outside it (i.e. Those changes do not apply in Shell Parent’s env)
To handle that situation, there are several ways to perform IFS
and Shell Options modification without affect them globally plus keep any parameters modification outside child processes
INFO
In the below sections, only a newline
\n
char is assigned toIFS
parameter to make it more complicated due to trailing newline trimming which command substitution performsLikewise, they’re gonna take the last above code related to File Processing line-by-line as example
The one related to array elements or positional arguments returned as a single string through
"S*"
expansion is explained in above sectionsAll related to handle filenames correctly through globbing and
find
command is here
Local Shell Builtin + C-ANSI Quoting
Non-POSIX Compliant
Explanation
IFS
’s modification scope is limited locally to the function due tolocal
shell builtinCAUTION
Be aware that
IFS
modification will retains its value in the loop contextA newline
\n
character is assigned toIFS
through C-ANSI QuotingShell Options are saved for later restoration
set +o
. Therefore, It can proceed to Globbing disabling with no problemset -f
Command Substitution is performed and
for loop
takes it as input. Each field/word resulting from that unquoted expansion is assigned to_line
parameter for each iterationThus,
IFS
’s value has to be modified only to a newline. This way each expansion’s output line is treated as a single line after Word Splitting and_line
receives an entire line rather than word-by-word (i.e. If blanks were also part ofIFS
’s value)Remember that above code cannot handle leading/trailing/inner empty lines →
- Since
IFS
’s value is\n
, Word Splitting takes that value as delimiter and deletes itThat occurs due to any sequence of whitespace chars is consolidated and taken as a single delimiter. Furthermore, adjacent whitespaces characters do not form a word
- Command Substitution trims any trailing newline from its output. In fact, this is the reason why final empty lines are removed since prior expansion happens before Word Splitting
CAUTION
Note that above code is _Non-POSIX Compliant _ due to:
Local
Shell Builtin → LimitsIFS
’s modification Scope to the function
- ANSI-C Quoting →
IFS
’s value is assigned through this way
$( < ./file )
→ Generates theFor loop
’s inputThose actions can be implemented in a POSIX Compliant way using a simple parameter assignment rather than
local
It’s not possible to limit
IFS
scope but, if assigned, its initial value can be saved in a variable for later restorationThere’re several POSIX-Compliant ways to assign a
\n
toIFS
instead of ANSI-C Quoting. See belowJust use
$( cat ./file )
rather than$(< ./file )
Take into account that above code is correct as long as any Shell validation is implemented before it to prevent unintended results
Since POSIX-oriented shells such as sh or dash do not support above Bash-like functionality
This way the command which generated script’s Parent Process is extracted through command substitution and evaluated in a
case
statement
Eval + Printf
POSIX Compliant
Explanation
All initial Shell Options are stored in a variable for later restoring (i.e.
set +o
andeval "$oldSetOptions
)The same applies with
IFS
, but only ifIFS
is set. No value can be stored if a parameter is unassignedTherefore,
IFS
’s value is stored for further restoring, regardless of whether its value is an empty string or other. See ThisA newline is assigned to
IFS
as a value througheval
andprintf
CAUTION
This is the correct way to assing a
\n
toIFS
parameter →Note that the following ways are incorrect due to several reasons →
Above code assigns the
n
character asIFS
’s value. Backslash char\
does not remain because It’s sintactically interpred as an escape character by the shellRemember any character preceded by a
\
is treated as literal (i.e. a\
is sintactically meaningfull for the shell)As the other one, this in incorrect due to any character in quotes is treated as literal and It loses its syntactic meaning if it has one
Therefore,
IFS
receives as value literally\n
. Note that It’s not an escape sequenceDue to
printf
’s behavior,\n
is interpreted as an escape sequence (i.e. a newline) and not as literalThe problem in above assignment is that Command Substitution trims any trailing newline in its output
Thus, because of this deletion, an empty string is assigned to
IFS
Likewise, that does not work either. Here, the same applies as when this is is done
IFS='\n'
, that is,IFS
get as value literally\n
(Not an escape sequence!)All this being said, the right way comes →
Several thing happens in above command:
- The Command Substitution’s resulting string is the following →
IFS="\n"
No trailing newline trimming is performed as last output char is a double quote
"
and not a newline\n
Note that, as mentioned above, although
\n
sequence is in quotes,printf
behaviour does that\n
is interpreted as an escape sequence rather than a literalSo, on paper,
IFS="\n"
is a valid assignment to get a newline insideIFS
. It only remains for thatprintf
string to be interpreted as a commandThat requirement is satisfied by
eval
shell builtin. It does that assigment toIFS
After the line-by-line file processing is done,
IFS
parameter is unset. This allows to restore its previous state in case It was unassignedRegardless of that, if the parameter that stored initial
IFS
’s value is set, which also means thatIFS
was set,IFS
recovers its previous value. Otherwise,IFS
keeps unset due to above stepShell options are restored to their initial value through
eval "$_oldSetOptions"
Printf + Parameter Expansion
POSIX Compliant
Explanation
Same as the
eval
+printf
one. OnlyIFS
assignment is differenteA character is placed after the newline
\n
to prevent that the expansion trims trail newlineTherefore, that char is the last one in the Command Substitution’s output rather than
\n
After that, a type of Parameter Expansion is used to remove that trailing char
X
INFO
Note that It’s not necessary to quote above Command Substitution since It occurs in an parameter assigment
Escalar parameter assigment, together with
case
statements,[[ ]]
shell keyword and other cases, does not undergoe Word Splitting and GlobbingThey’re like in a double quote contexts, therefore, It’s not necessary to use double quotes
However, remember that when in doubt, always quote parameters references !!
Having seen the above situations, It has to be said that no one should read file lines with a for loop
since this way need to process a Command Substitution’s output
That expansion cannot be quoted as It will be treated as a single string. Therefore, only one iteration will be done with that string as the for loop
’s parameter value
Thus, Word Splitting and Globbing will be performed together with command subtitution trimming trailing newlines from its output
Also, remember that, once expansion is performed and above actions are occurs on output’s string, the for loop
processes each resulting word/field assigning it to the declared parameter
As mentioned earlier, this situation can be improved modifying IFS
to a newline and disabling globbing with set -f
sentence
With above code, because of IFS
limited to just a newline, a line with blanks between non-whitespace chars is not split into several lines due to Word Splitting and the foor loop
processing
Globbing character are not interpreted neither, therefore, no filename expansion is performed and no line is generated for each file matched with that glob pattern
Although, expansion continues trimming trailing newlines from its output
Likewise, since IFS
has \n
as its value and newlines characters are considered whitespace chars, any consecutive sequence of newlines is consolidated as one single delimiter
In other words, above situation causes the empty lines to be skipped
You cannot possibly preserve blank lines if you are relying on IFS to split on newlines
Moreover, that IFS
modification will be remain in the loop context, which means that any unquoted expansion or other situations where Word Splitting acts, will be taken according to that IFS
value
That why FOR LOOP IS NOT A RECOMMENDED WAY TO PROCESS FILE LINES
Instead of the above one, this is the recommended way →
Correct Processing of a File’s Lines
POSIX Compliant
That’s all.
Explanation
No resulting string from an expansion is taken as input
read
, as its default behavior, process input until a newline\n
That input string processed by
read
undergoes Word Splitting but not GlobbingOnce above actions are performed, that string is assigned to
read
’s declared parameterTo prevent that leading and trailing blanks and tabs are trimmed,
IFS
is set to an empty string (read
’s default behavior)With that, no splitting, or globbing or final
\n
trimming or empty lines deletion is performedNote that
read -r
option makes thatread
treats\
chars as literals and not as escape sequence such as newlines or tabs
Note that above correct way to handle file lines is way more reliable and shorter than this one. The same applies with this one
If a file’s last line does not end with a newline character \n
, read
process it but returns false
Since while loop
iterates until read
returns false, the remaning line is stored in read
’s declared parameter but It’s not processed inside the loop itself
To prevent above situation, just process that remaning line individually →
while loop
stops if read
returns false, which means that It has reached the EOF (i.e. a line which end with \n
)
Then, $_line
’s value is checked. It it has content (i.e. no empty string), then It’s processed
Basically, It process last line which lacks a trailing newline but it has been read by read
The same applies to →
While loop
iterates until read
returns false, then [[ ]]
keyword checks if there was content after last \n
processed by read
. If true, that content (parameter’s value) is processed
This causes that while loop
continues iterating one last time even if read
command returns false, so that the last line with no newline can be processed
Note that this does not work →
Wrong
As mentioned above,
read
process input string until a\n
Therefore,
read
process and stores the remaining string after newline but returns falseWhich makes that
while loop
iterates no more. So$_line
receives a value that will not be processed
And this does not work either →
Wrong
All the
while loop
context is run in a subshell( )
created when a pipeline|
is usedAs mentioned in the beginning sections, any parameter assignment, such as the
read
one, is not reflected in the Parent Shell’s environmentTherefore, when
[[ ]]
check is performed,$_line
expands to an empty string because It is not set in the process parent’s environment
The following would be the correct way to iterate over an input stream using pipelines to redirect first command’s output as input of the while loop
→
Correct
Braces
{ }
allows to group all commands inside the same shell context (i.e. execution environment)Therefore,
[[ ]]
recognizes an assigned_line
parameter and returns True