Often, when someone needs to split a given string by a character in a shell script, he uses the “cut” command.

Example:

  1. string='first item,second,third'
  2.  
  3. first="$(echo "$string" | cut -d ',' -f 1)"
  4. second="$(echo "$string" | cut -d ',' -f 2)"
  5. third="$(echo "$string" | cut -d ',' -f 3)"

But, as you can see, this is verbose and not very efficient because six external programs are executed (3 echos + 3 cuts, even if commonly echo is built in in the shell). And cut can only handle a single character for delimiter, if you want to split on “:” and “,” for instance, you have to use cut multiple times or to use another program such as “awk” or “sed”.

Fortunately, there is one easy and efficient alternative in the POSIX shell specification : to use the “Input Field Separators” (IFS for the intimates) variable and the shell field splitting in conjunction with the read command and the here-document syntax:

  1. string='first item,second,third'
  2.  
  3. IFS=',' read -r first second third <<EOF
  4. $string
  5. EOF
  6.  
  7. // Works even with multiple delimiters!
  8. string="first,second:third"
  9.  
  10. IFS=',:' read -r first second third <<EOF
  11. $string
  12. EOF

If you think that the syntax is a bit messy, nothing prevents you from writing a small wrapper:

  1. # split STRING DELIMITERS VAR_NAME...
  2. split()
  3. {
  4. local string IFS
  5.  
  6. string="$1"
  7. IFS="$2"
  8. shift 2
  9. read -r -- "$@" <<EOF
  10. $string
  11. EOF
  12. }
  13.  
  14. string='first,second:third'
  15. split "$string" ',:' first second third

Yes, that's neat. But there are some things you should be aware of.

First, you must be conscious that the read command splits the fields for all but the last variable: this one will contain the rest of the string. So if you don't need it, you must add a variable which will contain the rest of it:

  1. string='first item,second one,third one,fourth one'
  2.  
  3. // We wants only the two first items.
  4.  
  5. split "$string" ',' item1 item2
  6. // The problem is that “item2” contains “second one,third one,fourth one”.
  7.  
  8. // We must provide an additional variable.
  9. split "$string" ',' item1 item2 garbage

The other problem is worst because I have no solution for it, if you use space as delimiter, the shell will discard any empty fields:

  1. split 'foo,,bar,baz' ',' first second third fourth
  2. // first → “foo”
  3. // second → “”
  4. // third → “bar”
  5. // fourth → “baz”
  6.  
  7. split 'foo bar baz' ' ' first second third fourth
  8. // first → “foo”
  9. // second → “bar”
  10. // third → “baz”
  11. // fourth → “”

As usual the split function is implemented in my small shell library jfsh.