Bash Script Beautifier (Python)

Discussion

This is the second Bash script beautifier I have written — the first was written in Ruby and it's become pretty well-known. But since that time, for those tasks where it's appropriate, I have decided to program in Python instead of Ruby, and eventually I decided to rewrite the Bash beautifier and clean up some annoying inconsistencies in the process.

Beautifying Bash scripts is not trivial. Bash scripts aren't like C or Java programs — they have a lot of ambiguous syntax, and (shudder) keywords can be used as variables. Years ago, while testing the first version of this program, I encountered this example:
done=3;echo done;done
Same name, but three distinct meanings (sigh). The Bash interpreter can sort out this perversity, but I decided not to try to recreate the Bash interpreter just to beautify a script. This means there will be some border cases this Python program won't be able to process. But in tests with many large Linux system Bash scripts, its error-free score was roughly 99%.

BeautifyBash has three modes of operation:
If presented with a list of file names —
beautify_bash.py file1.sh file2.sh file3.sh
— for each file name, it will create a backup (i.e. file1.sh~) and overwrite the original file with a beautified replacement.
If given '-' as a command-line argument, it will use stdin as its source and stdout as its sink:
beautify_bash.py - < infile.sh > outfile.sh
If called as a module, it will behave itself and not execute its main() function:
#!/usr/bin/env python
# -*- coding: utf-8 -*-

from beautify_bash import BeautifyBash

[ ... ]

result,error = BeautifyBash().beautify_string(source)
BeautifyBash handles Bash here-docs very carefully (and there are probably some border cases it doesn't handle). The basic idea is that the originator knew what format he wanted in the here-doc, and a beautifier shouldn't try to outguess him. So BeautifyBash does all it can to pass along the here-doc content unchanged:
if true
then
   
   echo "Before here-doc"
   
   # Insert 2 lines in file, then save.
   #--------Begin here document-----------#
vi $TARGETFILE <<x23LimitStringx23
i
This is line 1 of the example file.
This is line 2 of the example file.
^[
ZZ
x23LimitStringx23
   #----------End here document-----------#
   
   echo "After here-doc"
   
fi
As written, BeautifyBash can beautify large numbers of Bash scripts when called from ... well, among other things, a Bash script:
#!/bin/sh

for path in `find /path -name '*.sh'`
do
   bash_beautify.py $path
done
As well as the more obvious example:
$ beautify_bash.py *.sh
CAUTION: Because BeautifyBash overwrites all the files submitted to it, this could have disastrous consequences if the files include some of the increasingly common Bash scripts that have appended binary content (a regime where BeautifyBash's behavior is undefined). So please — back up your files, and don't treat BeautifyBash as though it is a harmless utility. That's only true most of the time.

Licensing, Source

BeautifyBash is released under the GNU General Public License.

Here is the plain-text source file without line numbers.

Revision History

Version 1.0 04/14/2011. Initial Public Release.

Program Listing

  1: #!/usr/bin/env python
  2: # -*- coding: utf-8 -*-
  3: 
  4: #**************************************************************************
  5: #   Copyright (C) 2011, Paul Lutus                                        *
  6: #                                                                         *
  7: #   This program is free software; you can redistribute it and/or modify  *
  8: #   it under the terms of the GNU General Public License as published by  *
  9: #   the Free Software Foundation; either version 2 of the License, or     *
 10: #   (at your option) any later version.                                   *
 11: #                                                                         *
 12: #   This program is distributed in the hope that it will be useful,       *
 13: #   but WITHOUT ANY WARRANTY; without even the implied warranty of        *
 14: #   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the         *
 15: #   GNU General Public License for more details.                          *
 16: #                                                                         *
 17: #   You should have received a copy of the GNU General Public License     *
 18: #   along with this program; if not, write to the                         *
 19: #   Free Software Foundation, Inc.,                                       *
 20: #   59 Temple Place - Suite 330, Boston, MA  02111-1307, USA.             *
 21: #**************************************************************************
 22: 
 23: import re, sys
 24: 
 25: PVERSION = '1.0'
 26: 
 27: class BeautifyBash:
 28: 
 29:   def __init__(self):
 30:     self.tab_str = ' '
 31:     self.tab_size = 2
 32: 
 33:   def read_file(self,fp):
 34:     with open(fp) as f:
 35:       return f.read()
 36: 
 37:   def write_file(self,fp,data):
 38:     with open(fp,'w') as f:
 39:       f.write(data)
 40: 
 41:   def beautify_string(self,data,path = ''):
 42:     tab = 0
 43:     case_stack = []
 44:     in_here_doc = False
 45:     defer_ext_quote = False
 46:     in_ext_quote = False
 47:     ext_quote_string = ''
 48:     here_string = ''
 49:     output = []
 50:     line = 1
 51:     for record in re.split('\n',data):
 52:       record = record.rstrip()
 53:       stripped_record = record.strip()
 54:       
 55:       # collapse multiple quotes between ' ... '
 56:       test_record = re.sub(r'\'.*?\'','',stripped_record)
 57:       # collapse multiple quotes between " ... "
 58:       test_record = re.sub(r'".*?"','',test_record)
 59:       # collapse multiple quotes between ` ... `
 60:       test_record = re.sub(r'`.*?`','',test_record)
 61:       # collapse multiple quotes between \` ... ' (weird case)
 62:       test_record = re.sub(r'\\`.*?\'','',test_record)
 63:       # strip out any escaped single characters
 64:       test_record = re.sub(r'\\.','',test_record)
 65:       # remove '#' comments
 66:       test_record = re.sub(r'(\A|\s)(#.*)','',test_record,1)
 67:       if(not in_here_doc):
 68:         if(re.search('<<-?',test_record)):
 69:           here_string = re.sub('.*<<-?\s*[\'|"]?([_|\w]+)[\'|"]?.*','\\1',stripped_record,1)
 70:           in_here_doc = (len(here_string) > 0)
 71:       if(in_here_doc): # pass on with no changes
 72:         output.append(record)
 73:         # now test for here-doc termination string
 74:         if(re.search(here_string,test_record) and not re.search('<<',test_record)):
 75:           in_here_doc = False
 76:       else: # not in here doc
 77:         if(in_ext_quote):
 78:           if(re.search(ext_quote_string,test_record)):
 79:             # provide line after quotes
 80:             test_record = re.sub('.*%s(.*)' % ext_quote_string,'\\1',test_record,1)
 81:             in_ext_quote = False
 82:         else: # not in ext quote
 83:           if(re.search(r'(\A|\s)(\'|")',test_record)):
 84:             # apply only after this line has been processed
 85:             defer_ext_quote = True
 86:             ext_quote_string = re.sub('.*([\'"]).*','\\1',test_record,1)
 87:             # provide line before quote
 88:             test_record = re.sub('(.*)%s.*' % ext_quote_string,'\\1',test_record,1)
 89:         if(in_ext_quote):
 90:           # pass on unchanged
 91:           output.append(record)
 92:         else: # not in ext quote
 93:           inc = len(re.findall('(\s|\A|;)(case|then|do)(;|\Z|\s)',test_record))
 94:           inc += len(re.findall('(\{|\(|\[)',test_record))
 95:           outc = len(re.findall('(\s|\A|;)(esac|fi|done|elif)(;|\)|\||\Z|\s)',test_record))
 96:           outc += len(re.findall('(\}|\)|\])',test_record))
 97:           if(re.search(r'\besac\b',test_record)):
 98:             if(len(case_stack) == 0):
 99:               sys.stderr.write(
100:                 'File %s: error: "esac" before "case" in line %d.\n' % (path,line)
101:               )
102:             else:
103:               outc += case_stack.pop()
104:           # sepcial handling for bad syntax within case ... esac
105:           if(len(case_stack) > 0):
106:             if(re.search('\A[^(]*\)',test_record)):
107:               # avoid overcount
108:               outc -= 2
109:               case_stack[-1] += 1
110:             if(re.search(';;',test_record)):
111:               outc += 1
112:               case_stack[-1] -= 1
113:           # an ad-hoc solution for the "else" keyword
114:           else_case = (0,-1)[re.search('^(else)',test_record) != None]
115:           net = inc - outc
116:           tab += min(net,0)
117:           extab = tab + else_case
118:           extab = max(0,extab)
119:           output.append((self.tab_str * self.tab_size * extab) + stripped_record)
120:           tab += max(net,0)
121:         if(defer_ext_quote):
122:           in_ext_quote = True
123:           defer_ext_quote = False
124:         if(re.search(r'\bcase\b',test_record)):
125:           case_stack.append(0)
126:       line += 1
127:     error = (tab != 0)
128:     if(error):
129:       sys.stderr.write('File %s: error: indent/outdent mismatch: %d.\n' % (path,tab))
130:     return '\n'.join(output), error
131: 
132:   def beautify_file(self,path):
133:     error = False
134:     if(path == '-'):
135:       data = sys.stdin.read()
136:       result,error = self.beautify_string(data,'(stdin)')
137:       sys.stdout.write(result)
138:     else: # named file
139:       data = self.read_file(path)
140:       result,error = self.beautify_string(data,path)
141:       if(data != result):
142:         # make a backup copy
143:         self.write_file(path + '~',data)
144:         self.write_file(path,result)
145:     return error
146: 
147:   def main(self):
148:     error = False
149:     sys.argv.pop(0)
150:     if(len(sys.argv) < 1):
151:       sys.stderr.write('usage: shell script filenames or \"-\" for stdin.\n')
152:     else:
153:       for path in sys.argv:
154:         error |= self.beautify_file(path)
155:     sys.exit((0,1)[error])
156: 
157: # if not called as a module
158: if(__name__ == '__main__'):
159:   BeautifyBash().main()
160: