Home > Bash, sysadmin > Varying ways to do multi-line ‘grep’: multi-line pattern matching from the command line

Varying ways to do multi-line ‘grep’: multi-line pattern matching from the command line

March 21st, 2007

At the end of this article, I’ve pasted the contents of my ~/.vimrc file - this is the file I’m searching in all demonstrations.

You’ll notice I’m searching only one file with cat .vimrc — if you want more grep-like behavior, you may want to do:
find . -type f -exec cat {} \; //recursively output all files, or:
find . -maxdepth 1 -type f -exec cat {} \; //non-recursive

So, carrying on, let’s say I want to find a multi-line string beginning with ’smartindent’ and ending with ’set nohl’:
cat .vimrc | sed -n '/smartindent/,/set nohl/ p'
The -n argument to sed tells it to suppress normal output. Inside sed, we use regular expressions instead of line number to delimit ‘bounds’ for the p a.k.a. ‘print’ command.

I get this output:

set smartindent
set incsearch
set backspace=indent,eol,start
set showmode

if &t_Co > 2 || has("gui_running")
  syntax on
  set nohlsearch

You’ll notice that the returned string is multi-line, and it begins and ends at the correct lines. However, I didn’t really want anything before ’smartindent’ or after ’set nohl’. That may not matter — in many cases this output is probably sufficient. lf I really need the match without the prepended or appended strings, I’ll do:
cat .vimrc | sed -n '/smartindent/,/set nohl/ p' | \
sed '1 s/.*\(smartindent.*\)/\1/; $ s/\(.*set nohl\).*/\1/'

Using shell (I use bash) variables, we can make this a bit easier to reuse (albeit a tad uglier):
begin='smartindent'; end='set nohl'; cat .vimrc | \
sed -n '/'"$begin"'/,/'"$end"'/ p' | \
sed '1 s/.*\('"$begin"'.*\)/\1/; s/\(.*'"$end"'\).*/\1/'

With either, we get better output than the first:

smartindent
set incsearch
set backspace=indent,eol,start
set showmode

if &t_Co > 2 || has("gui_running")
  syntax on
  set nohl

That’s a pretty good multi-line grep, if i really only care what the beginning and end of the string look like.

I was confronted with another problem recently. How do you break input into multi-line chunks, and then find out which chunks match a certain pattern anywhere within them? For instance, notice that the .vimrc file is broken into sections with double line breaks. What if I wanted to find and output only the sections that performed the ’set’ operation? For this, I’ll use awk:
cat .vimrc | awk -v RS='\n\n' '{ if ($0 ~ /\<set\>/) print $0 "\n\n" }'
By setting the ‘RS’ awk environment variable to the simple regular expression ‘\n\n’, we’re telling it to use a record separator other than its default ‘\n’. Now we have multi-line strings that we can easily compare to a regular expression. Here’s the output:

set autoindent
set nocompatible
set autoindent
set smartindent
set incsearch
set backspace=indent,eol,start
set showmode

if &t_Co > 2 || has("gui_running")
  syntax on
  set nohlsearch
  set hl=l:Visual
endif
set tabstop=4
set shiftwidth=4

"add < > to list of matchpairs
set matchpairs+=<:>
" show me my bracket match
set showmatch
" show me my matched pair FAST
set matchtime=2

Here’s my entire .vimrc file, just for reference

set autoindent
set nocompatible
set autoindent
set smartindent
set incsearch
set backspace=indent,eol,start
set showmode

if &t_Co > 2 || has("gui_running")
  syntax on
  set nohlsearch
  set hl=l:Visual
endif
set tabstop=4
set shiftwidth=4

"add < > to list of matchpairs
set matchpairs+=<:>
" show me my bracket match
set showmatch
" show me my matched pair FAST
set matchtime=2

" commands to configure code2html operation
let html_use_css = 1
let html_output_xhtml = 1

" make it easier to do windowing stuff
map F ^W

" SVN vim plugin setup
cmap SL SVNLog
cmap NN bd^M:n^M
cmap SVD SVNVimDiff
cmap nnn :bd^M:bd^M:n^M
social bookmark of choice:
  • Digg
  • del.icio.us
  • Ma.gnolia
  • Reddit
  • Slashdot

Greg Bash, sysadmin ,

  1. August 13th, 2007 at 18:52 | #1

    If you want to you can use a full regular expression… for which you go to perl, which will let you
    set your record separator (or not set any:)
    -0:
    or none at all…
    -0
    auto loop (which sed and awk do on their own)
    -n
    use a full expression with group matching
    perl -0 -n -e ‘/(?s)smartindent(.*)nohl/ && print “$1\n” ‘ someData
    Note the:
    (?s)
    it turns on “dotall” so that your new lines are matched…

    Where do I go for documentation on perl’s regular expressions? First to the regular expression documentation for java, which notes it antecedents… Then pull an O’Reilly book off a shelf.