__ _
-o)/ / (_)__ __ ____ __ Derek Winterstien
/\\ /__/ / _ \/ // /\ \/ / r.o.a.c.h.@.r.o.b.o.t.z...c.o.m
_\_v __/_/_//_/\_,_/ /_/\_\
Creation Date: Thu Apr 22 12:43:36 CDT 2004 current ver 0.10
------------------------------------------------------------------------------
REGULAR EXPRESSIONS- notes collection and general reference including examples
------------------------------------------------------------------------------
This reference applies to vi/vim and grep/egrep for the most part. It is
useful to be familiar with some basic vi conventions. Use CNTRL-V for vi to
accept ascii control characters such as carriage return. For example, if you
wish to add CR's in an html file for the beginning of every table row
tag
you would :%s/
/{CNTRL-V}{CR}
/ In {} brackets are key combinations, you
holding down the control key and pressing v, then not holding down control and
pressing the Enter key. Within vi on the terminal screen what you typed would
appear as 1,$S/<\/td>/^n<\/td>/
Single character matching is the principle to which vi operates. To match
every 'w' in this document only if the 'w' is the first character in a line of
text, type:
/^w
The slash is the vi search character (refer to the vi command reference), the
carrot ^ is part of regular expressions that indicates the beginning of the
line.
^ Match the beginning of a line
$ Match the end of a line
Typing /^useful causes vi to match any occurance of 'useful' string only when
at the beginning of a line. It is a pattern of single characters, or single
character patterns.
It is possible to group characters in a set. [ and ] represent a group
pattern with a list of characters inside. For example, /^[abc] will match any
occurance of the letter 'a', 'b', or 'c' individually and at the beginning of
a line. /^[abc][abc] tells vi to match any two characters that each
individually are a, b, or c and starting at the beginning of a line (such as
the 'ac' in 'accept').
Ranges are also possible. To match any lowercase letter at the beginning of a
line of text type /^[a-z] To match all numbers anywhere in the document type
/[0-9] or to match all alphabetic characters upper and lowercase at the
beginning of every line type /[a-zA-Z]
[abc] Is a single-character pattern that matches
either the letter a, b or c
[ab0-9] Is a single-character pattern that matches
either a or b or a digit in the ascii range
from zero to nine
[a-zA-Z0-9\-] This matches a single-character that
is either an upper case or lower case
letter, a digit or the minus sign.
Inverted sets are also possible using a set definition with "[^" instead of
"[". Inverting a ^ changes the meaning from beginning of the line to an
inverted set.
[0-9] Is a single character pattern that matches
a digit in the ascii range from zero to nine.
[^0-9] Match any single NON-digit character.
[^abc] Match any single character that is not an
a, b or c.
There are special characters such as the '.' dot wildcard and '*' multipler.
You may be accustomed to using the * asterik as a whildcard, but this is _not_
the case in regular expressions.
. matches one occurance of anything accept a new line character
* multiplier determines how often a single-character pattern must occur
- indicates a range
Special characters can be expressed literally by using a backslash. Preceding
a special character with a backslash, such as \. will cause the '.' to be
taken as its literal meaning and not as its reserved function characteristic.
To search for and match 2 positions in lines with a space as the second
character in vi you simply type
/^.\
To do the same for lines with any number as the third character type:
/^..[0-9]
Matchs for lines that start with anything other than 'a':
/^[^a]
Now to get multipliers involved lets take a look at some matches where
anything can be in the middle. Match any line string beginning with 'a' with
any number of any characters in the middle and terminating with the last word
'the' in a line:
/^a.*the
Notice how the first occurance of the word 'the' will be ignored and a match
continues to the very last occurance of the string 'the'? A multiplier will
basically swollow up everything until the last match.
For more complicated search and replace operations, it becomes necessary to
stuff some of the text string or a single character into memory. Parentheses
are a memory construct in regular expressions. What is enclosed in them is
remembered and used later on. In the vi/vim editor the parentheses syntax
must include backslashes.
Memory constructs are not that useful for simple searches such as those we
have demonstrated above. They are, however, absolutly necessary for complex
search and replace operations.
In an html document I have several images. I need to change the extension of
every image that represents an indexed part from 'jpg' to 'gif', and not alter
any other image names. It seems that indexed parts in our example always
start with the string 'gm' and are followed by a sequence of numbers of
variable length, and then concluded with the '.jpg' extension. Example:
Since we do not wish to modify any other image tags in the html document, we
must be careful how we contruct our real expression for pattern matching.
:%s/\(
table row starts and concludes.
:%s/
/
/g
:%s/<\/tr>/<\/tr>/g
By default, vi/vim will match only once per line. the 'g' and the end tells
vi to match multiple times per line. In vi the substitution command :%s/ /
/gc is used. The percent refers to the ex-range 'whole file' and can be
replaced by any appropriate range. E.g in vim you type shift-v, mark an area
and then use the substitution on that area only. I don't explain more about
vim here as this would be a tutorial on its own. The 'gc' is the interactive
version. The no interactive is s/ / /g
Now to create a line break {CR} for each | table cell tag:
:%s/ | / | /g
:%s/<\/td>/<\/td>/g
To add some indentation in the html source code:
:%s/ |
/
/g
:%s/<\/tr>/ <\/tr>/g
:%s/<\/td>/ <\/td>/g
:%s/| / | /g
Change all tags that reference some old graphics directory to use our /images
subdirectory:
:%s/ //sg; # Matches TOO Much! It sees "..."
as one big /<.*>/ (starts with "<" and ends with ">").
:%s/<.*?>//g; # Solves the greediness problem by using the shortest
possible match.
:%s/<[^>]*>//g; # Also works.
|