Ruby/String/grep

Материал из Wiki.crossplatform.ru

Версия от 17:56, 13 сентября 2010; ViGOur (Обсуждение | вклад)
(разн.) ← Предыдущая | Текущая версия (разн.) | Следующая → (разн.)
Перейти к: навигация, поиск

Содержание

Anchors anchor a pattern to the beginning (^) or end ($) of a line

opening = "This is a test. \nThis is a test. \n"
puts opening.grep(/^this is/)
puts opening.grep(/this is,$/)



a pair of square brackets ([]) matches any character in the brackets

opening = "this is a test\nthis is a test,\n"
opening.grep(/t[hi]s/)



Braces ({}) specifies the exact number of digits, such as \d{3} or \d{4}

phone = "(555)123-4567"
phone.grep(/[\(\d{3}\)]?\d{3}-\d{4}/)# => ["(555)123-4567"]



\d represents a digit; it is the same as using [0-9].

# Similarly to ^, the shortcut \A matches the beginning of a string
opening = "This is a test. \nThis is a test. This is a test. ,\n"
opening.grep(/\Athis in/)



Find out the phone number

phone = "(555)123-4567"
phone.grep(/[\(\d\d\d\)]?\d\d\d-\d\d\d\d/) # => ["(555)123-4567"]



Grouping uses parentheses to group a subexpression, like this one that contains an alternation

opening = "this is a test\nthis is a test,\n"
puts opening.grep(/t(h|i)s/)



? is a repetition operator

It indicates zero or one occurrence of the previous pattern.
color = "color colour"
color.grep(/colou?r/)



match alternate forms of a pattern using the pipe character (|)

opening = "this is a test \nthis is a test ,\n"
puts opening.grep(/th|te/)



match the first line just by using a word in the pattern

opening = "this is a test\n this is another test,\n"
p opening.grep(/is/)



Regular Expressions

Pattern             Description
/pattern/options    Pattern pattern in slashes, followed by optional options, one or more of: i for case-insensitive; o for substitute once; x for ignore whitespace, allow comments; m for match multiple lines and newlines as normal characters.
%r!pattern!         General delimited string for a regular expression, where ! can be an arbitrary character.
^                   Matches beginning of line.
$                   Matches end of line.
.                   Matches any character.
\1...\9             Matches nth grouped subexpression.
\10                 Matches nth grouped subexpression if already matched; otherwise, refers to octal representation of a character code.
\n, \r, \t, etc.    Matches character in backslash notation.
\w                  Matches word character; same as [0-9A-Za-z_].
\W                  Matches nonword character; same as [^0-9A-Za-z_].
\s                  Matches whitespace character; same as [\t\n\r\f].
\S                  Matches nonwhitespace character; same as [^\t\n\r\f].
\d                  Matches digit; same as [0-9].
\D                  Matches nondigit; same as [^0-9].
\A                  Matches beginning of a string.
\Z                  Matches end of a string, or before newline at the end.
\z                  Matches end of a string.
\b                  Matches word boundary outside [] or backspace (0x08) inside [].
\B                  Matches nonword boundary.
\G                  Matches point where last match finished.
[..]                Matches any single character in brackets, such as [ch].
[^..]               Matches any single character not in brackets.
*                   Matches zero or more of previous regular expressions.
*?                  Matches zero or more of previous regular expressions (nongreedy).
+                   Matches one or more of previous regular expressions.
+?                  Matches one or more of previous regular expressions (nongreedy).
{m}                 Matches exactly m number of previous regular expressions.
{m,}                Matches at least m number of previous regular expressions.
{m,n}               Matches at least m but at most n number of previous regular expressions.
{m,n}?              Matches at least m but at most n number of previous regular expressions (nongreedy).
?                   Matches zero or one of previous regular expression.
|                   Alternation, such as color|colour.
( )                 Groups regular expressions or subexpression, such as col(o|ou)r.
(?#..)              Comment.
(?:..)              Groups without back-references (without remembering matched text).
(?=..)              Specifies position with pattern.
(?!..)              Specifies position with pattern negation.
(?>..)              Matches independent pattern without backtracking.
(?imx)              Toggles i, m, or x options on.
(?-imx)             Toggles i, m, or x options off.
(?imx:..)           Toggles i, m, or x options on within parentheses.
(?-imx:..)          Toggles i, m, or x options off within parentheses.
(?ix-ix: )          Turns on (or off) i and x options within this noncapturing group.
[:alnum:]           class for alphanumeric.
[:alpha:]           class for uppercase and lowercase letters.
[:blank:]           class for blank and tab.
[:cntrl:]           class for Control characters.
[:digit:]           class for digits.
[:graph:]           class for printable characters (but not space).
[:lower:]           class for lowercase letter.
[:print:]           class for printable characters (space included).
[:punct:]           class for printable characters (but not space and alphanumeric).
[:space:]           class for whitespace.
[:upper:]           class for uppercase letters.
[:xdigit:]          class for hex digits: A-F, a-f, and 0-9.



Similar to $, the shortcut \z matches the end of a string, not a line

opening = "This is a test. This is a test. \nThis is a test. This is a test. ,\n"
opening.grep(/this is,\z/)



The plus sign (+) operator indicates one or more of the previous pattern

phone = "(555)123-4567"
phone.grep(/[\(\d+\)]?\d+-\d+/) # => ["(555)123-4567"]