Ruby/String/grep
Материал из Wiki.crossplatform.ru
Версия от 17:10, 26 мая 2010; (Обсуждение)
Anchors anchor a pattern to the beginning (^) or end ($) of a line
opening = "This is a test. \nThis is a test. \n" puts opening.grep(/^this is/) puts opening.grep(/this is,$/)
a pair of square brackets ([]) matches any character in the brackets
opening = "this is a test\nthis is a test,\n" opening.grep(/t[hi]s/)
Braces ({}) specifies the exact number of digits, such as \d{3} or \d{4}
phone = "(555)123-4567" phone.grep(/[\(\d{3}\)]?\d{3}-\d{4}/)# => ["(555)123-4567"]
\d represents a digit; it is the same as using [0-9].
# Similarly to ^, the shortcut \A matches the beginning of a string opening = "This is a test. \nThis is a test. This is a test. ,\n" opening.grep(/\Athis in/)
Find out the phone number
phone = "(555)123-4567" phone.grep(/[\(\d\d\d\)]?\d\d\d-\d\d\d\d/) # => ["(555)123-4567"]
Grouping uses parentheses to group a subexpression, like this one that contains an alternation
opening = "this is a test\nthis is a test,\n" puts opening.grep(/t(h|i)s/)
? is a repetition operator
It indicates zero or one occurrence of the previous pattern. color = "color colour" color.grep(/colou?r/)
match alternate forms of a pattern using the pipe character (|)
opening = "this is a test \nthis is a test ,\n" puts opening.grep(/th|te/)
match the first line just by using a word in the pattern
opening = "this is a test\n this is another test,\n" p opening.grep(/is/)
Regular Expressions
Pattern Description /pattern/options Pattern pattern in slashes, followed by optional options, one or more of: i for case-insensitive; o for substitute once; x for ignore whitespace, allow comments; m for match multiple lines and newlines as normal characters. %r!pattern! General delimited string for a regular expression, where ! can be an arbitrary character. ^ Matches beginning of line. $ Matches end of line. . Matches any character. \1...\9 Matches nth grouped subexpression. \10 Matches nth grouped subexpression if already matched; otherwise, refers to octal representation of a character code. \n, \r, \t, etc. Matches character in backslash notation. \w Matches word character; same as [0-9A-Za-z_]. \W Matches nonword character; same as [^0-9A-Za-z_]. \s Matches whitespace character; same as [\t\n\r\f]. \S Matches nonwhitespace character; same as [^\t\n\r\f]. \d Matches digit; same as [0-9]. \D Matches nondigit; same as [^0-9]. \A Matches beginning of a string. \Z Matches end of a string, or before newline at the end. \z Matches end of a string. \b Matches word boundary outside [] or backspace (0x08) inside []. \B Matches nonword boundary. \G Matches point where last match finished. [..] Matches any single character in brackets, such as [ch]. [^..] Matches any single character not in brackets. * Matches zero or more of previous regular expressions. *? Matches zero or more of previous regular expressions (nongreedy). + Matches one or more of previous regular expressions. +? Matches one or more of previous regular expressions (nongreedy). {m} Matches exactly m number of previous regular expressions. {m,} Matches at least m number of previous regular expressions. {m,n} Matches at least m but at most n number of previous regular expressions. {m,n}? Matches at least m but at most n number of previous regular expressions (nongreedy). ? Matches zero or one of previous regular expression. | Alternation, such as color|colour. ( ) Groups regular expressions or subexpression, such as col(o|ou)r. (?#..) Comment. (?:..) Groups without back-references (without remembering matched text). (?=..) Specifies position with pattern. (?!..) Specifies position with pattern negation. (?>..) Matches independent pattern without backtracking. (?imx) Toggles i, m, or x options on. (?-imx) Toggles i, m, or x options off. (?imx:..) Toggles i, m, or x options on within parentheses. (?-imx:..) Toggles i, m, or x options off within parentheses. (?ix-ix: ) Turns on (or off) i and x options within this noncapturing group. [:alnum:] class for alphanumeric. [:alpha:] class for uppercase and lowercase letters. [:blank:] class for blank and tab. [:cntrl:] class for Control characters. [:digit:] class for digits. [:graph:] class for printable characters (but not space). [:lower:] class for lowercase letter. [:print:] class for printable characters (space included). [:punct:] class for printable characters (but not space and alphanumeric). [:space:] class for whitespace. [:upper:] class for uppercase letters. [:xdigit:] class for hex digits: A-F, a-f, and 0-9.
Similar to $, the shortcut \z matches the end of a string, not a line
opening = "This is a test. This is a test. \nThis is a test. This is a test. ,\n" opening.grep(/this is,\z/)
The plus sign (+) operator indicates one or more of the previous pattern
phone = "(555)123-4567" phone.grep(/[\(\d+\)]?\d+-\d+/) # => ["(555)123-4567"]