Ruby/String/scan
Материал из Wiki.crossplatform.ru
Версия от 17:56, 13 сентября 2010; ViGOur (Обсуждение | вклад)
Accept dashes and apostrophes as parts of words.
class String def word_count frequencies = Hash.new(0) downcase.scan(/[-"\w]+/) { |word, ignore| frequencies[word] += 1 } return frequencies end end %{"this is a test."}.word_count
Anything that"s not whitespace is a word.
class String def word_count frequencies = Hash.new(0) downcase.scan(/[^\S]+/) { |word, ignore| frequencies[word] += 1 } return frequencies end end %{"this is a test."}.word_count
A pretty good heuristic for matching English words.
class String def word_count frequencies = Hash.new(0) downcase.scan(/(\w+([-".]\w+)*)/) { |word, ignore| frequencies[word] += 1 } return frequencies end end %{"this is a test."}.word_count
Count words for a string with quotation marks
class String def word_count frequencies = Hash.new(0) downcase.scan(/\w+/) { |word| frequencies[word] += 1 } return frequencies end end %{"I have no shame," I said.}.word_count
extract numbers from a string
"The car costs $1000 and the cat costs $10".scan(/\d+/) do |x| puts x end
Just like /\w+/, but doesn"t consider underscore part of a word.
class String def word_count frequencies = Hash.new(0) downcase.scan(/[0-9A-Za-z]/) { |word, ignore| frequencies[word] += 1 } return frequencies end end %{"this is a test."}.word_count
scan a here document
#!/usr/bin/env ruby sonnet = <<129 this is a test this is another test 129 result = sonnet.scan(/^test/) result << sonnet.scan(/test;$/) puts result
Scan as split
text = "this is a test." puts "Scan method: #{text.scan(/\w+/).length}"
Scan for \w+
#!/usr/bin/env ruby hamlet = "The slings and arrows of outrageous fortune" hamlet.scan(/\w+/) # => [ "The", "slings", "and", "arrows", "of", "outrageous", "fortune" ]
Scan() string with hex value
french = "\xc3\xa7a va" french.scan(/./) { |c| puts c }
scan through all the vowels in a string: [aeiou] means "match any of a, e, i, o, or u."
"This is a test".scan(/[aeiou]/) { |x| puts x }
scan(/./u) string with hex value
french = "\xc3\xa7a va" french.scan(/./u) { |c| puts c }
specify ranges of characters inside the square brackets
# This scan matches all lowercase letters between a and m. "This is a test".scan(/[a-m]/) { |x| puts x }
Splitting Sentences into Words
class String def words scan(/\w[\w\"\-]*/) end end "This is a test of words" capabilities".words
uses \d to match any digit, and the + that follows \d makes \d match as many digits in a row as possible.
"The car costs $1000 and the cat costs $10".scan(/\d/) do |x| puts x end