Question: How can I quickly generate all permutations of a string in Ruby?

Question

How can I quickly generate all permutations of a string in Ruby?

Answers 5
Added at 2017-01-01 18:01
Tags
Question

I am currently using this function, and the code works exactly as it should.
self.chars.permutation.map(&:join).uniq.group_by(&:chr)

However, once the string is more than 10 characters, it takes a lot of time to generate all permutations. How could I generate permutations quicker?

Answers to

How can I quickly generate all permutations of a string in Ruby?

nr: #1 dodano: 2017-01-01 19:01

Perhaps lazy might be an option. It doesn't need as much memory as generation all permutations before checking for a special condition.

Something like:

'my_string'.chars.permutation.lazy.map(&:join).each do |permutation|    
  puts permutation if dictionary.include?(permutation)
end
nr: #2 dodano: 2017-01-01 19:01

If we look at Permutation we see the number of permutations of an eleven letter word with no letter repeated would be 39,916,800. However for MISSISSIPPI it is 11! / ( 1! * 4! * 4! * 2!) = 34,650. The first is going to take a long time however you do it, but if you can reduce the search space using repeating characters it might become more manageable. The standard permutation method does not remove repeats.

Searching for "ruby permutations without repetition" might turn up some algorithms.

nr: #3 dodano: 2017-01-01 20:01

I am currently using this function, and the code works exactly as it should.
self.chars.permutation.map(&:join).uniq.group_by(&:chr)

However, once the string is more than 10 characters, it takes a lot of time to generate all permutations. How could I generate permutations quicker?

You can't. Well, maybe there are ways of speeding it up a little, but there is really no point: the number of permutations is much too large. For just 25 characters, even if we assume that you can generate one permutation for every CPU cycle, even if we assume that you have a 5GHz CPU, even if we assume that your CPU has 100 cores, even if we assume that the work can be perfectly distributed among those cores, it will still take close to one million years to generate. There's just that many of them.

In short: there is no point in even trying to speed up your algorithm. You need to get away from generating permutations altogether.

nr: #4 dodano: 2017-01-01 21:01

Theory

No need for permutations :

  • Sort the letters in your string
  • Sort the letters in every word in the dictionary
  • Look for same sorted letters
  • Done!

Implementation

class String
  def sorted_letters
    downcase.chars.sort.join
  end
end

class AnagramFinder
  @dict = '/usr/share/dict/american-english'
  class << self
    def get_anagrams(word)
      sorted_dict[word.sorted_letters]
    end

    def all
      sorted_dict.values.select { |anagrams| anagrams.size > 1 }
    end

    def sorted_dict
      @sorted_dict ||= generate_sorted_dict
    end

    private

    def generate_sorted_dict
      File.foreach(@dict).with_object(Hash.new { |h, k| h[k] = [] }) do |word, sorted_dict|
        word.chomp!
        sorted_dict[word.sorted_letters] << word
      end
    end
  end
end

p AnagramFinder.get_anagrams('impressiveness')
#=> ["impressiveness", "permissiveness"]
p AnagramFinder.get_anagrams('castor')
#=> ["Castor", "Castro", "Croats", "actors", "castor", "costar", "scrota"]
p AnagramFinder.all.last(5)
#=> [["wist", "wits"], ["withers", "writhes"], ["woodworm", "wormwood"], ["wriest", "writes"], ["wrist", "writs"]]
p AnagramFinder.all.max_by(&:length)
#=> ["Stael", "Tesla", "least", "slate", "stale", "steal", "tales", "teals"]

This example needed 0.5s on my slowish server, and most of it was spent building the sorted dictionary. Once it is done, the lookup is almost instantaneous.

"impressiveness" has 14 characters, you would need a very long time to generate all the permutations (14! = 87178291200).

nr: #5 dodano: 2017-01-01 21:01

Rather than computing all permutations of each word, a better approach is to first create a hash from the dictionary, whose keys are strings sorted by character and whose values are arrays containing all words in the dictionary which are anagrams of the key. The arrays are empty when the word contains no anagrams in the dictionary (other than itself).

words      = %w| god act bat tar a lion stop |
  #=> ["god", "act", "bat", "tar", "a", "lion", "stop"]
dictionary = %w| cat dog a fowl bat god act lion pig donkey loin post pots
                 spot stop tops| 
  #=> ["cat", "dog", "a", "fowl", "bat", "god", "act", "lion", "pig",
  #    "donkey", "loin", "post", "pots", "spot", "stop", "tops"]

h = dictionary.each_with_object(Hash.new { |h,k| h[k] = [] }) do |w,h|
  h[w.each_char.sort.join] << w
end
  #=> {"act"=>["cat", "act"], "dgo"=>["dog", "god"], "a"=>["a"], "flow"=>["fowl"],
  #    "abt"=>["bat"], "ilno"=>["lion", "loin"], "gip"=>["pig"], "deknoy"=>["donkey"],
  #    "opst"=>["post", "pots", "spot", "stop", "tops"]} 

We can then obtain all the anagrams of each word in words by sorting the word on its characters and seeing whether that is a key in the hash.

words.each_with_object({}) do |w,g|
  key = w.downcase.chars.sort.join
  values = h.key?(key) ? (h[key]-[w]) : []
  g[w] = values
end
  #=> {"god"=>["dog"], "act"=>["cat"], "bat"=>[], "tar"=>[], "a"=>[],
  #    "lion"=>["loin"], "stop"=>["post", "pots", "spot", "tops"]} 
Source Show
◀ Wstecz