Question: How to efficiently slice binary data in Ruby?

Question

How to efficiently slice binary data in Ruby?

Answers 2
Added at 2017-01-02 20:01
Tags
Question

After reviewing SO post Ruby: Split binary data, I used the following code which works.

z = 'A' * 1_000_000
z.bytes.each_slice( STREAMING_CHUNK_SIZE ).each do | chunk | 
  c = chunk.pack( 'C*' )
end

However, it is very slow:

Benchmark.realtime do
  ...
=> 0.0983949700021185

98ms to slice and pack a 1MB file. This is very slow.

Use Case:
Server receives binary data from an external API, and streams it using socket.write chunk.pack( 'C*' ).
The data is expected to be between 50KB and 5MB, with an average of 500KB.

So, how to efficiently slice binary data in Ruby?

Answers
nr: #1 dodano: 2017-01-02 23:01

Notes

Your code looks nice, uses the correct Ruby methods and the correct syntax, but it still :

  • creates a huge Array of Integers
  • slices this big Array in multiple Arrays
  • pack those Arrays back to a String

Alternative

The following code extracts the parts directly from the string, without converting anything :

def get_binary_chunks(string, size)
  Array.new(((string.length + size - 1) / size)) { |i| string.byteslice(i * size, size) }
end

(string.length + size - 1) / size) is just to avoid missing the last chunk if it is smaller than size.

Performance

With a 500kB pdf file and chunks of 12345 bytes, Fruity returns :

Running each test 16 times. Test will take about 28 seconds.
_eric_duminil is faster than _b_seven by 380x ± 100.0

get_binary_chunks is also 6x times faster than StringIO#each(n) with this example.

Further optimization

If you're sure the string is binary (not UTF8 with multibyte characters like 'ä'), you can use slice instead of byteslice:

def get_binary_chunks(string, size)
  Array.new(((string.length + size - 1) / size)) { |i| string.slice(i * size, size) }
end

which makes the code even faster (about 500x compared to your method).

If you use this code with a Unicode String, the chunks will have size characters but might have more than size bytes.

Using the chunks directly

Finally, if you're not interested in getting an Array of Strings, you could use the chunks directly :

def send_binary_chunks(socket, string, size)
  ((string.length + size - 1) / size).times do |i|
    socket.write string.slice(i * size, size)
  end
end
nr: #2 dodano: 2017-01-02 23:01

Use StringIO#each(n) with a string that has BINARY encoding:

require 'stringio'
string.force_encoding(Encoding::BINARY)
StringIO.new(string).each(size) { |chunk| socket.write(chunk) }

This only allocates the intermediate arrays just before pushing them to the socket.

Source Show
◀ Wstecz