Question: How to efficiently slice binary data in Ruby?


How to efficiently slice binary data in Ruby?

Answers 2
Added at 2017-01-02 20:01

After reviewing SO post Ruby: Split binary data, I used the following code which works.

z = 'A' * 1_000_000
z.bytes.each_slice( STREAMING_CHUNK_SIZE ).each do | chunk | 
  c = chunk.pack( 'C*' )

However, it is very slow:

Benchmark.realtime do
=> 0.0983949700021185

98ms to slice and pack a 1MB file. This is very slow.

Use Case:
Server receives binary data from an external API, and streams it using socket.write chunk.pack( 'C*' ).
The data is expected to be between 50KB and 5MB, with an average of 500KB.

So, how to efficiently slice binary data in Ruby?

Answers to

How to efficiently slice binary data in Ruby?

nr: #1 dodano: 2017-01-02 23:01


Your code looks nice, uses the correct Ruby methods and the correct syntax, but it still :

  • creates a huge Array of Integers
  • slices this big Array in multiple Arrays
  • pack those Arrays back to a String


The following code extracts the parts directly from the string, without converting anything :

def get_binary_chunks(string, size) + size - 1) / size)) { |i| string.byteslice(i * size, size) }

(string.length + size - 1) / size) is just to avoid missing the last chunk if it is smaller than size.


With a 500kB pdf file and chunks of 12345 bytes, Fruity returns :

Running each test 16 times. Test will take about 28 seconds.
_eric_duminil is faster than _b_seven by 380x ± 100.0

get_binary_chunks is also 6x times faster than StringIO#each(n) with this example.

Further optimization

If you're sure the string is binary (not UTF8 with multibyte characters like 'ä'), you can use slice instead of byteslice:

def get_binary_chunks(string, size) + size - 1) / size)) { |i| string.slice(i * size, size) }

which makes the code even faster (about 500x compared to your method).

If you use this code with a Unicode String, the chunks will have size characters but might have more than size bytes.

Using the chunks directly

Finally, if you're not interested in getting an Array of Strings, you could use the chunks directly :

def send_binary_chunks(socket, string, size)
  ((string.length + size - 1) / size).times do |i|
    socket.write string.slice(i * size, size)
nr: #2 dodano: 2017-01-02 23:01

Use StringIO#each(n) with a string that has BINARY encoding:

require 'stringio'
string.force_encoding(Encoding::BINARY) { |chunk| socket.write(chunk) }

This only allocates the intermediate arrays just before pushing them to the socket.

Source Show
◀ Wstecz