Question: Rails/Ruby... 50mb string performance

Question

Rails/Ruby... 50mb string performance

Answers 3
Added at 2016-12-19 04:12
Tags
Question

I'm trying to send 1mm rows worth of data to the browser.

Unfortunately this is a situation where I can't use pagination.

The actual raw PG query takes only 3 seconds. I'm interacting with ActiveRecord::Base.connection.exec_query directly so there's no rails overhead making rails instances... only a raw array.

I originally was calling .to_json on the result set from PG but found that calling .to_s is like 100x faster (and because its a simple array I don't need anything fancier).

The problem is that now the .to_s method is taking 10 seconds where the SQL query only takes like 1-2.

Is there a better way I can send this much data in a response with a quick conversion to a string?

The other option I'm going to look into is using a job to push this data to S3, and then stream the raw data file to the browser from S3, but it still doesn't change the fact I'm going to have to convert it to a string before sending to S3, and I assume that will just be extremely slow.

I'm a bit confused honestly because its a lot of records, but its still only 50mb. I don't get why this requires like 400mb of memory to do (my Heroku dynos are marked as "vastly exceeding bandwidth limit"). How come the rails/ruby/dyno isn't intelligent about it's memory when trying to stream a response this big? It literally just crashes itself...

Answers
nr: #1 dodano: 2016-12-19 05:12

I'm a bit confused honestly because its a lot of records, but its still only 50mb. I don't get why this requires like 400mb of memory to do (my Heroku dynos are marked as "vastly exceeding bandwidth limit"). How come the rails/ruby/dyno isn't intelligent about it's memory when trying to stream a response this big? It literally just crashes itself...

Because by default SQL query results must be completely loaded into Ruby; this means, every row, every field, is an instance of a class (from the PostgreSQL adapter); then it gets rewritten into yet another structure (an array of ActiveRecord models, each of which contains attributes that hold the fields, at minimum). That's a lot of memory overhead.

If you want to keep memory consumption low, you will want to use a PostgreSQL cursor, for example using the PostgreSQL cursor gem.

If you want to speed up the process, you will avoid ActiveRecord altogether: use PostgreSQL adapter directly, and manually use a cursor with DECLARE <cursor> CURSOR FOR <query>, FETCH <rows> FROM <cursor> and CLOSE <cursor> commands.

EDIT: I fail at reading, I just realised you did, indeed, eschew ActiveRecord. :) Cursors will solve your memory problems; if the speed problems stem from memory paging this should help your speed as well.

nr: #2 dodano: 2016-12-19 05:12

How come the rails/ruby/dyno isn't intelligent about it's memory when trying to stream a response this big? It literally just crashes itself...

For the same reason a balloon pops when you blow too much air into it!

Not only are you trying to send a lot of data at once, but you're also inadvertently creating a lot of new objects that clutter memory until the garbage collector has time to nuke 'em safely.

The best way to guarantee prompt memory cleanup, prevent the request from timing out on Heroku, and avoid overhead from additional container objects is to use HTTP streaming with a raw Postgres connection.

class DataDumpController < ApplicationController
  include ActionController::Live

  def send_data
    response.headers['Content-Type'] = 'text/event-stream'

    # Get access to a raw PG::Connection
    raw_conn = ActiveRecord::Base.connection_pool.checkout.raw_connection

    # Send your query.
    #
    # Note you can actually send multiple queries here, but you'll
    # need to call a "get_result" block for each.
    raw_conn.send_query('your query;')
    raw_conn.set_single_row_mode

    # Stream the results from the query to the client
    raw_conn.get_result.stream_each_row do |row|
      response.stream.write(row)
    end

    # Close the response stream
    response.stream.close
  end
end

Don't forget to add the appropriate route!

get 'data_dump' => 'data_dump#send_data'
nr: #3 dodano: 2016-12-19 05:12

I think you can save the result directly from Postgres to a temporary file and send it with Rails.

Example query:

ActiveRecord::Base.connection.execute(%(COPY(Select * from questions) TO '/tmp/result.txt'))

COPY TO documentation

Source Show
◀ Wstecz