Kevin Trowbridge

Software Developer. Husband. I ❤ Webpages.

Heroku Necessities — Generate CSV Files in the Background With Delayed_job and Store Them on S3 With Paperclip

| Comments

I’m trying to get back into technical blogging as I encounter interesting situations on a daily basis … and I get so much information from others doing the same thing.

In this case I’m moving a fairly large blog from a custom deployment platform on EngineYard, to Heroku. Heroku enforces a 30-second request timeout—so the webserver can’t be used for heavy, long-running tasks like generating a large CSV file.

The solution is to move the generation of the CSV file into a background task, and store the generated CSV file on Amazon S3. Since in my case the data that I am compiling into the CSV file is private, I also show how to configure Paperclip to make the generated CSV file only downloadable to authenticated users.

Here’s a brief (30 second) video showing the UI you can build by following these steps:

Heroku Necessities: generate CSV files in the background with “delayed_job” and store them on S3 with “paperclip” … from Kevin Trowbridge on Vimeo.

The Model: ExportedDataCsv.rb

In my case I have a few large sets of data that are stored in the database, that need to be exportable from the system for reporting and administrative tasks. Think … the ‘Users’ table (full list of users with email addresses, names, and so on) … or the ‘Stories’ table (for a blog, all of the ‘stories’ that have ever been written for the site). So this is stateful. We’re going to turn the Users table into a CSV file and save it on Amazon S3. We’ll be storing specific information about the file:

  • What’s its exact name?
  • When was it generated?
  • Is it actively generating right now, or is it available for download?

We’re using Paperclip to handle the mechanics of saving the file to S3, but we’ll need to setup a model in order to configure paperclip, as well as to store that stateful information.

app/models/exported_data_csv.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
class ExportedDataCsv < ActiveRecord::Base
  has_attached_file :csv_file, {:s3_protocol => 'https', :s3_permissions => "authenticated_read"}

  acts_as_singleton

  def generating?
    job_id.present?
  end

  def csv_file_exists?
    !self.csv_file_file_name.blank?
  end

  def trigger_csv_generation
    job = Delayed::Job.enqueue GenerateCsvJob.new({:csv_instance => self})
    update_attribute(:job_id, job.id)
  end

  def write_csv
    file = Tempfile.new([self.filename, '.csv'])
    begin
      file.write self.data_string
      self.csv_file = file
      self.save
    ensure
      file.close
      file.unlink # deletes the temp file
    end
  end

  protected

  # Kevin says: override me in subclasses ...
  def filename
    'exported_data_csv_'
  end

  def data_string
    ''
  end
end

Now that you’ve seen it, let’s discuss this model in more detail:

The first line 'has_attached_file' is the familiar way of configuring paperclip.

  • acts_as_singleton—I’m only storing a single version of each ExportedDataCSV file … so I am using the acts_as_singleton gem … the model associated with the exported CSV file will be a singleton.

  • generating? & csv_file_exists? are two methods I can use in my view to determine the immediate state of the CSV file.

  • trigger_csv_generation this method gets called by the application server’s controller method to queue up the write_csv_file background job.

  • write_csv_file this is the actual method that turns a CSV string into a TempFile which is then handed off to Paperclip.

Then there are two methods to be overridden in subclasses … oh yes, did I fail to mention? Since we are generating several distinct types of CSV files, each with its own name and data, I am using what’s called Rails ‘single table inheritance’ to create a set of subclasses to model this.

CreateExportedDataCsv db migration

Here’s the migration to create the ExportedDataCSV table in the database.

  • The presence of the type string makes the Single Table Inheritance work.
  • has_attached_file is the paperclip migration helper.
  • job_id is used to track the delayed_job and make the model’s generating? method work.
  • timestamps will keep track of when it was last updated.
db/migrate/create_exported_data_csv.rb
1
2
3
4
5
6
7
8
9
10
11
12
13
14
class CreateExportedDataCsv < ActiveRecord::Migration
  def up
    create_table :exported_data_csvs do |t|
      t.timestamps
      t.has_attached_file :csv_file
      t.string :type
      t.integer :job_id
    end
  end

  def down
    drop_table :exported_data_csvs
  end
end

Subclassed Models

With the previous two files written, it’s trivial to create a CSV file:

app/models/users_csv.rb
1
2
3
4
5
6
7
8
9
10
11
12
class UsersCsv < ExportedDataCsv

  protected

  def filename
    'users_'
  end

  def data_string
    User.all.to_comma
  end
end

The information to be put into the CSV file is simply a string. Please see https://github.com/crafterm/comma for more information on working with CSV files in Ruby.

GenerateCsvJob: The Delayed Job

We use the now-standard delayed_job gem to handle the passing off of the long running task (the write_csv method in the root model).

Here’s my ‘job’ file:

lib/delayed_jobs/generate_csv_job.rb
1
2
3
4
5
6
7
8
9
10
class GenerateCsvJob < Struct.new(:options)
  def perform
    csv_instance = options[:csv_instance]
    begin
      csv_instance.write_csv
    ensure
      csv_instance.update_attribute(:job_id, nil)
    end
  end
end

Credit – this stackoverflow post was very helpful to me: http://stackoverflow.com/questions/5582017/polling-with-delayed-job

The Controller

The controller is pretty simple … there are two methods.

  1. generate_csv – queue up a new delayed job to generate the CSV file and immediately redirect_to :back
  2. index – point the client to the S3 ‘expiring url’ path (the URL only lasts 5 minutes) to download the CSV file, if it exists.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
def index
  respond_to do |format|
    format.csv do
      if Rails.env[/production|demo/]
        redirect_to UsersCsv.instance.csv_file.expiring_url(5.minutes)
      else
        send_file UsersCsv.instance.csv_file.path
      end
    end
  end
end

def generate_csv
  UsersCsv.instance.trigger_csv_generation
  flash[:notice] = "We're generating your CSV file. Refresh the page in a minute or so to download it."
  redirect_to :back
end

routes.rb

In the routes file we just need to add a custom route to allow the client to access the generate_csv action that we created in the controller:

1
2
3
4
5
  resources :users do
    collection do
      post :generate_csv
    end
  end

The last tricky bit … the view

The last tricky piece is the view. In the view we determine whether a CSV has been generated yet … if not, we allow the user to trigger the generation of a CSV file … if so we show the link to it, but also allow the user to refresh the file as it may be far out of date.

Since we’re building a framework that will allow us to have many different CSV files … we first create an abstracted partial that will accept various input variables and that we can use all over our site:

1
2
3
4
5
6
7
8
9
10
11
12
13
<% if csv_object.generating? %>
  Generating CSV ...
<% else %>
  <% unless csv_object.csv_file_exists? %>
    No CSV exists.
  <% else # CSV exists %>
    <% shortened_filename = csv_object.csv_file_file_name.slice(/(^.*)_/, 1) + '.csv' %>
    <%= link_to shortened_filename, download_path %>
    Last updated:
    <% csv_object.updated_at.to_s(:viewable) %>
  <%= link_to "#{csv_object.csv_file_exists? ? 'Update' : 'Generate'} CSV.", trigger_generation_path, :method => :post %>
  <% end %>
<% end %>

Here’s an example of how to call the partial:

1
2
3
4
5
6
<li>
  <%= link_to 'Users', admin_users_path %>
  <br/>
  Download all:
  <%= render :partial => '/common/csv_generation_ui', :locals => {:csv_object => UsersCsv.instance, :trigger_generation_path => generate_csv_admin_users_path, :download_path => users_stories_path(:format => :csv)} %>
</li>

Summary

There are lots of moving parts in this scheme but once you get your head around it all, it’s a pretty straightforward pattern and a variant of this could be used in other situations as well. Enjoy and good luck!

software

Related

Comments