Gotcha with find_each and find_in_batches

Rails 2.3 added a couple of nice new methods - find_each and find_in_batches. Both methods accomplish the same thing in a slightly different way. Unlike a normal finder these methods grab objects in batches instead of all at once. For instance, if you have 500,000 users, you don't want to do the following:

User.find(:all).each { |user| user.some_method }

The reason is that you just loaded all 500,000 records into memory, and your server is not happy. Instead, you could do:

User.find_each { |user| user.some_method }

By default, the above will only load 1,000 User objects into memory, and your server will thank you. If 1,000 is too big/small for you, use the :batch_size option to change it. The find_in_batches method is similar except that it provides the array to the block instead of one object at a time. For example:

User.find_in_batches do |users|
  users.each { |user| user.some_method }

If you ever used the wonderful will_paginate gem, you are probably familiar with the concept from the paginated_each method that will_paginate provided.

So, what is the problem? The problem is that you have to aware that unlike paginate_each, find_each and find_in_batches work by setting up a with_scope block. Therefore, if you need to do any other finds on that same model, the scope will apply. Usually this only affects relationships, but it isn't hard to forget. Here is an example:

# This is a purely made up example
class User < ActiveRecord::Base
  # There is a last_login_at attribute
  named_scope :recent_login,
              lambda { |*args|
              { :conditions => ["people.last_login_at >= ?", (args.first || 1.week.ago)] } }
  belongs_to :parent, :class_name => "User", :foreign_key => "parent_id"
User.recent_login.find_each do |user|
  parent = user.parent # This will include the recent_login scope.

It's no different than other with_scope issues, but it isn't as obvious. You can get around it by doing:

User.recent_login.find_each do |user|
  # Got use send because with_excusive_scope is protected.
    parent = user.parent # This will include the recent_login scope

Now, go and be kind to your server with find_each and find_in_batches - just remember the scope.