Ruby Background Tasks with Starling – Part 3

In my previous post, I went over the changes I made to Workling to add threading and allow it to run for long periods. Now, we need to deploy and monitor everything. In the Linux word, there are several options, but monit seems to be the most popular. However, I wanted to give god.rb a shot. God.rb is basically a clone of monit written in Ruby. What it’s written in doesn’t really matter to me, but setting up my config files in Ruby was interesting. That’s one more set of script commands I don’t have know.
Installing god.rb is as simple as:

sudo gem install god

The god.rb web site has some decent documentation to help you understand how to build a config file. I tend to build my config files for as much as I can using erb templates rather than manually updating the config files for each environment.  I build my apache config file this way as well.

1. Download my god config to /lib/templates/god.conf. This is an example of an actual god.conf file. You will want to update it for paths and to setup the mongrels and the listeners to not run as root — not a good idea to run those as root. My apache and mysql scripts already run as a different user.

2. Now, add the following to your config/deploy.rb file:

namespace :god do

  desc <<-DESC
  Generate the GOD configuration. We will create the appropriate GOD
  configuration stanza for the application and copy it to:

  /shared/path/god.conf

  Once it's there, somebody with the required permission to manage the
  GOD configuration should somehow incorporate that file and restart
  the GOD server. Each time the runtime configuration (ie number of
  mongrels running) changes, the configuration will have to be manually
  updated to match.
  DESC
  task :config, :roles => :app do
    rails_env = fetch(:rails_env, 'production')
    dispatcher_starting_port = fetch(:dispatcher_starting_port, 8000)
    dispatcher_instances = fetch(:dispatcher_instances, 3)
    dispatcher_ending_port = dispatcher_starting_port + dispatcher_instances - 1
    dispatcher_ports = dispatcher_starting_port..dispatcher_ending_port
    god_conf_path = "#{shared_path}/god.conf"
    god_conf_template = File.join(File.dirname(__FILE__), '..', 'lib', 'templates', 'god.conf.erb')
    begin
      god_conf = ERB.new(File.read(god_conf_template)).result(binding)
      put god_conf, god_conf_path, :mode => 0644
    rescue Exception => e
      abort "An error occurred in the GOD config generation: #{e}"
    end
  end
end

3. Execute cap god:config with the desired environment, and a new god.conf file will be copied to the shared directory on your server. I’m assuming you have capistrano 2.x here.

4. On your server, execute: sudo god -c /shared/path/god.conf

That’s it! Now, all your processes will be monitored by god.rb and restarted whenever there is a problem. Hold on though, we’re not completely done. You want god.rb to start up when your server boots, right? We also have to update our deploy process to use god.rb to start/stop our services now. Try to use you existing spin or spawn tasks, and god.rb will fight with you.

First, let’s get god.rb setup as a service. I found this script in the god.rb sources, and I tweaked it a bit for my system.

#!/bin/bash
# god       startup script for god (like monit, only Ruby)
# Author: Dave Dupre

# Comments to support chkconfig on CentOS
# chkconfig: - 85 15
# description: god - monitor all my processes

CONF_DIR=/shared/path
LOG_DIR=/var/log
PID_DIR=/var/run/god
BIN_DIR=/usr/local/bin

RETVAL=0

# Go no further if config directory is missing.
[ -d "$CONF_DIR" ] || exit 0

case "$1" in
  start)
    $BIN_DIR/god -P $PID_DIR/god.pid -l $LOG_DIR/god.log -c $CONF_DIR/god.conf
    RETVAL=$?
  ;;
  stop)
    $BIN_DIR/god terminate
    RETVAL=$?
  ;;
  restart)
    $BIN_DIR/god terminate
    $BIN_DIR/god -P $PID_DIR/god.pid -l $LOG_DIR/god.log -c $CONF_DIR/god.conf
    RETVAL=$?
  ;;
  status)
    $BIN_DIR/god status
    RETVAL=$?
  ;;
  *)
  echo "Usage: god {start|stop|restartstatus}"
  exit 1
  ;;
esac

exit $RETVAL

Put the above contents into /etc/init.d/god and make it executable (sudo chmod +x /etc/init.d/god). Lastly, tell your system there is a new service in town. I use CentOS 5, so I run:

sudo chkconfig /etc/init.d/god on

Now, god.rb will start up whenever your server boots, and you can start/stop/restart it using standard service calls.

The last thing we need to do is update our deployment process to use god.rb to stop/start our processes. Add this to your deploy.rb file:

namespace :deploy do
  [ :stop, :start, :restart ].each do |t|
    desc "#{t.to_s.capitalize} mongrels using god"
    task t, :roles => :app do
      sudo "god #{t.to_s} listeners"
      sudo "god #{t.to_s} mongrels"
    end
  end
end

namespace :starling do
  [ :stop, :start, :restart ].each do |t|
    desc "#{t.to_s.capitalize} starling using god"
    task t, :roles => :app do
      sudo "god #{t.to_s} starlings"
    end
  end
end

namespace :workling do
  [ :stop, :start, :restart ].each do |t|
    desc "#{t.to_s.capitalize} workling using god"
    task t, :roles => :app do
      sudo "god #{t.to_s} listeners"
    end
  end
end

OK! Now, we’re done. You deploy as you normally would, and I have full control over Starling and the Workling listener. Notice that there is no mongrel cluster either. God.rb started up all the instances of mongrel I needed, and it will monitor everything so there is no need for mongrel cluster anymore.

That’s it for now. You have a system that will process all your background tasks and stay running. The only thing I didn’t setup here is notifications from god.rb when there is a problem. The god.rb config settings have lots of schemes for email notifications. Take a peak at the docs and make sure god can talk to you.


Posted

in

by

Tags:

Comments

32 responses to “Ruby Background Tasks with Starling – Part 3”

  1. sebastian Avatar

    Hi Dave,
    great series! I am searching for a solution for my background-processes at the moment and also tried backgroundrb. But your lightweight solution seems to be a big win regarding memory. The only thing that’s missing is the possibility to get the status of a background job back to the user.

    Do you have any idear how to accomplish this?

  2. Dave Avatar

    I haven’t had a need to do it yet, so I can’t comment other than to look at the code. Check out StarlingReturnStore in the Workling code. It looks like a worker sets a value into a starling queue, and the app does a get on the queue to retrieve it. It sure seems like a simple enough system. Just make sure you empty the queue from your app so you get the latest progress.
    If you have memcached already integrated into your app, then that is another way to share data between a worker and your application. Simple generate a key that both the worker and the app know, and then have the worker set values that the app can retrieve. This is very similar to the StarlingReturnStore except the value is stored in memory instead of a queue, and it will always be the latest.

  3. sebastian Avatar

    Okay, I’ll give it a try. Thanks!

  4. Harm Avatar

    Just wanted to say what an excellent series this was! I’m glad you took us through the entire process including God.

  5. Dave Avatar

    My pleasure. I hope to keep more like it coming.

  6. Julien Avatar

    Hi Dave,
    You mention that we should change the user… instead of root. Can you detail this?

    Thanks a lot for your great help!

  7. Dave Avatar

    If you run your application as root, then it has full access to your server. Generally, you should have any process available to outside users running as root. It’s also good protection for you as well. For instance, I usually setup grants for mysql such that my application can do drop or add tables, truncate tables, and sometimes even do a delete. This way if I happen to miss a escape someplace, a bad user can’t trash my database. With the process not running as root, you ensure that the user can’t get to areas on your server they don’t belong.

  8. Julien Avatar

    Thanks Dave for this, but my question was more: “how do you have god launching mongrel and the listener with a different user?” sudo -u app … ?

  9. Dave Avatar

    Sorry, I misunderstood. God.rb allows you to specify the user and group to execute a watch with. For example, you could add the following to your watch:
    w.uid = ‘tom’
    w.gid = ‘devs’

    Unfortunately, I had some issues with getting this to work in all cases with mongrel and workling. I really need to figure out why, but for now I use:

    w.start = “su – tom -c \”[insert your start command here\””

    Please note that I’m starting to wonder about god.rb. It’s leaking memory (10-20Mb per week), so I end up restaring it periodically. I’m tempted to switch to monit because I know that guy will run forever. Apparently, there is a bug in ruby 1.8 that is causing the leak. See http://groups.google.com/group/god-rb/browse_thread/thread/1cca2b7c4a581c2 for more info.

  10. Julien Avatar

    I had the same memory problem with god… and I plan to stop using it to switch to monit…Have you switched yet? If so, could you please help us with a few hints on the config files to monitor both workling and starling with Monit?

  11. Dave Avatar

    I was going to switch because god was leaking memory, but when I upgraded god to 0.7.6, my memory leak went away. As a result, I stopped investigating making a switch to monit.

  12. Dave Avatar

    If you want true fault tolerance, you could still use workling, but you might want to swap out starling for JMS (ActiveMQ with ActiveMessaging works very well). You would have to write some code in workling to handle the queue. In a year of hard running, I have never once had starling go down. The queue is persisted, so even if it did, you would not loose anything in the queue. However, since there are no transactions, you will loose anything that was in process of being sent or retrieved.
    For the Inquisix jobs that have to run, I use a backing database table to keep state of each job. With that I can add a create method on my worker to restart anything that didn’t complete.

    For example:

    # Initialize by restarting any jobs that didn’t complete
    def create
    jobs = ImportJob.find_in_state(:all, :processing) # acts_as_state_machine
    jobs.each { |job| ContactWorker.asynch_import_contacts(:job_id => job.id) }
    end

    It’s not as safe as JMS, but between the persistence of starling and calling create when started up again, it’s pretty close.

    Does this help?

  13. Neil Avatar

    Dave, thanks for these posts, they’re great – in fact, they’re so good, I’ve subscribed to your blog.
    I’ve been in the middle of an Engine Yard deployment for a few weeks now – most of the delays have been due to factoring in BackgroundRb (EY use Monit, and they do a whole load of ground work for you regarding deployment recipes etc, BgRb has been causing several issues). So, BackgroundRb is probably just about to be dropped for the two background processes we run; ‘feed item’ distribution (ala Twitter and Facebook home pages) and RSS parsing, which itself creates one type of ‘feed item’ for the distribution worker. The feed item distribution is the only candidate for Starling, the RSS worker is just going to drop back to crontab hitting a module.

    Just yesterday I swapped over BackgroundRb for Starling/Workling (thanks Railscasts) for the feed item distribution and all seems to be working well in development. I also did tons of reading round on Starling. However, the app can’t really afford to lose anything from the queue if something was to go wrong, and this is something I didn’t really understand even after lots of reading round. How can Starling be handled with regards to downtime or when the worker code is changed and the process needs restarting? Does the ‘feed item’ table need to know when feed items have been sent (was thinking along the lines of a ‘is_distributed’ boolean that would be updated true at the end of the worker transaction, and if things go wrong, I kick up a process to send anything that hasn’t been sent) or can we rely on the Starling log to cover this?

    How are you handling potential Starling downtime at Inquisix? Is it still doing the biz?

    p.s. I tried playing with BackgroundJob in place of Starling runner – oddly, it didn’t seem to work, although BJ has been working when I use it outside of Workling. Either way, I’m now a big fan of Workling/Starling.

  14. Neil Avatar

    Thanks, Dave, that helps a lot. The intro to ActiveMessaging looks very friendly indeed, I’ll book in time to trial it asap. Have you trialled any of them?
    Although, if you’re happy with Starling, I’m sure I will for the time being. It’s very snappy, at least in comparison to BgRb – and it’s up and running in production without any apparent overhead on the production slice (using New Relic Bronze).

    The RSS parsing worker is now on a daemon (see the Railscast on Custom Daemons) – it was going to be called via crontab (and before that it was on a backgroundrb schedule) but this would’ve loaded the Rails environment every time. Sleeping a daemon avoids that.

    Your :create method looks like the post-downtime tidy-up method I was considering, with which any loose items would be picked up and distributed. Feed_items (polymorphic references) have ‘feeds’ as the has_many_through model (i.e. the reference required for notifying user_id of feed_item_id) and their is validates_uniqueness_of with scope through user_id and feed_item_id – this means we can try distributing the same feed_item twice and only the ‘feeds’ which weren’t sent will be distributed on a second attempt. But, wouldn’t it be possible to wrap a transaction block around the worker? Quick research suggests transaction blocks were ripped out of AR and moved to a plugin – still, is that not something we could use with Starling?

    Thanks for the insights

  15. Dave Avatar

    It’s unclear to me whether starling would handle transactions properly. At its core, starling is nothing more than a memcached front end to Ruby’s Queue class. Starling added persistence using a log file, but I’d have to experiment some to see if it properly rolled things back. Once concern would be rolling the transaction back might affect the log file but not the Queue data. Also, this would be a distributed transaction across 2-3 processes. Normally, RoR is not happy about that.
    I’m happy with starling for my purposes. Between its stability, persistence, and the create method (as well as some protective code), I have a system that has a very good chance of not loosing anything.

    For other systems I built, the combination of ActiveMQ and ActiveMessaging are a rock. It is not uncommon for those to run for years without any problems. ActiveMQ is a real queue that supports all forms of transactional integrity. Using it will require adding new a new client, runner, and poller to workling. It would be an interesting addition to workling.

  16. Neil Avatar

    Great, maybe it’s worth suggesting ActiveMQ & ActiveMessaging support to the play/type guys.
    Based on your experiences with the two (albeit, presumably on different apps without a fair, scientific, architectural comparison), have you noticed any performance difference between Starling and ActiveMQ/ActiveMessaging?

  17. Dave Avatar

    This really hard to say given that the two apps in question were so different. I would say that ActiveMQ/ActiveMessaging (JMS) will most like perform better given all the code in Java vs. Ruby (perform being separate from scale). The JMS solution, however, is far more complex and suited for the Enterprise. Note, this is not to say that Starling can’t scale. It will just scale differently and with different issues to content with. If anything, since Starling instances are completely independent, you can scale horizontally almost at will. That’s what Twitter does with it, and regardless of what you may think about Twitter, they do scale pretty well.
    My experience has been that message flow has never been the issue. It’s always message processing that is the problem. My bias will always be towards a solution like Starling because it’s simpler. Unless I have a need for absolutely guaranteed message delivery or have huge message backups, I don’t see a reason to introduce Java, ActiveMQ, and ActiveMessaging to the mix. I reference huge message backups because Starling is basically a persisted memory queue. If your queue gets too backed up, you will run out of memory. JMS has many modes that allow for queue to grow far larger than available memory if necessary.

    One more thing, Rails is still not able to handle distributed transactions easily, so the absolute guarantee with ActiveMQ is not happening anyway. You will need to write additional code to handle failures rather than relying on transactional rollback like you can in Java and other platforms.

  18. Neil Avatar

    Yes, if Starling is working for you, and it’s already proven to be very simple and reliable what my own experiences, it’ll do just fine, for now. The guys at Engine Yard seem to be happy with it (after getting over some niggling config with Monit).
    I think the team at Twitter have done a fantastic job of late. The app I’m working on has some resemblances to Twitter & it’s status updates, so that’s another reason to use Starling.

    I’m looking forward to any more Rails & background process related posts.

  19. Dave Avatar

    I have a couple more I can add. I’ve been pretty lax of late, but comments tend to get me back going again.

  20. Nanda Avatar
    Nanda

    Dave,Here’s a scenario I have run into. I have several different profiles(same base code but different DBs), and they all have RAILS_ENV=production. So when I fire off starling in say 15151 and start workling clients from each of the profiles, invoking a task in oneprofile gets picked up by starling but for another profile. Obviously I am using the same starling for all so the workling.yml looks exactly the same for all the profiles, any thoughts on how ot make this work.

  21. Dave Avatar

    Your problem is that the queue names generated by workling are a combination of class name and method name. The profile is not included. Therefore, if you have the same workling base code running in multiple environments/profiles, they all try to use the same queue names. See class_and_method_routing.rb.
    I suggest you create a new routing class based on ClassAndMethodRouting that includes the profile. For example:

    # returns the routing string, given a class and method.
    def self.queue_for(clazz, method)
    “#{ENV[‘PROFILE’]/#{ clazz.to_s.tableize }/#{ method }”.split(“/”).join(“__”)
    end

    Now, your queue names will include the environment, and your queues will not overlap. See listen.rb for how to replace the routing class.

  22. Nanda Avatar
    Nanda

    Thanks for the suggestion. Incidentally, we tried adding namespace: profilename under memchace_options in workling.yml and so far its been working. We tested with running near simultaneous task in 4/5 profiles at once. Anyway, I will look into your suggestion as well.

  23. Dave Avatar

    Very interesting. I have not used the namespace option with memcache, but it sounds like it would do the trick as well.

  24. Nanda Avatar
    Nanda

    Dave,One more question, so you had added threaded functionality to the workling base code right? Is there anything particular we need to specify in workling.yml or sth to ensure its threaded. Basically, even for the same profile(app), say i have campaign_worker.rb, and the task there takes say 5/10 mins to run. What’s happening now is when i invoke a campaign task thru app, it seems to wait if there is another campaign task already running in the workling. Ideally, we would want to fork another thread forr each of the task, any thoughts on what we might be missing here. Thanks!

  25. Dave Avatar

    The threading model I implemented created a thread per worker class, not per task. I wanted to create a balance between workers not stepping on each other and creating too many threads. In most cases, when you have a job that takes 5-10 minutes to run, you don’t want to have more than one executing at a time anyway. It is often more efficient to run things serially in that case. What I wanted to avoid, however, is the single 10 minute task preventing the hundreds of 100 millisecond tasks from running.

  26. Nanda Avatar
    Nanda

    Ok thanks, got the info I needed. In our app, there are couple tasks that take 5-10 mins and there could be situation where users invoke the task(from different pages) in short span of time. Anyway, we might just have to educate users that wait time may vary. 🙂

  27. Dave Avatar

    I have the same thing in my Inquisix app. User can upload 1000’s of contacts, and that process can take a while. Once the file is received, the user is informed that they will get a message/email sent to them when the process is complete.

  28. Jason Avatar

    Great post,One question – I don’t really like the idea of giving sudo access to my application account in production ie:
    * it would be nice if your deploy script didn’t use sudo for god commands.

    I wonder if theres a way to configure god.config to monitor the applications ‘current’ (deployed) directory in order to trigger it’s workling restart? Monitoring ports might work for mongrel but not phusion passenger. Any ideas on how to do that?

  29. Dave Avatar

    Not sure how to do that with god.rb. What I normally do is give sudo access to the production account but ONLY for the god application. Combine that with no ssh passwords (public/private key auth), and I feel pretty good about things.
    There are other monitoring applications that may do what you’re thinking, however, but I haven’t spent too much time looking.

  30. Jason Avatar

    wow, my unix sucks. i didn’t realise you could assign sudo privileges for a specific command only – awesome!
    I added:
    myappuser ALL= NOPASSWD: /usr/local/bin/god

    into /etc/sudoers and bingo – god in a capfile – thanks dave!

  31. Dave Avatar

    No problem. Glad I could help.

  32. Replacing Mongrel with Passenger (mod_rails) » Big Dave’s Blog Avatar

    […] bother with it since I have a bunch of recipes for Apache and Mongrel (see a previous posts about background tasks with Workling). The other day, I figured I would give it a shot. Now, I think I found my new way to deploy Rails […]

Leave a Reply

Your email address will not be published. Required fields are marked *