Introducing Spectie, a behavior-driven-development library for RSpec 12

Posted by ryan Mon, 02 Nov 2009 03:34:00 GMT

I'm a firm believer in the importance of top-down and behavior-driven development. I often start writing an integration test as the first step to implementing a story. When I started doing Rails development, the expressiveness of Ruby encouraged me to start building a DSL to easily express the way I most-often wrote integration tests. In the pre-RSpec days, this was just a subclass of ActionController::IntegrationTest that encapsulated the session management code to simplify authoring tests from the perspective of a single user. As the behavior-driven development idea started taking hold, I adapted the DSL to more-closely match those concepts, and finally integrated it with RSpec. The result of this effort was Spectie (rhymes with necktie).

The primary goal of Spectie is to provide a simple, straight-forward way for developers to write BDD-style integration tests for their projects in a way that is most natural to them, using existing practices and idioms of the Ruby language.

Here is a simple example of the Spectie syntax in a Rails integration test:

Feature "Compelling Feature" do
  Scenario "As a user, I would like to use a compelling feature" do
    Given :i_have_an_account, :email => ""
    And   :i_have_logged_in

    When  :i_access_a_compelling_feature

    Then  :i_am_presented_with_stunning_results

  def i_have_an_account(options)
    @user = create_user(options[:email])

  def i_have_logged_in
    log_in_as @user

  def i_access_a_compelling_feature
    get compelling_feature_path
    response.should be_success

  def i_am_presented_with_stunning_results
    response.should have_text("Simply stunning!")


Spectie is available on GitHub, Gemcutter, and RubyForge. The following should get it installed quickly for most people:

% sudo gem install spectie

For more information on using Spectie, visit

Why not Cucumber or Coulda?

At the time that this is being written, Cucumber is the new hotness in BDD integration testing. My reasons for sticking with Spectie instead of switching to Cucumber like the rest of the world are as follows:

  • Using regular expressions in place of normal Ruby method names seems like a potential maintenance nightmare, above and beyond the usual potential.
  • The layer of indirection that is created in order to write tests in plain text doesn't seem worth the cost of maintenance in most cases.
  • Separating a feature from its "step definitions" seems mostly unnecessary. I like keeping my scenarios and steps in one file until the feature becomes sufficiently big that it warrants extra organizational consideration.

These reasons are more-or-less the same as those given by Evan Light, who recently published Coulda, which is his solution for avoiding the cuke. What sets Spectie apart from Coulda is its reliance on and integration with RSpec. The Spectie 'Feature' statement has the same behavior as an RSpec 'describe' statement, and the 'Scenario' statement is the same as the RSpec 'example' and 'it' statements. By building on RSpec, Spectie can take advantage of the contextual nesting provided by RSpec, and rely on RSpec to provide the BDD-style syntax within what I've been calling a scenario statement (the words after the Given/When/Thens). Coulda is built directly on Test::Unit. I'm a firm believer in code reuse, and RSpec is the de facto standard for writing BDD-style tests. Spectie, then, is a feature-driven skin on top of RSpec for writing BDD-style integration tests. To me, it only makes sense to do things that way; as RSpec evolves, so will Spectie.

Rails Plugin for Mimicking SSL requests and responses 1

Posted by ryan Fri, 14 Nov 2008 23:33:42 GMT

The Short

I've written a plugin for Ruby on Rails that allows you to test SSL-dependent application behavior that is driven by the ssl_requirement plugin without the need to install and configure a web server with SSL.

Learn more

The Long

A while back, I wanted the Selenium tests for a Ruby on Rails app I was working on to cover the SSL requirements and allowances of certain controller actions in the system, as defined using functionality provided by the ssl_requirement plugin. I also wanted this SSL-dependent behavior to occur when I was running the application on my local development machines. I had two options:

  1. Get a web server configured with SSL running on my development machines, as well as on the build server.

  2. Patch the logic used by the system to determine if a request is under SSL or not, as well as the logic for constructing a URL under SSL, so that the system can essentially mimic an SSL request without a server configured for SSL.

Since I had multiple Selenium builds on the build server, setting up an SSL server involved adding a host name to the loopback for each build, so that Apache could switch between virtual hosts for the different server ports. I also occasionally ran web servers on my development machines on ports other than the default 3000, as did everyone else on the team, so that we'd all have to go through the setup process for multiple servers on those machines as well. We would need to do all of this work in order to test application logic that, strictly speaking, didn't even require the use of an actual SSL server. Given that the only thing that I was interested in testing was that the requests to certain actions either redirected or didn't, depending on their SSL requirements, all I really needed was to make the application mimic an SSL request.

To mimic an SSL request in conjunction with using the ssl_requirement plugin without an SSL server consisted of patching four things:

  1. ActionController::UrlRewriter#rewrite_url - Provides logic for constructing a URL from options and route parameters

    If provided, the :protocol option normally serves as the part before the :// in the constructed URL.

    The method was patched so that the constructed URL always starts with "http://". If :protocol is equal to "https", this causes an "ssl" key to be added to the query string of the constructed URL, with a value of "1".

  2. ActionController::AbstractRequest#protocol - Provides the protocol used for the request.

    The normal value is one of "http" or "https", depending on whether the request was made under SSL or not.

    The method was patched so that it always returns "http".

  3. ActionController::AbstractRequest#ssl? - Indicates whether or not the request was made under SSL.

    The normal value is determined by checking if request header HTTPS is equal to "on" or HTTP\_X\_FORWARDED_PROTO is equal to "https".

    The method was patched so that it checks for a query parameter of "ssl" equal to "1".

  4. SslRequirement#ensure\_proper\_protocol - Used as the before\_filter on a controller that includes the ssl_requirement plugin module, which causes the redirection to an SSL or non-SSL URL to occur, depending on the requirements defined by the controller.

    This method was patched so that, instead of replacing the protocol used on the URL with "http" or "https", it either adds or removes the "ssl" query parameter.

For more information, installation instructions, and so on, please refer to the plugin directly at:

Enabling/disabling observers for testing 6

Posted by ryan Thu, 10 Apr 2008 02:53:50 GMT

If you use ActiveRecord observers in your application and are concerned about the isolation of your model unit tests, you probably want some way to disable/enable observers. Unfortunately, Rails doesn't provide an easy way to do this. So, here's some code I threw together a while ago to do just that.

module ObserverTestHelperMethods
  def observer_instances
    ActiveRecord::Base.observers.collect do |observer|
      observer_klass = \
        if observer.respond_to?(:to_sym)
        elsif observer.respond_to?(:instance)

  def observed_classes(observer=nil)
    observed =
    (observer.nil? ? observer_instances : [observer]).each do |observer|
      observed += (observer.send(:observed_classes) + observer.send(:observed_subclasses))

  def observed_classes_and_their_observers
    observers_by_observed_class = {}
    observer_instances.each do |observer|
      observed_classes(observer).each do |observed_class|
        observers_by_observed_class[observed_class] ||=
        observers_by_observed_class[observed_class] << observer

  def disable_observers(options={})
    except = options[:except]
    observed_classes_and_their_observers.each do |observed_class, observers|
      observers.each do |observer|
        unless observer.class == except

  def enable_observers(options={})
    except = options[:except]
    observer_instances.each do |observer|
      unless observer.class == except
        observed_classes(observer).each do |observed_class|
          observer.send :add_observer!, observed_class

Include this in a Test::Unit::TestCase or 'include' in your RSpec configuration, whatever rocks your boat. Here's a stupid example:

class SomethingCoolTest < Test::Unit::TestCase
  include ObserverTestHelperMethods

  def setup

  def teardown

  def test_without_observers
    # ...


When you go to test the behavior of the observer itself, simply disable/enable like the following to disable/enable all observers except the one you're testing:

class DispassionateObserverTest < Test::Unit::TestCase
  include ObserverTestHelperMethods

  def setup
    disable_observers :except => DispassionateObserver

  def teardown
    enable_observers :except => DispassionateObserver

  def test_without_observers_except_dispassionate_observer
    # ...


Testing on High: Bottom-up versus Top-down Test-driven Development 40

Posted by ryan Mon, 19 Nov 2007 02:13:21 GMT

I recently talked to a number of Rails developers about their general approach to testing some new functionality they're about to code. I asked these developers if they found it to be more useful to start testing from the bottom-up or top-down. I suggested to them that, since Rails uses the MVC pattern, it's easy to think of the view, or user interface, as the "top", and the model as the "bottom". Surprisingly, nearly every developer that I asked this question of answered that they prefer to start from the bottom, or model, and test upwards. Nearly every one! I expected that I'd get a much more mixed response than I have. In fact, I think that the correct place to start testing is precisely at the highest level possible, to reduce the risk of building software based on incorrect assumptions of how best to solve a user requirement.

Bottom-up Testing

Bottom-up testing implies bottom-up design in TDD. In bottom-up design, a developer would probably consider the high-level objectives and break them up into manageable components that interact with each other to provide the desired functionality. The developer thinks about how each component will be used by its client components, and tests accordingly.

The problem with the bottom-up approach is that it's difficult to really know how a component needs to be used by its clients until the clients are implemented. To consider how the clients will be implemented, the developer must also think about how those clients will be used by their clients. This thought process continues until we reach the summit of our mighty design! Hopefully, when the developer is done pondering, they can write a suite of tests for a component which directly solves the needs of its client components. In my experience, however, this is rarely the case. What really happens is that the lower-level components tend either to do too much, too little, or the right amount in a way that is awkward or complicated to make use of.

The advantage of bottom-up testing is that, since we're starting with the most basic, fundamental components, we guarantee that we'll have some working software fairly quickly. However, since the software being written may not be closely associated with the high-level user requirements, it may not produce results that are necessarily valuable to the user. A simple client could quickly be written which demonstrates how the components work to the user, but that's besides the point unless the application being developed is a simple application. In such a case, the bottom-level of components are probably close enough to the top-level ones that there is little risk involved in choosing either the bottom-up or top-down approach.

Unless you're writing a small application, the code is probably going to have to support unforeseen use cases. When this comes as a result of ungrounded assumptions about the software that's already been written, this can mean a lot of rework. I can tell you from experience, once you realize that your lower-level components don't fit the bill for the higher levels in the system, it can be quite a chore to go back and fix, remove, or replace all of that unnecessary or incorrect code.

Top-down Testing

Top-down testing implies top-down design in TDD. Following the top-down approach, the developer will pick the highest level of the system to be tested; that is to say, the part of the system that has the closest correlation to the user requirements. This approach is sometimes referred to as Behavior Driven Development. Whatever it's called, the point is that you test the most critical parts of the application first.

Since software is often written for human users, the most critical parts usually involve the front-end as it relates to the value being provided by the system being developed. When testing from the top-down, the effort is the inverse of bottom-up testing: Instead of spending a lot of time thinking about how the components to be developed will be used by other components to be developed, the focus is on how the user needs to interact with the system. Testing involves proving that the system supports the required usability. For an application with a graphical front-end, this might involve testing for a minimal version of that front-end.

The disadvantage of top-down testing is that you can end up with a lot of stubbed or mocked code that you then have to go back and implement. This means it might take longer before you have software that actually does something besides pass tests. However, there are ways that you can minimize this sort of recursive development problem.

One way to minimize the time between starting development of a feature and demonstrating functionality that is valuable to the user is to focus on a thin slice of the overall architectural pie of the application. For example, there may be a number of views that need to be implemented before the system provides some major piece of functionality. However, the developer can focus on one view at a time, or one part of the view. That way, the number of components that need to be implemented before the system does something useful is small; ideally, one component in each architectural layer that I need build out, and often times only a part of the overall functionality of each component.

Another way to minimize the amount of time before the system does something useful is to code a small bit of functionality without worrying about breaking the problem up into classes until you have some tested, working code to analyze. You can then use established methods for refactoring to bring the code to an acceptable level of quality.

The advantage of top-down testing is that you write functionality that solves the most critical functionality first. This generally means starting development at a high level. When the system eventually does something besides pass tests, what it does will provide value to its users. Additionally, because development starts at a high level, the code that is written is based on the current understanding of the problem, and not on assumptions. This guarantees that the tests and code that are written are not superfluous.


The challenge with top-down testing is that you must be highly disciplined to ensure that the code you write is being refactored and is properly evolving into a cohesive domain model for the application. This is compared with bottom-up testing, where you start with the domain model and build your system around it. Either way, you're going to be refactoring code. The difference is in where the time in refactoring is spent. In my experience, when doing bottom-up testing, more time is spent correcting incorrect assumptions about how the domain model will be used than on actually improving code that already works to solve the user requirements. In order to avoid making assumptions about the code being written, it must be written at the level that is closest to providing actual value to the end-user. In so doing, the developer focuses on continuous refinement of code that already provides value, as opposed to speculative design and development.

Selenium Core Bug and TinyMCE Anchor Tags 1

Posted by ryan Fri, 12 Oct 2007 05:01:18 GMT

Today, I was trying to get Selenium to click an anchor tag that was created with the "link" plugin in a TinyMCE editor. I was able to verify that the link was present with something as simple as verifyElementPresent('link=Link Text'). However, when I tried calling clickAndWait('link=Link Text'), it gave a "Window does not exist" error. A quick Google search yielded the answer: a bug in Selenium Core.

When the TinyMCE "link" plugin creates a link that doesn't open in a new window, it sets the "target" attribute on the anchor tag to "_self". Selenium Core versions prior to 0.8.4 (which hasn't been released yet) don't respond to links with "target" set to "_self".

If you're doing Rails development and using the selenium_on_rails plugin, it uses an old version of Selenium Core (0.7.something) as of this posting. To fix the anchor tag problem, I replaced the contents of the selenium-core directory under vendor/plugins/selenium_on_rails with that of the core directory of the Selenium Core 0.8.3 release, then applied the patch described in the bug spec linked to above. This seems to have fixed the problem.

Hopefully this saves you all some time and muddling.

Failing Quickly When Testing For Performance 2

Posted by ryan Fri, 24 Nov 2006 05:46:00 GMT

I was working with an algorithm today that I discovered had a bug that caused it to run for an unacceptable amount of time, hogging a lot of system resources in the process. Whenever I find a bug in a piece of code I'm working on, I write a failing unit test for it that defines the correct behavior. For this algorithm, I needed to define what an "acceptable amount of time" was in the test, and then test for that level of performance so that the test results were consistent across multiple computers with possibly differing resource loads and load fluctuations. I also needed to ensure that the test would fail as quickly as possible in the event that the algorithm did not perform as desired.

The method containing the algorithm takes a string parameter such as "1-4, 23, 50-52", specified as user input and representing a range of numbers. It then generates an array of numbers; for the string previously mentioned, the array would contain the numbers 1, 2, 3, 4, 23, 50, 51, and 52. The method also takes an optional parameter for the maximum amount of numbers that would be acceptable for it to generate, since generating an array containing all numbers for a range string like "1-9999999999999" would send the generating system into epileptic fits, complete with bus lines frothing. As you may have guessed, this was where the problem was: The method in question generated all of the numbers in the specified range string, and then it checked to see if the amount of numbers generated exceeded the specified maximum.

I needed to define an acceptable response time for a given maximum size of the generated array of numbers for my test. It seems to me that it should take the same amount of time for the algorithm to complete with a range for 10 numbers with a maximum resulting array size of 5 as it does with a range for 10 million, billion, or squigillion numbers with the same result size. Basically, when the algorithm determines that the given range will exceed the maximum, it should end. The challenge here is that different computers will have different timings to reach the maximum, so a reasonably-accurate system-specific timing expectation needed to be calculated.

For this purpose, I wrote a method that determines the range of acceptable response times for the algorithm, given a desired number count, maximum result size, and the number of sample timings to make, since timings will differ slightly from one invocation to another.

def acceptable_timing(number_count, result_size_limit, sample_count=10)
  timings = []
  sample_count.times do 
    generator ="1-#{number_count}", result_size_limit)
    start_time =
    end_time =
    timings << end_time - start_time
  0.0..average(timings) + standard_deviation(timings)

The next challenge was testing the numbers method with a range string that represents a large set of numbers, but using the same result_size_limit that was used in the call to acceptable_timing. I decided that a range of 9999999 numbers was sufficiently large to determine that the timing was acceptable; after all, it should take the same amount of time with the same result size limit as if I were to use 100 numbers, right? However, the problem with using a set of 9999999 numbers is that, with the bug, the test will hang for an extremely long time and hog a lot of system resources. We want our tests to fail as fast as possible, and give a useful error message if and when that failure occurs.

To ensure that the test fails fast, I decided to launch a separate thread to call the method under test so that I can stop it as soon as it's determined that it's taking longer than the acceptable amount of time to return.

def completes_within?(threshold, &block)
  start_time =
  thread = &block
  while true
    if !threshold.include?( - start_time)
      return false
    return true if thread.stop?
And finally, the test:
def test_numbers_fails_fast_when_result_size_limit_exceeded
  range_size = 9999999
  result_size_limit = 5
  generator ="1-#{range_size}", result_size_limit)

  acceptable_amount_of_time = acceptable_timing(100, result_size_limit)

  assert_equal true, \
    completes_within?(acceptable_amount_of_time) { generator.numbers }, \
    "Exceeded acceptable time to determine that range of #{range_size} " + \
    "numbers exceeds limit of #{result_size_limit}"

I considered using a range size smaller than 9999999 to avoid the threading and make the solution simpler. My reasoning for not doing that is, if I were to pick a smaller number, it would still have to be sufficiently larger than the range size I used to determine the acceptable amount of time for the method under test to return. The larger range size gives me confidence that a failed timing is not just because of a resource spike on the computer running the test, at least if the test is supposed to fail. If I have to pick a large number anyways, it's going to take the test longer to fail, thus violating the idea of fail-fast testing. Therefore, I might as well just abort the method as soon as I know it's going to take too long.

To further improve the reliability of this test, the completes_within? method could be called multiple times and, if a success is ever achieved, the test passes. However, this would make the test run longer, so the choice of whether to use it or not should depend on the variation in resource load that is expected amongst the computers that will be running the tests. If the tests are running on a dedicated machine, this technique probably wouldn't be needed.

In order to gain 100% confidence that there will be no false negatives in the test results, the structure of the code could be modified so that it can be determined whether the algorithm is considering the result limit while it generates the numbers, or afterwards, as in the case of the buggy version of the algorithm. The tradeoff here is that a certain amount of the algorithm logic must be externalized so that the necessary assertions can be set up in the test. This makes the algorithm itself less adaptable to change, as some changes could make the test fail inappropriately, since not only would the results be getting tested, but also the way in which the algorithm works.

Defined Classifications for "Mock Objects" 2

Posted by ryan Sun, 14 May 2006 00:46:00 GMT

Martin Fowler has a good article on his blog about the different kinds of "mock objects" used in unit testing. He uses Gerard Meszaros' word for this classification of object: Test Double. If you've ever been in a discussion about unit tests, you know how easily misunderstandings can result from throwing around terms like "mock", "dummy, or "stub". It's a good idea to have a consistent definition for these things.