{ |one, step, back| } 1 of 1 article Syndicate: full/short

Problems with Directories   02 Jul 05
[ print link all ]

Someone reported an interesting problem in Rake, and I thought you might enjoy the problem and its resolution.

The Problem

Consider the following Rakefile (I’ve left out some of the uninteresting parts):

task :run

BUILD_DIR = 'build'
TARGET_DIR = 'build/copies'

FileList['src/*'].each do |src|
  directory TARGET_DIR
  target = File.join TARGET_DIR, File.basename(src)
  file target => [src, TARGET_DIR] do
    cp src, target
    sleep 3
  end
  task :run => target
end

Assume the src directory has a lot of files, and that it takes a while to copy them to the build directory (I artificially slowed down the copy by including a sleep command). Also assume for this first run, the build directory has not been created yet.

The first time you run “rake run”, you will see …

  cp src/foo1 build/copies/foo1
  cp src/foo2 build/copies/foo2
  cp src/foo3 build/copies/foo3
  ...

and so one for each of the copies.

Now run “rake run” again. You shouldn’t see any copies because the source files have all been copied at this point and there is no more work to do … but instead you will see (if you are running rake 0.5.3 or earlier) a number of duplicate copies being performed.

Analysis

Why are those extra copies performed? Each target file build/copies/foon^ is dependent upon the source file src/foon^ (so it gets updated when the source changes) and the target directory (so the directory is created by the time the file is copied).

The first time task run is invoked, it populates the target directory with each copy, and in doing so updates the time stamp of the target directory. The next time run is invoked, the earliest target files are out of date with respect to the timestamp on the directory. Rake thinks it needs to update the targets, hence the extra copies.

A Solution

Rake already supports two kinds of tasks. Task objects always run when invoked and are useful for defining simple jobs that need to be performed whenever invoked. FileTasks are different in the they are only invoked if (1) the file they are associated with is does not exist, or (2) the time stamp of any prerequisites are newer than the target file.

What we need for directories is a task that runs when a file needs created but (1) doesn’t trigger on timestamps and (2) returns a timestamp that is earlier than any time stamp of files that depend upon it.

It turns out this is fairly easy in Rake. Tasks define two methods, needed? and timestamp. The first is easy … only return true if the file doesn’t exist.

  def needed?
    ! File.exist?(name)
  end

Handling timestamp was a bit more interesting. What is the earliest possible time stamp? I played around with Time.mktime to find the earliest possible time stamp it could encode, but gave up after a bit. Even if I found it, it would be an implementation dependent issue. I wanted an object that would report it is less than any timestamp.

Class EarlyTime

The first pass at an EarlyTime object was simple:

  class EarlyTime
    include Comparable
    include Singleton

    def <=>(other)
      -1
    end
  end

We made the comparison operator (<=>) always return -1. This means that an object of EarlyTime will claim to be smaller than any other object. The Comparable inclusion makes sure all the comparison operators are properly defined (based on <=>). The Singleton inclusion make sure the is only one copy of the early time (we only need one … really, one of few times I’ve used Singleton).

This works for early_time < time, but how do we handle time < early_time? The Time class doesn’t know about EarlyTime, so it won’t return the right result (in fact it will choke on the value).

We just need to teach Time about the new class:

  class Time
    alias pre_early_time_compare :<=>
    def <=>(other)
      if Rake::EarlyTime === other
        - other.<=>(self)
      else
        pre_early_time_compare(other)
      end
    end     
  end

We create an alias the existing comparison operator <=>. Then we redefine <=> to check for an EarlyTime value. If other is an early time, we redispatch the comparison to the early time value and reverse the sign of the result. If other is not an EarlyTime, then we invoke the old behavior through the alias we created.

Beta Rake

If you want to try the new version of Rake, I’ve uploaded a beta version to my betagems site. You can get it via:

gem install rake --source http://onestepback.org/betagems

Version 0.5.4.3 is the latest beta version. Once I get a little time on it, I’ll make a 0.5.5 release.

Thanks

Thanks to Martin Fowler for pointing out this problem and correctly deducing the reason behind the problem.


blog comments powered by Disqus

 

Formatted: 06-Jan-09 22:49
Feedback: jim@weirichhouse.org