Someone reported an interesting problem in Rake, and I thought you
might enjoy the problem and its resolution.
The Problem
Consider the following Rakefile (I’ve left out some of the uninteresting parts):
task :run
BUILD_DIR = 'build'
TARGET_DIR = 'build/copies'
FileList['src/*'].each do |src|
directory TARGET_DIR
target = File.join TARGET_DIR, File.basename(src)
file target => [src, TARGET_DIR] do
cp src, target
sleep 3
end
task :run => target
end
Assume the src directory has a lot of files, and that it takes a
while to copy them to the build directory (I artificially slowed down
the copy by including a sleep command). Also assume for this first
run, the build directory has not been created yet.
Now run “rake run” again. You shouldn’t see any copies
because the source files have all been copied at this point and there
is no more work to do … but instead you will see (if you are running
rake 0.5.3 or earlier) a number of duplicate copies being performed.
Analysis
Why are those extra copies performed? Each target file
build/copies/foon^ is dependent upon the source file src/foon^
(so it gets updated when the source changes) and the target directory
(so the directory is created by the time the file is copied).
The first time task run is invoked, it populates the target
directory with each copy, and in doing so updates the time stamp of
the target directory. The next time run is invoked, the earliest
target files are out of date with respect to the timestamp on the
directory. Rake thinks it needs to update the targets, hence the
extra copies.
A Solution
Rake already supports two kinds of tasks. Task objects always run
when invoked and are useful for defining simple jobs that need to be
performed whenever invoked. FileTasks are different in the they are
only invoked if (1) the file they are associated with is does not
exist, or (2) the time stamp of any prerequisites are newer than the
target file.
What we need for directories is a task that runs when a file needs
created but (1) doesn’t trigger on timestamps and (2) returns a
timestamp that is earlier than any time stamp of files that depend
upon it.
It turns out this is fairly easy in Rake. Tasks define two methods,
needed? and timestamp. The first is easy … only return
true if the file doesn’t exist.
def needed?
! File.exist?(name)
end
Handling timestamp was a bit more interesting. What is the earliest
possible time stamp? I played around with Time.mktime to find the
earliest possible time stamp it could encode, but gave up after a bit.
Even if I found it, it would be an implementation dependent issue. I
wanted an object that would report it is less than any timestamp.
Class EarlyTime
The first pass at an EarlyTime object was simple:
class EarlyTime
include Comparable
include Singleton
def <=>(other)
-1
end
end
We made the comparison operator (<=>) always return -1. This
means that an object of EarlyTime will claim to be smaller than any
other object. The Comparable inclusion makes sure all the
comparison operators are properly defined (based on <=>).
The Singleton inclusion make sure the is only one copy of the early
time (we only need one … really, one of few times I’ve used
Singleton).
This works for early_time < time, but how do we handle
time < early_time? The Time class doesn’t know about
EarlyTime, so it won’t return the right result (in fact it will
choke on the value).
We just need to teach Time about the new class:
class Time
alias pre_early_time_compare :<=>
def <=>(other)
if Rake::EarlyTime === other
- other.<=>(self)
else
pre_early_time_compare(other)
end
end
end
We create an alias the existing comparison operator <=>.
Then we redefine <=> to check for an EarlyTime value. If
other is an early time, we redispatch the comparison to the early
time value and reverse the sign of the result. If other is not an
EarlyTime, then we invoke the old behavior through the alias we
created.
Beta Rake
If you want to try the new version of Rake, I’ve uploaded a beta
version to my betagems site. You can get it via: