General
Legal
Search

Dazzit! Corp. RSSDazzit! Corp. Twitter

Main | Buildbot: Command, Control & Communication (Part 1) »
Wednesday
Feb162011

Buildbot: Command, Control & Communication (Part 2)

Last time we talked about the fact that we use Buildbot to automate tasks. This time we’ll dive into our actual Buildbot config. We won’t cover every last bit of it; rather, we’ll cover enough to give you a good feel for how the pieces come together.

And though the pieces do indeed come together, it’s all a little rough. This config has evolved, um, organically. It’s been quite a while since we last re-factored.

Caveat emptor.

We run Buildbot 0.7.12 on our server. The Buildbot server’s config is defined in a file named master.cfg. The first steps in master.cfg initialize various top-level properties—the name of the project, for example, and how long the server should hang on to build status and logs.


c = BuildmasterConfig = {}

#
# PROJECT IDENTITY
#

c['projectName'] = "Dazzit!"
c['projectURL'] = "http://server.example.com/"

c['buildbotURL'] = "http://server.example.com:8010/"

#
# PROJECT PROPERTIES
#

c['buildHorizon'] = 100
c['eventHorizon'] = 50
c['logHorizon'] = 25

c['buildCacheSize'] = 15

We run our Buildbot web server on port 8010, and we allow “builds” (aka tasks) to be “forced;” that’s how we manually trigger tasks like downloading data. We also tell Buildbot to send us an email whenever a build fails—but only the first time it fails (mode="problem"), not every time.


#
# PROJECT STATUS
#

c['status'] = []

from buildbot.status import html

c['status'].append(
    html.WebStatus(http_port=8010, allowForce=True)
)

from buildbot.status import mail

c['status'].append(
    mail.MailNotifier(
        fromaddr="buildbot@example.com",
        extraRecipients=["developers@example.com"],
        sendToInterestedUsers=False,
        mode="problem"
    )
)

We use Perforce for version control. Buildbot has some built-in support for Perforce via its P4Source and P4 classes. Here we define a couple of globals—our p4 host name & port and our p4 user name—that we’ll be passing in to those classes later on.


#
# PROJECT PERFORCE
#

_p4port = "p4.example.com:1666"
_p4user = "buildbot"

Now we define the slaves. First we define the port (8015) on which the server should listen for slaves; then we define the slaves themselves. (We’ve only listed one slave here since all of the slave definitions look similar.)


#
# BUILD SLAVES
#

from buildbot.buildslave import BuildSlave

c['slavePortnum'] = 8015

c['slaves'] = [
    BuildSlave(
        "bot-ubuntu1004-main",
        "XXXXXXXXXX",
        properties={
            "osclass": "linux",
            "os": "ubuntu1004"
        }
    )
]

Our Buildbot server polls Perforce for changes every 600 seconds. It looks for any changelists that were submitted since the last time it checked. It parses the paths of files listed in the changelists to determine which branches they relate to. Our three branches—MAIN, STAGE and LIVE—are in the Perforce repository under //Dazzit/Web/MAIN, //Dazzit/Web/STAGE and //Dazzit/Web/LIVE respectively. As you can see the branch name is consistently defined after the base, //Dazzit/Web/. Buildbot comes with a routine, get_simple_split, which does just what we need in this kind of setup. (If you have a more complicated setup you should check out the source for get_simple_split; it deals with some corner cases not covered by the example in the docs.) Any changes outside of the base, //Dazzit/Web/, are ignored. In our case that’s exactly what we want. (By the way, if you’re using Buildbot 0.7.12 make sure the base path in P4Source and P4 both have trailing slashes; otherwise you’ll run into a rather subtle bug.)


#
# BUILD SOURCES
#

from buildbot.changes.p4poller import P4Source, get_simple_split
s = P4Source(
    p4bin='/usr/local/bin/p4',
    p4port=_p4port,
    p4user=_p4user,
    p4base='//Dazzit/Web/',
    split_file=get_simple_split,
    pollinterval=600
)
c['change_source'] = s

Now we can define some schedulers that will trigger builds.

The first kind of scheduler—which we’ve commented out below—will trigger builds after source has changed and if there have been no further changes for 300 seconds. We don’t use that kind of scheduler ourselves; thus the commenting. The problem is that we have a number of scheduled tasks—and those tasks use the software produced by the software builder. We don’t want the software builder to replace software that’s currently in use, so we don’t build software automatically. (Buildbot has mutual exclusion locks that builders can use to coordinate activities—but we’d just as soon avoid the complication.)

The second kind of scheduler is a “nightly” scheduler that fires at a particular time—and potentially on a particular day and/or month. In this case—the software build for MAIN—we run every night at 8:00 pm. Furthermore, we run only if Buildbot detected a related change in Perforce (onlyIfChanged=True). It works roughly like this: the poller defined above wakes up every now and then to see if there have been any changes in Perforce. If there have been the poller parses the paths in the changelist looking for a branch name. If it finds a branch name it tries to match it against any branch names mentioned in schedulers. (The scheduler below references the MAIN branch.) In this case the scheduler is on a timer—and has onlyIfChanged set. The poller and scheduler work together to record the fact that a change occurred. If a change did indeed occur by the time 8:00 pm rolls around a build will be triggered. The fact that we build only when changes are detected doesn’t matter much for MAIN—but in STAGE and LIVE it gives us relatively stable build timestamps. The build timestamps are embedded in the binaries we produce. They’re useful when we’re trying to track down problems.

We have quite a few nightly schedulers—for building software, downloading data, processing data, etc. (Only the software build schedulers use the onlyIfChanged flag.)


#
# BUILD SCHEDULERS
#

from buildbot.scheduler import Scheduler, Nightly, Triggerable

c['schedulers'] = []

#
# MAIN
#

# c['schedulers'].append(
#     Scheduler(
#         name="MAIN",
#         branch="MAIN",
#         builderNames=[
#             "MAIN-dazzit-meta"
#         ],
#         treeStableTimer=300
#     )
# )
c['schedulers'].append(
    Nightly(
        name="MAIN-dazzit-overnight",
        branch="MAIN",
        builderNames=[
            "MAIN-dazzit-meta"
        ],
        hour=20,
        minute=00,
        onlyIfChanged=True
    )
)

Now we’re getting close to the interesting part—the builder definitions. There’s still a bit of bookkeeping in the way, though. First, we need to import various methods and classes that the builders will need. Second, we need to define a few utility routines for the builders. (The utility routines are one of the things that would be restructured if the code was refactored.)


#
# BUILDERS
#

from buildbot.process import factory
from buildbot.process.properties import WithProperties

from buildbot.steps.source import P4
from buildbot.steps.shell import \
    ShellCommand, SetProperty, Compile, Test
from buildbot.steps.trigger import Trigger

def repo_dir(args):
    if args['os'].startswith('win'):
        return '\\\\nas.example.com\\dazzit\\%(branch)s' % args
    else:
        return '/nas/dazzit/%(branch)s' % args

def work_dir(args):
    if args['os'].startswith('win'):
        return '%CD%'
    else:
        return '`pwd`'

def root_dir(args):
    if args['os'].startswith('win'):
        return '%(repo_dir)s/dazzit-win-current' % args
    elif args['os'].startswith('macosx'):
        return '%(repo_dir)s/dazzit-mac-current' % args
    else:
        return '%(repo_dir)s/dazzit-linux-current' % args

Builders are defined in three parts. First, there are builder factories that define the actual work to be done. Second, there are builder builders (!) that (among other things) instantiate the factories. Third, there are parameterized calls to the builder builders. Those calls return the builder objects we actually use.

Whew.

The first builder we’ll look at is the one that promotes from one stream (e.g. MAIN) to the next (e.g. STAGE). While the essence of what the builder does is pretty straightforward some of the implementation details make this our most complicated builder definition. (Not very complicated, just a little.)

Let’s start with the essence, then. First we use Buildbot’s P4 class to bring our workspace up to date. Strictly speaking that isn’t necessary—but as a side-effect the P4 class will create a clientspec if it doesn’t already exist. (Clientspecs define Perforce workspaces.) That’s something of a convenience measure; by getting Buildbot to define our clientspecs on demand we have one less thing to worry about when defining new bots.

Once the workspace is up-to-date—and once the clientspec is guaranteed to exist—we copy up. In Perforce we do that by a) integrating changes from source to destination and then b) resolving conflicts by favoring the source over the destination. After that’s done we submit the changelist. Buildbot’s P4 class only handles workspace updates, so we have to perform all of these steps on our own by directly calling the p4 executable.

The complication in all of this arises from the fact that we don’t want to resolve or submit changes that don’t exist. In other words, the fact that someone forces a promotion doesn’t mean that there’s actually anything to promote. We want the promotion task to halt successfully when that happens rather than failing with an error.

To that end we do two things. First, we capture the output of the p4 opened command to a Buildbot property. Second, we test the value of that property in the following steps—and skip them if the value indicates that no files were actually opened after integration. (No doubt there’s a more elegant way to perform those tests.) The property comes into play because (like make) Buildbot doesn’t maintain state across Shell commands. (We use a Buildbot property rather than some kind of shell mechanism because all of this is able to run on any of the platforms we support—and different platforms run different shells.)

This whole description is full of Perforce-isms that will be of little interest to non-Perforce users—but the concept of making steps conditional on previous steps may well be.


#
# Promote Builder
#

def promote_factory(**args):
    # p4client: "buildbot_%(slave)s_%(builder)s"

    _p4client = "buildbot_%(bot)s_%(branch)s-promote" % args
    _p4branch = "%(branch_donor)s-%(branch)s" % args

    f = factory.BuildFactory()
    # run this for the side-effect of creating the client
    # (which we then immediately make use of).
    f.addStep(P4(
        "//Dazzit/Web/",
        p4port=_p4port,
        p4user=_p4user,
        defaultBranch="%(branch)s" % args,
        mode="update")
    )
    f.addStep(ShellCommand(
        description="integrating",
        descriptionDone="integrate",
        command=["p4", "-p", _p4port, "-u", _p4user, "-c", _p4client, "integ", "-b", _p4branch, "-d", "-i", "-t"],
        haltOnFailure=True)
    )
    f.addStep(SetProperty(
        command=["p4", "-p", _p4port, "-u", _p4user, "-c", _p4client, "opened", "-m", "100"], property="opened")
    )
    f.addStep(ShellCommand(
        description="resolving",
        descriptionDone="resolve",
        command=["p4", "-p", _p4port, "-u", _p4user, "-c", _p4client, "resolve", "-at"],
        doStepIf=lambda step: not step.build.getProperty("opened").startswith("File(s) not opened on this client."),
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="submitting",
        descriptionDone="submit",
        command=["p4", "-p", _p4port, "-u", _p4user, "-c", _p4client, "submit", "-d", "'Integrate %(branch_donor)s to %(branch)s.'" % args],
        doStepIf=lambda step: not step.build.getProperty("opened").startswith("File(s) not opened on this client."),
        haltOnFailure=True)
    )
    return f

def promote_builder(c,**args):
    b = {
        'category': args['branch'],
        'name': "%(branch)s-promote" % args,
        'factory': promote_factory(**args),
        'slavename': "%(bot)s" % args,
        'builddir': "%(branch)s-promote" % args
    }
    return b

This next builder is responsible for building the software. It’s pretty simple: we bring our Perforce workspace up-to-date and then call make with various targets (like make clean and make test).

The one thing to note is that the builder builder (what else to call it…?) creates a “triggerable” scheduler. That allows us to call this builder as a subroutine from other builders—as you’ll see later on.


#
# Dazzit Builder
#

def dazzit_factory(**args):
    f = factory.BuildFactory()
    f.addStep(
        P4(
            "//Dazzit/Web/",
            p4port=_p4port,
            p4user=_p4user,
            defaultBranch="%(branch)s" % args,
            mode="update"
        )
    )
    f.addStep(ShellCommand(
        description="cleaning",
        descriptionDone="clean",
        workdir="build/dazzit",
        command=["make","VERBOSE=1","clean"],
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="making",
        descriptionDone="make",
        workdir="build/dazzit",
        command=["make","VERBOSE=1"],
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="creating current",
        descriptionDone="create current",
        workdir="build/dazzit",
        command=["make","VERBOSE=1","create-current"],
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="testing",
        descriptionDone="test",
        workdir="build/dazzit",
        command=["make","VERBOSE=1","test"],
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="pushing current",
        descriptionDone="push current",
        workdir="build/dazzit",
        command=["make","VERBOSE=1","push-current"],
        haltOnFailure=True)
    )
    return f

def dazzit_builder(c,**args):
    b = {
        'category': args['branch'],
        'name': "%(branch)s-dazzit-%(os)s" % args,
        'factory': dazzit_factory(**args),
        'slavename': "%(bot)s" % args,
        'builddir': "%(branch)s-dazzit-%(os)s" % args
    }
    c['schedulers'].append(
        Triggerable(
            name=b["name"],
            builderNames=[b["name"]]
        )
    )
    return b

The next builder is the one that triggers data downloads. It uses our dflow executable to perform the work. (We’ll be describing dflow in future posts.) There are two steps: first, download the data; second, push the data to our repository. Notice that work is done locally; the data is only pushed to the repository if that work succeeds. If the work doesn’t succeed the build fails (haltOnFailure=True).


#
# DSD Builder
#

def dsd_factory(**args):
    args['repo_dir'] = repo_dir(args)
    args['root_dir'] = root_dir(args)
    args['work_dir'] = work_dir(args)

    f = factory.BuildFactory()
    f.addStep(ShellCommand(
        description="running",
        descriptionDone="run",
        command="%(root_dir)s/bin/dflow -l debug -w %(work_dir)s run dsd %(catalog)s using %(exe)s" % args,
        timeout=3600,
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="pushing",
        descriptionDone="push",
        command="%(root_dir)s/bin/dflow -l debug -w %(work_dir)s push dsd %(catalog)s from work to repo" % args,
        timeout=1800)
    )
    return f

def dsd_builder(c,**args):
    b = {
        'category': args['branch'],
        'name': "%(branch)s-dsd-%(catalog)s" % args,
        'factory': dsd_factory(**args),
        'slavename': "%(bot)s" % args,
        'builddir': "%(branch)s-dsd-%(catalog)s" % args
    }
    c['schedulers'].append(
        Triggerable(
            name=b["name"],
            builderNames=[b["name"]]
        )
    )
    return b

Each night after building software—and after downloading and processing data—we package the two together and start a web server with the new package. Here’s what that task looks like: it stops the old server; deletes the old package; copies the new package; and starts the new server. (These steps are clearly Linux-specific.) dsrv, by the way, is our libevent-based web server.


#
# WWW Builder
#

def www_factory(**args):
    args['repo_dir'] = repo_dir(args)
    args['root_dir'] = root_dir(args)
    args['work_dir'] = work_dir(args)

    f = factory.BuildFactory()
    f.addStep(ShellCommand(
        description="stopping",
        descriptionDone="stopped",
        command="/etc/init.d/dsrv stop",
        timeout=300,
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="removing",
        descriptionDone="removed",
        command="rm -Rf /opt/dazzit" % args,
        timeout=300,
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="copying",
        descriptionDone="copied",
        command="cp -Rv %(repo_dir)s/dps-%(catalog)s-linux-current/dazzit /opt" % args,
        timeout=300,
        haltOnFailure=True)
    )
    f.addStep(ShellCommand(
        description="starting",
        descriptionDone="started",
        command="/etc/init.d/dsrv start",
        timeout=300,
        haltOnFailure=True)
    )
    return f

def www_builder(c,**args):
    b = {
        'category': args['branch'],
        'name': "%(branch)s-www-%(catalog)s" % args,
        'factory': www_factory(**args),
        'slavename': "%(bot)s" % args,
        'builddir': "%(branch)s-www-%(catalog)s" % args
    }
    c['schedulers'].append(
        Triggerable(
            name=b["name"],
            builderNames=[b["name"]]
        )
    )
    return b

In our first Buildbot post we talked about meta tasks—tasks that call other tasks. Meta tasks exist mostly for convenience. They allow us to trigger a set of tasks by triggering just one—while still giving us direct access to sub-tasks when necessary. They also give us a single task to schedule (rather than a bunch of separate tasks).

In this example we’re defining a builder that triggers software builds on each of our platforms. If you look at the nightly scheduler described above you’ll see that this is the builder it schedules. And if you look at the definition of the builder below you’ll see that it runs the platform builds in parallel (waitForFinish=False).


#
# Dazzit Meta Builder
#

def dazzit_meta_factory(**args):
    args['repo_dir'] = repo_dir(args)
    args['root_dir'] = root_dir(args)
    args['work_dir'] = work_dir(args)

    f = factory.BuildFactory()
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dazzit-ubuntu1004" % args],
        waitForFinish=False,
        haltOnFailure=False)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dazzit-macosx5" % args],
        waitForFinish=False,
        haltOnFailure=False)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dazzit-winxp" % args],
        waitForFinish=False,
        haltOnFailure=False)
    )
    return f

def dazzit_meta_builder(c,**args):
    b = {
        'category': args['branch'],
        'name': "%(branch)s-dazzit-meta" % args,
        'factory': dazzit_meta_factory(**args),
        'slavename': "%(bot)s" % args,
        'builddir': "%(branch)s-dazzit-meta" % args
    }
    return b

Here’s another example of a meta builder. This one defines all of the main steps for our com catalog—the catalog we use at http://www.dazzit.com. We’ll describe the steps in upcoming posts, but for now note that we create a package (via dps) for each platform and then (using the www builder described earlier) launch a new web server with one of those packages. Unlike the software meta builder this one runs tasks sequentially (waitForFinish=True).


#
# DDS/DCS/DPS Com Meta Builder
#

def dds_com_meta_factory(**args):
    args['exe']     = "dazzit"
    args['catalog'] = "com"

    args['repo_dir'] = repo_dir(args)
    args['root_dir'] = root_dir(args)
    args['work_dir'] = work_dir(args)

    f = factory.BuildFactory()
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dd-%(catalog)s" % args],
        waitForFinish=True,
        haltOnFailure=True)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dds-%(catalog)s" % args],
        waitForFinish=True,
        haltOnFailure=True)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dcs-%(catalog)s" % args],
        waitForFinish=True,
        haltOnFailure=True)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dps-%(catalog)s-linux" % args],
        waitForFinish=True,
        haltOnFailure=True)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dps-%(catalog)s-mac" % args],
        waitForFinish=True,
        haltOnFailure=True)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-dps-%(catalog)s-win" % args],
        waitForFinish=True,
        haltOnFailure=True)
    )
    f.addStep(Trigger(
        schedulerNames=["%(branch)s-www-%(catalog)s" % args],
        waitForFinish=True,
        haltOnFailure=True)
    )
    return f

def dds_com_meta_builder(c,**args):
    b = {
        'category': args['branch'],
        'name': "%(branch)s-dds-com-meta" % args,
        'factory': dds_com_meta_factory(**args),
        'slavename': "%(bot)s" % args,
        'builddir': "%(branch)s-dds-com-meta" % args
    }
    c['schedulers'].append(
        Triggerable(
            name=b["name"],
            builderNames=[b["name"]]
        )
    )
    return b

The last step is to create a collection of instantiated builders. Each instantiated builder is represented by a column in the Buildbot web server. The columns will appear in the same order as the builders are defined. We take advantage of that fact to create a logical left-to-right progression of tasks in the web server. (The first four builders below map to the screenshot we included in the prior post.)


#
# Builders
#

c['builders'] = [

    #
    # MAIN
    #

    # Dazzit

    dazzit_meta_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN"),

    dazzit_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN"),
    dazzit_builder(c,bot="bot-macosx5",os="macosx5",branch="MAIN"),
    dazzit_builder(c,bot="bot-winxp",os="winxp",branch="MAIN"),

    ...

    # DDS/DCS/DSS/DPS Com

    dds_com_meta_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN"),

    dd_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN",catalog="com",exe="dazzit"),
    dds_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN",catalog="com",exe="dazzit"),
    dcs_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN",catalog="com",exe="dazzit"),
    dps_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN",catalog="com",exe="dazzit",package_os="linux"),
    dps_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN",catalog="com",exe="dazzit",package_os="mac"),
    dps_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN",catalog="com",exe="dazzit",package_os="win"),
    www_builder(c,bot="bot-ubuntu1004-main",os="ubuntu1004",branch="MAIN",catalog="com"),

    ...

]

Just imagine that with a lot more lines and you’ll get the picture.

There’s nothing particularly unusual about our Buildbot config. What’s unusual—and hopefully interesting—is that we use Buildbot to provide complete command & control of non-build tasks. The fact that it all works—and works quite well—is a credit to the Buildbot development team.