Hourly Maintenance

hourly_maint.php is run at the top of each hour. It reloads the shared memory segments that determine which creatives are assigned to which sections and how many of each to run.

Step-by-step through hourly_maint.php

This section will illustrate the various tasks performed by this script.

Record the last hour's traffic

Immediately upon launching, a semaphore is obtained (0x4f415300, or "OAS0" in ASCII) to prevent new data from being written to or read from the shared memory segments while we're performing our calculations. We want to hold this semaphore for as short a period as possible, since all ad delivery is paused while we're holding it.

Record the last hour's traffic

All of the impressions and clickthroughs are committed back to the database. The ImpressionsDelivered and ClickthroughsDelivered fields are updated in the Creatives table. Later, this information will be propogated up to the level of the campaigns, but for now, since we want to operate as fast as possible while holding the semaphore, we just record things at the creative level.

Call daily_maint.php if appropriate

If we're called during hour 0 (or if the force_daily command-line option is specified), we'll call the daily maintenance script, which will rebuild the DailyTargets table.

Build the assignment table

The script goes through all the campaign and creative section assignments and propogates them down through the section tree. It processes them in this order:
  1. Campaign Includes
  2. Creative Includes
  3. Campaign Excludes
  4. Creative Excludes
  5. Campaign Exclusives
  6. Creative Exclusives
In this way, creative assignments always override those of their parent campaigns. Also, exclusive assignments override all others.

A large table is constructed. For each section S in the section tree, there is a hash. The keys to this hash are creative dimensions ('120x90', '468x60', etc.). The value associated with each key D is an array of CreativeIDs for all the creatives of dimensions D which are running on section S. The delivery engine will use this table to quickly acquire a list of all candidate creatives for a given section.

Compute hourly targets

For each entry in the DailyTargets table, the script looks at how many hours are left in the day, how much traffic is slated to run during those hours (from the values you've defined in the Traffic Shaping interface) and what percentage of that traffic is remaining in the current hour. This is done on a per-creative level, since each creative can run at different hours during the day.

Example: suppose it is hour 22. You have said that you get on average 10,000 pageviews in hour 22 and 5,000 pageviews in hour 23. You have two active creatives. Creative 1 runs all hours of the day. Creative 2 runs hours 10 a.m. to 10 p.m. Each still needs 600 impressions before the end of the day. Creative 1 will be scheduled for

600 * 10,000 / (10,000 + 5,000) = 400 impressions

Creative 2, which does not run in hour 23, will be scheduled for

600 * 10,000 / (10,000) = 600 impressions

Creatives which do not have specific impression targets will be entered into the HourlyTargets shared memory if they are slated to run during the given hour.

For each active creative, four values are stored in the HourlyTargets shared memory segment: the hourly impression target, the weight of the creative, the number of impressions remaining, and the number of clicks delivered.

For creatives with impression targets, the number of impressions remaining is equal to the hourly impression target when the table is loaded. As impressions are delivered, this number decrements until it hits 0, at which point no more impressions are served.

For creatives without impression targets, the number of impressions remaining is 0 when the table is loaded. As impressions are delivered, this number goes negative.

Release the semaphore

We've done all the really tricky stuff; so it's now safe to allow delivery to resume.

Process log file

This step happens only if no options were specified on the command line (presumably, if you're calling the script by hand, you're specifying options. When you call the script by hand, you don't want to process the logs -- we try to process them only once per hour to reduce the entries in the HourlyStats table).

The log file is slurped up and each entry is tallied. The result is that for each creative/section combination, a record is saved in the HourlyStats. Impressions, clicks, impression errors, and click errors are stored in the table. Reports are generated on the fly from these records.

In addition to the fairly detailed HourlyStats table, there is a table called CampaignDailyStats, which contains one record per campaign per day. This table exists for speed in generating invoices and the revenue report. You can't slice and dice the numbers like the HourlyStats, but it is a much more compact representation.

The contents of the log file are appended to the file YYYY/MM-DD.log in the directory specified by the LogDir preference. During hour 0, yesterday's log file is gzipped. Note that you can clean these files up at will; they are only there for your reference.

Update campaigns

Now we propogate the impressions and clicks recorded for each creative up to their parent campaigns.

Mark campaigns/creatives complete

Finally, we look for any campaigns or creatives which have either met their impression targets or have passed their end dates. At this point, we set the Status of those campaigns to "Complete".

Inventory Simulation

If it's currently hour 0 (and we're being run as a regular cron job, not manually), project available inventory and run a simulation. This is a compute-intensive job. It can take a couple of hours, depending on some of your preferences.

Adding hourly_maint.php to the crontab

Add this script to the Web server user's crontab. Do not run this script as anybody but that user, as this will result in creating shared memory segments that are not owned by the Web server user. This will interfere with normal operation of OASIS, as it will not be able to clear shared memory segments.

The job should run at the top of every hour, as so:

0 * * * * /path/to/oasis/mgmt/hourly_maint.php > /dev/null

Calling hourly_maint.php from the command-line

Under certain circumstances, you can run this script from the command-line. Do not run it as any other user but the one whose crontab runs the script normally (preferably your Web server user). You should call it in one of these ways:

/path/to/oasis/mgmt/hourly_maint.php start
This will force the running of daily_maint.php, and it will reload the hourly targets. It will not process the logs or attempt to record any traffic from the previous hour. Use this to start up OASIS after a reboot (and before you start your Web server).

/path/to/oasis/mgmt/hourly_maint.php force_daily
This is the same as "start", but it also records the last hour's traffic.

/path/to/oasis/mgmt/hourly_maint.php reload
This is basically the same as calling hourly_maint.php with no arguments, but it won't call the daily maintenance script (even if it is run during hour 0), and it won't run the inventory simulation (even if it is run during hour 0). It will record the last hour's traffic, process the logs, and reload the hourly targets.

/path/to/oasis/mgmt/hourly_maint.php stop
This method of invocation will run the script and record the last hour's traffic. It then clears shared memory and processes the log. It does not attempt to calculate any targets for the next hour. In fact, if oasisi.php runs after the shutdown process, it will not be able to deliver anything.

You should run this immediately after shutting down your Web server. Note that you can safely stop your Web server at any time without doing anything to OASIS. But if you want to reboot the server, you should run 'stop' to record the last hour's data to disk first.

/path/to/oasis/mgmt/hourly_maint.php resume
This method of invocation will not attempt to process logs or record the last hour's traffic. It will not try to clear shared memory. All it will do is reload shared memory. Use this after you've run 'stop', before restarting your Web server.

The difference between this method and the 'start' method is that 'start' will force a recalculation of daily targets. If you've been running and you stop the server only to decide to restart it, you can use 'resume'.