Batch Processing - Development with the Force.com Platform: Building Business Applications in the Cloud, Third Edition (2014)

Development with the Force.com Platform: Building Business Applications in the Cloud, Third Edition (2014)

9. Batch Processing

You’ve learned two ways you can process database records within the Force.com platform: triggers and Visualforce controllers. Each has its own set of platform-imposed limitations, such as how many records can be created at one time. As you accumulate tens of thousands of records or more in your database, you might need to process more records than permitted by the governor limits applying to triggers and controllers.

Although Salesforce has simplified and incrementally relaxed governor limits in recent Force.com releases, triggers and Visualforce controllers are fundamentally not suited to processing large amounts of data in a multitenant environment. They are driven by user interaction, and must be limited to provide good performance to all users. The Force.com platform carefully controls its resources to maintain high performance for all, so resource-intensive tasks such as processing millions of records must be planned and executed over time, balanced with the demands of other customers.

Batch processing makes this possible, and Batch Apex is the Force.com feature that enables batch processing on the platform. With Batch Apex, data-intensive tasks are taken offline, detached from user interactions, the exact timing of their execution determined by Salesforce itself. In return for relinquishing some control, you, the developer, receive the ability to process orders of magnitude more records than you can in triggers and controllers.

In this chapter, you will learn how to use Batch Apex to create, update, and delete millions of records at a time. It is divided into five sections:

Image Introduction to Batch Apex—Learn the concepts and terminology of Batch Apex, what it can do, and when you should and should not use it.

Image Getting started with Batch Apex—Walk through a simple example of Batch Apex. Develop the code, run it, and monitor its execution.

Image Testing Batch Apex—Like any other Apex code, proper test coverage is required. Learn how to kick off Batch Apex jobs within test code.

Image Scheduling Batch Apex—Although Salesforce has the final say on when Batch Apex is run, you can schedule jobs to run using a built-in scheduler. Learn how to use the scheduling user interface and achieve finer-grained control in Apex code.

Image Sample application—Enhance the Services Manager application by creating a scheduled batch process to identify missing timecards.


Note

The code listings in this chapter are available in a GitHub Gist at http://goo.gl/Iw8XT.


Introduction to Batch Apex

Prior to the availability of Batch Apex, the only options for processing data exceeding the governor limits of triggers and controllers were tricky workarounds to shift work off the platform. For example, you might have hundreds of thousands of records spanning multiple Lookup relationships to be summarized, deduplicated, cleansed, or otherwise modified en masse algorithmically. You could use the Web Services API to interact with the Force.com data from outside of Force.com itself, or you could use JavaScript to process batches of data inside the Web browser. These approaches are usually slow and brittle, requiring lots of code and exposing you to data quality problems over time due to gaps in error handling and recovery. Batch Apex allows you to keep the large, data-intensive processing tasks within the platform, taking advantage of its close proximity to the data and transactional integrity to create secure, reliable processes without the limits of normal, interactive Apex code. This section introduces you to concepts and guidelines for using Batch Apex to prepare you for hands-on work.

Batch Apex Concepts

Batch Apex is an execution framework that splits a large data set into subsets and provides them to ordinary Apex programs that you develop, which continue to operate within their usual governor limits. This means with some minor rework to make your code operate as Batch Apex, you can process data volumes that would otherwise be prohibited within the platform. By helping Salesforce break up your processing task, you are permitted to run it within its platform.

A few key concepts in Batch Apex are used throughout this chapter:

Image Scope—The scope is the set of records that a Batch Apex process operates on. It can consist of 1 record or up to 50 million records. Scope is usually expressed as a SOQL statement, which is contained in a Query Locator, a system object that is blessedly exempt from the normal governor limits on SOQL. If your scope is too complex to be specified in a single SOQL statement, then writing Apex code to generate the scope (called an iterable scope) programmatically is also possible. Unfortunately, using Apex code dramatically reduces the number of records that can be processed because it is subject to the standard governor limit on records returned by a SOQL statement.

Image Batch job—A batch job is a Batch Apex program that has been submitted for execution. It is the runtime manifestation of your code, running asynchronously within the Force.com platform. Because batch jobs run in the background and can take many hours to complete their work, Salesforce provides a user interface for listing batch jobs and their statuses, and to allow individual jobs to be canceled. This job information is also available as a standard object in the database. Although the batch job is not the atomic unit of work within Batch Apex, it is the only platform-provided level at which you have control over a batch process.

Image Transaction—Each batch job consists of transactions, which are the governor limit-friendly units of work you’re familiar with from triggers and Visualforce controllers. By default, a transaction is up to 2,000 records (with no limit for an iterable scope), but you can adjust this downward in code. When a batch job starts, the scope is split into a series of transactions. Each transaction is then processed by your Apex code and committed to the database independently. Although the same block of your code is being called upon to process potentially thousands of transactions, the transactions themselves are normally stateless. None of the variables within it are saved between invocations unless you explicitly designate your Batch Apex code as stateful when it is developed. Salesforce doesn’t provide information on whether your transactions are run in parallel or serially, nor how they are ordered. Observationally, transactions seem to run serially, in order based on scope.

In the remainder of this section, these concepts are applied to take you one step closer to writing your own Batch Apex.

Understanding the Batchable Interface

To make your Apex code run as a batch, you must sign a contract with the platform. This contract takes the form of an interface called Batchable that must be implemented by your code. It requires that you structure your processing logic into the following three methods:

Image start—The start method is concerned with the scope of work, the raw set of records to be processed in the batch. When a batch is submitted to Salesforce for processing, the first thing it does is invoke your start method. Your job here is to return a QueryLocator or anIterable that describes the scope of the batch job.

Image execute—After calling the start method, Force.com has the means to access all the records you’ve requested that it operate on. It then splits these records into sets of up to 200 records and invokes your execute method repeatedly, once for each set of records. At this point, your code can perform the substance of the batch operation, typically inserting, updating, or deleting records. Each invocation of execute is a separate transaction. If an uncaught exception is in a transaction, no further transactions are processed and the entire batch job is stopped.


Caution

Transactions that complete successfully are never rolled back. So, an error in a transaction stops the batch, but transactions executed up to that point remain in the database. Thinking of an overall Batch Apex job as transactional is dangerous, because this is not its default behavior. Additionally, you cannot use savepoints to achieve a single pseudotransaction across the entire batch job. If you must achieve jobwide rollback, this can be implemented in the form of a compensating batch job that reverses the actions of the failed job.


Image finish—The finish method is invoked once at the end of a batch job. The job ends when all transactions in the scope have been processed successfully, or if processing has failed. Regardless of success or failure, finish is called. There is no requirement to do anything special in the method. You can leave the method body empty if no postprocessing is needed. It simply provides an opportunity for you to receive a notification that processing is complete. You could use this information to clean up any working state or notify the user via email that the batch job is complete.

With this initial walk-through of the Batchable interface, you can begin to apply it to your own trigger or Visualforce controller code. If you find a process that is a candidate to run as a batch, think about how it can be restructured to conform to this interface and thus take advantage of Batch Apex.

Applications of Batch Apex

Like any feature of Force.com, Batch Apex works best when you apply it to an appropriate use case that meshes well with its unique capabilities. The following list provides some guidelines when evaluating Batch Apex for your project:

Image Single database object—Batch Apex is optimized to source its data from a single, “tall” (containing many records) database object. It cannot read data from other sources, such as callouts to Web services. If the records you need to process span many database objects that cannot be reached via parent-child or child-parent relationships from a single database object, you should proceed carefully. You will need to develop separate Batch Apex code for every database object. Although this is doable and you can share code between them, it creates maintenance headaches and quickly exposes you to the limitation of five active batch jobs per organization.

Image Simple scope of work—Although Batch Apex allows the use of custom code to provide it with the records to process, it is most powerful when the scope of work is expressed in a single SOQL statement. Do some work up front to ensure that the source of data for your batch can be summed up in that single SOQL statement.

Image Minimal shared state—The best design for a Batch Apex process is one where every unit of work is independent, meaning it does not require information to be shared with other units of work. Although creating stateful Batch Apex is possible, it is a less mature feature and more difficult to debug than its stateless counterpart. If you need shared state to be maintained across units of work, try to use the database itself rather than variables in your Apex class.

Image Limited transactionality—If your batch process is a single, all-or-nothing transaction, Batch Apex is only going to get you halfway there. You will need to write extra code to compensate for failures and roll back the database to its original state.

Image Not time-critical—Salesforce provides no hard guarantees about when Batch Apex is executed or its performance. If you have an application that has time-based requirements such that users will be prevented from doing their jobs if a batch does not run or complete by a specific time, Batch Apex might not be a good fit. A better fit is a process that must run within a time window on the order of hours rather than minutes.

These guidelines might seem stifling at first glance, but Batch Apex actually enables an impressive breadth of interesting applications to be developed that were previously impossible with other forms of Apex.

Getting Started with Batch Apex

You don’t need an elaborate use case or huge data volumes to get started with Batch Apex. This section walks you through the development of a simple Batch Apex class that writes debug log entries as it runs. The class is submitted for execution using the Force.com IDE and monitored in the administrative Web user interface. Two more versions of the Batch Apex class are developed: one to demonstrate stateful processing and the other an iterable scope. The section concludes with a description of important Batch Apex limits.

Developing a Batch Apex Class

Although the class in Listing 9.1 performs no useful work, it leaves a trail of its activity in the debug log. This is helpful in understanding how Force.com handles your batch-enabled code. It also illustrates the basic elements of a Batch Apex class, listed next:

Image The class must implement the Database.Batchable interface. This is a parameterized interface, so you also need to provide a type name. Use SObject for batches with a QueryLocator scope, or any database object type for an Iterable scope.

Image The class must be global. This is a requirement of Batch Apex classes.

Listing 9.1 Sample Batch Apex Code


global class Listing9_1 implements Database.Batchable<SObject> {
global Database.QueryLocator start(Database.BatchableContext context) {
System.debug('start');
return Database.getQueryLocator(
[SELECT Name FROM Project__c ORDER BY Name]);
}
global void execute(Database.BatchableContext context,
List<SObject> scope) {
System.debug('execute');
for(SObject rec : scope) {
Project__c p = (Project__c)rec;
System.debug('Project: ' + p.Name);
}
}
global void finish(Database.BatchableContext context) {
System.debug('finish');
}
}


Before actually running the code in the next subsection, review these implementation details:

Image The start method defines the scope by returning a QueryLocator object constructed from an in-line SOQL statement. The SOQL statement returns all Project records in ascending order by the Name field. The SOQL statement can use parameters (prefaced with a colon) like any in-line SOQL in Apex code. Relationship queries are acceptable, but aggregate queries are not allowed. You can also pass a SOQL string into the getQueryLocator method, which allows the scope of the batch to be specified with dynamic SOQL.

Image The execute method is called once per transaction with a unique group of up to 2,000 records from the scope. The records are provided in the scope argument.

Image The finish method is called when all transactions have completed processing, or the batch job has been interrupted for any reason.

Image The BatchableContext object argument in all three methods contains a method for obtaining the unique identifier of the current batch job, getJobID. This identifier can be used to look up additional information about the batch job in the standard database object AsyncApexJob. You can also pass this identifier to the System.abortJob method to stop processing of the batch job.

Working with Batch Apex Jobs

Batch Apex can be executed from a Visualforce page, scheduled to run automatically at specific times, or kicked off from within a trigger. But the easiest way to experiment with it is in the Execute Anonymous view in the Force.com IDE.

First, enable debug logging for your user in the Administration Setup area; select Monitoring, Debug Logs; and add your user to the list of monitored users by clicking the New button. This is no different than debugging any Apex class. Using the Execute Anonymous view, enter the code inListing 9.2 and execute it. The batch is submitted and its unique job identifier displayed in the results box.

Listing 9.2 Running Sample Batch Apex Code


Listing9_1 batch = new Listing9_1();
Id jobId = Database.executeBatch(batch);
System.debug('Started Batch Apex job: ' + jobId);


The executeBatch method of the Database class does the work here. It queues the batch job for processing when Force.com is ready to do so. This could be in seconds or minutes; it is not specified. The Listing9_1 sample class is very simple, but in many cases you would need to pass arguments, either in the constructor or via setter methods, to adjust the behavior of a batch process. This is no different from any Apex class.

To start a batch in response to a button click or other user interface action, apply the code shown in Listing 9.2 within a Visualforce custom controller or controller extension class. Now that you have submitted your batch job, it’s time to monitor its progress. In your Web browser, go to the Administration Setup area and select Monitoring, Apex Jobs. This page, shown in Figure 9.1, allows you to manage all the batch jobs in your Force.com organization.

Image

Figure 9.1 Apex Jobs user interface

The single Listing9_1 job you executed should be visible. By this time, it is most likely in the Completed status, having few records to process. If Force.com is very busy, you might see a status of Queued. This means the job has not been started yet. A status value of Processing indicates the job is currently being executed by the platform. If a user interrupts the job by clicking the Abort link on this page, the job status becomes Aborted. A job with a Failed status means an uncaught exception was thrown during its execution. If you scroll to the right, you can also see the Apex Job Id, which should match the one returned by the Database.executeBatch method.

Take a closer look at the values in the Total Batches and Batches Processed columns. To avoid confusion, disregard the word Batches here. Total Batches is the number of transactions needed to complete the batch job. It is equal to the scope (which defaults to 200) divided into the number of records returned by the start method. The Batches Processed column contains the number of times the execute method of your Batch Apex class was invoked so far. As the processing proceeds, you should see it increment until it is equal to the Total Batches value. For example, if you have fewer than 200 Project records in your database, you should see a 1 in both columns when the batch is complete. If you have between 201 and 400 records, you should see 2 instead. If you have 1,500 records and the system is processing the 300th record, you should see a value of 8 in Total Batches and 1 in Processed Batches. All the information on the page is also accessible programmatically, contained in the standard object named AsyncApexJob.

You have seen the batch job run its course. Proceed back to the Debug Logs page. Here you can review the job’s execution in detail, thanks to the System.debug statements throughout the code. Figure 9.2 is an example of what you might see there.

Image

Figure 9.2 Debug logs from sample Batch Apex code

Four separate logs each cover a different aspect of the batch execution. Each is described next in the order they are executed, although this might not be the order shown on the Debug Logs page:

1. Results of evaluating the code in the Execute Anonymous view.

2. Invocation of the start method to prepare the data set for the batch.

3. Results of running the execute method, where the batch job performs its work on the subsets of the data.

4. All the transactions have been processed, so the finish method is called to allow postprocessing to occur.

These results are somewhat interesting, but appreciating what the batch is doing is hard without more data. You could add 200 more Project records, or you can simply adjust the scope to process fewer records per transaction. Listing 9.3 is an example of doing just that, passing the number 2 in as the scope, the second argument of the Database.executeBatch method. This indicates to Force.com that you want a maximum of two records per transaction in the batch job.

Listing 9.3 Running Sample Batch Apex Code with Scope Argument


Listing9_1 batch = new Listing9_1();
Id jobId = Database.executeBatch(batch, 2);
System.debug('Started Batch Apex job: ' + jobId);


After running this code in the Execute Anonymous view, return to the debug logs. You should now see two additional logs in the execute phase, for a total of three transactions of two records each. The three transactions are needed to process the six Project records.

Using Stateful Batch Apex

Batch Apex is stateless by default. That means for each execution of your execute method, you receive a fresh copy of your object. All fields of the class are initialized, static and instance. If your batch process needs information that is shared across transactions, one approach is to make the Batch Apex class itself stateful by implementing the Stateful interface. This instructs Force.com to preserve the values of your static and instance variables between transactions.

To try a simple example of stateful Batch Apex, create a new Apex class with the code in Listing 9.4.

Listing 9.4 Stateful Batch Apex Sample


global class Listing9_4
implements Database.Batchable<SObject>, Database.Stateful {
Integer count = 0;
global Database.QueryLocator start(Database.BatchableContext context) {
System.debug('start: ' + count);
return Database.getQueryLocator(
[SELECT Name FROM Project__c ORDER BY Name]);
}
global void execute(Database.BatchableContext context,
List<SObject> scope) {
System.debug('execute: ' + count);
for(SObject rec : scope) {
Project__c p = (Project__c)rec;
System.debug('Project ' + count + ': ' + p.Name);
count++;
}
}
global void finish(Database.BatchableContext context) {
System.debug('finish: ' + count);
}
}


Take a moment to examine the differences between this class and the original, stateless version. Implementing the interface Database.Stateful is the primary change. The other changes are simply to provide proof in the debug log that the value of the count variable is indeed preserved between transactions.

Run the modified class with a scope of two records and examine the debug log. Although the log entries might not be ordered in any discernible way, you can see all the Project records have been visited by the batch process. Assuming you have six Project records in your database, you should see a total of six new debug log entries: one to begin the batch, one for the start method, three entries’ worth of transactions (of two records each), and one for the finish method.

Notice the value of the count variable throughout the debug output. It begins at 0 in the first transaction, increments by two as Project records are processed, and begins at 2 in the second transaction. Without implementing Database.Stateful, the count variable would remain between 0 and 2 for every transaction. The value of the count variable is 6 when the finish method is reached.

Using an Iterable Batch Scope

All of the sample code so far has used a QueryLocator object to define the scope of its batch. This enables up to 50 million records to be processed by the batch job, but requires that the scope be defined entirely using a single SOQL statement. This can be too limiting for some batch processing tasks, so the iterable batch scope is offered as an alternative.

The iterable scope allows custom Apex code to determine which records are processed in the batch. For example, you could use an iterable scope to filter the records using criteria that are too complex to be expressed in SOQL. The downside of the iterable approach is that standard SOQL limits apply. This means you can process a maximum of 50,000 records in your batch job, a dramatic reduction from the 50 million record limit of a QueryLocator object.

To develop a batch with iterable scope, you must first write code to provide data to the batch. There are two parts to this task:

Image Implement the Iterator interface—The Iterator is a class for navigating a collection of elements. It navigates in a single direction, from beginning to end. It requires that you implement two methods: hasNext and next. The hasNext method returns true if additional elements are left to navigate to, false when the end of the collection has been reached. The next method returns the next element in the collection. Iterator classes must be global.

Image Implement the Iterable interface—Think of this class as a wrapper or locator object that directs the caller to an Iterator. It requires a single global method to be implemented, called Iterator, which returns an Iterable object. Like Iterator, classes implementingIterable must be global.

You could write two separate classes, one to implement each interface. Or you can implement both interfaces in a single class, the approach taken in the code in Listing 9.5.

Listing 9.5 Project Iterator


global class ProjectIterable
implements Iterator<Project__c>, Iterable<Project__c> {
List<Project__c> projects { get; set; }
Integer i;
public ProjectIterable() {
projects = [SELECT Name FROM Project__c ORDER BY Name ];
i = 0;
}
global Boolean hasNext() {
if (i >= projects.size()) {
return false;
} else {
return true;
}
}
global Project__c next() {
i++;
return projects[i-1];
}
global Iterator<Project__c> Iterator() {
return this;
}
}


With the implementation of the Iterable class ready for use, examine the code in Listing 9.6. It is very similar to the first Batch Apex example. The only notable differences are that the parameterized type has been changed from SObject to Project__c, and the start method now returns the Iterable class developed in Listing 9.5.

Listing 9.6 Iterable Batch Apex Sample


global class Listing9_6
implements Database.Batchable<Project__c> {
global Iterable<Project__c> start(Database.BatchableContext context) {
System.debug('start');
return new ProjectIterable();
}
global void execute(Database.BatchableContext context,
List<Project__c> scope) {
System.debug('execute');
for(Project__c rec : scope) {
System.debug('Project: ' + rec.Name);
}
}
global void finish(Database.BatchableContext context) {
System.debug('finish');
}
}


Turn on the debug log for your user and run the Listing9_6 job. Examine the logs and see for yourself that you’ve accomplished the same work as the Listing9_1 code using an iterable scope instead of a QueryLocator object.

Limits of Batch Apex

You must keep in mind several important limits of Batch Apex:

Image Future methods are not allowed anywhere in Batch Apex.

Image Batch jobs are always run as the system user, so they have permission to read and write all data in the organization.

Image The maximum heap size in Batch Apex is 12MB.

Image Calling out to external systems using the HTTP object or webservice methods is limited to one for each invocation of start, execute, and finish. To enable your batch process to call out, make sure the code implements the Database.AllowsCallouts interface in addition to the standard Database.Batchable interface.

Image Transactions (the execute method) run under the same governor limits as any Apex code. If you have intensive work to do in your execute method and worry about exceeding the governor limits when presented with the default 200 records per transaction, reduce the number of records using the optional scope parameter of the Database.executeBatch method.

Image The maximum number of queued or active batch jobs within an entire Salesforce organization is five. Attempting to run another job beyond the five raises a runtime error. For this reason, you should tightly control the number of batch jobs that are submitted. For example, submitting a batch from a trigger is generally a bad idea if you can avoid it. In a trigger, you can quickly exceed the maximum number of batch jobs.

Testing Batch Apex

Batch Apex can be tested like any Apex code, although you are limited to a single transaction’s worth of data (one invocation of the execute method). A batch job started within a test runs synchronously, and does not count against the organization’s limit of five batch jobs.

The class in Listing 9.7 tests the Batch Apex example from Listing 9.1 and achieves 100% test coverage. The annotation IsTest(SeeAllData=true) allows the test to access the data in the organization rather than requiring it to create its own test data. Alternatively, you could modify the code to omit the annotation and insert a few Project records to serve as test data.

Listing 9.7 Batch Apex Test


@IsTest(SeeAllData=true)
public with sharing class Listing9_7 {
public static testmethod void testBatch() {
Test.startTest();
Listing9_1 batch = new Listing9_1();
ID jobId = Database.executeBatch(batch);
Test.stopTest();
}
}


The test method simply executes the batch with the same syntax as you have used in the Execute Anonymous view. The batch execution is bookended with the startTest and stopTest methods. This ensures that the batch job is run synchronously and is finished at the stopTestmethod. This enables you to make assertions (System.assert) to verify that the batch performed the correct operations on your data.

Scheduling Batch Apex

Along with Batch Apex, Salesforce added a scheduler to the Force.com platform. This enables any Apex code, not just Batch Apex, to be scheduled to run asynchronously at regular time intervals. Prior to the introduction of this feature, developers had to resort to off-platform workarounds, such as invoking a Force.com Web service from an external system capable of scheduling jobs.

This section describes how to prepare your code for scheduling and how to schedule it from Apex and the administrative user interface.

Developing Schedulable Code

An Apex class that can be scheduled by Force.com must implement the Schedulable interface. The interface requires no methods to be implemented; it simply indicates to the platform that your class can be scheduled. Code that is executed by the scheduler runs as the system user, sosharing rules or other access controls are not enforced. At most, ten classes can be scheduled at one time.

The class in Listing 9.8 enables the Batch Apex example from Listing 9.1 to be schedulable. It does this by implementing the Schedulable interface, which has a single method: execute. Although you could implement this interface directly on your batch class, the best practice recommended by Salesforce is to create a separate Schedulable class.

Listing 9.8 Schedulable Batch Apex


global class Listing9_8 implements Schedulable {
global void execute(SchedulableContext sc) {
Listing9_1 batch = new Listing9_1();
Database.executeBatch(batch);
}
}


Scheduling Batch Apex Jobs

To schedule a job using the user interface, go to the App Setup area and click Develop, Apex Classes. Click the Schedule Apex button. In Figure 9.3, the Listing9_8 class has been configured to run Saturday mornings at 11:00 a.m. between 7/10/2013 and 8/10/2013.

Image

Figure 9.3 Schedule Apex user interface

To view and cancel scheduled jobs, go to the Administration Setup area and click Monitoring, Scheduled Jobs. This is shown in Figure 9.4 with the previously scheduled job. At this point, you can click Manage to edit the schedule, or Del to cancel it.

Image

Figure 9.4 All Scheduled Jobs user interface

The same management of scheduled jobs available in the user interface can be automated using Apex code, as described next:

Image Create a scheduled job—Use the System.schedule method to schedule a new job. This method requires three arguments: the name of the job, the schedule expression, and an instance of class to schedule. The schedule expression is a string in crontab-like format. This format is a space-delimited list of the following arguments: seconds, minutes, hours, day of month, month, day of week, and year (optional). Each argument is a value specifying when the job is to run in the relevant units. All arguments except seconds and minutes permit multiple values, ranges, wildcards, and increments. For example, the schedule expression 0 0 8 ? * MON-FRI schedules the job for weekdays at 8:00 a.m. The 8 indicates the eighth hour, the question mark leaves day of month unspecified, the asterisk indicates all months, and the day of week is Monday through Friday. The time zone of the user scheduling the job is used to calculate the schedule.


Note

For a full reference to schedule expressions, refer to the Force.com Apex Code Developer’s Guide section on the subject, available at http://www.salesforce.com/us/developer/docs/apexcode/index_Left.htm#StartTopic=Content/apex_scheduler.htm.


Image View a scheduled job—To get attributes about a scheduled job, such as when it will be executed next, query the standard object CronTrigger. It includes useful fields such as NextFireTime, PreviousFireTime, as well as StartTime and EndTime, calculated from the time the scheduled job was created to the last occurrence as specified by the schedule expression.

Image Delete a scheduled job—The System.abortJob method deletes scheduled jobs. It requires a single argument, the identifier returned by the SchedulableContext getTriggerID method. This can also be obtained from the Id field of a CronTrigger record.

Image Modify a scheduled job—The standard object CronTrigger is read-only, so to modify a job, you must delete it first and then re-create it.

The code in Listing 9.9 can be executed in the Execute Anonymous view to schedule the Listing9_8 class to run monthly on the first day of every month at 1:00 a.m. in the user’s time zone. You can verify this by examining the scheduled job in the user interface or querying theCronTrigger object.

Listing 9.9 Sample Code to Schedule Batch Apex


System.schedule('Scheduled Test', '0 0 1 * * ?', new Listing9_8());



Caution

After an Apex class is scheduled, its code cannot be modified until all of its scheduled jobs are deleted.


Sample Application: Missing Timecard Report

A common application of Batch Apex is to distill a large number of records down to a smaller, more digestible set of records that contain actionable information. In the Services Manager sample application, consultants enter timecards against assignments, specifying their daily hours for a weekly period. When consultants fail to enter their timecards in a timely manner, this can impact the business in many ways: Customers cannot be invoiced, and the budget of billable hours can be overrun without warning. With a large number of timecards, consultants, and projects, manually searching the database to identify missing timecards isn’t feasible. This information needs to be extracted from the raw data.

The management users of the Services Manager have requested a tool that enables them to proactively identify missing timecards. They would like to see a list of the time periods and the assignments that have no timecard so that they can work with the consultants to get their time reported. This information could later be used as the basis of a custom user interface, report or dashboard component, or automated email notifications to the consultants.

This section walks through the implementation of the missing timecard report. It consists of the following steps:

1. Create a custom object to store the list of missing timecards.

2. Develop a Batch Apex class to calculate the missing timecard information.

3. Run through a simple test case to make sure the code works as expected.

Creating the Custom Object

Your Services Manager users in supervisory positions have asked to see missing timecards of their consultants. Specifically they want the dates of missing timecards, the offending consultants, and their assigned projects. There are two fields necessary to provide the requested information: the assignment, which automatically includes the resource and project as references, and the week ending date that lacks a timecard for the assignment.

Create a new custom object to store this information, naming it Missing Timecard. Add a lookup field to Assignment and a Date field named Week_Ending__c to mirror the field of the same name in the Timecard object. Create a custom tab for this object as well. When you’re done, the Missing Timecard object definition should resemble Figure 9.5.

Image

Figure 9.5 Missing timecard custom object definition

Developing the Batch Apex Class

A good design approach for Batch Apex is to consider the input schema, output schema, and the most direct algorithm to transform input to output. You’ve already designed the output schema based on what the users want to see: the Missing Timecard object. That leaves the input and the algorithm to be designed.

Consider the algorithm first, which drives the input. The algorithm loops through assignments that are not in Tentative or Closed status. It builds a list of Week Ending dates of valid timecards (in Submitted or Approved status) in the same project as the assignment. It then cycles through the weeks between the start and end dates of the assignment, up to the current day. If a week ending date is not found in the list of timecard Week Ending dates, it is considered missing and its assignment and date are added to the Missing Timecards object.

With the algorithm nailed down, move on to the input. The key to a concise, maintainable Batch Apex class is formulating the right SOQL query to provide the input records. Most of the effort is in finding the optimal SObject to base the query on. If you pick the wrong SObject, you could be forced to augment the input in your execute method, resulting in more queries, this time subject to SOQL governor limits.

It is clear from the algorithm that the batch input must include Assignment records and corresponding Timecard records. But Assignment and Timecard are two separate many-to-many relationships with no direct relationship with each other.

Although basing the query on the Assignment or Timecard objects might be tempting, this leads to a weak design. For example, if you query the assignments in the start method and then augment this with Timecard records in the execute method, you need to build dynamic SOQL to optimize the second query given the input Assignment records. This is a sure sign that you should continue to iterate on the design.

When you switch tracks and design the batch around the Project object, life becomes easier. From Project, you have access to Timecard and Assignment records at the same time. The code in Listing 9.10 implements the missing timecard feature with a query on Project as the input.

Listing 9.10 MissingTimecardBatch


global class MissingTimecardBatch
implements Database.Batchable<SObject> {
global Database.QueryLocator start(Database.BatchableContext context) {
return Database.getQueryLocator([ SELECT Name, Type__c,
(SELECT Name, Start_Date__c, End_Date__c
FROM Assignments__r WHERE Status__c NOT IN ('Tentative', 'Closed')),
(SELECT Status__c, Week_Ending__c
FROM Timecards__r
WHERE Status__c IN ('Submitted', 'Approved'))
FROM Project__c
]);
}
global void execute(Database.BatchableContext context,
List<SObject> scope) {
List<Missing_Timecard__c> missing = new List<Missing_Timecard__c>();
for (SObject rec : scope) {
Project__c proj = (Project__c)rec;
Set<Date> timecards = new Set<Date>();
if (proj.Assignments__r != null) {
for (Assignment__c assign : proj.Assignments__r) {
if (proj.Timecards__r != null) {
for (Timecard__c timecard : proj.Timecards__r) {
timecards.add(timecard.Week_Ending__c);
}
}
/** Timecards are logged weekly, so the Week_Ending__c field is always
* a Saturday. We need to convert an assignment, which can contain an
* arbitrary start and end date, into a start and end period expressed
* only in terms of Saturdays. To do this, we use the toStartOfWeek
* method on the Date object, and then add 6 days to reach a Saturday.
*/
Date startWeekEnding =
assign.Start_Date__c.toStartOfWeek().addDays(6);
Date endWeekEnding =
assign.End_Date__c.toStartOfWeek().addDays(6);
Integer weeks = 0;
while (startWeekEnding.addDays(weeks * 7) < endWeekEnding) {
Date d = startWeekEnding.addDays(weeks * 7);
if (d >= Date.today()) {
break;
}
if (!timecards.contains(d)) {
missing.add(new Missing_Timecard__c(
Assignment__c = assign.Id,
Week_Ending__c = d));
}
weeks++;
}
}
}
}
insert missing;
}
global void finish(Database.BatchableContext context) {
}
}


Testing the Missing Timecard Feature

To achieve adequate test coverage, add unit tests to the Batch Apex class that create assignments and timecards in various combinations, kick off the batch, and then query the Missing Timecard object and verify the presence of the correct data.

You can also test informally from the user interface and the Execute Anonymous view in the Force.com IDE. For example, create an Assignment record for the GenePoint project, starting 4/1/2015 and ending 4/30/2015 for Rose Gonzalez, and set its status to Scheduled. Enter a timecard for her for week ending 4/11/2015 on the GenePoint project, and set its status to Approved. Now run the MissingTimecardBatch from Force.com using the code in Listing 9.11.

Listing 9.11 Running MissingTimecardBatch


Database.executeBatch(new MissingTimecardBatch());


Check the Apex Jobs to monitor the progress of your batch job. When it’s done, visit the Missing Timecard tab. You should see three Missing Timecard records for the GenePoint assignment, with the dates 4/4/2015, 4/18/2015, and 4/25/2015. The 4/11/2015 date is not included because a valid Timecard record exists for it.

To try some more test scenarios, first clear the Missing Timecard records so you don’t have to sift through duplicates. The code in Listing 9.12 is an easy way to do so, and you can run it from the Execute Anonymous view.

Listing 9.12 Reset Results of MissingTimecardBatch


delete [ SELECT Id FROM Missing_Timecard__c ];


Summary

Batch processing in Force.com enables you to query and modify data in volumes that would otherwise be prohibited by governor limits. In this chapter, you’ve learned how to develop, test, and schedule batch jobs, and applied batch processing to the real-world problem of identifying missing database records.

When using Batch Apex in your own applications, consider these key points:

Image Batch Apex is optimized for tasks with inputs that can be expressed in a single SOQL statement and that do not require all-or-nothing transactional behavior.

Image With its limit of five active batch jobs per organization, one input data set per job, and a lack of precise control over actual execution time, Batch Apex is the nuclear option of Force.com data processing: powerful, but challenging to build and subject to proliferation problems. Use it sparingly, when all other options are exhausted. If triggers or Visualforce controllers can do the same job given expected data volumes, consider them first.

Image You can implement the Schedulable interface to run any Apex code at regular time intervals, not just Batch Apex. Schedules can be managed via the administrative user interface and in Apex code.