Numerical Essentials - USING OPTIMIZATION—PRACTICAL ESSENTIALS - Optimization in Practice with MATLAB for Engineering Students and Professionals (2015)

Optimization in Practice with MATLAB for Engineering Students and Professionals (2015)

PART 3


USING OPTIMIZATION—PRACTICAL ESSENTIALS

7


Numerical Essentials

7.1 Overview

In this chapter, you will be exposed to perhaps some of the most important issues in optimization. Interestingly, most optimization books and courses leave it up to the students to learn many of these issues simply by chance, leaving a broad set of the important practical topics unaddressed. We are referring here to numerical conditioning issues which strongly control the success of computational optimization.

We will not engage in highly mathematical and theoretical issues of numerical optimization. Instead, we will learn about the tangible and practical issues that directly affect our success in formulating and solving optimization problems. Fortunately, for most of this chapter, all we need is the knowledge of first-year college mathematics.

Keeping these issues in mind will often make the difference between success and failure. That is: (i) easily applying optimization successfully, or (ii) experiencing great frustration in trying to obtain an adequate solutionunsuccessfully. In addition, we will learn how to control the resulting accuracy of our results, a critical issue in practice. The topics explored here include: numerical conditioning, scaling, finite differences, automatic differentiation, termination criteria, and sensitivities of optimal solutions, as well as examples that illustrate how to handle these issues in practice (Refs. [1, 2, 3].

7.2 Numerical Conditioning—Algorithms, Matrices and Optimization Problems

We begin by asking the following questions: What is numerical conditioning? And how does it relate to optimization? As we have learned, optimization depends on the numerical evaluation of the performance of the system being optimized. This performance evaluation typically involves coding, simulation, or software-based mathematical analysis, often also involving matrix manipulations (e.g., involving Hessians). For example, linear programminginvolves extensive matrix manipulations. Therefore, understanding the numerical properties of the matrices and of the algorithms used in optimization codes is important. From a practical point of view, we can think of a numericallywell-conditioned problem or matrix as one that lends itself to easy numerical computation. Conversely, we can think of a numerically ill-conditioned problem or matrix as one that lends itself to difficult numerical computation. A well-conditioned problem is also said to be well-posed. Numerical conditioning can also refer to the property of a particular algorithm, as you will see shortly. A well-conditioned algorithm is likely to converge relatively easily, while an ill-conditioned algorithm may converge after many more iterations or may not converge at all. In other words, numerical conditioning may refer to the facility with which an algorithm converges. Numerical conditioning may also have practical consequences on the quality or accuracy of the resulting solution.

The immediate questions that come to mind at this point are: How do we know if an algorithm, a matrix, or an optimization problem is well-conditioned or not? How do we quantify numerical conditioning? Well, there is some good news:

(i) Algorithms: Since this book is primarily concerned with the practical aspects of formulating and applying optimization, we will not be learning about how to make algorithms well-conditioned. This is an advanced topic of numerical computation. Fortunately, if we use reputed optimization codes, the algorithmic numerical conditioning properties are usually adequate.

(ii) Matrices: For our purpose, the numerical conditioning of a matrix is quantified by the condition number. For symmetric matrices (e.g., Hessians (Eq. 2.42)), the condition number is the square of the ratio of the highest to the lowest eigenvalues. The case of non-symmetric matrices involves singular values, which is an advanced topic that does not directly concern us. A condition number with an order-of-magnitude of one is desirable, while a muchhigher condition number is a concern. The numerical properties of matrices play an important role in optimization algorithms. Sometimes an optimization run that does not converge will report that the “Hessian is ill-conditioned" (see Sec. 2.5.4 for the definition the Hessian). The strong relevance of a function’s Hessian becomes fully evident in our study of the more advanced aspects of optimization presented in Part IV of this book; specifically, Chapters 12and 13.

(iii) Optimization Problems: Posing our problems well is critically important. Two theoretically equivalent problems can have radically different numerical properties, as described later. Fortunately, we can generally deal with problem conditioning through proper scaling, which we will study in Sec. 7.3.

7.2.1 Reasons Why the Optimization Process Sometimes Fails

There are many reasons why optimization runs sometimes fail to converge to an adequate solution. The following are the prevailing ones that you should keep in mind:

1.The problem has a coding bug - a software/programming error;

2.The problem is ill conditioned - poorly scaled;

3.The problem is incorrectly formulated (e.g., missing a constraint);

4.The problem posed does not reflect a physical design that is realistic (you can’t fool mother nature!);

5.The algorithm used is not appropriate, or not sufficiently robust, for the problem at hand.

In the event of non-convergence, or indicate convergence to a solution that is not to one’s liking, the above items should be explored - roughly in the order presented. Next, we briefly comment on each of the above items. (1)Regarding coding errors, one simply needs to employ the debugging strategy of personal choice. This issue concerns computer coding in general, and is not exclusive to optimization. (2) The problem of scaling is addressed in detail later in this chapter. (3) As far as the formulation of the optimization problem is concerned, the material in this book is of direct help, in particular, the previous chapter on multiobjective optimization. Proper formulation is also an issue of common sense. Failing to include a constraint, for example, could yield a design that is not desirable. (4) The problem of seeking an unrealistic design is one that should be carefully examined. Often, relaxing the constraints will allow the search process to explore physically feasible designs. Another possible cause for unknowingly seeking an unrealistic design is the inappropriateness of the objective function (e.g., wrong weights in the weighted sum approach). As a final example, we could be trying to design a small table to support an elephant in a way that is impossible. All the modeling equations and optimization formulations issues might be seemingly fine, but we are simply asking for the impossible. (5) The final item presented concerns the appropriateness of the algorithm. For example: (i) the algorithm might not be sufficiently robust for problems of poor numerical conditioning; (ii) the algorithm might not be appropriate for problems of large dimensions; (iii) the algorithm might be limited to solving specific types of problems: continuous, discrete, or integer variables. (iv) the algorithm might not work well fornoisy (i.e., non-smooth) objective functions or constraints.

Before we present the approaches to address numerical conditioning issues in optimization, it is important that we first learn about certain numerical problems that can occur independently of the optimization process. Specifically, we find that the matrices themselves can be problematic, and so can the way that they are used in a given algorithm. These two issues are addressed next.

7.2.2 Exposing Numerical Conditioning Issues—Algorithms and Matrices

Through simple examples, we illustrate how we must concern ourselves with matrices and algorithm issues, in addition to those directly related to optimization. We provide an example of how numerical conditioning issues can affect us in a seemingly simple case. This telling example will sensitize us to the critical nature of numerical issues. For the sake of simplicity of presentation, we only use a 3 × 3 matrix. We also use a numerical computation that is simple and readily understood. In practice, matrices are much larger and computations are much more complex. In spite of the simplicity of the present case, the numerical difficulties presented are quite serious.

Consider Matrix A, which depends on α, given by

A =

(7.1)

The three eigenvalues of Matrix A can be evaluated as: 1∕α, 1, and α. As a result, the condition number of A (the square of the ratio of the highest to the lowest eigenvalue since A is symmetric) is 1∕α4 when α ≤ 1 and α4 when α ≥1. Therefore, we should expect that for a very low or a very high value of α, we may experience numerical difficulties, particularly when the algorithm within which it is being used is not well conditioned.

To explore the numerical properties of Matrix A, let n be any positive integer, and consider the expressions

(7.2)

(7.3)

where I is the identity matrix. Using elementary linear algebra, we can indeed verify that both A1 and A2 are identically equal to 3 × 3 zero matrices. We can further write the scalar equations

(7.4)

(7.5)

where we use the maximum norm defined as M = max{|mij|}, with mij denoting the ij-th entry of Matrix M.

We make the important observation that the zero answers in Eqs. 7.4 and 7.5 are exact only from a theoretical standpoint. When we compute A1 and A2 using a computer, we immediately observe that the numerical results depart markedly from the theoretical answers. To illustrate this important point, we present Table 7.1, where the incorrect answers are in bold face. In this table, we vary the parameter α in Matrix A, as well as the power n in the A1and A2 expressions.

Table 7.1. Ill-Conditioned Matrices and Algorithms

We further make three specific observations: (1) Even for high values of the condition number Cn, the quantity A1 is evaluated accurately. In fact, A1 is evaluated accurately for all the cases presented in Table 7.1. (2) Even for values of the condition number that are less than 100 (α = 0.4), the quantity A2 is unacceptably inaccurate, for n = 50. (3) Even though A1 and A2 are theoretically both equal to zero, the computation of A1 is more numerically robust than that of A2 . Finally, (4) in general, the stability of the algorithm and the condition numbers of the matrices involved can greatly impact the accuracy of the solutions obtained. For example, in optimization, how we pose a constraint (numerically) can impact the success of the optimization. Next, we expose the critical need for scaling in the following example.

7.2.3 Exposing Numerical Conditioning Issues—Optimization Problems

We provide a simple example of an optimization problem for which proper scaling is essential. We begin by stating the general optimization problem formulation as follows.

PROB-7.2-GOPF: General Optimization Problem Formulation

(7.6)

subject to

(7.7)

(7.8)

(7.9)

Next, we consider the following seemingly trivial optimization problem given by

(7.10)

subject to

(7.11)

(7.12)

Using the techniques later presented in this chapter on scaling, we find that the solution to this problem is x = 10-6 ×{3.083, 1.541}. The important message here is that, without proper scaling, the solution produced by MATLAB could be deemed incorrect. Specifically, MATLAB converged to the solution x = 10-6 ×{4.948, 2.474}, which has strongly inaccurate values of x. (We note that different MATLAB settings may lead to different equally erroneous answers). The danger here is that there is no indication that we are dealing with an incorrect solution. This simple example points to the importance of using various strategies to increase our confidence in the solutions obtained by optimization codes. Indeed, one of the more important ways to increase confidence is to implement proper scaling as mentioned earlier in this section, and presented in the next section.

7.3 Scaling and Tolerances for Design Variables, Constraints and Objective Functions

This section deals directly and explicitly with the all important subject of scaling in optimization. We all understand that it is important to learn how to formulate an optimization problem in particular cases, which may range from finance to engineering; and to learn useful theoretical aspects of optimization. However, unless we also pay special attention to the numerical issues of scaling, we may have serious trouble in practice. Fortunately, the information that is presented in this section provides the essence of what we need to know in practice.

Specifically, in practice, we need to know about scaling for (i) design variables, (ii) objective functions, and (iii) constraints. Depending on the problem, one, two, or all three of the above items may be critical. What do we mean by “critical"? We mean that with scaling, we may obtain faster convergence to the correct minimum. Without scaling, we may experience non-convergence or premature termination at a poor design, and not even realize that this is the case.

For convenience, we discuss scaling in the context of the MATLAB optimization routine fmincon, for which the optimization problem is posed as follows.

PROB-7.2-MATLAB: fmincon Optimization Model

(7.13)

subject to

(7.14)

(7.15)

(7.16)

(7.17)

(7.18)

where (i) A and Aeq are matrices that define the left-hand-sides of the linear constraints, (ii) b and beq are vectors that define the right-hand-sides of the linear constraints, (iii) c(x) and ceq(x) are vectors of nonlinear constraints, and (iv) LB and UB denote bounds on the design variables. Equation 7.18 defines what is called the side constraints. It is useful to note the similarities between PROB-7.2-GOPF and PROB-7.2-MATLAB. The latter provides the flexibility of differentiating between linear and nonlinear constraints. Doing so can provide significant numerical efficiencies. That is, it is numerically efficient to put the linear constraints in Eqs. 7.16 and 7.17, even though linear constraints can also be put in Eqs. 7.14 and 7.15; however, the reverse is not true. This MATLAB notation is the one used in Figs. 5.4 and 7.1. The former has no scaling, while the latter does.

Figure 7.1. Optimization Flow Diagram—with Scaling

Figure 7.1 illustrates the optimization process using fmincon (or any code that conforms with the PROB-7.2-MATLAB problem definition). The general scaling process is discussed with the aid of Fig. 7.1. Scaling is discussed in terms of its three generic components: (i) design variables, (ii) objective functions, and (iii) constraints. Upon examining Fig. 7.1, we make the following important observations: (i) The variables that immediately enter and leave the optimization routine (fmincon) are scaled variables, and (ii) these variables are immediately unscaled if/when they are used in the main file, the objective function, or the constraint function modules. Stated differently, the variables are scaled as they enter fmincon, and they are unscaled as they leave fmincon for use anywhere else. The variables are scaled as they enter fmincon because the optimization routine needs to use variables in the desirable numerical ranges to operate properly. However, the variables must regain their unscaled values when they are used in the actual model models. Figure 7.1 depicts both the unscaling processes (triangle) and the scaling processes (circle). Note that in this chapter, ()s denotes the scaled version of the variable (), except in the cases of αss, and γs, which are prescribed scaling constants.

In Fig. 7.1, we describe where scaling takes place for the design variables. We note that, in a similar fashion, scaling also takes place for the quantities c(x), ceq(x) and f(x), as shown in Fig. 7.1. The quantity Is in Fig. 7.1represents the set of scaling parameters that are used for the scaling and unscaling of various quantities. This set includes αss, and γs, which are all defined in this section. Next, we discuss the important issue of numerical accuracy representation, which will be followed by the development of scaling approaches.

7.3.1 Understanding the Accuracy of the Reported Results

An important aspect of our work is to understand (i) the accuracy of the results we obtain, and (ii) how to report these results. These issues entail two related components. The first involves understanding and controlling the accuracy of the results produced by the optimization code using scaling. The second involves understanding the inherent accuracy of the physics-based models used in the optimization. We discuss each next.

Accuracy of Results of Optimization Code:

As we will learn in this section, we have the ability to control the accuracy of the results that the optimization code produces. We can do so with scaling, in conjunction with prescribing certain parameter settings in the optimization code. As we examine the numbers that the optimization code produces, we keep in mind that the computer has in its memory many more digits than we report for that number. It is up to us to decide/understand howmany digits of the number are significant (i.e., meaningful).

As we report the numbers, we may choose to either put zero digit in the place of the meaningless digits or use exponent notation. For example, we may use 12,345,000, or 1.2345 × 107, for five significant digits (see Table 7.2), even though the first 8 digits in the computer memory might be 12,345,678. Note that as we decided to report five significant digits of the computer results, we did so simply because we determined that the optimization converged to within five significant digits. This convergence does not tell us that these five significant digits are physically meaningful. Next, we discuss this physical aspect in greater detail.

Table 7.2. Numerical Accuracy Definitions



Generic Number

DA

NSD


0.012,3

10-4

3

100.012,345,6

10-7

10

123,000

103

3



DA: Decimal Accuracy

NSD: No. of Significant Digits of Accuracy

Accuracy Results with Physics-Based Model:

In addition to the accuracy of the optimization code results discussed above, we must also concern ourselves with the accuracy of the physics-based models being used. For example, if we obtain the optimal value of the deflection of a beam as 1.234,56 × 10-6, perhaps only 3 or 4 of these digits might be physically meaningful, even though all 6 might be significant in the optimization code convergence.

If we wish to know how many digits are physically meaningful, we need to ask the structural engineer who developed the model. He or she might tell us that the deflection model that we are using only provides an accuracy of approximately four significant digits. In this case, we would only report to the outside world the number 1.234 × 10-6. We conclude by noting that if (i) we report nr significant digits of accuracy, (ii) the physics-based model produces npm significant digits of accuracy, and (iii) the optimization code converges to within nop significant digits of accuracy, then it is advisable to use nr such that

(7.19)

where nr need not be the same for each resulting number (e.g., design variable). That is, the accuracy for deflection might be different from the accuracy for stress in a given model. Next, we present the scaling approach (the how) for design variables, objective functions, and behavioral constraints.

7.3.2 Design Variable Scaling—Order of Magnitude (DV-1)

Design variables scaling is often desirable anytime the order of magnitude of the design variables is much higher or much lower than one. Similarly, when the actual/unscaled values of the design variables are in the desirable order of magnitude, no scaling is typically necessary. As illustrated in Fig. 7.1, the scaling of the design variables takes place in three places: (i) the main file, (ii) the objective function evaluation, and (iii) the constraints evaluations.Although there are sophisticated approaches for scaling, in most practical cases, we can simply multiply each design variable by a constant that will (i) bring it close to 1, and/or (ii) address some tolerance or accuracy issues to be discussed later.

To bring the magnitude of the design variable close to “one", we use the following trivial form of design variable scaling

(7.20)

where nx is the number of design variables, and αis is a constant that is chosen to make the quantity αisx i approximately equal to one (or on the order of one). Therefore, Eq. 7.20 represents the scaling process that takes place in the circles of Fig. 7.1 that pertain to the design variables. Conversely, the unscaling process takes the form xi = xis∕α is in the triangles. The upper and lower bounds of the design variables (in the side constraints) can be scaled as

(7.21)

Note that in the case where the initial design variable is several orders of magnitude different from the final or optimal value, it may be necessary to perform more than one optimization run, each with an updated scaling (i.e., each time using the final design variable value of one run as the initial value of the next run).

Let’s further clarify. Say we have x3-initial = 123,45.67, but x3-optimal = 0.012,345 (much lower magnitude). In this case, to perform the optimization, we determine the scaling factor using x3-initial, which has a higher magnitude. This scaling factor might not yield an accurate answer for x3-optimal, which has a significantly lower magnitude than x3-initial. To increase confidence in our results, we determine a new scaling factor using the possibly inaccurateresults obtained for x3 from the first optimization run. We then perform a second optimization run using the updated scaling factor. This second optimization run will generally yield more accurate results.

7.3.3 Design Variable Scaling—Tolerance Definition (DV-2)

Before we begin, let us clarify our terminology pertaining to accuracy. Specifically, we recall the definitions of Decimal Accuracy (DA) and Number of Significant Digits (NSD). We observe the definitions of these terms by inspecting Table 7.2, where three generic numbers are examined.

Next, we proceed with the development approach to scaling design variables. As previously discussed, the scaling of the design variables only needs to take into account their orders of magnitude. The proper order of magnitude of the design variables will promote improved convergence of the algorithms. However, having the proper order of magnitude alone will not address the all important issue of the desired accuracy of each variable. To explain this issue, let us consider two scenarios:

Scenario 1
If we are optimizing the thickness of a sheet of paper with an order of magnitude of 10-3 (say, 0.001,234,567), we might need the decimal accuracy (or tolerance) of the pertinent design variable to be 10-6, yielding the number 0.001,234, which has 4 significant digits.

Scenario 2
Similarly, if we are optimizing the annual revenue of a company with an order of magnitude of 109 (say, $1,234,567,891.234), we might need the decimal accuracy (or tolerance) of the pertinent design variable to be 103, yielding the number $1,234,567,000, which has 7 significant digits.

7.3.4 Design Variable Scaling—Optimization Code Decimal
Accuracy Setting (DV-3)

As we see, the number of significant digits of a number depends on its decimal accuracy, as generated by the code. All codes, in fact, provide us with the means to specify this decimal accuracy. In the case of MATLAB, the optimization parameter setting called TolX governs the decimal accuracy of the entries of the design variable vector x. The default value of TolX is 10-6. Accordingly, in the case of Scenario 1 above, the nominal result would be 0.001,234; however, in the case of Scenario 2 above, the nominal result would be $1, 234, 567, 891.234, 000. At this point, we make two important observations. In Scenario 1, the code needs to produce only four significant digits, which poses no particular difficulty. In Scenario 2, however, the code needs to produce 16 significant digits of accuracy, which is problematic. In the latter case, the code will either not converge to this many significant digits, or it may take an unduly long time to do so.

7.3.5 Design Variable Scaling—Combining Order of Magnitude and
Desired Tolerance (DV-4)

We discuss the order of magnitude and the desired tolerance of design variables in terms of the resulting number of significant digits of accuracy. We again discuss these concerns with the aid of Scenarios 1 and 2.

In Scenario 1, if we are satisfied with four correct digits as presented above, all is well. If, instead, we would prefer five significant digits of accuracy, we can perform scaling and let αis = 10 (Eq. 7.20) for the pertinent variable. In this case, the code now works with xis=0.012,345,67; and since TolX=10-6, the code will yield the result xis=0.012,345, which has five significant digits of accuracy, as desired. Recall that we need to unscale using the relation xi =xis∕α is, yielding the unscaled/correct value xi=0.001,234,5, which has the same number of significant digits of accuracy (i.e., five). We note that an alternative approach for obtaining five significant digits (alternative to scaling) is to let TolX=10-7, which would again yield xi=0.001,234,5. However, this approach is not generally advised since changing TolX in MATLAB affects all the design variables, and not just xi. In addition, significantly altering the default tolerance values of any code may result in unpredictable consequences on the operation of the code.

In Scenario 2, if we would like to have 7, rather than 16, significant digits of accuracy, we can simply scale and let αis = 10-9, leading to xis=1.234,567,891,234, yielding the final value xis=1.234,567, which has 6 decimal digits ofaccuracy (TolX=10-6) and 7 significant digits of accuracy, as desired. Again, recall that we need to unscale using the relation xi = xis∕α is, yielding the unscaled/correct value xi=$1,234,567,000, which has the same number of significant digits (i.e., seven).

7.3.6 Design Variable Scaling—Setting Scaling Parameters (DV-5)

As noted above, the magnitude of the design variable xi has a direct impact: (i) on the success of the optimization (i.e., convergence), (ii) on the computation time, and (iii) on the accuracy of the final result (e.g., tolerance). We note that the magnitude that is most important is that of the design variable at the end of the optimization, where the desired tolerance is meaningful. To bring the final/optimal design variable to an order of magnitude of one, we need to divide the design variable by a number that is approximately equal to its optimal value. As such, the user could specify his or her best guess as to what the typical optimal value of xi might be. When we do so, the optimal value will have approximately six decimal digits of accuracy (TolX=10-6), and seven significant digits of accuracy, since the optimal value is approximately on the order of one. Specifically, we let

(7.22)

in Eq. 7.20 (with TolX=10-6), where n dvi denotes an exponent parameter setting for the i-th design variable. Using Eq. 7.22 is expected to yield approximately 7 + ndvi significant digits of accuracy. If, instead, we let αis = 10ndvi , we will simply have increased the accuracy of the i-th design variable by ndvi digits compared to its accuracy without scaling. When ndvi is negative, we will have decreased the accuracy by ndvi digits. Finally, we note that the absolute value of ndvi should generally be no greater than three, to avoid requesting excessive accuracy.

In the case of MATLAB, the designer is allowed to specify the typical values of the design variables of the vector x by prescribing the vector TypicalX. Using the vector TypicalX and the scalar TolX, one is expected to obtain the desired number of significant digits of accuracy. Therefore, one has the option of setting the above two MATLAB parameters, or to simply set a scaling parameter for each design variable. The latter approach offers more flexibility.

As we examine the development presented above, a word of caution is in order. When we predict a given number of significant digits, our prediction is only approximate. We also note that different codes behave differently, and different versions of the same code will often behave differently. However, the general ideas presented in this section embody the important message that we need to know.

We conclude with two important notes. In the case where a given initial or final design variable is nearly zero, it may be unnecessary to perform scaling for that design variable. Moreover, in the case where the design variables are nearly of the same order of magnitude (and different from one), with similar tolerances, it may be acceptable to use a single value of the constant α for all design variables.

7.3.7 Objective Function Scaling

Most optimization codes are designed to perform well when the objective function value is on the order of one. While many codes offer significantly more flexibility, it is nevertheless safer to let the objective functions take on the desired order of magnitude through appropriate scaling. For values that lie significantly outside of this range, various aspects of the optimization algorithm become ill-conditioned. The finite difference derivatives, for example, take on undesirably large or small magnitudes.

For the purpose of optimization, the scaling of the objective function only needs to be performed after its evaluation in the objective function module, before it is passed to fmincon. This scaling is illustrated in the circular functional block after the objective function evaluation in Fig. 7.1. The mathematical expression for this scaling can take the form

(7.23)

where βs is a constant chosen so as to bring f(x) within the desirable range. At the conclusion of the optimization run, fmincon returns the optimal value of the objective function. We recall that this is the scaled value. Therefore, before we use it, we must unscale it in the main file (the unscaling is not shown in Fig. 7.1) by simply using the equation f = f s/βs.

Much of the discussion regarding accuracy in the earlier subsections for the design variables also applies in the case of the objective function, and will not be repeated here in detail. The number of significant digits of accuracy of the objective function is obtained with the aid of scaling, in conjunction with the MATLAB setting called TolFun. The default value of TolFun is 10-6. TolFun represents the decimal accuracy of the objective function f (x).

In a manner similar to the discussion above, we obtain the desired accuracy of the objective function by scaling with

(7.24)

(TolFun=10-6), where n obj denotes an exponent parameter setting, which is expected to yield approximately 7 + nobj significant digits of accuracy. The constant ftypical represents a number that is of the order of magnitude of the optimal value of the objective function. If, instead, we let βs = 10nobj , we will simply have increased the accuracy of the objective function by nobj digits when compared to its accuracy without scaling. When nobj is negative, we will have decreased the accuracy by nobj digits. Again, we note that the absolute value of nobj should generally be no more than three to avoid requesting excessive accuracy. In the case where the optimal value of the objective function is near zero, scaling may be unnecessary.

We conclude this discussion on objective function scaling by noting that some codes set the accuracy of the objective function by prescribing the number of significant digits (say, 6), rather than the decimal accuracy (say, 10-6) at convergence. For example, if the optimal value of the objective function is fopt=1,234.123,456,789, the former case will yield fopt=1,234.12, while the latter case will be fopt=1,234.123,456 (without scaling). On the other hand, if we have fopt=0.000,001,234,567, the former case will yield fopt=0.000,001,234,56, while the latter case will be fopt=0.000,001 (without scaling). We can see the relative advantages of each approach. However, with scaling, both approaches can be made to yield the same answer.

7.3.8 Behavioral Constraints Scaling

Behavioral constraints, in some sense, describe aspects of the system behavior. They do so in the form of imposed constraints that can be equalities or inequalities. In PROB-7.2-MATLAB, the behavioral constraints are expressed by Equations 7.14 to 7.17. As stated earlier, the first two constraints are nonlinear, while the last two constraints are linear. Equation 7.18 concerns the design variables (i.e., side constraints), and is addressed earlier under design variable scaling. In keeping with the scope of this book, we will only explicitly discuss the scaling of nonlinear behavioral constraints. The essence of the following discussion on nonlinear behavioral constraints also applies to linear behavioral constraints. However, linear constraints also entail other complex numerical conditioning issues that are often automatically addressed in the linear programming algorithm, and do not require our attention here. Next, we address nonlinear equality constraints. The same techniques apply to inequality constraints. Consider the vector of constraints

(7.25)

In MATLAB, as in most optimization codes, the above constraint is satisfied to within a given decimal accuracy. In MATLAB, the pertinent tolerance parameter is called TolCon, and its default value is TolCon= 10-6. Specifically, in satisfying the constraints in Eq. 7.25, the optimization algorithm will stop after the relation

(7.26)

is satisfied, where nec denotes the number of nonlinear equality constraints. Consequently, for example, ceq(x) = 10-7 will be considered numerically acceptable. However, this situation might be physically unacceptable. Consider the following two scenarios.

Scenario 1

Consider the constraint 2x12 + x 2 = 0. Assume that the normal/good values of the design variables are on the order of 10-7, say, x 1 = x2 = 2.5 × 10-7. By inspection, we see that these values of x do not satisfy the constraint (x2 is not equal to -2x12.) In this case, we have 2x12 + x 2 = 2.500,001,25 × 10-7 and Eq. 7.26 yields 2.500,001,25 × 10-7 ≤ 10-6, which numerically satisfies the constraint. This is in disagreement with our view that the constraint is notsatisfied.

To correct this situation, we multiply the left- and right-hand-sides of the constraint by 106, which yields (2x12 + x 2) × 106 = 0. We let ceq(x)1 = (2x12 + x 2) × 106. Substituting the values of x in ceq(x)1, Eq. 7.26 yields2.500,001,25 × 10-1 ≤ 10-6, which is clearly not satisfied - in agreement with our view. The important message here is that, by multiplying the constraint by a constant, we can enforce satisfaction of the constraint within the accuracy that we desire.

Scenario 2

Consider the constraint x1 = 2x2, which yields ceq1 = x1 - 2x2. Assume that the design variables are on the order of 106; and the optimization yields x1 = 2,222,222.33, and x2 = 1,111,111.11. For practical purposes, we might consider these values of x to satisfy the constraint. However, Eq. 7.26 yields 0.11 ≤ 10-6, which is clearly not satisfied - in disagreement with our view.

We could satisfy the constraint by increasing TolCon to be greater than 0.11, but such a high value of TolCon will likely compromise the operation of the optimization code and adversely affect the other constraints. Alternatively, could try to force the code to generate a very high number of digits of accuracy, say x = {2,222,222.222,222,2; 1,111,111.111,111, 1}, but this approach would require too many iterations and often will not converge.

To correct this situation, we follow a path similar to the previous approach. We multiply the left- and right-hand-sides of the constraint by 10-6, yielding (x 1 - 2x2) × 10-6 = 0. We let ceq(x)1 = (x1 - 2x2) × 10-6. Substituting thevalues x = {2, 222, 222.33; 1, 111, 111.11} in ceq(x)1, Eq. 7.26 yields 0.11 × 10-6 ≤ 1.0 × 10-6, which is clearly satisfied - in agreement with our view. The message here is the same as above; namely, that by multiplying the constraint by a constant, we can enforce the satisfaction of the constraint within the accuracy that we desire. In doing so, we save unnecessary computation and we increase the likelihood of successful convergence.

Constraint Scaling Approach: The above discussion leads us to a simple approach to scaling equality constraints. We simply let

(7.27)

where ceq(x)i denotes the ith entry of ceq(x), ceq(x)is is the scaled value of ceq(x) i, γis is a constant that is used to increase or decrease the degree of constraint satisfaction, and nec denotes the number of equality constraints. At this point, an important question arises: What is a good value for γis? As we can see from the above discussion, the answer depends on (i) the accuracy to which we need the constraint (Eq. 7.25) to be satisfied, and (ii) the value of theconstraints setting of the optimization code that we are using (TolCon in the case of MATLAB). We can address this question in various ways.

First, the most direct answer to this question is to simply let γis = 10n or 10-n if we wish to increase or decrease the accuracy of the constraint satisfaction, respectively. The greater the integer n, the greater the increased or decreased constraint satisfaction.

The next approach involves a pseudo-normalization scheme. We let

(7.28)

where ceqi-nominal is a nominal value of ceqi and neqi denotes an exponent parameter setting for the i-th constraint. We note the similarities between Eqs. 7.22, 7.24 and 7.28. However, while values for xtypical and ftypical can generally be estimated, the typical/nominal value of ceqi is known and is zero. Therefore, we cannot divide by it in a fashion similar to Eqs. 7.22 and 7.24. In the present case, we used what we call ceqnominal. To determine good values ofceqnominal, we are guided by the constraint Scenarios 1 and 2 above, and present the following three representative cases:

(7.29)

(7.30)

(7.31)

where x2-typical is a typical value of x2, used to make one term in ceq(x)i be close to “one.” If we let neqi = 0 in Eq. 7.28, in each of the three cases above, the value chosen for ceqi-nominal will be such that one term in ceqis will be on the order of one. Since “one” is much larger than TolCon, the remaining terms in ceq(x)i will attempt to cancel the “one" term as much as possible, thereby promoting a meaningful satisfaction of the constraint. In the cases where we wish to further increase or decrease constraint satisfaction, we can simply choose a positive or negative integer for neqi in Eq. 7.28, as previously discussed.

We again note that all scaling functions above are presented as simple multiplicative constants - a proportional function. These comments conclude the presentation of the scaling approach. Next, we provide useful information regarding practical MATLAB implementation, followed by some insightful examples.

7.3.9 Setting MATLAB Optimization Options and Scaling Parameters: Syntax

In our approach to scaling, we discussed the need to sometimes change certain optimization settings. Here, we provice the syntax for changing them in MATLAB. We can change them in the following ways:

Several at once:

options = optimset(‘TolFun’, 1e-8, ‘TolX’, 1e-7, ‘TolCon’, 1e-6)

In this case, several settings are used simultaneously and assigned to the set called “options,” which will be used in the optimization function call “fmincon.”

One at a time:

options1 = optimset
options2 = optimset(options1, ‘TolFun’, 1e-8)
options3 = optimset(options2, ‘TolX’, 1e-7)
options = optimset(options3, ‘TolCon’, 1e-6)

To obtain more thorough information regarding the MATLAB settings parameters; we can type, on the MATLAB prompt, “help optimset.” The options used are: Display, TolX, TolFun, TolCon, DerivativeCheck, Diagnostics, FunValCheck, GradObj, GradConstr, Hessian, MaxFunEvals, MaxIter, DiffMinChange and DiffMaxChange, LargeScale, MaxPCGIter, PrecondBandWidth, TolPCG, TypicalX, Hessian, HessMult, HessPattern, PlotFcns, andOutputFcn.

Fortunately, we rarely have to deal with most of these parameters. To obtain general information regarding the optimization settings; we can type, at the MATLAB prompt, “help optimoptions.”

The left-hand-sides of any of the above optimset calls define a set of “options" that can be used in the fmincon call, which will be used during optimization as follows.

[xopt,fopt] = fmincon(‘fun’,x0,A,B,Aeq,Aeq,LB,UB,‘nonlcon’,
options)

In addition to setting the “options" parameters as performed above, we also need to define the scaling parameters and pass them to the various routines. In Fig. 7.1, these scaling parameters are defined by Is. Specifically, the scaling parameters αs (alphas), βs (betas), and γs (gammas) can be defined in the main calling function and passed to fmincon, the constraint function nonlcon.m, and the objective function objfun.m. The commands are:

[xopts,fopts] = fmincon(‘objfun’,x0s,A,B,Aeq,Aeq,LB,UB,
‘nonlcon’,... options, alphas, betas, gammas);

[Cs,Ceqs] = nonlcon(xs, alphas, betas, gammas);

fs = objfun(xs, alphas, betas, gammas);

This presentation of the MATLAB syntax concludes our discussion of scaling approaches. Next, we provide some simple examples. A larger example is presented in Sec. 7.7.

7.3.10 Simple Scaling Examples

Here we perform the scaling for some simple examples.

Example 1

Consider the optimization problem of Eqs. 7.10 to 7.12. As previously stated, the MATLAB answer with scaling is x = 10-6 ×{3.083, 1.541} and f = 1.724 × 10-12, while the MATLAB answer without scaling is x = 10-6 ×{4.948,2.474} and f = 6.0714 × 10-12. This is a case where scaling is indeed required or, possibly, the tolerance parameters must be changed. We previously discussed why scaling is generally the preferable approach.

Let us scale the design variables, the objective function, and the constraint:

(i)From Eqs. 7.20 and 7.22, and observing the orders of magnitude of the design variables, we let ndvi = 0 and xi-typical = 10-7, with (i = 1, 2).

(ii)From Eqs. 7.23 and 7.24, and observing the orders of magnitude, we let nobj = 0 and ftypical = 10-12.

(iii)Similarly, according to Eqs. 7.27 and 7.28, we let neqi = 0 and ceqnominal = 10-6, using x2-typical, according to Eq. 7.31.

Using these scaling parameters, MATLAB indeed yields the accurate answers above.

We note that, in some cases, we may need to perform some preliminary investigation in order to obtain the pertinent orders of magnitude of the various quantities needed to implement the scaling. Finally, we note that the answers we obtain should not be sensitive to the scaling parameters we used. That is, if we moderately increase the scaling, we should obtain similarly more accurate answers. If our answers change significantly, then our scaling might be inadequate.

Example 2

Consider a slightly modified version of the optimization problem of Eq. 7.10 used in Example 1, where the objective function includes a constant term. From optimization theory, the design variables of the modified problem should not change from their original values; only the objective function should change. Let us examine the resulting pitfalls. The modified version reads

(7.32)

subject to

(7.33)

(7.34)

Without scaling, the MATLAB answers are x = 10-6 ×{4.948, 2.474} and f = 1.112×10-8, which can be deemed incorrect. Note the low orders of magnitude of the various quantities. We also note that this problem poses numerical issues, in part, because the objective function is fairly flat near the minimum. We implement scaling in accordance with our previous discussion:

(i)For the design variables (Eqs. 7.20 and 7.22), we let ndvi = 0 and xi-typical = 10-7; (i = 1, 2).

(ii)For the objective function (Eqs. 7.23 and 7.24), we let nobj = 0 and ftypical = 10-8.

(iii)For the equality constraint (Eqs. 7.27 and 7.28), we let neqi = 0 and ceqnominal = 10-6.

This scaling yields a MATLAB answer of x = 10-6 ×{1.200, 0.600} and f = 1.1116 × 10-8, which is deemed incorrect. We know that it is incorrect because, in this particular case, we know the correct answer from above. It is important to note that we could have easily been fooled into thinking that, since we implemented scaling, our answer is correct. The way we deal with this is to make sure we develop confidence in any answer we obtain throughoptimization, a topic we discuss in Sec. 7.6.3. In the case of scaling, what we have done thus far is applied guidelines. We used nobj = 0 in accordance with the guidelines provided. We need to explore whether changing nobj (or changing any of the other scaling parameters) is not necessary. Indeed, if we let nobj = 2, MATLAB yields x = 10-6 ×{3.0828, 1.541} and f = 1.111 × 10-8, which is correct.

As a final comment, we wish to emphasize that the material of this section is a comprehensive presentation of information that we would learn over perhaps years of trial and error in optimization, which, to our knowledge, is not provided in any other textbook in this form. What we have provided is a set of general guidelines and insights into the numerical processes that will be very helpful to you. We have not given strict predictions of computational outcomes as it might well be impossible to do so. In fact, we should sometimes expect the unexpected. Final numerical results depend on too many unknown parameters for us to make predictions with any kind of certainty. However, we have provided you with a set of useful tools and understandings that should be of critical assistance in practice. An example of a larger scaling problem is presented in Section 7.7.

7.4 Finite Difference

This section presents finite difference and the important role it plays in computational optimization. The first subsection introduces the fundamentals of finite difference and the second presents its pertinent accuracy issues.

7.4.1 Fundamentals of Finite Difference

A particular class of optimization algorithms is known as gradient-based algorithms. This class of algorithms uses the gradient of the objective function and of the constraints in the search for the optimal solution. As discussed in Sec. 2.5.4, the gradient of a scalar-valued function is a vector of the partial derivatives of a function with respect to each of the design variables. Specifically, the gradient of a scalar function f (x), where x is an n-dimensional columnvector, is given by

(7.35)

The gradient vector is used not only to govern the search, but also to help decide when the optimum is reached and the search terminated. When the optimum is reached, we say that the optimality condition is satisfied.

Generally, the value of the objective function is evaluated using a complex computer code. As such, we do not have an explicit analytical expression for the objective function that can be used to evaluate its gradient. As a result, an adequate approximation is used instead. This approximation is referred to as a finite difference derivative, as opposed to an analytical derivative [4]. The finite difference derivative of f(x) at a point x0 is given by

(7.36)

with

(7.37)

where Δxi is a small deviation of xi about xi0, and Δf0 is the corresponding variation in f(x). The quantity can be evaluated using three typical approaches: (i) forward difference, (ii) backward difference, or (iii) central difference. We express each as follows:

Forward Difference

(7.38)

Backward Difference

(7.39)

Central Difference

(7.40)

Further, we note that the finite difference approximation entails an error that can be partially explained by the equation

(7.41)

where ∝ (Δxi)2 is a term proportional to (Δx i)2 that is ignored by the finite difference approximation above, and HOT represents additional Higher Order Terms (proportional to (Δxi)nh ; nh > 2) that are also ignored. The smaller the magnitude of Δxi, the more negligible the ignored terms become and the more accurate the finite difference, at least in theory. However, as we will see later in this section, there is a limit to the acceptable smallness of Δxi in practice.

It is interesting to think of the three finite difference evaluation options, both in mathematical terms and in geometrical/graphical terms. Equations 7.38, 7.39, and 7.40 provide the expressions for the mathematical evaluations of the forward, backward, or central difference, respectively. Similarly, Figs. 7.2 (a), (b), and (c) provide the respective graphical interpretations of these finite differences, in the case where x is a scalar.

Figure 7.2. Graphical Representation of Finite Difference Approximation

We make the following observations:

(i)The solid line represents the tangent line at the point x0. The slope of the tangent line is exactly the derivative of f (x) at the point x0.

(ii)The dashed line represents the so-called secant line. The slope of the secant line is equal to the finite difference value.

(iii)As Δx tends to zero, the secant line converges to the tangent line; and the finite difference (Eqs. 7.38, 7.39, or 7.40) converges to the gradient (Eq. 7.35). However, as we will see shortly, excessively small values of Δx pose some numerical difficulties.

(iv)Number of Function Calls - Objective Function: When x is a scalar, the gradient is also a scalar (or a one-dimensional vector). In this case, the forward, backward, and central difference evaluations require two function calls each (see Eqs. 7.38, 7.39, and 7.40). In the case where the vector x has dimension nx, then the finite difference approximation of the gradient requires nx + 1 function evaluations for forward and backward difference (i.e., one evaluation at x0, and nx evaluations obtained after deviating each of the nx entries of the vector x). Note that the case of central difference requires 2nx function evaluations. In this latter case, there is no evaluation at x0; weinstead deviate each variable forward and backward (see Eq. 7.40).

(v)Number of Function Calls - Constraints: When we have constraints, and we also use a finite difference approximation for the gradient in the constraints, it may lead to a large number of constraint functions evaluation. For example, if we have neq constraints and we use forward difference, the finite difference evaluations will require neq(nx + 1) constraint function evaluations.

(vi)The central difference option generally yields more accurate answers (see Fig. 7.2), but also requires more function evaluations as discussed above.

This discussion leads us to the all important topic of the accuracy of the finite difference approximation.

7.4.2 Accuracy of Finite Difference Approximation

The success of all gradient-based optimization algorithms, as their names suggest, strongly depends on the accuracy of the evaluated gradient vector. An important question at this point is: How accurate is this finite difference approximation? This is a critical question, since gradient-based optimization is one of the most popular approaches in practice.

Assume that the function f (x) has nsda significant digits of accuracy. As a rule of thumb, the number of digits of accuracy of derivatives drops by half (nsda2). For the second derivatives, it drops by another half (nsda4). Please keep in mind that this is indeed a rule of thumb. In practical cases, the situation could be much worse or much better. The resulting finite difference values may become useless in the optimization algorithm, and result in seriousconvergence difficulties.

Optimizing with Experimental Data

An important practical situation of interest occurs when experimental data is used for the objective function or the constraint functions. This situation may have low accuracy (say six digits.) In this case, the first derivative may only have three digits of accuracy, and the second derivative might be practically unusable in an optimization algorithm.

In these cases, it may be useful to first develop so-called response surfaces of the resulting data. Once obtained, it might be more reliable to optimize using these surfaces, which are essentially a best-fit of the data available. At a basic level, the situation is straightforward; that is, (i) form a best-fit function of the data (make the best fit as smooth as possible) and (ii) optimize using this best fit function. This approach works quite well. The details of this topic are beyond the scope of this book. References [5] and [6] offer representative works in the area. The first is a fundamental book on response surface methodology. The second provides response surface information from the perspective of design of experiments, within one concise chapter.

How to Impact Finite Difference Accuracy

We can impact finite difference accuracy in three basic ways. Fortunately, most optimization codes perform well in promoting the maximum accuracy of the obtained results. However, there is much that we can also do. The three basic approaches are as follows.

(1)Adequate Scaling. Adequate scaling (as previously discussed) will address: (ii) the magnitude of the objective functions, (i) the magnitude of the design variables, (iii) the magnitude of the constraints, and (iv) the pertinent setting parameters of the optimization codes. When these issues are addressed, finite difference will tend to perform more effectively.

(2)Forward, Backward, or Central Difference: As previously discussed, forward and backward differences provide similar accuracies, while central difference provides greater accuracy. However, the central difference is morecomputationally intensive. Depending on the computer labor involved in evaluating the objective functions and constraints, we may decide to choose one option vs. another.

(3)The Magnitude of Δx: This last consideration is the most critical. Too small or too large a magnitude will result in excessive inaccuracies. Let us consider each scenario.

iToo large a magnitude of Δx makes the secant line too distinct from the tangent line. Their corresponding slopes become too dissimilar (i.e., the finite difference and gradient become too different). This situation is readily seen inFig. 7.2.

iiToo small a magnitude of Δx also makes the corresponding Δf too inaccurate. Let us explain. Assume that the function f(x) possesses eight significant digits of accuracy. (This means that all digits beyond the 8th are useless.) Letf(x0) = 1.234,567,812,34, where the first eight (bold) digits are accurate/significant and the last four digits are useless/incorrect. Let us consider two cases.

CASE 1: Assume that Δx is so small that f(x0x) = 1.234,567,825,67. In the case of forward difference, Δf0 = f(x0 + Δx) - f(x0). The key observation here is that, using the above numbers, we obtain Δf0 = 0.000,000,013,01.However, only the bold digits are meaningful, leaving no accuracy at all. This evaluation of the finite difference will have no digit of accuracy. This loss of accuracy is referred to as a cancelation error. Cancelation errors may occur when we subtract two numbers that have similar magnitudes and an insufficient number of digits of accuracy.

CASE 2: Assume that Δx is now sufficiently large so that f(x0 + Δx) = 1.234,688,925,57, where the change in f(x) takes place in the last four significant digits, leaving the first four unchanged. In this case, Δf0 = 0.000,121,113,23. We have four significant digits of accuracy in the finite difference computation. This is half the number of digits of accuracy in f(x), which reflects a good compromise between too large or too small a value of Δx.

Selection of Finite Difference Method

When using the MATLAB Optimization Toolbox, one can select how to evaluate the finite difference derivatives. The pertinent option is set up using the command optimset. For the function fmincon, two options are available: ’forward’ (the default) and ’central’ (about the center). The command to select the central difference method is shown as follows.

options = optimset(’FinDiffType’,’central’)

Example of Finite Difference Accuracy

Table 7.3 provides a specific example of the number of significant digits of accuracy available in a given finite difference case. We perform forward and central finite difference. The results are self explanatory. We note thatMATLAB was used, and that it provides 16 significant digits of accuracy by default. The finite difference is evaluated for two values of x. We make the following observations:

Table 7.3. Example of Finite Difference Accuracy (f(x) = 4x4 + 2x3 + 1∕x)

(1)The 3rd column reports 11 (of the available 16) significant digits of accuracy for the gradient - full MATLAB accuracy. These results are used as a benchmark to determine finite difference accuracy.

(2)For very small or very large values of Δx, the number of Digits of Accuracy (DoA) is expectedly lower. In fact, for two cases, we have no accuracy at all (Number of DoA = 0).

(3)For mid-range values of Δx, the accuracy is generally the highest.

(4)Central difference is predictably more accurate than forward difference.

(5)Generally, the finite difference yields approximately six to eight digits of accuracy. This is somewhat less than one-half of 16 - the accuracy of f(x) in MATLAB.

(6)These trends are shown graphically in Fig. 7.3, where the horizontal axis depicts - log Δx (smaller values of Δx on the right and larger values on the left).

Figure 7.3. Finite Difference—Number of Digits of Accuracy

We conclude with the following important message: (i) When Δx is too small, we have large errors because f(x) has limited accuracy (16 DoA in MATLAB). (ii) When Δx is too large, we have large errors because the secant line departs from the tangent line. (iii) The greatest accuracy we can generally obtain for the finite difference is approximately one-half the accuracy of the function f (x). With this understanding of the practical limitations of finite difference, we now turn to another option that is largely immune to these numerical issues: Automatic Differentiation.

7.5 Automatic Differentiation

As discussed in Sec. 7.4 above, optimization codes that are gradient-based need the differentiation of the objective function and of the constraints with respect to the design variables to allow them to march toward the optimum. In most cases, we simply let the code evaluate this differentiation using finite difference. In simple cases, we could manually write a code that evaluates these differentiations analytically, as opposed to letting the code use finite difference. Let us refer to this option as Analytical Differentiation. We could then provide the analytical differentiation code to MATLAB, or to any other optimization software that we are using (if it has the capability to use that code).In more complicated cases, it is generally much easier and more practical to let the optimization code use finite difference to obtain the needed differentiations. However, as we see from Sec. 7.4, finite difference can be computationally intensive, as well as computationally ill-conditioned, leading to potential complications.

We note important tradeoffs between analytical differentiation and finite difference:

Analytical Differentiation

(i)It is highly efficient, computationally.

(ii)It promotes faster and more certain convergence, and is numerically stable.

(iii)It is more difficult and time consuming to implement, in practice, for problems of realistic complexity.

(iv)It is generally not implemented in optimization codes and must be explicitly implemented by the user.

Finite Difference

(i)It is computationally intensive.

(ii)It generally converges more slowly than with the finite difference approach, and is more likely to lead to numerical instabilities.

(iii)It is much easier to implement in practice.

(iv)It is generally an integral component of the gradient-based optimization codes, requiring no user coding.

Automatic Differentiation

Interestingly, there is yet another approach that offers us the best of both worlds. It is the Automatic Differentiation approach. Specifically, it offers the benefits of analytical differentiation and finite difference without their respective complications. It essentially employs analytical differentiation. However, it does not require the user to develop the analytical expressions for the derivatives manually. These derivatives are automatically developed by another software - the Automatic Differentiation software. We explain in detail with the help of Fig. 7.4.

Figure 7.4. Automatic Differentiation vs. Finite Difference Approximation

In Fig. 7.4, we explain and contrast Automatic Differentiation and Finite Difference. For the sake of presentational simplicity, we only discuss the differentiation of the objective function. We note that the very same discussion applies to the differentiation of the constraints. Figure 7.4 is divided into four blocks: (A), (B), (C), and (D). Blocks (A) and (C) describe the finite difference option, and Blocks (B) and (D) describe the automatic differentiation option. In addition, Blocks (A) and (B) describe the work/development that must be done by the engineer before optimization can proceed, while Blocks (C) and (D) depict the software modules that are required for the optimization to proceed. Let us describe each option.

Finite Difference

Block (A) describes quite a familiar situation. Namely, that we must have a software routine available to evaluate the objective function that we will provide to the optimization code for the optimization to proceed.

Block (C) describes a situation where (i) the objective function evaluation code is used by the optimization code, and (ii) the finite difference operation is performed within the optimization code. The remainder is self explanatory.

Automatic Differentiation

Block (B) explains the preparatory work that is required before optimization can proceed. An Automatic Differentiation (AD) Software is used as depicted in the figure. The function evaluation routine is available to us. This code becomes the input to the AD software. The AD software then uses a series of chain rule applications throughout the objective function code, and generates another code as its output. This output code is specifically designed togenerate the gradient vector, given a current value of x. In summary, at the end of the work of Block (B), we have available to us two codes: one that evaluates the objective function, given x, and one that generates the gradient of f(x), given x. At this point, it is critically important to keep in mind that this gradient is evaluated analytically and not through finite difference. We also note that sometimes the generated gradient code provides two outputs: (i) the gradient and (ii) the objective function. In this case, the originally used objective function evaluation function is no longer needed. (In Fig. 7.4, these objective functions and gradients are generated by two codes).

Block (D) describes the optimization process when no finite difference takes place. Instead, the gradient is evaluated during the optimization process using the function generated by the Automatic Differentiation Software in Block (B). The well-known automatic differentiation software called ADIFOR may be used, which we describe next.

ADIFOR (Automatic DIfferentiation in FORtran) was developed by the Mathematics and Computer Science Division at Argonne National Laboratory and the Center for Research on Parallel Computation at Rice University.ADIFOR is a software tool that performs automatic differentiation for Fortran 77 programs. Given a Fortran 77 source code and a user’s specification of the dependent and independent variables, ADIFOR will generate a code that computes the partial derivatives of all of the specified dependent variables with respect to all of the specified independent variables. ADIFOR also outputs the objective function value. Details on the background and implementation of ADIFOR are provided in Refs. [7, 8].

As the C programming language increased in popularity, an analog of ADIFOR was developed. ADIC is a tool that implements automatic differentiation on a ANSI C code. Reference [9] describes the architecture of ADIC and how to use it, with the help of an example. In Ref. [10], an enhancement for ADIFOR and ADIC that computes Hessians is discussed.

7.6 Other Important Numerical and Computational Issues

In this section, we discuss several useful numerical issues that do not readily fall under the previously discussed topics. These are:

1.Sensitivity of optimum solution [11],

2.Termination of the optimization run - criteria and causes,

3.Level of confidence in the optimal results obtained,

4.Computational burden and problem dimension,

5.Additional numerical pitfalls.

7.6.1 Sensitivity of Optimal Solutions in Nonlinear Programming

Sensitivity to System Parameters

Understanding how sensitive our optimum solution is to small changes in various system parameters (that are not design variables) is generally important. This information provides us with the means to potentially further improve our design by reconsidering the values of these parameters. The simplest way to assess this sensitivity is to run the optimization code with different values of the potentially sensitive parameters, and to observe the resulting optimal solution. Alternatively, an indication of the sensitivity can be obtained by simply evaluating the objective function and the constraints for different values of these parameters that are near their nominal values. It is important to note that undue sensitivity to a system parameter may potentially indicate pitfalls in the physical design, which should be carefully examined. This aspect of sensitivity is one of continuing research in the community, and will not be further addressed here.

In the case of linear programming, there are also important practical sensitivity issues that should be considered. These issues are discussed in Sec. 11.7.4.

Sensitivity to Design Variables

The more urgent aspect of the sensitivity of the optimal solution is that with respect to the design variables. The generated optimum solution can be highly sensitive to the design variables in a way that indicates the presence of numerical issues, and that the solution obtained might not be adequate or mathematically optimal. Extreme sensitivity to design variables may also actually indicate that the problem is not well posed, or may also indicate the presence of pitfalls in the physical design - as in the case of system parameters discussed above. Stated differently, the problem might be a purely computational issue that has no bearing on the actual design, or might indicate pitfalls in the actual design. In the case where the problem is of a numerical nature, we can simply employ the techniques presented in Sec. 7.3. If the computational issues persist after proper scaling, then a consideration of the physical design may be advisable.

Sensitivity to Weights in the Objective Function

Undue sensitivity to the weights in the objective function may also be a sign of complications that should be explored. Careful scaling of the objectives and of the design variables will generally be helpful. In the case where we are dealing with the weights in the weighted sum approach, our physical insight into the meaning of the weights can be helpful, and we may reconsider our stated preferences in forming the objective function. However, the mere knowledge of the sensitivity to these weights may give us insight into the nature of our optimal solution with respect to our preferences. In the case where the weights are part of more complex nonlinear objective functions, the issues are more involved.

7.6.2 Optimization Termination Criteria and Optimization Termination Causes

Optimization codes terminate the convergence process for numerous reasons. Some are desirable and generally indicate a successful outcome. Others indicate non-convergence or convergence to a non-optimal solution. Yet other termination causes call for further investigation. The information on page 159 (regarding possible reasons for the optimization process to fail) is directly relevant to our present discussion. Let us consider various termination scenarios.

Scaling

As previously discussed, scaling is arguably the most important step we can take to maximize the likelihood of convergence to the optimum solution. Therefore, the wealth of pertinent information provided in Sec. 7.3 should be exploited to the fullest. A poorly scaled problem will usually force the optimization code to terminate at a non-optimal solution, while the designer might not even be aware of this serious situation.

Optimization Settings

All good optimization codes provide the user with the ability to influence the conditions under which the optimization process will terminate. To influence the code, the designer has at his/her disposal various settings. Different codes have different versions of these settings. Pertinent MATLAB settings include:

1.The max number of function evaluations (MaxFunEvals). Its default value is 100*(Number of Variables).

2.Maximum number of iterations (MaxItr). The default is 400.

Even though you may not have a complete understanding of these settings, increasing them might be helpful when MATLAB informs us that the code terminated because these maximum values have been reached. Please see the end of Sec. 7.3 to determine how to change these settings. Please note that the default values of these settings may change from one version of MATLAB to the other.

7.6.3 Developing Confidence in Optimization Results

Upon obtaining a final solution from an optimization run, it is important to take steps to develop appropriate confidence in the validity of the answer. These steps may include:

1.Being satisfied that the previous discussion in Sec. 7.2.1 on page 159 regarding why the optimization process sometimes fails have been addressed.

2.Implementing multiple starts with very different initial points to avoid or identify local minima, particularly in the case of gradient-based algorithms. It may be required to update the scaling for each different start.

3.Critically examine the physical validity/feasibility of the optimum solution. Common sense alone will often point to potential issues.

7.6.4 Problem Dimension and Computational Burden

The problem size or dimension, which can be represented by the number of design variables, is directly related to its computational intensity (see Sec. 15.3). The fidelity of the models employed is also a determining factor for the computation resources needed. In addition, the more computationally intensive the problem is, the more likely it is to involve numerical issues. Therefore, whenever possible, we should explore strategies to reduce the problemdimension [12]. Design variable linking is one venue to this end (see Sec. 15.3.1). The proper fidelity of the models that should be used is an important continuing research activity in the community, and should be considered in any serious system optimization effort.

7.6.5 Additional Numerical Pitfalls

Additional potential numerical pitfalls are briefly mentioned here:

1.Some constraints can be violated during the optimization, and become satisfied during the optimization process. This is perfectly fine. However, some others may cause the optimization algorithm to misbehave or fail. These include constraints that lead to invalid results in the given context. Examples are:

•The violation of a constraint that the outer diameter of a pipe must be larger than its inner diameter, resulting in a non-physical design.

•The violation of a constraint that the argument of a square root must be positive, potentially resulting in a non-physical design.

2.Division by a quantity that vanishes.

3.Taking the square root of a negative quantity.

4.Having the diameter of a bar become negative.

5.Having a mass take on a negative value.

6.Numerical cancelation, as discussed in the context of finite difference (see Sec. 7.4).

7.Matrices becoming singular to working precision.

8.Starting the optimization too close to its optimum, which can cause convergence problems.

7.7 Larger Scaling Example: Universal Motor Problem

7.7.1 Universal Motor Problem Definition

Let us consider the design of a universal motor [13, 14]. A universal motor is simply an electric motor that functions on both AC and DC supplies to produce a torque. The ratio of the power input to the power output of the motor gives a measure of the motor efficiency, and it is known that increasing the windings in the field and armature of the motor will increase the efficiency. However, this increase in windings is associated with an increase in the mass of the motor. Since these motors are widely used in kitchen appliances, a motor having a large mass is undesirable. The universal motor is to be optimally designed in an effort to maximize its efficiency and minimize its mass.

Our universal motor problem is governed by the following design variables:

1.Number of turns of wire on the armature, Na,

2.Number of turns of wire on the field, Nf,

3.Cross sectional areas of the wires on the field, Awf,

4.Cross sectional area of the wires on the armature, Awa,

5.Electric current, I,

6.Outer radius of the stator, ro,

7.Thickness of the stator, t, and

8.Stack length, L

Finding the values of mass, M, power, P, efficiency, η, torque, T, and magnetizing intensity, H, of the motor requires the design variables as inputs. No equations are given to you to compute the values of mass, power, efficiency, torque, and magnetizing intensity of the motor. However, you are given a MATLAB function umotor.m, in the book website (www.cambridge.org/Messac), that allows you to enter all the design variables in a row vector (lets call it x) to get the resulting mass, M, power, P, efficiency, η, torque, T, and magnetizing intensity, H, of the motor as shown in Fig. 7.5. Thus

x = [Na Nf Awf Awa I ro t L]

Figure 7.5. Input and Output for the Program umotor.m

We will use the outputs of umotor.m (see the book website www.cambridge.org/Messac) in our objective function and constraints. Anytime you need the values of mass, M, power, P, efficiency, η, torque, T, and magnetizing intensity, H, of the motor, you should call the function umotor.m.

The side limits on the design variables are as follows:

1.100 ≤ Na ≤ 1,500 turns

2.1 ≤ Nf ≤ 500 turns

3.0.01 × 10-6A wf ≤ 1.0 × 10-6m2

4.0.01 × 10-6A wa ≤ 1.0 × 10-6m2

5.0.1 ≤ I ≤ 6 Amps

6.0.01 ≤ ro ≤ 0.10m

7.0.0005 ≤ t ≤ 0.10m

8.0.000566 ≤ L ≤ 0.10m

In addition to the design variables bounds, we have additional constraints presented in Table 7.4.

Table 7.4. Constraints for Universal Motor



Constraints

Value


Magnetizing intensity, H

H < 5,000

Feasible geometry

ro > t

Power of each motor, P

P = 300W

Efficiency of each motor, η

η ≥ 0.15

Mass of each motor, M

M ≤ 2.0kg

Torque, T

T = 0.25Nm



Your task is to obtain the Pareto frontier for the two objectives, mass and efficiency. Use a set of 100 evenly spaced weights, and use the initial guess of X0 = (LB +UB)2 for all Pareto points. Prepare a MATLAB code that generates the Pareto frontier for this problem. You will realize that fmincon is unable to converge for some Pareto points. If you use the given assumptions regarding the initial guess and weights, you will obtain several solutions that are not Pareto, as seen in Fig. 7.6(a).

Figure 7.6. Pareto Frontier

This difficulty occurs because the design variables are of different orders of magnitude, thereby causing numerical difficulties for fmincon. In this exercise, you learn a method to overcome the numerical issues caused by design variable scaling and obtain a good representation of the Pareto frontier.

7.7.2 Design Variable Scaling

One effective yet simple approach to overcome the above discussed numerical issues is to scale each design variable in fmincon so that it is on the order of one. The scaled design variables are used only for numerical purposes, so that fmincon will converge. We need to make sure that the original (or unscaled) design variables are used to compute the objective function and constraint values.

The scaled design variables used by the fmincon optimizer will need to be unscaled to their original magnitudes when executing the objective function and constraint functions. Note that the function umotor.m needs the original design variables to yield meaningful results for mass, M, power, P, efficiency, η, torque, T, and magnetizing intensity, H, of the motor. Figure 7.6(b) provides the correct Pareto frontier that results from scaling. Figure 7.6 illustrates side-by-side Pareto frontier plots with and without scaling, which demonstrates the dramatic impact that scaling can have. The procedure for implementing the scaling of the design variables in the MATLAB code is shown in Fig. 7.7.

Figure 7.7. Scaling of Design Variables

In addition to scaling and unscaling in the objective and constraint files, we need to do the following. In the main file, reset the upper and lower bounds to correspond with our scaled design variables. You may also wish to specify an initial point that lies within the lower and upper bounds of the scaled design variables.

7.8 Summary

This chapter provided an analytical, intuitive, and computational discussion of the major numerical issues encountered in practical implementation computational optimization. Effective methods that address these issues were also presented. These methods included numerical conditioning of algorithm, matrices, and the optimization problem; followed by methods used to scale and to set the tolerances for different optimization parameters (e.g., variables,objective functions and constraint functions). The example provided at the end illustrates the application of such scaling techniques. This chapter also provides a description of the primary methods used to estimate function gradients (e.g., finite differences). The chapter concluded with a discussion of some of the other important numerical considerations toward successful application of optimization, such as termination criteria and solution sensitivity. Importantly, these issues can collectively make or break optimization in practice!

7.9 Problems

Warm-up Problems

7.1Give five reasons why we need to pay attention to numerical issues in our engineering activities, which include design, analysis, and optimization. Not all five should be from the book.

7.2What does it mean for a problem to be (i) well-posed?, (ii) ill-conditioned? Explain each in approximately 200 words.

7.3Provide three important reasons why optimization codes sometimes do not work well. Explain each in detail in approximately 300 words.

7.4Evaluate A1 in Eq. 7.2, while A is given by Eq. 7.1. Prove that Eq. 7.2 indeed holds true for any positive integer n.

7.5Let α = 0.5 in Eq. 7.1. Evaluate A1 in Eq. 7.2 with n = 2. Perform this calculation without a computer, and show all your intermediate steps. This problem will also help you recall elementary matrix algebra. Please do a quick review if some related concepts have become a bit rusty in your mind.

7.6How can you evaluate the condition number of a symmetric matrix; and how does this number relate to its numerical conditioning properties?

7.7Use MATLAB to solve the optimization problem defined by Eqs. 7.10, 7.11, and 7.12. Do so without implementing any scaling. Compare the solution you obtained with the solution x = 10-6 ×{3.083, 1.541}. Which solutionyields a lower value of the objective function? Explain.

7.8Consider the general optimization problem statement defined by Eqs. 7.13 through 7.18. (a) Provide the mathematical equation for a constraint that can be interchangeably used either in Eq. 7.15 or Eq. 7.17. Explain. Is it computationally more desirable to use this constraint in Eq. 7.15 or Eq. 7.17? Explain. (b) Provide the mathematical equation for a constraint that can be used in Eq. 7.15, but not in Eq. 7.17. Explain.

Intermediate Problems

7.9For Matrix A given in Eq. 7.1, generate the Table 7.1 - while adding a column for n = 100. Please keep in mind that you may not obtain numbers that are exactly equal to those in the table, but you should be able to observe that the conclusions will be the same. Submit your clearly organized MATLAB codes.

7.10For Matrix A given by

A =

(7.42)

generate the Table 7.1 while adding a column for n = 100. Discuss any interesting observations that you can make from the results. Submit your clearly organized MATLAB codes.

7.11The two bar truss shown in Fig. 7.8 is to be designed so that: (1) it is as light as possible, and (2) the vertical deflection at node A is as low as possible.

Figure 7.8. Schematic of Two-Bar Truss

The dimensions H and D are the design variables. The following parameters are given:

The bounds on H and D are given as:

The vertical deflection at node A is given as follows:

Assume that the cross sectional area of the beam is πDT, and use this expression for your analysis. Answer the following questions:

1. What are the most important failure modes in this problem?

2. Clearly write down the mathematical expressions for your constraints.

3. Formulate the bi-objective optimization problem, and obtain the Pareto frontier.

7.12Consider the tubular column shown in Fig. 7.9.

Figure 7.9. Tubular Column

1.Design the column to support the load P at minimum mass, m, and minimum cost, C, by choosing optimal values for the outer radius , rout, and the inner radius, rin. Ensure that your column avoids buckling; P should be less than the critical buckling load, Pcr. In addition, ensure that the load induced stress, σ, does not exceed the maximum stress, σmax, of the material. Use the expressions given below to aid in your design optimization.

2.Plot the Pareto frontier for this problem. Ensure you obtain a complete Pareto frontier.

7.13MATLAB code for universal motor with and without scaling:

1.Write a MATLAB code to generate the Pareto frontier of the universal motor problem discussed in this chapter. Do so without design variable scaling. In other words, reproduce the results shown in Fig. 7.6(a). Use the initialguess of X0 = (LB + UB)2 for all Pareto points, where LB and UB are the original design variable bounds. Note that you may not get the exact identical point, but the conclusion will be clear.

2.Now, scale the design variables as described previously (refer to Fig. 7.7). Use the initial point specified in Fig. 7.7 for all Pareto points. Generate the Pareto frontier using scaled design variables. In other words, reproduce the results shown in Fig. 7.6(b).

7.14Consider the following nonlinear programming (NLP) problem:

subject to

1.Use the default settings for the function fmincon in the MATLAB Optimization Toolbox to obtain an optimal solution, x*, to this problem. Was your solution process able to converge?

2.Make the following changes to your code and the various search options in the MATLAB Optimization Toolbox to try and obtain the optimal solution.

a)Make sure that the medium scale SQP algorithm, with Quasi-Newton update and line-search, is the selected solver in fmincon. Although you may not yet know these algorithms, you can proceed with this problem. The details of these algorithms are presented later in the book.

b)Set the solution point tolerance, function tolerance, and constraint tolerance in fmincon to 10-7.

c)Specify the initial guess as x0 = [1 1 1]T.

d)Make sure that the inequality constraint g2 is treated as a linear constraint by fmincon.

e)Solve the NLP by using a finite difference approximation of the gradients of the objective function and constraints first. Then, resolve the problem by providing explicit expressions for the gradients of the objective function and constraints (ask fmincon to check for the gradients before running the optimization in this latter case).

3.Record the optimal solution for the design variables and the objective function. Ensure that your problem converges.

7.15Optimize the simple I-beam shown in Fig. 7.10. The symmetrical I-beam cross section is also shown.

Figure 7.10. Schematic of Simply Supported I-Beam

Find the optimal cross section dimensions that minimize the structural mass and minimize the mid span static deflection. As shown in Fig. 7.10, the I-beam is simply supported at both ends and is subjected to an appliedconcentrated load, P. The final design must be able to sustain a maximum bending stress of 160 MPa, and the beam dimensions must be within the acceptable limits given below.

Acceptable Beam Dimensions: 0.10 m ≤ x1 ≤ 0.80 m; 0.10 m ≤ x2 ≤ 0.50 m; 0.009 m ≤ x3 ≤ 0.05 m; 0.009 m ≤ x4 ≤ 0.05 m.

Constant Numerical Values: P = 600 kN; L = 2 m; E = 200 GPa; ρ = 7, 800 kg/m3.

1.Provide a clear optimization problem statement.

2.Use the weighted sum method to complete the following table. W1 denotes the weight for the mass objective, and W2 denotes the weight for the deflection objective.(Hint: Your code should cater to any scaling problems.

3.Plot the Pareto frontier for this problem.

5.Show the points obtained in No. 2 on your plot.

Table 7.5. Optimum Design Results for Different Objective Weights



W1

W2

Mass (M)

Deflection (δ)

x1

x2

x3

x4


0

1

0.25

0.75

0.5

0.5

0.75

0.25

1

0



7.16Problem 6.10 of the Multiobjective chapter presents numerical challenges that can be addressed with proper scaling of the design variables, objective functions, and/or constraints. First solve Parts 1 through 6 of that problem using appropriate scaling. Comment on the behavior of the Pareto frontier with and without scaling.

Advanced Problems

7.17Consider the task of designing a cantilever beam with a rectangular cross section subjected to the load P, as shown in Fig. 7.11. We are interested in finding the cross sectional dimensions, b and w, that can safely support the load P. At the same time, we would like the beam to: (1) be as light as possible and (2) have the least possible deflection (see Fig. 7.11).

Figure 7.11. Schematic of Cantilever Beam

The following information is given to you:

1. P = 600 kN,

2.L = 2 m,

3.Young’s Modulus E of the beam material = 200 GPa,

4.Density of the beam material = 7800 kg/m3,

5.Maximum allowable stress of the beam material = 160 MPa,

6.Maximum and minimum acceptable values for b and w are: bmin = 0.1 m, bmax = 0.8 m, wmin = 0.1 m, and wmax = 0.5 m.

7.The expression for the deflection of a cantilever beam at any Section X - X

where x is the length of the beam measured from the fixed end (see Fig. 7.11),

8.The expression for the bending stress at the Section X - X is given as

Answer the following questions:

1.At what value of x does the maximum deflection occur? At what value of x does the maximum stress occur?

2.What are your design objectives in this problem? Write down the mathematical expressions for the maximum deflection and the mass of the beam.

3.What are your design variables in this problem?

4.Write down the mathematical expressions of the behavioral and side constraints for this problem.

5.Clearly write down your multiobjective problem formulation.

6.Obtain the Pareto frontier of the above problem using fmincon.

7.18Answer the following questions:

(i)Discuss two reasons why the MATLAB tolerance parameter typicalX can be helpful to use in an optimization problem.

(ii)Discuss two reasons why the MATLAB tolerance parameter tolx can be helpful to use in an optimization problem.

(iii)Develop an optimization problem for which you do not obtain the optimum solution without scaling the design variables.

(iv)For the problem you just developed in Part (iii), use the scaling techniques presented in this chapter to obtain the optimum.

(v)Now, instead of scaling the design variables, obtain the optimum solution simply by using the tolerance parameters typicalX and tolx. The size of the problem is up to you. Note that much of the learning will take place in the process of generating the problem.

BIBLIOGRAPHY OF CHAPTER 7

[1]E. Cheney and D. Kincaid. Numerical Mathematics and Computing. Cengage Learning, 2012.

[2]T. Sauer. Numerical Analysis. Pearson Education, 2012.

[3]M. Grasselli and D. Pelinovsky. Numerical Mathematics. Jones and Bartlett Publishers, 2008.

[4]M. Nuruzzaman. Finite Difference Fundamentals in MATLAB. Createspace Independent Pub., 2013.

[5]R. H. Myers, D. C. Montgomery, and A. C. M. Christine. Response Surface Methodology: Process and Product Optimization Using Designed Experiments. John Wiley and Sons, 3 edition, 2009.

[6]D. C. Montgomery. Design and Analysis of Experiments. Wiley, 8th edition, 2012.

[7]C. H. Bischof, A. Carle, G. F. Corliss, A. Griewank, and P. Hovland. ADIFOR: Generating derivative code from Fortran programs. Scientific Programming, 1:11-29, 1992.

[8]C. H. Bischof, A. Carle, P. Khademi, and A. Mauer. ADIFOR 2.0: Automatic differentiation of Fortran 77 programs. IEEE Computational Science and Engineering, 3:18-32, 1996.

[9]C. H. Bischof, L. Roh, and A. J. Mauer-Oats. ADIC: An extensible automatic differentiation tool for ANSI-C. Software: Practice and Experience, 27(12):1427-1456, 1997.

[10]J. Abate, C. H. Bischof, L. Roh, and A. Carle. Algorithms and design for a second-order automatic differentiation module. In International Symposium on Symbolic and Algebraic Computation, pages 149-155, 1997.

[11]J. R. R. A. Martins, P. Sturdza, and J. J. Alonso. The complex-step derivative approximation. ACM Transactions on Mathematical Software (TOMS), 29(3):245-262, 2003.

[12]P. M. Zadeh, V. V. Toropov, and A. S. Wood. Metamodel-based collaborative optimization framework. Structural and Multidisciplinary Optimization, 38(2):103-115, 2009.

[13]T. W. Simpson, J. R. A. Maier, and F. Mistree. Product platform design: Method and application. Research in Engineering Design, 13(1):2-22, 2001.

[14]S. J. Chapman. Electric Machinery and Power System Fundamentals. McGraw-Hill, 2002.