Optimization in Practice with MATLAB for Engineering Students and Professionals (2015)
PART 4
GOING DEEPER: INSIDE THE CODES AND THEORETICAL ASPECTS
12
Nonlinear Programming with No Constraints
12.1 Overview
Nonlinear programming is a technique used to solve optimization problems that involves nonlinear mathematical functions (e.g., objective functions or constraints, or both). The methods presented in this chapter are capable of solving unconstrained optimization problems with unimodal objectives. The procedures used for these methods are illustrated via minimization problems. Maximization problems can be converted to minimization problems by multiplying the objective functions by 1.
Unconstrained nonlinear programming has many practical applications. Many real life design problems have nonlinear objectives due to the nonlinearities in nature. Some design problems do not have constraints, or the constraints are negligible. Constrained optimization problems can be reformulated as unconstrained problems. The methods used to solve unconstrained nonlinear problems are powerful tools for solving optimization problems.
12.2 Necessary and Sufficient Conditions
Chapter 2 provides mathematical knowledge concerning the first and second derivatives of functions. Local optima can be determined by their derivatives. This chapter provides the conditions for local minima. These conditions involve the gradients (first derivative vectors) and Hessian matrices (second derivative matrices) for the objective functions. These conditions are the foundation for several of the algorithms described in this chapter (see Refs. [3, 1]).
Assume a function, f(x), is continuous and has continuous first and second derivatives. The function can be expressed as a Taylor expansion.
(12.1) 
Neglecting the higher order terms, O_{3}(Δx), the objective function change can be expressed as
(12.2) 
If x^{*}is a local minimum, any other points in its neighborhood should produce a greater objective value, which can be expressed as
(12.3) 
From Eq. 12.2, in order for the sign of Δf(x) to be known (e.g., positive for a minimum at x^{*}) for arbitrary values of Δx, the first derivative of f(x) should be zero. Otherwise, Δf(x) can be forced to be positive or negative by changing the sign of Δx, and Eq. 12.3 cannot be satisfied. Therefore, a local minimum, x^{*}, should satisfy the following firstorder necessary conditions.
Theorem (FirstOrder Necessary Conditions)
If x^{*}is a local minimum of f(x) and f(x) is continuously differentiable in an open neighborhood of x^{*}, then the gradient ∇f(x^{*}) = 0.
Since ∇f(x^{*}) = 0 at the local minimum, Eq. 12.2 becomes
(12.4) 
It should satisfy Δf(x) ≥ 0 since x^{*}is a local minimum. Then, ∇^{2}f(x^{*}) should be positive semidefinite, which is stated as the secondorder necessary conditions as follows.
Theorem (SecondOrder Necessary Conditions)
If x^{*}is a local minimum of f(x) and ∇^{2}f(x) exists and is continuous in an open neighborhood of x^{*}, then ∇f(x^{*}) = 0 and ∇^{2}f(x^{*}) is positive semidefinite.
The secondorder necessary conditions do not guarantee a local minimum. At x = 0, the function, f(x) = x^{3}, satisfies the firstorder and secondorder necessary conditions. However, x = 0 is not a local minimum of f(x) = x^{3}.
If the Hessian of f(x) is positive definite at the point where ∇f(x) = 0, then
(12.5) 
Equation 12.5 guarantees the point, x^{*}, is a strict local minimum. It is stated as the secondorder sufficient condition. The secondorder sufficient conditions are stronger than the secondorder necessary conditions.
Theorem (SecondOrder Sufficient Conditions)
Given f(x), if ∇^{2}f(x) is continuous in an open neighborhood of x^{*} and ∇f(x^{*}) = 0; if ∇^{2}f(x^{*}) is positive definite, then x^{*}is a strict local minimum of f(x).
12.3 Single Variable Optimization
Some unconstrained nonlinear optimization problems may only have a single variable. In this section, the methods for solving single variable nonlinear optimization problems are presented. We parenthetically note that the Bisection and the Golden search methods require bounds which, technically, are constraints. However, these bounds need not have physical meanings, and can be arbitrarily chosen for the sole purpose of implementing these algorithms.
12.3.1 Interval Reduction Methods
Bisection
The interval reduction methods are applicable to single variable nonlinear optimization problems whose variables have lower bounds and upper bounds.
The Bisection method (also known as the Interval Halving method) finds an extreme point in a bounded region for a unimodal single variable function. The function and its first derivative are assumed continuous. The Bisectionmethod successively halves the interval, and decides in which one of the two half intervals the extreme point exists. The Bisection method uses the first derivative of the function to determine in which half the extreme point lies.Figure 12.1 illustrates how to update the interval according to the sign of the first derivative. Suppose the variable x of a unimodal convex function f(x) is inside the interval [a,b]. The function f(x) and its derivative f′(x) are continuous over this interval. The first derivatives at the two ends satisfy the condition f′(a)f′(b) < 0. This condition implies that there is a minimum in the interval [a,b] (with 0 derivative). The Bisection method procedure forfinding the minimum is outlined as follows.
Figure 12.1. Interval Updates of the Bisection Method
1.Specify the convergence tolerance for an interval of length > 0. Specify the convergence tolerance of the gradient γ > 0. Set the iteration number, k, to 0. The bounds are l_{0} = a and r_{0} = b.
2.If r_{k}  l_{k} < , stop. The midpoint x_{k} = is taken as the minimum, x^{*}, and the corresponding function value f(x^{*}) is the optimal solution.
3.Evaluate the gradient of the function at the midpoint f′(x_{k}) = f′().
4.If f′(x_{k}) < γ, stop. The corresponsing x_{k} is considered the minimum, x^{*}, and the corresponding function value, f(x^{*}), is the optimal solution.
5.Evaluate the derivative f′(x_{k}). If it is positive, let l_{k}_{+1} = l_{k} and r_{k}_{+1} = x_{k}. If it is negative, let l_{k}_{+1} = x_{k} and r_{k}_{+1} = r_{k}.
6.k = k + 1. Go to Step 2.
Example: The following optimization problem is used to illustrate the Bisection method.

Figure 12.2 presents the function curve in the defined region.
Figure 12.2. Plot of the Objective Function
The convergence tolerance of the interval length, , is set as 0.001. The convergence tolerance of the gradient at the midpoints, γ, is set as 0.01. Table 12.1 provides the end points, a_{k} and b_{k}, and the derivatives at each midpoint. The Bisection method stops after 10 iterations. At the 10^{th} iteration, the distance between the two points, 0.0098, is larger than the convergence tolerance of the interval length, . The absolute value of the derivative at the midpoint is lessthan the convergence tolerance of the gradient at the midpoint, γ. It is determined that the optimal value is at the midpoint 3.9990. The corresponding function value is f(x^{*}) = 15.0000.
Table 12.1. The End Points and the Derivatives at Each Midpoint




k 
a_{k} 
b_{k} 
Derivative 


1 
0.0000 
5.0000 
15.0000 
2 
2.5000 
5.0000 
2.5000 
3 
3.7500 
5.0000 
3.7500 
4 
3.7500 
4.3750 
0.6250 
5 
3.7500 
4.0625 
0.9375 
6 
3.9063 
4.0625 
0.1563 
7 
3.9844 
4.0625 
0.2344 
8 
3.9844 
4.0234 
0.0391 
9 
3.9844 
4.0039 
0.0586 
10 
3.9941 
4.0039 
0.0098 



Golden Section Search
The Golden Section Search is a method to minimize or maximize a unimodal function of one variable. The algorithm maintains the function values for triples of points whose distances form a golden ratio. This method does not require information about the derivatives. If the minimum is known to exist inside a region of the variable, this method successively narrows the range of function values inside which the minimum exists.
In mathematics, two quantities are in the golden ratio if the ratio of the sum of the quantities to the larger quantity is equal to the ratio of the larger quantity to the smaller quantity. Figure 12.3 illustrates the geometric relationshipthat defines the golden ratio. The total length is l_{ac}, the larger segment is l_{ab}, and the smaller segment is l_{bc}. From Fig. 12.3, the golden ratio can be expressed as
Figure 12.3. Line Segments Divided According to the Golden Ratio
(12.7) 
Solving Eq. 12.7, the value of φ is
(12.8) 
The value of its conjugate is
(12.9) 
Suppose the variable of a unimodal function f(x) is inside the interval [a,b], and f(x) is continuous over this interval. The minimum exists inside this interval. The steps of the Golden Section Search are illustrated as follows.
1.Select > 0 as the convergence tolerance of the interval. Set the iteration number, k, to 0. The bounds are a_{0} = a and b_{0} = b. Set l_{0} = b_{0}  τ(b_{0}  a_{0}) and r_{0} = a_{0} + τ(b_{0}  a_{0}).
2.If b_{k}  a_{k} < , stop. The midpoint, x_{k} = , is considered the optimal value, x^{*}; and the corresponding function value, f(x^{*}), is the optimal solution.
3.If f(l_{k}) > f(r_{k}), the parameters are updated as shown in the f(l_{k}) > f(r_{k}) case in Fig. 12.4.
Figure 12.4. Interval Updates of the Golden Section Search
4.If f(l_{k}) < f(r_{k}), the parameters are updated as the f(l_{k}) < f(r_{k}) case in Fig. 12.4.
5.k = k +1, go to Step 2.
Inside each iteration, depending on f(l_{k}) > f(r_{k}) or f(l_{k}) < f(r_{k}), the objective function is evaluated at different new points. The two bounds of the interval are updated and the interval becomes shorter. At Step 2 of each iteration, the bounds are [a_{k},b_{k}]. If f(l_{k}) < f(r_{k}), the right bound, b_{k}, is updated in Step 4. The length of the new interval is
(12.10) 
The length of the interval is reduced by a factor of τ at each iteration. The length reduction is similar to when f(l_{k}) > f(r_{k}).
The following observations are noteworthy. The Golden Section Search method requires one evaluation of the objective function at each iteration to determine the interval where the minimum lies; while the Bisection method requires one evaluation of the first derivative at each iteration.
Example: The same optimization example used for the Bisection method is again used to illustrate the Golden Section Search. The convergence tolerance rate, , is set as 0.1 for this example, which is different from the previousexample for the Bisection method.

Table 12.2 provides the four points, a_{k}, b_{k}, l_{k}, and r_{k}, and their function values. The Golden Section Search stops after 10 iterations. At the 10^{th} iteration, the values of the objective function are compared at two points, l_{10} = 4.00and r_{10} = 4.03. It is determined that the optimal value is between the points, 3.95 and 4.03. The distance between the two points is less than the convergence tolerance, . Finally, the point x^{*} = (3.95+4.03)∕2 = 3.99 is considered theoptimal value, and the corresponding function value is f(x^{*}) = 15.00.
Table 12.2. Endpoints, Golden Section Points and Function Values at Each Iteration




k 
a_{k} 
b_{k} 
f(a_{k}_{ }) 
f(b_{k}_{ }) 
l_{k} 
r_{k} 
f(l_{k}_{ }) 
f(r_{k}_{ }) 


1 
0.00 
10.00 
95.00 
195.00 
3.82 
6.18 
15.16 
38.77 
2 
0.00 
6.18 
95.00 
38.77 
2.36 
3.82 
28.44 
15.16 
3 
2.36 
6.18 
28.44 
38.77 
3.82 
4.72 
15.16 
17.60 
4 
2.36 
4.72 
28.44 
17.60 
3.26 
3.82 
17.72 
15.16 
5 
3.26 
4.72 
17.72 
17.60 
3.82 
4.16 
15.16 
15.13 
6 
3.82 
4.72 
15.16 
17.60 
4.16 
4.38 
15.13 
15.71 
7 
3.82 
4.38 
15.16 
15.71 
4.03 
4.16 
15.01 
15.13 
8 
3.82 
4.16 
15.16 
15.13 
3.95 
4.03 
15.01 
15.01 
9 
3.95 
4.16 
15.01 
15.13 
4.03 
4.08 
15.01 
15.03 
10 
3.95 
4.08 
15.01 
15.03 
4.00 
4.03 
15.00 
15.01 



12.3.2 Polynomial Approximations: Quadratic Approximation
If we assume an objective function is unimodal and continuous inside an interval, then it can be approximated by a polynomial. A is the simplest interpolation of the objective function (see Ref. [3]). The quadratic approximationmethod consists of a sequence of reducing intervals, and iterative approximations in the reduced intervals. As the approximation approaches the actual minimum, its accuracy increases. When the error between the approximation and the actual function is less than a predefined tolerance, terminate the iteration.
A quadratic approximating function can be constructed given three consecutive points, x_{1}, x_{2}, and x_{3}, and their corresponding function values, f(x_{1}), f(x_{2}), and f(x_{3}). The quadratic function is expressed as follows.
(12.12) 
At the three points, x_{1}, x_{2}, and x_{3}, (x) = f(x). The three constants, c_{0}, c_{1}, and c_{2}, can be determined as follows.
At the point x_{1}, since f(x_{1}) = (x_{1}) = c_{0}, the first constant is c_{0} = f(x_{1}). At the point x_{2}, f(x_{2}) = (x_{2}) = c_{0}+c_{1}(x_{2}x_{1}). The second constant, c_{1}, can be evaluated as follows.
(12.13) 
At the point x_{3}, f(x_{3}) = (x_{3}) = c_{0}+c_{1}(x_{3}x_{1})+c_{2}(x_{3}x_{1})(x_{3}x_{2}). The third constant, c_{2}, is obtained from the following equation.
(12.14) 
Using the above quadratic approximation, the minimum point, x^{*}, should satisfy the firstorder necessary conditions.
(12.15) 
Then, the minimum point, x^{*}, can be expressed as
(12.16) 
The quadratic approximation can be implemented in successively reduced intervals. A successive quadratic approximation method is outlined as follows.
1.Define the tolerance of the function successivevalues difference, γ > 0, and the tolerance of the variable, > 0. Set Δx. Set the iteration number, k, to 1.
2.Set the initial point, x_{1}^{1}. Compute x_{2}^{1} = x_{ }_{1}^{1} + Δx. Evaluate f(x_{1}^{1}) and f(x_{ }_{2}^{1}). If f(x_{1}^{1}) > f(x_{2}^{1}), then x_{ }_{3}^{1} = x_{2}^{1} + Δx. If f(x_{1}^{1}) < f(x_{ }_{2}^{1}), then x_{3}^{1} = x_{1}^{1}Δx. Evaluate f(x_{3}^{1}).
3.Compare the function values at the three points, x_{1}^{k}, x_{2}^{k}, and x_{ }_{3}^{k}. Find the minimum, f_{min}^{k} = min{f(x_{ }_{1}^{k}),f(x_{ }_{2}^{k}),f(x_{ }_{3}^{k})}, and the corresponding point, x_{min}^{k}.
4.Using the three points, x_{1}^{k}, x_{ }_{2}^{k}, and x_{3}^{k}, construct a quadratic approximation. Compute the minimum point, x_{k}^{*}, using Eq. 12.16. Evaluate f(x_{k}^{*}).
5.If f(x_{k}^{*})  f_{min}^{k} < γ and x_{k}^{*}x_{ }_{min}^{k} < , take x_{k}^{*} as the minimum. Terminate the iteration.
6.Take the current best point x_{k}^{*} and the two points bracketing it as the three points for the next quadratic approximation. k = k + 1. Go to Step 3.
Example: Use the quadratic approximation method to solve the following optimization problem. The variable, x, is inside (1.001, 10).
(12.17) 
Define the tolerance of the function successivevalues difference as 0.01f(x_{k}^{*}) and the tolerance of the variable as 0.05x_{k}^{*}. Set Δx = 1. Set the initial point, x_{1}^{1} = 2.
Define δf^{k} and δx^{k} as
(12.18) 
(12.19) 
The results for all the iterations are listed in Table 12.3.
Table 12.3. The Iterations for the Quadratic Approximation Method
At the end of the second iteration, the errors satisfy the tolerances. The iterations are terminated. The optimal point is 2.65 and the optimal function value is 12.57
12.4 Multivariable Optimization
Most practical unconstrained nonlinear optimization problems have multiple design variables. In this section, the methods for solving multivariable nonlinear optimization problems are illustrated.
12.4.1 ZerothOrder Methods
Simplex Search Method
The Simplex Search method is a directsearch method that does not require function derivatives (see Ref. [4]). Although the name of this method includes the word simplex, it has no relationship to the Simplex method of linear programming.
In geometry, a Simplex is a triangle in twodimensional space and a tetrahedron in threedimensional space. In ndimensional space, it is an ndimensional polytope. The main idea of the Simplex method is that, at each iteration, based on the comparison of function values at all the vertices, one vertex is projected through the centroid of the other vertices at a suitable distance. At the beginning, the Simplex Search method sets up a regular Simplex in the space of the variables. The objective function is evaluated at each vertex. For minimization problems, the vertex with the highest function value is the one that is reflected through the centroid of the other vertices to generate a new point. The point and the remaining vertices construct the next Simplex.
At the beginning of the algorithm, it requires the first Simplex to be generated given a base point, x^{0}, and an appropriate scale factor, α. Assuming the variable vector has N dimensions, two increments, δ_{1} and δ_{2}, are defined by the following two expressions.
(12.20) 
(12.21) 
Using the two increments, the j^{th} dimension for the i^{th} vertex can be calculated using the following expression.
(12.22) 
At each iteration, the vertex with the highest function value is selected as x_{high}. It is reflected through the centroid of the remaining points. The centroid is evaluated by the following expression.
(12.23) 
The line passing through x_{high} and x_{centroid} is given by the following expression.
(12.24) 
The selection of λ can yield the desired point on the line. If λ = 0, it yields the point, x_{high}. If λ = 1, the result is the point, x_{centroid}. In order to generate a symmetric reflection, λ is set as 2. The reflected point, x_{reflected}, is evaluated as
(12.25) 
The reflection process is illustrated in Fig. 12.5. The point, x^{2}, has the highest function value, and it is reflected through the centroid of x^{1} and x^{3}.
Figure 12.5. Reflection of the Vertex with the Highest Function Value
During the optimization, the following two situations may occur.
1.At the current iteration, the vertex with the highest function value is the reflected point generated in the last iteration. In this situation, choose instead the vertex with the next highest function value and generate a reflected point.
2.The iterations can cycle between two or more Simplexes. If a vertex remains in the Simplex for more than M iterations, set up a new Simplex with the lowest point as the base point and reduce the size of the Simplex. Thenumber of the dimension for the variable is N. The value of M can be estimated by M = 1.65N + 0.05N^{2} and M is rounded to the nearest integer.
The Simplex Search method is terminated when the size of the Simplex is sufficiently small or other termination criteria are met. The vertex with the lowest function value is taken as the minimum.
The Simplex construction and point reflection are illustrated in the following example.
Example: Use the Simplex Search method to minimize the following function.
(12.26) 
The number of the dimensions is 2. Each Simplex has 3 vertices. Take x^{0} = [1, 1]^{T} as the starting point, and set α = 2. The increments, δ_{1} and δ_{2}, can be evaluated as follows.
(12.27) 
(12.28) 
The coordinates of the other two vertices are calculated as follows.
(12.29) 
(12.30) 
The function values at the three points, x^{0}, x^{1}, and x^{2}, are as follows.
(12.31) 
(12.32) 
(12.33) 
Since the vertex, x^{0}, has the highest function value, it is reflected to form a new Simplex. The centroid of the two remaining points, x^{1} and x^{2}, is calculated as follows.
(12.34) 
The reflected point, x_{reflected}, is calculated as follows.
(12.35) 
At the point, x_{reflected}, f(x_{reflected}) = 2.3027. The vertex with the highest function value in this new Simplex is x^{1}. The algorithm will continue by reflecting the x^{1}, and the iterations proceed following the reflection rule until the optimal point, [2, 3]^{T}, is found.
Pattern Search Methods
The pattern search methods are a family of derivativefree numerical optimization methods. They can be applied to not only the optimization of continuous differentiable functions, but also to the optimization of discrete or nondifferentiable functions. The pattern search methods look along certain specified directions, and evaluate the objective function at a given step length along each of these directions. These points form a frame around the currentiteration. Depending on whether any point within the pattern has a lower objective function value than the current point, the frame shrinks or expands in the next iteration. The search stops after a minimum pattern size is reached.
An important decision in the pattern search methods is to choose the search direction set, D_{k}, at each iteration k. A key condition is that at least one direction in this set should give a descent direction for the objective function whenever the current point is not a stationary point. If a search direction, p, satisfies the following inequality, it can be a direction of descent for the objective function.
(12.36) 
where δ is a positive constant.
If the search direction, p, satisfies Eq. 12.36 at each iteration, it is a descent direction. Choose the search direction set, D_{k}, so that at least one direction, p ∈ D_{k}, will satisfy cos θ > δ, regardless of the value of ∇f_{k}.
A second condition on D_{k} is that the lengths of the vectors in this set are all similar. The diameter of the frame formed by this set is captured adequately by the step length. This condition can be expressed as
(12.37) 
where β_{min} and β_{max} are positive constants.
For the pattern search methods in ndimensional space, examples of sets D_{k} that satisfy the above two conditions in Eqs. 12.36 and 12.37 include the coordinate direction set
(12.38) 
and the set of n+1 vectors defined by
(12.39) 
where e = (1, 1,..., 1)^{T}.
Another important decision in the pattern search method is how to choose a sufficient decrease function ρ(l). The sufficient decrease function is used to determine the convergence of the results. It is a function of the step length and its domain is [0,∞). It is an increasing function of l and the function values are positive. As l approaches 0, the limit of ρ(t)∕t is 0. An appropriate choice of the sufficient decrease function is Mt^{3∕2}, where M is some positiveconstant. If the decrease of the objective function value is less than the value of the sufficient decrease function, the pattern search is converged to the minimum.
At each iteration, the search direction set is D_{k} and the step length is l_{k}. The frame that consists of the points at the next iteration is x_{k} + l_{k}p_{k} for all p_{k} ∈ D_{k}. When one of the points in the frame yields a significant decrease of the objective function, it becomes the next searching point, x_{k}_{+1}, and the next step length, l_{k}_{+1}, is increased. If none of the points in the frame has a significantly smaller function value than f_{k}, the next step length, l_{k}_{+1}, will be reduced.The procedure of the pattern search algorithm is illustrated as follows.
1.Define the sufficient decrease function ρ(l). Choose convergence tolerance l_{tol}, contraction parameter θ_{max}, and search direction set D_{k}. Choose initial point x_{0}, initial step length l_{0}, and initial direction set D_{0}. Set the iterationnumber, k, to 1.
2.Evaluate the objective function value f(x_{k}).
3.If the step length is l_{k} ≤ l_{tol}, take x_{k} as the optimal point, x^{*}. Stop.
4.If f(x_{k}+l_{k}p_{k}) < f(x_{k})  ρ(l_{k}) for some p_{k} ∈ D_{k}, set x_{k}_{+1} as x_{k}+l_{k}p_{k} for some p_{k} ∈ D_{k}, and increase the step length for the next iteration.
5.If f(x_{k}+l_{k}p_{k}) ≥ f(x_{k})  ρ(l_{k}) for all p_{k} ∈ D_{k}, use the same point x_{k} as the point x_{k}_{+1} for the next iteration. Reduce the step length for the next iteration, l_{k}_{+1}. Set l_{k}_{+1} as θ_{k}l_{k}, where 0 < θ_{k} ≤ θ_{max} < 1.
6.Set k = k + 1. Go to Step 2.
MATLAB has the function, patternsearch, that finds the minimum of a function using the pattern search method. It can handle optimization problems with nonlinear, linear, and bound constraints, and does not require functions to be differentiable or continuous. An example is provided below to illustrate how to solve a nonlinear optimization problem without constraints using patternsearch.
Example: Use the pattern search method to solve the following threedimensional quadratic problem.
(12.40) 
(12.41) 
(12.42) 
The sufficient decrease function is set as ρ(l) = 10l^{3∕2}. The contraction factor is 0.5. The convergence tolerance is set as 0.001. The contraction factor, θ_{max}, is set as 0.5. The initial step length, l_{0}, is set as 0.5. The starting point, x_{0}, is [0, 0, 0]^{T}. The search directions set, D_{k}, is the coordinate direction set, which is as follows.
(12.43) 
At the initial point, x_{0}, the function value, f(x_{0}), is 0. At the first iteration, the function values at x_{0} +l_{0}p_{k}, (k = 1,..., 6) are 9, 10.5, 10.5, 7, 7.5, and 5.5. The value of sufficient decrease function is as follows.
(12.44) 
For the six points x_{0} + l_{0}p_{k}, three of them (k = 4, 5, 6) satisfy the equation, f(x_{k}+l_{k}p_{k}) < f(x_{k})(l_{k}). Set the point, x_{0} + l_{0}p_{5}, as x_{1}, and increase the step length. The iterations continue. The minimum point is obtained as follows.
(12.45) 
The optimal function value f(x^{*}) = 35.9.
This problem can also be solved by MATLAB function patternsearch. The result is the same. The main file is as follows.
clc
clear
X0 = [0; 0; 0];
[x,fopt]=patternsearch(@obj_fun,X0)
The file for the objective function is as follows.
function f = obj_fun(x)
Q = [2 0 0; 0 3 0; 0 0 5];
c = [8; 9; 8];
f = 0.5*x’*Q*xc’*x;
12.4.2 FirstOrder Methods
Steepest Descent
The steepest descent method is a firstorder optimization algorithm. In each iteration, it takes the negative direction of the gradient as the descent direction of the objective function. This method takes steps along the descentdirection to find a local minimum. The procedure is as follows.
1.Choose a starting point x_{0}. Set k = 0.
2.Check the conditions of f(x_{k}). If x_{k} is the minimum, stop.
3.Calculate the descent direction, p_{k}, as follows.
(12.46) 
4.Evaluate the step length, α_{k}. The step length, α_{k}, should be an acceptable solution to the following minimization problem.
(12.47) 
5.Update the value of the variable, x_{k}_{+1}, using the descent direction and the step length.
(12.48) 
6.Set k = k+1 and go to Step 2.
The steepest descent method is the simplest Newtontype method for nonlinear optimization. The search direction is the opposite of the gradient. It does not require the computation of second derivatives, and it does not requirematrix storage. The computational cost per iteration is lower compared to other Newtontype methods. However, this method is very inefficient at solving most problems. It converges only at a linear rate. This means it requires more iterations. As a result, although the computational cost per iteration is low, the overall costs are high. However, the steepest descent method is theoretically useful in proving the convergence of other methods; and it provides a lower bound on the performance of other algorithms.
The procedure to solve an unconstrained nonlinear problem using the steepest descent method is illustrated with the following example. Only the first two iterations are illustrated in this example.
Example: Use the steepest descent method to solve the following threedimensional quadratic problem.
(12.49) 
(12.50) 
(12.51) 
In this problem, the steepest descent direction is
(12.52) 
The step length used for exact line search x_{k}_{+1} = x_{k} + α_{k}p_{k} is
(12.53) 
Choose an initial point x_{0} = (0, 0, 0)^{T}. At x_{0},
(12.54) 
(12.55) 
and
(12.56) 
Since ∇f(x_{0}) > 0, go to the next iteration. The step length is α_{0} = 0.1875. The next point is
(12.57) 
At point x_{1}, the function value, the gradient, and the norm of the gradient are as follows.
(12.58) 
(12.59) 
and
(12.60) 
The next step length is α_{1} = 0.1715 and the new point is
(12.61) 
At point x_{2}, the function value, the gradient, and the norm of the gradient are as follows.
(12.62) 
(12.63) 
and
(12.64) 
The above are the first two iterations to solve this problem. It takes 36 iterations before the norm of the gradient is less than 0.001. The optimal solution is
(12.65) 
The corresponding optimal function value is f(x^{*}) = 0.65.
Conjugate Gradient Methods
In addition to the Gaussian elimination method, iterative methods are also a valuable tool for solving linear equations in the form of Eq. 12.66. One of the many iterative methods is the conjugate gradient method, which is designed to solve Eq. 12.66 when the matrix A is symmetric and positive definite.
(12.66) 
Equation 12.67 is an optimization problem that has a quadratic objective function.

The matrix A is symmetric and positive definite. The firstorder necessary conditions require the gradient of f(x) to be equal to zero. The gradient can be expressed as Eq. 12.68.
(12.68) 
Since the Hessian matrix, ∇^{2}f(x) = A, is positive definite, the sufficient conditions for a local minimum are satisfied. This optimization problem is equivalent to the problem of linear equations as shown by Eq. 12.66.
If a set of vectors, p_{i}, satisfies Eq. 12.69, the vectors are said to be conjugate with respect to the matrix A. Any set of conjugate vectors is also linearly independent.
(12.69) 
The residual of Eq. 12.68 is
(12.70) 
In Eq. 12.67, if the matrix A is diagonal, the contours of the function f(x) are ellipses. The axes of the ellipses are aligned with coordinate directions. The conjugate directions are along the coordinate directions. Figure 12.6 illustratesthe conjugate directions in a two dimensional space. The minimum of the function can be found by performing successive onedimensional minimizations along the conjugate directions.
Figure 12.6. Conjugate Directions for a Diagonal Matrix A
In Eq. 12.67, if Matrix A is nondiagonal, the contours of the function f(x) are also ellipses. However, the axes of the ellipses are not aligned with the coordinate directions. The conjugate directions are not along the coordinate directions. Figure 12.7 shows the conjugate directions for a nondiagonal matrix in a two dimensional space.
Figure 12.7. Conjugate Directions for a Nondiagonal Matrix A
The conjugate gradient methods used to solve the quadratic optimization problem (Eq. 12.67) exist in different versions of the algorithm. The steps for one of these versions are illustrated as follows.
1.Set an initial guess x_{0}. The residual is r_{0} = Ax_{0}  b. Set a vector with the same dimension of the conjugate vector, p_{0} = r_{0}. Specify the convergence tolerance, > 0. Initialize the iteration counter, i = 0.
2.Check the value of the residual r_{i}. If r_{i} < , stop.
3.Set α_{i} = r_{i}^{T}r_{ }_{i}∕p_{i}^{T}Ap_{ }_{i}.
4.Evaluate the next point, x_{i}_{+1} = x_{i} + α_{i}p_{i}.
5.Evaluate the residual, r_{i}_{+1} = r_{i} + α_{i}Ap_{i}.
6.Set β_{i}_{+1} = r_{i}_{+1}^{T}r_{ }_{i}_{+1}∕r_{i}^{T}r_{ }_{i}.
7.Set the conjugate vector, p_{i}_{+1} = r_{i}_{+1}+β_{i}_{+1}p_{i}.
8.Set i = i + 1. Go to Step 2.
Example: Use the above algorithm for conjugate gradient methods to solve the following threedimensional quadratic problem.
(12.71) 
(12.72) 
(12.73) 
The convergence tolerance, , is set as 0.01. Choose an initial point x_{0} = (0, 0, 0)^{T}. At this point, the residual is r_{0} = (8, 9, 8)^{T}. This point is not optimal. The algorithm performs several iterations in search of the optimal solution.Table 12.4 provides the results for all the iterations.
Table 12.4. The Iterations for the Conjugate Gradient Method
At the initial point x_{0}, r_{0} = 14.46. In the first iteration, r_{1} = 5.24. In the second iteration, r_{2} = 1.22. In the third iteration, r_{3} = 0. Since r_{3} ≤ , the point x_{3} = (4.00,3.00,1.60)^{T} is the optimal solution to this example, and the optimal function value is f(x_{3}) = 35.9.
Different versions of the conjugate gradient method can be applied to quadratic optimization problems and general unconstrained nonlinear optimization problems. The FletcherReeves method extends the conjugate gradient method to nonlinear optimization problems by making two changes. First, the onedimensional minimum along each conjugate vector is identified by a line search. Second, the residual is replaced by the gradient of the nonlinear function. The steps of the FletcherReeves method are as follows.
1.Set an initial guess x_{0}. At x_{0}, evaluate the function, f(x_{0}), and its gradient, ∇f(x_{0}). Set p_{0} = ∇f(x_{0}). Specify the convergence tolerance of the residual, > 0. Set the iteration number, i, to 0.
2.Check the norm of the gradient ∇f(x_{i}). If ∇f(x_{i}) < , stop.
3.Use a line search along the direction of the conjugate vector to determine the minimum x_{i}_{+1} = x_{i} + α_{i}p_{i}.
4.Evaluate the gradient ∇f(x_{i}_{+1}).
5.Set β_{i}_{+1} = ∇f(x_{i}_{+1})^{T}∇f(x_{ }_{i}_{+1})∕∇f(x_{i})^{T}∇f(x_{ }_{i}).
6.Set the conjugate vector p_{i}_{+1} = ∇f(x_{i}_{+1}) + β_{i}_{+1}p_{i}.
7.Set i = i + 1. Go to Step 2.
12.4.3 SecondOrder Methods
Newton Method
Newton’s method approximates the function f(x) with a quadratic function at each iteration. The approximated quadratic function is minimized exactly, and a descent direction is evaluated.
In each iteration, the function is approximated as follows.
According to the firstorder necessary condition, the first derivative of (x) should satisfy ∇ (x) = 0. This condition can be expressed as
(12.75) 
Solving the above equation,
(12.76) 
Assuming the inverse of the Hessian exists, then
(12.77) 
The second term in Eq. 12.77 is used as the reasonable direction to estimate x_{k}_{+1} in the next iteration.
(12.78) 
The descent direction, p_{k}, can be expressed as
(12.79) 
If [∇^{2}f(x_{ }_{k})]^{1} exists, p_{k} can be evaluated using Eq. 12.79. The other way to evaluate the descent direction, p_{k}, is by solving the following equation.
(12.80) 
Equation 12.80 can be solved by Gaussian Elimination, Cholesky Factorization, or other suitable methods.
The expression of the descent direction is simple in Newton’s method. However, Newton’s method has certain shortcomings. (1) Newton’s method does not always find a minimum, and might only find a stationary (not minimum) point. (2) It converges only if one starts sufficiently near an optimal solution. (3) The computational cost to evaluate the inverse of the Hessian is high, especially for highdimensional optimization problems.
The steps to solve an unconstrained nonlinear problem are illustrated with the following example.
Example: Use Newton’s method to minimize the following function.
(12.81) 
The gradient of f(x) is as follows.
(12.82) 
The Hessian of f(x) is as follows.
(12.83) 
The inverse of ∇^{2}f(x) is as follows
(12.84) 
Choose an initial point, x_{0} = (10, 10)^{T}. The next point, x_{1}, is evaluated as follows.
(12.85) 
The point x_{1} = (0, 0)^{T} is the optimal solution for this example, and the optimal function value is f(x_{1}) = 0.
QuasiNewton Methods
QuasiNewton methods are among the most widely used methods for nonlinear optimization. When the Hessian is difficult to compute, quasiNewton methods are effective for solving these kinds of problems. There are differentquasiNewton methods, but they are all based on approximations of the Hessian by another matrix to reduce the computational cost. In Sec. 12.4.3, Newton’s method estimates the search direction, p_{k}, that satisfies the following equation.
(12.86) 
QuasiNewton methods use an approximation of the Hessian, B_{k}, to obtain the search direction, p_{k}. The search direction is obtained by solving
(12.87) 
For onedimensional nonlinear optimization problems, quasiNewton methods are generalizations of the secant method. The secant method approximates the Hessian using
(12.88) 
where f′(x_{k}) can be approximated as
(12.89) 
For multidimensional nonlinear optimization problems, the above condition is rewritten as
(12.90) 
From the above Eq. 12.90, the condition for the approximation matrix B_{k} is derived, which can be expressed as
(12.91) 
The matrix B_{k} has n^{2} entries and the secant condition only has a set of n equations. This condition is insufficient to define B_{k} uniquely. The matrix B_{k} does not always have a unique solution for the secant condition.
There are several ways to approximate and update the matrix B_{k}. Define two vectors, s_{k} and y_{k}, as follows.
s_{k} 
= x_{k}_{+1}  x_{k} 
(12.92) 
y_{k} 
= ∇f(x_{k}_{+1}) ∇f(x_{k}) 
(12.93) 
As one of the approximation formulae, the symmetric rankone update formula can be expressed as
(12.94) 
The B_{k}’s updated by Eq. 12.94 are symmetric. However, they are not necessarily positive definite.
As one of the most widely used formulae, the BFGS update formula can be expressed as
(12.95) 
Upon selecting the appropriate update formula of B_{k}, the quasiNewton algorithm is illustrated as follows.
1.Choose a starting point x_{0}. Choose an initial Hessian approximation B_{0}, which can be the diagonal matrix, I. Set k = 0.
2.If x_{k} is optimal, stop.
3.Determine the search direction, p_{k}, by solving B_{k}p_{k} = ∇f(x_{k}).
4.Find the step length, α_{k}, by a line search to minimize f(x_{k}+ α_{k}p_{k}).
5.Set x_{k}_{+1} = x_{k} + α_{k}p_{k}.
6.Update B_{k}_{+1} using the selected update formula.
7.Set k = k + 1 and go to Step 2.
The steps to solve an unconstrained nonlinear problem are illustrated with the following example.
Example: Use a quasiNewton method with the symmetric rankone B_{k} update formula to solve the following threedimensional quadratic problem.
(12.96) 
(12.97) 
(12.98) 
Choose an initial point x_{0} = (0, 0, 0)^{T}. The initial guess is B_{0} = I. At this point, x_{0}, ∇f(x_{0}) = c = 14.4568, so this point is not optimal.
From B_{0}p = ∇f(x_{0}), the first search direction is derived as
(12.99) 
The line search formula gives α_{0} = 0.3025. The new estimated point is
(12.100) 
The gradient at the new point x_{1} is
(12.101) 
To update B_{1}, first evaluate s_{0} and y_{0}.
(12.102) 
(12.103) 
(12.104) 
At this new point x_{1}, ∇f(x_{1}) = 5.2423. It is not optimal. The search direction is
(12.105) 
The line search step length is α_{1} = 0.3471. At the new point, x_{2}, the new estimates of the solution, the gradient, and the Hessian are
(12.106) 
(12.107) 
(12.108) 
At the point x_{2}, ∇f(x_{2}) = 0.8397. It is not optimal. The new search direction is
(12.109) 
and the step length α_{2} = 0.4145. The line search yields the point
(12.110) 
At point x_{3},
(12.111) 
The point x_{3} is the optimal solution to this example, and the optimal function value is f(x_{3}) = 35.9.
12.5 Comparison of Computational Issues in the Algorithms
12.5.1 Rate of Convergence
The optimization algorithms discussed in this chapter are iterative methods, and are used to evaluate a sequence of approximate solutions. The rate of convergence is a measure of efficiency, which describes how quickly the estimates of the solution approach the exact solution. Efficient algorithms reduce the computational cost. Assume a sequence of points x_{k} converges to a solution x^{*}. The error at the k^{th} iteration can be defined as
(12.112) 
As the sequence approaches the solution x^{*}, the limit of e_{ }_{k} is 0.
If Eq. 12.113 holds, the sequence x_{k} is said to converge to x^{*} with rate r and rate constant C. The constant C < ∞.
(12.113) 
For a sequence of errors, when r = 1, the convergence is said to be linear, as Eq. 12.114 indicates. Note that if 0 < C < 1, then the norm of the error is reduced at every iteration. If C > 1, then the sequence diverges.
(12.114) 
If r = 1 and C = 0, the rate of the convergence is said to be superlinear. For any r > 1, if
(12.115) 
then Eq. 12.116 holds. Any convergence with r > 1 has a superlinear rate of convergence.
(12.116) 
When r > 2, the convergence is called quadratic. The rates of convergence for some optimization algorithms in this chapter are listed in Table 12.5.
Table 12.5. Comparison of Optimization Methods for Unconstrained Nonlinear Problems




Method 
Converge 
Convergence 
Computation 
Problem 
Proof 
Rate 
Cost 
Scale 



Bisection 
yes 
linear 
very high 
1dimensional 
Golden Section 
yes 
linear 
high 
1dimensional 
Quadratic Appro 
yes 
linear 
very high 
1dimensional 
Pattern Search 
yes 
high 
small 

Steepest Descent 
yes 
linear 
high 
medium 
Conjugate Grad 
yes 
linearquadratic 
medium high 
large 
Newton 
no 
quadratic 
medium  high 
smallmedium 
QuasiNewton 
yes 
superlinear 
low  medium 
small  mid 



12.5.2 Line Search Methods
In the above sections, at each iteration, the steepest descent method, the conjugate gradient method, the Newton method and the quasiNewton methods first evaluate a search direction p_{k}, and then do a line search along the direction to determine an appropriate step length, α_{k}. The iteration is given by
(12.117) 
Most line search algorithms require the search direction, p_{k}, to be a descent direction, and it should satisfy
(12.118) 
The step length, α_{k}, should generate a substantive reduction of f(x) along the descent direction. The minimum of Eq. 12.119 is chosen as a function of α_{k}.
(12.119) 
In general, it is not computationally practical to determine that exact minimum. Some practical strategies perform an inexact line search to identify a step length with adequate reduction in f(x). These strategies reduce computational cost.
A popular inexact line search condition requires that α_{k} give a sufficient decrease in the objective function, f(x), as measured by the following inequality:
(12.120) 
where the constant c_{1} ∈ (0, 1). This inequality is called the Armijo condition.
The sufficientdecrease condition is not, by itself, adequate to ensure that the algorithm makes reasonable progress. To rule out unacceptable short steps, the curvature condition is introduced as the following inequality.
(12.121) 
where the constant c_{2} ∈ (c_{1}, 1). The sufficientdecrease condition and the curvature condition are collectively known as the Wolfe conditions.
12.5.3 Comparison of Different Methods
The optimization methods presented in this chapter include the Bisection method, the Golden Section Search method, the quadratic approximation method, the pattern search methods, the steepest descent method, the conjugategradient methods, the Newton method, and the quasiNewton methods.
All of these methods have applications in different fields. Unconstrained nonlinear problems can be classified as small, mid, or large scale problems. Computers used to run the codes of the algorithms can have different computational capacity. The appropriate method can be selected, depending on the application, the computation capability, and the storage requirements. Note that computation capacity, in this regard, and storage requirements have become less critical issues in recent years.
Table 12.5 provides a comparison of the different methods. The comparison includes the following properties: convergence guarantee, convergence rate, computation cost, and problem scale. It is important to keep in mind that the overall computational cost is a combination of (i) the number of iterations, and (ii) the cost of each iteration. While a method may converge in fewer iterations, it might require more costly computation at each iteration. For example,the Newton method may often converge with fewer iterations than the Conjugate Gradient method. However, at each iteration, Conjugate Gradient only requires gradient computation, while Newton requires a very costly Hessiancomputation.
The Conjugate Gradient method may exhibit linear to quadratic convergence, depending on the particular implementation. As standalone algorithms, Bisection, Golden Section, and Quadratic Approximation methods are capableof performing only univariate optimization or line search. However, they can be applied as techniques to perform line search along a direction of improvement (direction vector) within the context of multivariate optimization. We note that some of the above comments are somewhat subjective, and that much depends on the particular problem and the computing resources at hand.
This chapter provides the reader with sufficient information to understand the basics of unconstrained nonlinear programming. This background makes it possible to explore more theoretical aspects of the topic [5].
12.6 Summary
Since most engineering design problems tend to be nonlinear in nature, the topic of nonlinear programming (NLP) is of paramount importance in learning optimization. This chapter introduced the basics of unconstrained NLP. The chapter began with a description of the necessary and sufficient conditions for optimality in unconstrained NLP problems. This description was followed by the introduction of the major techniques/algorithms used for solving univartiate and multivariate NLP problems. Specifically, two major classes of univariate methods were presented: (i) interval reduction methods and (ii) polynomial approximation methods. This was followed by a description of thethree major classes of multivariate unconstrained NLP methods. These are: the Zeroth order methods (e.g., line search and pattern search), the first order methods (e.g., steepest descent and conjugate gradient), and the second ordermethods (e.g., Newton’s and Quasi Newton methods). The chapter concluded with a comparative discussion of the rate of convergence and the computational cost of these different classes of methods for solving unconstrained NLP problems.
12.7 Problems
Warmup Problems
12.1Consider the function:
1.Calculate the Hessian matrix of the function.
2.Calculate the Hessian at (i)[1, 1], (ii)[2, 2], and (iii)[0, 0].
3.Use optimality conditions to classify the points in Question (2) as minimum, maximum, or saddle points.
Intermediate Problems
12.2Consider the following two dimensional function:
1.Calculate the gradient vector of the above function by hand.
2.Calculate the gradient at the following points by hand: (i)[1, 1], (ii)[2, 2], and (iii)[0, 0].
3.Write a small program in MATLAB that evaluates the gradient at each point in a twodimensional grid in the space 5 ≤ x_{1} ≤ 5. Choose an appropriate grid spacing (at least 100 points).
4.Find a way in MATLAB to plot each gradient vector of Question (3) as a small arrow (pointing in the correct direction) at its x_{1}, x_{2} location; that is, if you evaluated 1, 000 gradient vectors in Question 3, then your plot should contain 1, 000 arrows. In addition, plot the contours of f on top of the gradient plot.
5.Interpret your plot and comment on the likely location of the minimum point by observing the gradient vectors and the contours of f.
12.3Consider the function:
1.Write a program to estimate the minimum of the function using the steepest descent method. Use the starting point [10, 1].
2.Compare your optimum results with the results of fmincon. How many function evaluations does your code need? How many function evaluations does fmincon need?
3.On the contour plot of the function, plot the sequence of points obtained during the optimization iterations using your steepest descent code.
Advanced Problems
12.4Consider the following function:
We wish to find the minimum of the above function in the given range of x.
1.Using the following listed methods, estimate the minimum of the above function using up to three iterations. Solve Parts (a) through (c) by hand, and please show your complete work.
(a)Interval Halving Method (i.e., Bisection method)
(b)NewtonRaphson Method using x_{0} = 1 as the starting point
(c)Golden Section Method
(d)Plot the function f(x) using MATLAB, and identify the optimum solution in the range of x. For Parts (a) through (c), calculate the percentage error between the optimum x value and the value of x after three iterations.Present your results in a tabular form, as shown below, for methods in Parts (a), (b), and (c).
2.Write a MATLAB code for the NewtonRaphson method that evaluates the minimum of the function in Problem 1. Use an appropriate stopping criterion.
3.Verify your results using fmincon. Make sure your code gives you the same solution as that given by fmincon. You might have to change your stopping criterion.
4.How many function evaluations does your code need? How many function evaluations does fmincon need? Use the same starting point for your code and fmincon.
Table 12.6. Results Summary for Given Search Methods




Iteration 
(a) 
(b) 
(c) 








x 
f(x) 
x 
f(x) 
x 
f(x) 



1 

2 

3 




12.5Consider the following function:
We want to find the minimum of the above function in the given range of x.
1.Using the following methods, estimate the minimum of the above function by implementing up to three iterations. Solve Parts (a) through (c) by hand, and please show your complete work.
(a)Bisection Method
(b)Secant Method
(c)Successive quadratic estimation Present your results in a tabular form, as shown below, for methods in Parts (a), (b), and (c).
2.Write a MATLAB code for the above three methods for the given function. Use an appropriate stopping criterion.
Table 12.7. Results Summary for Given Search Methods




Iteration 
(a) 
(b) 
(c) 








x 
f(x) 
x 
f(x) 
x 
f(x) 



1 

2 

3 




BIBLIOGRAPHY OF CHAPTER 12
[1]M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Programming: Theory and Algorithms. John Wiley and Sons, 3rd edition, 2013.
[2]A. Ruszczynski. Nonlinear Optimization. Princeton University Press, 2011.
[3]B. Zwicknagl and R. Schaback. Interpolation and approximation in taylor spaces. Journal of Approximation Theory, 171:6583, 2013.
[4]A. R. Conn, K. Scheinberg, and L. N. Vicente. Introduction to Derivativefree Optimization. SIAM, 2009.
[5]R. J. Vanderbei. Linear Programing: Foundations and Extensions. Springer, 2014.