Mathematical Preliminaries - HELPFUL PRELIMINARIES - Optimization in Practice with MATLAB for Engineering Students and Professionals (2015)

Optimization in Practice with MATLAB for Engineering Students and Professionals (2015)

PART 1


HELPFUL PRELIMINARIES

2


Mathematical Preliminaries

2.1 Overview

This chapter presents some of the basic mathematics needed for learning and using optimization. Specifically, this chapter reviews the basics of linear algebra (e.g., vectors, matrices and eigenvalues) and calculus. A review of the material presented in this chapter may be helpful. For most, it will be well known material; for others, it will be a great opportunity for a quick review. For others still, it will be an opportunity to learn the minimally required mathematics for the material presented in this book.

2.2 Vectors and Geometry

A scalar is a quantity that is determined by its magnitude and sign. For example, length, temperature, and speed are scalars. A vector is a quantity that is determined by both its magnitude and its direction. It is a directed line segment. For example, velocity and force are examples of vectors.

A vector can be expressed in terms of its components in a Cartesian Coordinate system as a = a1i + a2 j + a3k, where i, j, and k denote the unit vectors along the X, Y and Z axes. The magnitude of this vector is given as ||a|| = .

The length of a vector x is also called the norm (or Euclidean norm) of the vector, denoted by |x|. The position vector of a given point A : {x,y,z} is the vector from the origin of the axes to the point A.

2.2.1 Dot Product

The dot product or Inner product of two vectors yields a scalar. Mathematically, the dot product of two vectors a and b, denoted ab, is the product of their lengths times the cosine of the angle θ between them.

ab = |a||b| cos θ

(2.1)

In terms of vector components, the dot product of two vectors a = a1i + a2 j + a3k and b = b1i + b2j + b3k is given as

ab = a1b1 + a2b2 + a3b3

(2.2)

Two vectors are said to be orthogonal when their dot product is equal to zero (θ = 90o).

2.2.2 Equation of a Line

The slope-intercept form of the equation of a line that has a slope of m and y-intercept c is

y = mx + c

(2.3)

The point-slope form of the equation of a line that passes through a point P : (x1,y1) with a slope m is

(yy1) = m(xx1)

(2.4)

The two-point form of the equation of a line that passes through two points P1 : (x1,y1) and P2 : (x2,y2) is

(yy1)(x2x1) = (xx1)(y2y1)

(2.5)

The two-intercept form of a line with x intercept a and y intercept b is given as

(2.6)

2.2.3 Equation of a Plane

The equation of a plane with a normal vector n = [a,b,c]T passing through the point (x 0,y0,z0) is given as

n ⋅ (xx0) = 0

(2.7)

where x = (x,y,z) is any generic point on the plane. Substituting the value of x in Eq. 2.7, the general form of the equation of a plane is

ax + by + cz + d = 0

(2.8)

where d = −ax0by0cz0. The plane specified in this form has its X, Y , and Z intercepts at p = , q = , and r = .

Figure 2.1. Equation of a Plane

In the intercept form, a plane passing through the points (a, 0, 0), (0,b, 0), and (0, 0,c) can be given as

(2.9)

The plane passing through the two points (x1,y1,z1) and (x2,y2,z2), and parallel to the direction [a,b,c]T is given by the following equation.

(2.10)

The plane that passes through the three points (x1,y1,z1), (x2,y2,z2), and (x3,y3,z3) is given by,

(2.11)

In the above two equations, we used the notation = det(A), for any given matrix A.

2.3 Basic Linear Algebra

Linear algebra can be defined as the study of the theory and application of linear systems of equations and linear transformations. Linear algebra uses matrices and their properties in a systematic way, often to solve physically meaningful problems (Ref. [1]).

2.3.1 Preliminary Definitions

A matrix is a rectangular array of numbers, symbols, or functions, which are called the elements of the matrix. The following matrices are presented to help discuss some basic matrix definitions.

2×3

(2.12)

2×2

(2.13)

4×1

(2.14)

1×4

(2.15)

2×2

(2.16)

3×3

(2.17)

4×4

(2.18)

4×4

(2.19)

The first example provided in Eq. 2.12 has two rows (horizontal lines) and three columns (vertical lines). The order of this matrix is 2 × 3 since it has two rows and three columns. Matrices where the number of rows is not equal to the number of columns are called Rectangular matrices. The second example provided in Eq. 2.13 has two rows and two columns. Matrices where the number of rows is equal to the number of columns are called Square matrices. The third example found in Eq. 2.14 has only one column. These are called Column matrices. The fourth example in Eq. 2.15 has only one row. These are called Row matrices. A general notation for a matrix is given by

Any element in a matrix can be represented using the double subscript notation as aij, where i denotes the row and j the column in which the element is located. When m = n, the matrix is said to be square. The diagonal containing the elements a11, a22,...ann is called the principal diagonal of the matrix. The matrix given by Eq. 2.17, where all the elements in the principal diagonal are ones, and the rest of the elements are zeros, is called theIdentity Matrix. The matrix in the form shown in Eq. 2.18, where all the elements below the principal diagonal are zeros, is called an Upper Triangular matrix. A Lower triangular form is provided in Eq. 2.19, where all the elements above the principal diagonal are zeros.

2.3.2 Matrix Operations

Addition

The addition of two matrices is possible only if they are of the same order. If two matrices A and B have the same order, then A + B can be obtained by adding the corresponding elements of A and B. Consider the following example.

Example 1:

(2.20)

and

(2.21)

then,

(2.22)

Subtraction

Similar to the addition operation, two matrices can be subtracted only if they have the same order. For the two matrices A and B given by Eqs. 2.20 and 2.21, AB can be obtained by subtracting the corresponding elements as,

Example 2:

Scalar Multiplication

The product of any matrix with a scalar can be obtained by multiplying each element of that matrix with that scalar. For example, for matrix A in Eq. 2.20, 4A can be obtained by simply multiplying each element of A by 4.

Example 3:

4A =

Matrix Multiplication

Unlike matrix addition, matrix multiplication is not obtained by multiplying the corresponding elements of two matrices. Consider two matrices C and D. These two matrices can be multiplied to yield CD only if the number of columns in C is equal to the number of rows in D. If C is an m × n matrix and D is an n × p matrix, then the product of C and D has an order of m × p. For example, consider the following:

Example 4:

and

then

In addition, consider

Note that CD DC. The following are some important properties of matrix multiplication.

1. AB BA

2. AI = IA = A

3. AB = 0 does not necessarily imply A = 0 or B = 0

4. k(AB)=(kA)B=A(kB), where k is a scalar

5. A(BC)=(AB)C

6. (A + B)C=AC + BC

7. A(B + C)=AB + AC

Transpose

The transpose of a matrix is obtained by interchanging its rows and columns. Consider the following example.

Example 5:

Some important properties of the transpose of a matrix follow:

1. If AT = A, then A is called a Symmetric Matrix. If AT = −A, then A is called a Skew Symmetric Matrix.

2. (AT)T = A

3. (A + B)T = AT + BT

4. (AB)T = BTAT

2.3.3 Determinants

The general form of a determinant of a matrix is given as

where j represents any row in A, Cij is called the Cofactor of A, and Mij is the corresponding minor. The minor determinant Mij is the determinant of the sub-matrix obtained by deleting the ith row and the jth column of the matrix. A cofactor is a signed minor determinant. Specifically, Cij = Mij when i + j is even and Cij = −Mij when i + j is odd. For example, let us compute the determinant of a 2 × 2 matrix.

Example 6:

The following example demonstrates computation of the determinant of a 3 × 3 matrix.

Example 7:

The following are some important properties of determinants of (square) matrices.

1. det(AT) = det(A)

2. det(I) = 1

3. If two rows or columns of a matrix are identical, then det(A) = 0.

4. If all entries of a row or column are all zeros, then det(A) = 0.

5. If B is obtained by interchanging any two rows or columns of A, then det(B) = −det(A)

6. If a row or column of a matrix is a linear combination of two or more rows or columns, respectively, then det(A) = 0.

7. det(AB) = det(A)det(B)

8. det(cA) = cndet(A), where c is a scalar.

9. If the determinant of a matrix is zero, the matrix is said to be singular. Otherwise, the matrix is said to be non-singular.

10.If a matrix is upper triangular or lower triangular, the determinant of that matrix is simply equal to the product of the elements in the principal diagonal.

2.3.4 Inverse

For a square matrix A, if there exists a matrix B, such that, AB = BA = I, then B is the inverse of A. A general definition of inverse is given as

A-1 = [Adjoint(A)]

(2.23)

where Adjoint(A) = CT, and C is the matrix of the co-factors of A. For a 2 × 2 matrix, the above equation reduces to the following simple formula.

It follows from Eq. 2.23 that the inverse of a matrix exists if and only if the determinant is nonzero (i.e., the matrix is non-singular). The inverse of a matrix can also be computed using the Gauss Jordan method. The following example illustrates how to compute the inverse of a 2 × 2 matrix using its determinant.

Example 8:

The following properties of matrix inversion (assuming the associated matrices are invertible) are also important to know.

1. AA-1 = I

2. (AB)-1 = B-1A-1

3. (AT)-1 = (A-1)T

2.3.5 Eigenvalues

The Eigenvalue problem is defined as follows:

For an n × n matrix, find all scalars λ such that the equation

Ax = λx

(2.24)

has a nonzero solution x. Such a scalar λ is called the eigenvalue of A, and any nonzero n × 1 vector, x, satisfying Eq. 2.24 is called the eigenvector corresponding to λ.

Given a matrix A, the eigenvalues of A, denoted by λ, can be computed by finding the roots of the equation |AλI| = 0. This equation |AλI| = 0 is called the characteristic equation. The following example illustrates how to find the eigenvalues of a 2 × 2 matrix.

Example 9

A = 2×2

The characteristic equation of A can be written as |AλI| = 0, where λ is the eigenvalue we want to compute.

The above quadratic equation is the characteristic equation of A. Recall that for a quadratic equation given by ax2 + bx + c = 0, the roots are given as . The eigenvalues of A are the roots of λ2 + 7λ + 6 = 0, which are found to be λ 1 = −1 and λ2 = −6.

2.3.6 Eigenvectors

Given a matrix A, consider the equation (AλI)x = 0, where λ is an eigenvalue of A. The nonzero values of x that satisfy the above equation for each eigenvalue are called the Eigenvectors of A. We will learn how to compute eigenvectors with the help of an example. Consider the matrix A in Example 9.

Example 10: Note that it is important to compute eigenvalues before we compute eigenvectors. Previously, in Example 9, we obtained the eigenvalues of A as λ1 = −1 and λ2 = −6. For the first eigenvalue λ1 = −1, we find values of x such that (Aλ1I)x = 0, or (A − (−1)I)x = 0.

Expanding the above matrix, we obtain

In order to find the eigenvector corresponding to λ1 = −1, we need to find the values of x1 and x2 that satisfy the above two equations. By inspection, we find that x1 = 1 and x2 = 2 satisfy the above equations. Thus, the vector [1,2]T is an eigenvector corresponding to the eigenvalue λ1 = −1.

After finding a particular x as an eigenvector for a given eigenvalue, note that a multiple of x, for example kx, will also be an eigenvector for that given eigenvalue. For instance, for λ1 = −1, [, 1]T is also a valid eigenvector. Using the procedure outlined above, we can compute the eigenvector for the eigenvalue of λ2 = −6 to be [2,−1]T.

Note that there could be several linearly independent eigenvectors possible for a single eigenvalue. The maximum number of linearly independent eigenvectors corresponding to an eigenvalue of λ is called the geometric multiplicity of λ.

2.3.7 Positive Definiteness

A matrix is said to be positive definite if all its eigenvalues are positive. A matrix is said to positive semidefinite if all its eigenvalues are non-negative (that is, including zero).

A matrix is said to be negative definite if all its eigenvalues are negative. A matrix is said to negative semidefinite if all its eigenvalues are non-positive (that is, including zero).

2.4 Basic Calculus: Types of Functions, Derivative, Integration and Taylor Series

You can find a mathematical definition of function in numerous books. Instead of using mathematical language, we define function on a lighter note. Think about a vending machine that takes in money and gives you soda or coffee. Functions behaves similarly. They generally take in one or more numbers and output a number. For example, consider a simple trigonometric function f (x) = sin(x). Now, when we input x = π∕2, we obtain f (x) = 1. Thus, the sinefunction maps the input π∕2 to the output, which is 1. Try typing sin(pi/2)in MATLAB. Most optimization books provide some useful mathematical background (Refs. [2, 3]). Note that our simplistic discussion here avaded many important issues, such as invertibility.

2.4.1 Types of Functions

In this section, we will introduce different types of functions commonly encountered in optimization.

Continuous Functions

A function f (x) is continuous over an interval [a,b] if the following conditions are satisfied for any point p (apb).

1. f (x) is defined on the interval [a,b].

2. The limit of f (x), as x tends to p, should exist.

3. The limit of f (x) as x tends to p is equal to f (p). In other words, the limit should be equal to the value of the function evaluated at x = p.

The first plot in Fig. 2.2 illustrates a continuous function. For every value of x, there exists a unique value of f (x). In simple terms, we note that if a function can be traced without lifting the hand, it is continuous.

Figure 2.2. Type of Functions

Discontinuous Functions

Functions that do not satisfy the three conditions stated in the previous subsection are called discontinuous functions. As illustrated in the second plot in Fig. 2.2, at x = a, the function has two values, f (a)-and f (a)+. As such, it violates the third condition. A discontinuous function cannot be traced from one end to the other without lifting the hand. Try it on the discontinuous function shown in Fig. 2.2.

Discrete Function

Discrete functions are not defined for all values of x within the given interval; rather, they are defined at particular values of x. As seen in the third plot in Fig. 2.2, function f (x) is defined for discrete values of x. Some optimization algorithms approximate the discrete function using a continuous function.

Monotonically Increasing Function

A function f (x) is said to be monotonically increasing if it satisfies the following property. For two points x = a and x = b, such that b > a, if f (b) ≥ f (a), then the function is monotonically increasing. Monotonically increasing functions are shown by A and B in Fig. 2.3.

Figure 2.3. Monotonic Functions

A function is called Strictly increasing if f (b) > f (a), as represented by A in Fig. 2.3.

Monotonically Decreasing Function

A function f (x) is monotonically decreasing if it satisfies the following property. For two points x = a and x = b, such that b > a, if f (b) ≤ f (a), then the function is monotonically decreasing. Monotonically decreasing functions are shown by C and D in Fig. 2.3.

A function is called Strictly decreasing if, for any b>a, f (b)<f (a); as represented by C in Fig. 2.3.

Unimodal Functions

Unimodal functions have a single optimum in a given interval (other than end-points), which may be either a maximum or a minimum. A unimodal function that has a minimum value at x = p is represented by A in Fig 2.4. We note that f (x) is strictly decreasing for x < p and strictly increasing for x > p. In Fig. 2.4, B represents a special type of unimodal function. Unimodal functions need not be continuous. Function C in Fig. 2.4 is a discrete unimodal function.

Figure 2.4. Unimodal and Multimodal Functions

Multimodal Functions

Unlike the previous functions, multimodal functions have multiple maxima or minima. One such multimodal function is represented by D in Fig. 2.4, which has two minima.

Convex Functions

A convex function is one for which the line joining any two points on the function lies entirely on or above the function defined between these two points. The left plot in Fig. 2.5 displays a convex function. As shown in Fig. 2.5, aand b are any two points in the domain of a function f (x). The line joining their function values is entirely on or above the function defined between a and b.

Figure 2.5. Convex and Concave Functions

If f (x) is twice differentiable (that is, you can find its second derivative) in [a,b], then a necessary and sufficient condition for it to be convex on that interval is that we have the second derivative f′′(x) > 0 for all x in [a,b].

Concave Functions

A function f (x) is said to be concave on an interval [a,b] if the function −f (x) is convex in that interval. Thus, the definition of concave functions is exactly opposite that of convex functions. The right plot in Fig. 2.5 illustrates a concave function. As shown in Fig. 2.5, the line joining the function values of a and b lies entirely on or below the function defined between a and b.

2.4.2 Limits of Functions

Consider a function f (x), and let the function f (x) have a limit L when x tends to a value x0. In simple language, the function f (x) reaches a value L as x approaches the value x0. Mathematically, it is denoted by

(2.25)

Example 2.1 chap_math_limit Let the function be f (x) = ((a + x)2a2)∕x. The limit of this function as x tends to 0 can be determined as

To find the limit of f (x) as x tends to 0, simply substitute x = 0 in the above equation, and obtain

2.4.3 Derivative

The derivative of a function f (x) at a certain point x is a measure of the rate at which that function is changing as the variable x changes. That is, a derivative represents the rate of change of a function with respect to the input variable, or the derivative is the computation of the instantaneous slope of f (x) at point x. Mathematically, the derivative is represented as

(2.26)

where h is an infinitesimally small deviation in x.

2.4.4 Partial Derivative

For a function of several variables, it is often useful to examine the variation of the function with respect to one of the variables while all the other variables remain fixed. This is the purpose of a partial derivative. The partial derivative is obtained in the same way as ordinary differentiation with this constraint. Partial derivatives are defined as derivatives of a function of multiple variables when all but the variable of interest are held fixed during thedifferentiation. Consider a function f (x,y), which is a function of two variables, x and y. The partial derivative of this function with respect to x is defined as the derivative of the function when y is held constant.

(2.27)

Similarly, the partial derivative of the function f (x,y) with respect to y is defined as the derivative of the function when x is held constant.

(2.28)

2.4.5 Indefinite Integration

The previous subsection presented the derivative of a function. Let us assume that the derivative of a function f (x) is F(x). It is possible to recover f (x), within a constant, by integrating F(x), as follows

f (x) = ∫ F(x)dx + C

(2.29)

where, C is the integration constant. For example, if F(x) = x2 + 2, then

f (x) = ∫ (x2 + 2)dx + C = + 2x + C

(2.30)

2.4.6 Definite Integration

Definite integration is the integration of a function, F(x), over a particular range of the variable x. It is denoted by

(2.31)

where a and b are the limits of integration. Let f (x) be the indefinite integral of F(X). The definite integral is determined as follows

(2.32)

For example, if F(x) = x2 + 2, then from Eq. 2.30 we have f (x) = + 2x + C. Now let us evaluate the definite integral over the interval from x = 1 to x = 2.

(2.33)

(2.34)

(2.35)

2.4.7 Taylor Series

A Taylor series [4] provides an approach to approximate a function. Although it generally requires an infinite number of terms to obtain the exact value of the function, in practice only a small number of terms is required for an adequate approximation. The approximation takes the form of a polynomial. The approximation is accurate around a chosen point, and becomes progressively inaccurate as we move away from that point. For a single variable function, f (x), the Taylor series expansion about a point x = x0 is given by

(2.36)

where ′ represents the derivative. To increase the accuracy of the function approximation, we increase the number of terms included in the Taylor series. At some point, including too many terms will become impractical. In many cases, the Taylor series progresses as far as the 2nd derivative term. This version of the approximation is referred to as a 2nd order Taylor series expansion. It is given as

(2.37)

When higher order terms (HOT) are excluded, it is called a “Taylor series approximation.” Let’s evaluate the 2nd order Taylor series for f (x) = sin(x) at x = π∕2. We have f′(x) = cos(x) and f′′(x) = − sin(x). Therefore, the 2nd orderTaylor series is given by

(2.38)

Since, cos(π∕2) = 0 and sin(π∕2) = 1, we find

(2.39)

2.5 Optimization Basics: Single-Variable Optimality Conditions, Gradient, Hessian

A general optimization problem refers to finding the maximum or the minimum value of a function. Some examples of optimization problems include maximizing the mileage of a car or minimizing the failure rate of a product.

Consider a continuous single variable function f (x). Optimization theory gives us the tools to find a “good” value of x that corresponds to a “good” value of f (x). In many problems, the choice of the values of x is constrained in the sense that a candidate value of x must satisfy some conditions. Such optimization problems are called Constrained Optimization problems, which will be studied later. Unconstrained Optimization problems, on the other hand, involve no constraints on the values of x. In this section, we present some basic concepts of unconstrained optimization. We discuss the basic conditions of optimality for single variable functions. We also present the two basic entities that play a central role in multi-variable optimization: gradients and Hessians.

Definitions

Consider the function f (x) in a set S. We define global and local minima as follows:

1. The function f (x) is said to be at a point of global minimum, x*S, if f (x*) ≤ f (x) for all xS.

2. The function f (x) is said to be at a point of local minimum, x*S, if f (x*) ≤ f (x) for all x within an infinitesimally small distance from x*. That is, there exists > 0 such that for all x satisfying |xx*| < , f (x*) ≤ f (x).

The concept of global and local minima is illustrated in Fig. 2.6. The word “optimum” refers to a maximum or a minimum.

Figure 2.6. Local and Global Minima

2.5.1 Necessary Conditions for Local Optimum

Assuming that the first and second derivatives of the function f (x) exist, the necessary conditions for x* to be a local minimum of the function f (x) on an interval (a,b) are:

1.

2.

The necessary conditions for x*to be a local maximum of the function f (x) on an interval (a,b) are:

1.

2.

It is important to understand that the above stated conditions are necessary, but not sufficient. This means that if the above conditions are not satisfied, x* will not be a local minimum or maximum. On the other hand, if the above conditions are satisfied, it does not guarantee that x*is the local minimum or maximum.

2.5.2 Stationary Points and Inflection Points

A stationary point is a point x*that satisfies the following equation:

An inflection point may or may not be a stationary point. An inflection point is one where the curvature of the curve (second derivative) changes sign from positive to negative, or vice versa. That point is not necessarily a minimum or a maximum.

2.5.3 Sufficient Conditions for Local Optima

Consider a point x*at which the first derivative of f (x) is equal to zero, and the order of the first nonzero higher derivative is n. The following are the sufficient conditions for x*to be a local optimum.

1.If n is odd, then x*is an inflection point.

2.If n is even, then x*is a local optimum. In addition,

(a)If the value of that derivative of f (x) at x*is positive, then the point x*is a local minimum.

(b)If the value of that derivative of f (x) at x*is negative, then the point x*is a local maximum.

The procedure to find the maximum of a function is illustated using the following example.

Example 10: In this example, we find the maximum value of a function given by f (x) = −x3 + 3x2 + 9x + 10 in the interval −2 ≤ x ≤ 4.

First, the stationary points are determined by solving = 0.

Using the formula to compute roots of a quadratic equation, the roots of the first derivative are found by solving x = . This process yields two solutions: x = 3 and x = −1 as the two stationary points, which are in the interval −2 ≤ x ≤ 4. The function values at these stationary points are evaluated to determine which of these points may correspond to a global maximum.

Evaluating f (x) at x = 3,−1,−2 and 4 yields the function values as 37, 5, 12, and 30, respectively. Therefore, x = 3 corresponds to the maximum of the function in the interval −2 ≤ x ≤ 4.

2.5.4 Gradient and Hessian of a Function

In the previous subsection, we have only considered single variable functions; that is, f (x), where x is a single variable. In practice, however, x could represent the two dimensions of a rectangular backyard. In this case, we could letx be a vector with two entries, a two-dimensional vector: x = {x1,x2}, where x1 = Length and x2 = Width. Therefore, in general, we can consider multi-variable functions f (x), where x is an n-dimensional vector. In these cases, the first and the second derivatives of the function are more complicated. They are respectively referred to as the gradient and the Hessian of the functions. The gradient is an n × 1 vector and the Hessian is an n × n matrix.

Gradient of a Function:

Given a function f (x), its gradient is given by

(2.40)

For example, given , the gradient is given by

(2.41)

Recall that when the partial derivative of f is taken with respect to x1, , we treat x1 as the variable and x2 as a constant.

Hessian of a Function:

The Hessian of a function f (x) is given by the following matrix

(2.42)

and when the above derivatives exist, the Hessian is symmetric. That is, the terms below the diagonal are the same as those above the diagonal. Therefore, we can either forgo evaluating the lower or upper triangular terms, or we can evaluate all the terms and verify that the resulting matrix is indeed symmetric.

For the above function, the Hessian is given by

(2.43)

leading to the symmetric matrix

(2.44)

2.6 Summary

Quantitative optimization is founded on the understanding of important mathematical concepts, including calculus, geometry, and matrix algebra. This chapter hence provided a summary of the important mathematical concepts that are needed for learning and practicing optimization. Specifically, it provided a brief introduction to vectors, Euclidean geometry (e.g., equation of a plane), matrix properties and operations, and differential and integral calculus (e.g., function continuity, partial derivatives, and Taylor Series). The chapter ended with an introduction to single-variable optimality conditions. That is, how to estimate the gradient and the Hessian of a function and use them todetermine (or search for) optimum points. These topics will be greater amplified in later chapters.

2.7 Problems

Warm-up Problems

2.1A vector a starts from the point P1 : (0,−1, 1) and ends at the point P2 : (3, 4,−1). Find the length of a.

2.2You are given three vectors: a = 2i + 3j + 4k, b = −i + 2j − 3k, and c = 2i + 3j − 4k. Compute (1) a + b, (2) b + a, (3) 3a + 3b, (4) 3(a + b), (5) (a + b) + c, and (6) a + (b + c). Next, (7) What do you observe in the results that you obtain for Parts (1) and (2), Parts (3) and (4), and Parts (5) and (6)? (8) What are these properties called?

2.3Given two vectors a = 2i + 3j + 4k and b = −i + 2j − 3k, compute (a) ab and (b) ba. Next, (c) What property of dot product of vectors is observed in this example? (d) What is the angle between a and b?

2.4Given three vectors: a = −i + 2j − 5k, b = 3ij + 4k, and c = 2i + 3j − 4k. Compute (a) a ⋅ (b + c), (b) a ⋅ (bc), (c) ab + ac, and (d) abac. What properties of dot product of vectors are observed in the results of Parts (a) through (d)?

2.5Let a force acting on a particle be given by the vector f = −2i + j. As a result of this force, the particle moves from the point A : (2, 4,−5) to B : (2, 4,−7). (a) Compute the work done on the particle. (b) How can you explain the results you obtain? Interpret the results obtained in Part (a) from a Statics point of view.

2.6You are given three vectors a = 2i + j, b = 4k, and c = −i + 2j. Find the angles between (1) a and b, (2) b and c, and (3) a and c. Next, (4) what can you say about the vectors a, b, and c based on these results?

2.7Find the equation of the line passing through the points (−2, 2) and (3, 4). What is the slope of the line joining these two points?

2.8Find the equation of a plane passing through three points P1 : (2, 0, 0), P2 : (0, 2, 0) and P3 : (0, 0, 2) using the three point formula, and verify the equation you obtain using the intercept formula for the equation of the plane. Show your work for both methods.

2.9Let A=[1 1;2 1], B=[0 1;3 2]. Do the following problems by hand and verify using MATLAB. (a) (AB)T and (b) BTAT. Turn in your hand written results, and a print out of the results at the MATLAB Command Window. What do you notice from your results?

2.10Let A=[1 0 2;0 3 4;2 1 3]and B=[0 2 4;2 3 4;5 1 3]. Do the following problems by hand and verify using MATLAB. Turn in your hand written results and a print out of the results at the MATLAB Command Window. If you think any of the operations below cannot be performed, explain why. (a) AT, (b) (AT)T, (c) (A + B)T, and (d) AT + BT (e) What matrix properties do you observe in these results?

2.11Given A=[0 2 -1;-2 0 -4;1 4 0]. Find AT by hand and MATLAB. Turn in your hand written results and a print out of the results at the MATLAB Command Window. What is the special property you observe about A? What are these kind of matrices called?

2.12With the help of an example of your choice, prove that the det(A) where A is upper or lower triangular is equal to the product of elements in the principal diagonal. Take one example each of the upper and lower triangular kinds. Turn in your hand written results.

2.13Find the determinant of A=[1 0 2;0 3 4;2 1 3] by hand, and verify your results using MATLAB.

2.14Find the inverse of a 3 × 3 Identity matrix by hand. What do you observe?

2.15Compute the eigenvalues and eigenvectors of A=[1 2;3 2]by hand. Find the command in MATLAB that can be used to compute eigenvalues and eigenvectors, and verify your results using MATLAB. Turn in printouts of the Command Window.

2.16Find the eigenvalues of A=[3 2 1;2 2 1;1 1 1] by hand and using MATLAB. What can you say about the definiteness of the matrix? Give reasons. (You can use MATLAB to solve for the cubic equation.) Use MATLAB to compute the eigenvectors of A. Turn in your hand written calculations and the printouts of the Command Window.

2.17Determine the following limits.

(a)

(b)

(c)

2.18Determine if the function is convex or concave.

(a)f (x) = ex

(b)f (x) = xlog(x)

(c)f (x) = 1/x2

2.19Determine the derivatives of the following functions. (This is an opportunity to test or review your calculus).

(a)f (x) = (x2 + x + 2)(x2 + 2)

(b)f (x) = x2 + x + x−1 + x−2

(c)f (x) = xlog(x)

(d)f (x) = sin(x)

(e)f (x) = cos(x)

(f)f (x) = tan(x)

2.20Determine the partial derivatives with respect to x of the following functions. (This is an opportunity to test or review your calculus).

(a)f (x, y) = (y2 + x + 2)(x2 + 2)

(b)f (x, y) = x2 + x + y−1 + y−2

(c)f (x, y) = xlog(y)

(d)f (x, y) = cos(y)

(e)f (x, y) =

(f)f (x, y) = extan(x y)

2.21Solve the following problems.

(a)Determine the 1st and 2nd order Taylor series for f (x) = sin(x) about x = 0. Using MATLAB, plot the following functions for x ranging from −1 to 1: (i). f (x) = sin(x), (ii) the 1st order Taylor series approximation, and (iii) the 2nd order Taylor series approximation. Comment on the accuracies of these two Taylor series approximations.

(b)Find

(c)Find

(d)Find

2.22X is a design variable vector. Find the Hessian of the following functions.

(a)

(b)f (X) = (sin x1 + cos x2)N, N is an integer

(c)f (X) = x1 ln x2 + x2 ln x1, at (x1,x2) = (1, 1)

2.23Write a MATLAB program to generate a 5 × 5 matrix of random real numbers between A and B, where [A,B] = [−5, 500]. Show whether the matrix is positive definite or positive semi-definite or negative definite or negative semi-definite or indefinite. Print the matrix and the MATLAB M-file.

2.24Consider the following function and do the following (by hand):

(a)What are the gradient and Hessian of f (x)?

(b)What are the stationary point(s) of f (x)?

2.25Consider the following function and do the following (by hand):

(a)What are the gradient and Hessian of f (x)?

(b)What are the stationary point(s) of f (x)?

Intermediate Problems

2.26Let f (x) = sin(x) be a function that you are interested in optimizing. Answer the following questions.

(a)What are the necessary conditions for a solution to be an optimum of f (x)?

(b)Using the necessary conditions obtained in (a), and considering the interval 0 ≤ x ≤ 2π, obtain the stationary point(s).

(c)Confirm whether the above point(s) are inflection points, maxima, or minima. If they are maximum (or minimum) points, are they global maximum (or minimum) in the given interval?

(d)Plot the function sin(x) over the interval 0 ≤ x ≤ 2π. Show all the stationary points on it, and label them appropriately (maximum, minimum, or inflection).

2.27Consider the single variable function f (x) = e-ax2 , where a is a constant. This function is often used as a “radial basis function” for function approximation.

(a)Is the point x = 0 a stationary point for (i) a > 0, and (ii) a < 0. What happens if a = 0? Is x = 0 still a stationary point?

(b)If x = 0 is a stationary point, classify it as a minimum, maximum, or an inflection point for (i) a > 0, (ii) a < 0, and (iii) a = 0.

(c)Prepare a plot of f (x) for a = 1, a = 2, and a = 3. Plot all three curves on the same figure. By observing the plot, do you think f (x) = e-ax2 ,a > 0 has a global minimum? If so, what is the value of x and f (x) at the minimum?

BIBLIOGRAPHY OF CHAPTER 2

[1]J. Snyman. Practical Mathematical Optimization: An Introduction to Basic Optimization Theory and Classical and New Gradient-Based Algorithms. Springer, 2005.

[2]E. K. Chong and S. H. Zak. An Introduction to Optimization. John Wiley and Sons, 2014.

[3]D. Z. Du, P. Pardalos, and W. Wu. Mathematical Theory of Optimization. Springer, 2010.

[4]A. Ravindran, G. V. Reklaitis, and K. M. Ragsdell. Engineering Optimization: Methods and Applications. John Wiley and Sons, 2006.