Optimization in Practice with MATLAB for Engineering Students and Professionals (2015)
MORE ADVANCED TOPICS IN OPTIMIZATION
Modeling Complex Systems: Surrogate Modeling and Design Space Reduction
In this chapter, we introduce efficient mathematical approaches to model the behavior of complex systems in the context of optimizing the system. Specifically, complex systems often involve a large number of design parameters (tobe tuned), and the optimization of these systems often demand large scale computational simulations or experiments to quantify the system behavior. The resulting high dimensionality of the design space, the prohibitive computationtime (or expense), or the lack of mathematical models present important challenges to the quantitative optimization of these complex systems. This chapter introduces traditional and contemporary approaches, such as design variablelinking and surrogate modeling, to address the modeling challenges encountered in solving complex optimization problems.
Section 15.2 discusses the generic challenges in complex optimization problems. The impact of problem dimension is addressed in Sec. 15.3, where design variable linking and design of experiments are introduced. Section 15.4presents surrogate modeling, where the discussion includes: the process, the polynomial response surface methodology, the radial basis function method, the Kriging method, and the artificial neural network method. The chapterconcludes with a summary provided in Section. 15.5.
15.2 Modeling Challenges in Complex Optimization Problems
Modeling is one of the primary activities in optimization. Leveraging computational tools and models, as opposed to purely depending on experiments, can be considered the way forward in the area of system optimization. Thedesign of complex systems, such as aircrafts, cars, and smart-grid networks, are increasingly performed using simulation-based design and analysis tools such as Finite Element Analysis (FEA) and Computational Fluid Dynamics(CFD). The incorporation of these tools has dramatically transformed modern engineering design and optimization approaches.
These changes, however, are not free of challenges. For example, modern aerospace systems present significantly complex design requirements , where design objectives generally involve improving performance, reducingcosts, and maximizing safety. Another example is the optimization of a vehicle structure to absorb crash energy, maintain adequate passenger space, and control crash deceleration pulse during collisions . The collision process is an extremely complex multibody dynamic process.
The relationship between the objectives and the design variables in complex systems is generally not governed by functions of simple analytical form. The use of complex computational or simulation-based models to design thesesystems may demand unreasonably high computational costs. You will find that the techniques presented in this chapter are paramount to overcoming the computational barrier and efficiently exploring optimal designs for complexsystems.
It is also important to remember that, as discussed in Chapter 4, inappropriate modeling can result in slow convergence, everlasting computation, or unrealistic solutions. In the context of modeling, practical optimization problemsare faced with three major challenges:
1.Design-space dimensionality: Complex problems may involve a very large number of variables. For example, nearly 300 variables are involved in designing a vehicle powertrain system  and more than 1,000 design variablesare involved in designing large scale wind farms or systems with multiple physical domains . Models that are formulated to operate only on the entire (high-dimensional) variable space in these cases may pose challenging-to-prohibitive numerical burdens on the optimization process.
2.Computational expense: As we learned in the previous chapters, the behavior of complex physical systems are often modeled using computational simulations (e.g., FEA or CFD), which are basically the computer-basedimplementation of complex (often multi-step) mathematical models. However, these simulations can take hours (even days) to run. Thus, their usage could become computationally/time-wise prohibitive in the context of optimization when simulation models have to be executed thousands (or tens of thousands) of times during optimization. For example, if a large eddy simulation  (which is a CFD model) is used to model the flow inside a windfarm, it may demand approximately 600 million CPU-hours for optimizing the configuration of a 25-turbine wind farm .
3.Lack of mathematical models: Alternatively, for certain systems or problems, there may not exist any mathematical models that define the underlying physics/behavior of the system (or the dynamics of the problem). This couldbe due to a lack of a clear understanding of the underlying physics or the relationship between the design parameters and the criteria functions. For example, without a physics-based model to represent the constitutive relationshipof Ti-25V-15Cr-0.2 Si alloy during the hot deformation process, empirical models (based on experimental data) are used .
One class of approaches to address the first challenge (i.e., design-space dimensionality) is design space reduction or design variable linking. Design variable linking is an approach that seeks to reduce the design dimensionality of a problem by developing relationships among the variables - thereby yielding models that quantify the system behavior in terms of a smaller (likely more tractable) set of variables than initially defined. These approaches arediscussed in the following section. In addition, the following section introduces another aspect of model development, sampling (or design of experiments), which is also directly related to the dimensionality of the design space.
The other two challenges are generally addressed by a class of mathematical functions that have different names, based on the research community with which you are communicating (e.g., mathematicians, material scientists, or aerospace engineers). The process of developing/using this class of mathematical functions is known by such names as metamodeling, surrogate modeling, response surface methodology, and function fitting. Essentially, theyrepresent an approach to mathematical modeling that does not exploit the understanding of the underlying physics of the system behavior being modeled. An introductory discussion of this class of models, known as surrogatemodels, is provided in Sec. 15.4.
15.3 Impact of Problem Dimension
In this section, you will be introduced to approaches that can reduce the dimension of design optimization problems, which can be considered as an important aspect of modeling and problem formulation. Subsequently, we willexplore other implications of the design space dimensionality. Specifically, we will learn how to design or plan a set of computational/physical experiments to analyze and model the behavior of the system (that is being optimized);and how this design of experiments is closely related to problem dimensionality (Ref. ).
15.3.1 Design Variable Linking
In certain optimization problems, when the number of design variables is large, it is possible to reduce the dimensionality through design variable linking . Equality constraints are used to define this linking. The value of one design variable can be expressed in terms of one or more design variables. The behavior of symmetric components can be identical under certain circumstances, which may offer opportunities to reduce the number of independentdesign variables. This reduction of the number of design variables is generally intended to reduce the computational cost of optimization.
Assume that an optimization problem is defined by n-dimentional design variable vector x, where n is deemed large and computationally undesirable. It may be possible to redefine the optimization problem in terms of a new m-dimensional design variable vector , where m < n, through a linear transformation defined by the constant matrix T. This transformation can be expressed as
xn×1 = Tn×m m×1
For example, the reduction from 3 to 2 design variables can take the form
x3×1 = 2×1
Lets look at an example where design variable linking is used.
Example: Buckling is a common failure mode when a structure is subjected to high compressive stresses. A vertical aluminum column is pinned at both ends, as shown in Fig. 15.1. The height of the column is l = 0.5m. Its crosssection is depicted in Fig. 15.2. The value of h is a quarter of b. The column is subjected to a force, P = 35.5N, on its top. The Young’s modulus of aluminum is 69 GPa at room temperature (21° C). Only buckling failure is considered in this optimization problem, where the objective is to minimize the volume of the column.
Figure 15.1. A Column Subject to Buckling
Figure 15.2. Rectangular Cross Section
The area moment of inertia of a rectangle is given by
The critical force that the column can accept is given by
where P is the critical force, E is the modulus of elasticity, I0 is the area moment of inertia, l is the unsupported length of the column, and K is the column effective length factor. For a column where both ends are pinned, the factorK is equal to 1.
The optimization problem is formulated as
The above formulation of the optimization problem involves two design variables. Using design variable linking , we can reduce the number of design variables to one. Constraint (15.5) is used to eliminate h from theformulation. Therefore, the optimization problem is reduced to
The optimal solution to the above optimization problem is b = 0.01m and h = 0.0025m. The volume is 1.25×10-5m3. Thus, in this problem, design variable linking  is used to reduce the number of design variables from two toone, thereby simplifying the optimization problem and likely saving computational costs.
15.3.2 Design of Experiments
Design of experiments (DoE) techniques were originally developed to study (and empirically model) the behavior of systems through physical experiments (e.g., in experimental chemistry ). DoE can be defined as the design of a controlled information gathering exercise for a system or phenomena where variation is known to be present. DoE techniques have existed in some form since the late 1700s , and is considered a discipline with very broadapplications across the different natural and social sciences, and engineering.
The primary objective of DoE is to determine multiple combinations of the controlled parameters (or conditions) at which the experiments will be conducted. In mathematical terms, each combination of controlled parameters can be considered a sample. As such, DoE can also be perceived as a process of generating parameter/condition samples to conduct controlled experiments. Although, traditionally controlled experiments referred only to physicalexperiments, in modern times, it would include both physical and computational experiments (i.e., simulation-based experiments).
Once experiments have been performed for each planned sample condition, the data acquired can be used to (i) investigate a theoretical hypothesis, (ii) analyze the system behavior (e.g., sensitivity analysis), and/or (iii) developempirical or surrogate models to represent the relationship between different system parameters. In this chapter, we will primarily focus on the third application of DoE, where the objective is to understand and model the relationshipbetween “the input parameters (that comprised the planned sample set)," and “the output parameters of interest (whose values were acquired through the experiment).”
DoE techniques have a large influence on the accuracy of the surrogate model developed thereof. To develop effective surrogate models, it is necessary to acquire adequate information about the underlying system. Assuming noprior knowledge of the system behavior, the typical DoE strategy is to generate a distribution of sample points throughout the design space in an uniform fashion. There are several techniques available to distribute the sample points,to provide an adequate coverage of the design space without any particular variable bias. These techniques include Factorial design , Central Composite designs , and Latin Hypercybe design , and Sobol sequence .
The most straightforward approach to uniform sampling is the factorial design method. In this technique, the range of each design variable is divided into different levels between the upper and lower limits of a design space. In afull factorial design, sample points are located at all the combination of the different levels of all the design variables. Figure 15.3 illustrates a three level factorial design in a three-variable space. In high-dimensional problems, the full factorial design approach may be cost/time prohibitive. For example, in a 10 dimension problem, even a 2-level full factorial design would require as many as 210 = 1, 024 sample points. If we assume that each experiment takesonly 1 hr, 42 days would be required to run through the entire sample set.
Figure 15.3. A 3-Level and 3-Dimensional Full Factorial Design (27 Points)
In these situations of excessive resources requirment, only a fraction of the sample points can be used for conducting experiments. Designs that use a fraction of the full factorial design sample points are called fractional factorialdesign. Central composite design (CCD) combines full or fractional factorial designs with additional points to allow the fitting of full quadratic polynomial models. The MATLAB functions available to perform full factorial design and fractional factorial design are fullfract and fracfact, respectively. A simple two-level factorial design can be performed using the function ff2n as follows.
% To generate a 2-level factorial design with 3 factors
dFF2 = ff2n(3)
0 0 0
0 0 1
0 1 0
0 1 1
1 0 0
1 0 1
1 1 0
1 1 1
In a Latin Hypercube design (LHD) or Latin Hypercube Sampling (LHS), the range of each design variable is divided into n non-overlapping intervals with equal probability. A sample point is then located randomly on eachinterval of every design variable. Consider a simple example where we wish to generate a LHD of size n = 10 for two design variables. The ten intervals of the variable x1 and x2 in that case are presented in Fig. 15.4(a). The nextstep is to randomly select specific values for x1 and x2 in each of their ten intervals as shown in Fig. 15.4(b).
Figure 15.4. Latin Hypercube Sampling (LHS) with 10 Points
MATLAB provides a set of in-built functions to perform different types of Latin Hypercube Sampling (LHS). The most straight-forward implementation of LHS can be performed using lhsdesign. Direct implementation of thisfunction generates a sample set in the range of for each variable. A two-variable implementation is shown below.
% To generate and plot a LHS set for two variables
x = lhsdesign(50,2);
The plot of the sample set generated by using the MATLAB LHS function is provided in Fig. 15.5. It is noted from Fig. 15.5 that the LHS provides uniform yet non-deterministic coverage of the multi-dimensional design space.Note that using such methods as LHS for sample generation is also helpful when creating the initial population of candidate designs in evolutionary algorithms, as we will see in a later chapter in this book.
Figure 15.5. A LHS for a 2-Variable Problem, Generated Using MATLAB
In the next section, we show that DoE or effective sampling is the first step in the development of surrogate models . This is because surrogate models are trained by using (i) the input data generated by the DoE and (ii) theoutput data generated by the experiments conducted under that DoE. Thus, the greater the number of samples or the more dense the coverage of the design space, the greater is the expected accuracy of the surrogate models trainedwith these samples.
Unfortunately, using a generously large number of samples contradicts the very purpose of developing surrogate models, which is to avoid the unreasonable cost of running too many expensive computational/physicalexperiments. To develop accurate surrogate models, performing a comprehensive set of simulations or experiments is desirable, but often unreasonable. We learned from the discussion of the different DoE methods, the size of thesample set is closely related to the dimension of the problem. Richard Bellman coined the term “curse of dimensionalit” to describe the rapid increase of the sample size resulting from an increase in the number of variables . TheDoE techniques that you learned in this section are uniquely helpful in planning effective sample sets for the surrogate modeling of systems of different design dimensions.
15.4 Surrogate Modeling
The need to quantify the economic and engineering performance of complex systems often demands highly complex and computationally/time-wise expensive simulations and experiments. The direct use of these computationalsimulations or experiments in optimization could be anywhere from challenging to prohibitive. Surrogate models are one of the most popular methodologies to deal with this issue. They provide a significantly less expensive andoften more tractable alternative toward model-based optimization.
Surrogate modeling is concerned with the construction of purely mathematical models to estimate system performance or, in other words, to define relationships between specific system inputs and outputs. Over the past couple of decades, function estimation methods and approximation-based optimization have progressed remarkably. Surrogate models are being extensively used in the analysis and optimization of complex systems or in the solution ofcomplex problems. Surrogate modeling techniques have been used for a variety of applications from multidisciplinary design optimization to the reduction of analysis time and the improvement of the tractability of complex analysiscodes [1, 18]. Figure 15.6 illustrates the diverse applicability of surrogate modeling.
Figure 15.6. Applications of Surrogate Modeling (Graphics Courtesy SUMO)
The general surrogate modeling problem can be stated as follows: “Given a set of data points xi ∈ Rm,i = 1,,n p, and the corresponding function values, f(xi), obtain a global approximation function, (x), that adequatelyrepresents the original/actual functional relationship over a given design domain.” In this section, you are provided an introduction to the overall surrogate modeling process, and a brief description of the major surrogate modelingmethods.
15.4.1 Surrogate Modeling Process
The process of surrogate modeling generally involves three stages: (i) design of experiments (DoE); (ii) construction or training of the surrogate model; and (iii) model validation.
We have already learned about DoE and sampling in the previous section. However, it is important to realize that in some cases, it might not be practically feasible to control the DoE. This is especially true in practice, when thesource of experimental data is significantly different from where the surrogate model is being constructed. For example, data from the literature, historical data from previously reported experiments or measurements, or data fromcommercial sources may need to be used to construct a surrogate model. In these cases, the user constructing the model does not have any direct control over the data generation. For example, Zhang et al.  used data from theNational Renewable Energy Laboratory  to construct a wind farm cost model, where the levelized cost of wind farms was represented as a function of turbine features and reported labor costs.
The sample points used to construct the surrogate models are generally called training points, while the construction of the surrogate models is often called model training. Once the training points have been defined (through DoEor other sources), the next step is to select an appropriate surrogate model or functional form. An introductions to four major surrogate models is provided in the following subsections. These models are: (i) polynomial responsesurfaces, (ii) radial basis functions, (iii) Kriging, and (iv) artificial neural network. We take this opportunity to note that the quantification of surrogate modeling errors is an important current research area (see Refs. [21, 22]).
Once the surrogate model has been constructed, the final step is to evaluate the performance or expected accuracy of the surrogate model. The two most popular measures of model error are the root mean squared error (RMSE)and the maximum absolute error (MAE). The root mean squared error (RMSE) is a global error measure that provides an understanding of the model’s accuracy over the entire design domain; while the maximum absolute error(MAE) provides an understanding of the maximum local deviations of the model from the actual function. The most prominent approaches used to estimate these error measures are : (i) split sample, (ii) cross-validation, and (iii)bootstrapping.
In the split sample strategy, the sample data is divided into training and test points. The former is used to construct the surrogate; while the latter is used to test the performance of the surrogate. The cross-validation techniqueoperates through the following five steps:
1.Splits the sample points randomly into q (approximately) equal subsets;
2.Removes each of these subsets in turn (one at a time);
3.Trains an intermediate surrogate model to the remaining q - 1 subsets;
4.Computes the error of the intermediate surrogate using the omitted subset; and
5.Once each one of the q subsets has been used as the omitted subset, the q sets of errors evaluated therein are generally aggregated to yield a global error measure.
The bootstrapping approach generates m subsamples from the sample points. Each subsample is a random sample with replacements from the full sample. Different variants of the bootstrapping approach can be used for (i) modelidentification, and (ii) identifying confidence intervals for surrogate models .
The four major surrogate models are discussed next.
15.4.2 Polynomial Response Surface Methodology
The Polynomial Response Surface (PRS) methodology is motivated by the Taylor series expansion . A Taylor series generally requires an infinite number of terms to obtain the exact value of the real function. The approximationtakes the form of a polynomial. The number of terms included in a PRS depends on the desired accuracy. We may seek a zero-th order (constant), a first order (linear), a second order (quadratic), or an even higher orderapproximation. The approximation is accurate in the neighborhood of a chosen point, and becomes progressively inaccurate as we move away from that point. Depending on the form of a real function, the availability of trainingdata, and users’ accuracy requirement, PRS may be selected as a linear, second-order, or higher-order polynomial of the vector of the design variables. Low-order polynomials, defined in a small region of the variable space, aregenerally used in practice when PRS is the model of choice.
The Kth-order PRS of a single variable x has the following form.
where a0,a1,…,aK are arbitrary coefficients to be determined by training.
In n-dimensional space, variable x has n components: xj, where j = 1,,n. The linear PRS of n-dimensional variable x has the following form.
where the generic aj are arbitrary coefficients to be determined by training.
The most popular PRS is the the 2nd-order PRS, or the Quadratic Response Surface (QRS). The QRS of an n-dimensional variable x has the following form.
where the generic aj and aji are arbitrary coefficients to be determined by training.
The generalized kth-order PRS of an n-dimensional variable x can be represented as
where the generic aj to aj1j2jk are arbitrary coefficients to be determined by training.
The PRS methodology is frequently used in regression analyses. In regression analyses, the number of training points is generally greater than that of the unknown coefficients, ai. Only lower order PRS are used in practice. Thus, the resulting PRS does not necessarily pass through the training sample data (i.e., does not necessarily have a zero error at the training points). This is why PRS and other regression functions are often called approximationfunctions. One of the approaches used to evaluate the unknown coefficients (a) is the least squares method. The least squares method solves a regression as an optimization problem. The overall solution minimizes the sum of thesquares of the errors between the real function values and the corresponding estimated PRS values, where the summation is taken over all the training points. The following example illustrates how to estimate the parameters of aQRS for a one-dimensional problem using the least squares method.
Example: The following four training points and their corresponding real function values are given as: f(x1 = 2) = 4.2, f(x2 = 7.1) = 8.7, f(x3 = 4) = 14.8, and f(x4 = 5) = 11.1. Using the least squares method, fit a quadratic responsesurface (QRS), (x) of the single variable x. Find the maximum value of the fitted function, which is the approximated maximum value of the real function.
The second-order response surface of one variable is given as
(x) = a0 + a1x + a2x2
The parameters, a0, a1, and a2, need to be determined using the least squares method. In order to solve the least squares method, the following unconstrained optimization problem is formulated.
The solution to the optimization problem is a0 = -12.10, a1 = 10.06, and a2 = -1.05. The mean squared error is estimated to be 2.38.
To find the maximum value of the fitted function, we need to solve the following optimization problem.
The optimal point is x = 4.78, and the corresponding maximum function value is 11.93.
The process to fit PRS for practical optimization problems is the same as described in the example. First, the least squares method or other methods are used to fit an appropriate PRS based to the training data. The accuracy of the PRS is generally given by the RMSE error.
For a one-dimensional problem, the in-built MATLAB function, polyfit, can also be used to fit a PRS of a desired order. The process of fitting a QRS for the above example using polyfit is shown below.
% To fit a 1-D QRS to a set of four training points
x = [2 3.5 4 5];
y = [4.2 7.1 14.8 11.1];
P = polyfit(x,y,2)
-1.0533 10.0627 -12.1013
Note that the coefficients of the PRS, as estimated and displayed by polyfit, appear from the highest order term to the lowest order term (i.e., from a2 to a0). It is also observed that the QRS trained using polyfit is similar to thattrained using the least squares method.
15.4.3 Radial Basis Function Method
The idea of using Radial Basis Functions (RBFs) as approximation functions was first proposed by Hardy  in 1971, where he used multiquadric RBFs to fit irregular topographical data. Since then, RBFs has been used fornumerous applications that require a global representation of multidimensional scattered data [25, 26, 27].
Radial Basis Function (RBF) expresses surrogate models as linear combinations of a particular type of basis function (ψ(r)), where each constituent basis function is defined in terms of the Euclidean distance (r) between a trainingpoint and the point of evaluation. The Euclidean distance (r) can be expressed as
r = x - xi
where xi is the ith training point, and x is the point of evaluation.
The commonly used non-parametric basis functions are:
1.Linear: ψ(r) = r,
2.Cubic: ψ(r) = r3, and
3.Thin plate spline: ψ(r) = r2lnr.
The commonly used parametric basis functions are:
1.Gaussian: ψ(r) = e-r2∕(2δ2),
2.Multiquadric: ψ(r) = (r2+ δ2)1∕2, and
3.Inverse multiquadric: ψ(r) = (r2+ δ2)-1∕2.
The RBF model is then expressed as a linear combination of the basis functions across all the training points, xi ∈ Rn, i = 1,,m, as given by
where wi are the generic weights of the basis functions (to be determined by training). The standard process of estimating the weights (wi) is described below.
The weights, wi, are evaluated using all the training points xi and their corresponding function values f(xi). In the evaluation of wi, Ψ is used to represent the matrix of the basis function values at the training points, as given by
The vector of the weights is W.
The vector Y has the function values at the training points (f(xi)).
The weights wi are then evaluated by solving the following matrix equation.
ΨW = Y
It is important to keep in mind that RBFs are essentially interpolating functions (i.e., a trained RBF model will pass through all the training points). This property enables RBFs to represent highly nonlinear data, which is otherwiseoften challenging to accomplish using low order PRSs.
The training date for the example in Sec. 15.4.2 is used to illustrate the process of constructing a RBF model.
Example: The training points and their corresponding real function values are f(x1 = 2) = 4.2, f(x2 = 3.5) = 7.1, f(x3 = 4) = 14.8, and f(x4 = 5) = 11.1. Fit a RBF model (x) of a single variable x. Find the maximum value of thefitted function in the range [2,5], which will give the approximated maximum value of the real function.
In this example, the multiquadric basis function, ψ(r) = (r2+ δ2)1∕2, is used. The parameter, δ, is set to 0.9..
The matrix Ψ of the basis function values at training points, and the vectors Y and W are given below.
By solving the following matrix equation for the weights
ΨW = Y
we obtain w1 = -4.947, w2 = 66.437, w3 = -82.098, and w4 = 23.144.
To find the maximum of the RBF function, we solve the following optimization problem.
The optimal point is found to be x = 4.28, and the corresponding maximum function value is 16.29.
Figure 15.7 shows the plot of the fitted function in the range [0,7]. The fitted function is unimodal in the range [2,5]. However, it is multimodal when we consider the entire range from 0 to 7. As an interpolation method, RBFs arehighly capable of representing such multimodal data. However, on the negative side, as an interpolating function, the accuracy of RBFs is highly unreliable beyond the range of the training data.
Figure 15.7. Fitted Function
Although the maximum function values obtained using the PRS model and the RBF model are similar, they are meaningfully different. This observation indicates that the choice of surrogate models can have an important impacton the optimal solutions obtained.
There exists an advanced version of RBF, called the Extended Radial Basis Functions or E-RBF . The E-RBF is essentially a combination of radial and non-radial basis functions, where non-radial basis functions (N-RBFs)are defined in terms of the individual coordinates of generic points x, relative to a given data point xi, in each dimension separately. Further description of the E-RBF surrogate model can be found in the paper by Mullur andMessac .
15.4.4 Kriging Method
Another popular surrogate model is Kriging, which is most commonly (but not restricted to be) used as an interpolating model. Kriging [29, 30] is an approach to approximate irregular data. The Kriging approximation functionconsists of two components: (i) a global trend function, and (ii) a deviation function representing the departure from the trend function. The trend function is generally a polynomial (e.g., constant, linear, or quadratic). The generalform of the Kriging surrogate model is given by :
(x) = G(x) + Z(x)
where (x) is the unknown function of interest, G(x) is the known approximation (usually polynomial) function, and Z(x) is the realization of a stochastic process with the mean equal to zero, and a nonzero covariance. The i,j -thelement of the covariance matrix of Z(x) is given as
COV [Z(xi),Z(xj)] = σ z2R ij
where Rij is the correlation function between the ith and the jth data points, and σz2 is the process variance. Further description of the Kriging model can be found in .
15.4.5 Artificial Neural Networks (ANN)
A more recent and increasingly popular choice of surrogate model is the Artificial Neural Network (ANN). A neural network generally contains an input layer, one or more hidden layers, and an output layer (Ref. ). Figure 15.8shows a typical three-layer feedforward neural network. An ANN is developed by defining the following three types of parameters:
Figure 15.8. A Generic Topology of Neural Networks
1.The interconnection pattern between different layers of neurons;
2.The learning process for updating the weights of the interconnections; and
3.The activation function that converts a neuron’s weighted input to its output activation.
One of the drawbacks associated with neural networks for function approximation is the fairly large number of parameters that need to be prescribed by the user, thereby demanding adequate user experience in implementingANN. These prescribed parameters include the number of neurons, the number of layers, the type of activation function, and the optimization algorithm used to train the network. In addition, the training process generally needs to besupervised in order to avoid “over-fitting” . MATLAB provides a dedicated toolbox for ANN, which can be started using the command nnstart (Ref. ).
This chapter provided a brief overview of the modeling issues encountered in the process of optimizing complex and/or large dimensional systems. The chapter introduced methods for reducing the dimension of the design space in the course of modeling (e.g., design variable linking), and methods for addressing empirical modeling and analysis in the design space (e.g., design of experiments). This was followed by the development of a special class ofmathematical models that provide an approximation of the system behavior, by leveraging an affordable and carefully designed set of expensive experiments (simulation-based or physical experiments). These models are known assurrogate models, of which four major types were presented. They are used to provide quick/computationally benign approximations of complex system behavior for the purpose of optimizing the system. The methodologies andmodeling approaches provided in this chapter are crucial in overcoming the barrier of computational expense and complexity in designing current and next generation complex systems, such as aircraft, ground vehicles, and windturbines.
15.1The 9 training points are (2,3), (4,3), (6,3), (2,5), (4,5), (6,5), (2,7), (4,7), and (6,7). Their corresponding function values are 12.12, 5.97, 11.98, 8.04, 2.18, 7.97, 12.03, 5.97, and 12. Use the training data to fit a second-orderpolynomial response surface of two variables. Find the minimum value of the fitted function.
The second-order response surface has the following form.
(x) = a0 + a1x(1) + a 2x(2) + a 11(x(1))2 + a 22(x(2))2 + a 12x(1)x(2).
15.2For Problem 15.1, fit RBF functions using the multiquadric basis ψ(r) = (r2+δ2)1∕2. The value of δ can be set to 0.01. Find the minimum point in the region spanned by the training points. Plot the figure of the fitted function in a region slightly larger than that surrounded by the training points.
15.3The 9 training points are (2,3), (4,3), (6,3), (2,5), (4,5), (6,5), (2,7), (4,7), and (6,7). Their corresponding function values are -12.12, -5.97, -11.98, -8.04, -2.18, -7.97, -12.03, -5.97, and -12. Fit RBF functions using themultiquadric basis ψ(r) = (r2+δ2)1∕2. The value of δ can be set to 0.01. Find the maximum point in the region spanned by the training points. Plot the figure of the fitted function in a region slightly larger than that surroundedby the training points.
15.4The aluminum cantilever beam shown in Fig. 15.9 is subjected to a tip force, P = 35.5N. The length of the column is l = 0.5m. Its cross section is shown in Fig. 15.10. The Young’s modulus of aluminum is 69 GPa at roomtemperature (21°C). Minimize the volume of the beam. Reduce the minimization problem using design variable linking.
Figure 15.9. Cantilever Beam Subject to a Tip Force
Figure 15.10. Rectangular Cross Section
BIBLIOGRAPHY OF CHAPTER 15
N. V. Queipo, R. T. Haftka, W. Shyy, T. Goel, R. Vaidyanathan, and P. K. Tucker. Surrogate-based analysis and optimization. Progress in Aerospace Sciences, 41(1):1-28, January 2005.
R. N. Cadete, J. P. Dias, and M. S. Pereira. Optimization in vehicle crashworthiness design using surrogate models. In 6th World Congresses of Structural and Multidisciplinary Optimization, Rio de Janeiro, Brazil, June 2005.
H. S. Chung and J. J. Alonso. Using gradients to construct coKriging approximation models for high-dimensional design optimization problems. In 40th AIAA Aerospace Sciences Meeting and Exhibit, Reno, NV, USA, January 2002.
O. Sigmund. Design of multiphysics actuators using topology optimization - Part I: One-Material structures. Computer Methods in Applied Mechanics and Engineering, 190(49-50):6577-6604, 2001.
J. Annoni, P. Seiler, K. Johnson, P. J. Fleming, and P. Gebraad. Evaluating wake models for wind farm control. In American Control Conference, Portland, OR, USA, June 2014.
S. Chowdhury, J. Zhang, A. Messac, and L. Castillo. Optimizing the arrangement and the selection of turbines for wind farms subject to varying wind conditions. Renewable Energy, 52:273-282, 2013.
Y. Han, W. Zeng, Y. Zhao, X. Zhang, Y. Sun, and X. Ma. Modeling of constitutive relationship of Ti-25V-15Cr-0.2Si alloy during hot deformation process by fuzzy-neural network. Materials and Design, 31(9):4380-4385, 2010.
S. Banerjee, B. P. Carlin, and A. E. Gelfand. Hierarchical Modeling and Analysis for Spatial Data. Chapman and Hall/CRC, 2nd edition, 2014.
S. S. Rao. Engineering Optimization: Theory and Practice. John Wiley and Sons, 4th edition, 2009.
G. E. P. Box and N. R. Draper. Empirical Model-Building and Response Surfaces. John Wiley and Sons, 1987.
P. M. Dunn. James Lind (1716-94) of Edinburgh and the treatment of scurvy. Archives of Disease in Childhood-Fetal and Neonatal Edition, 76(1):F64-F65, 1997.
G. E. P. Box, J. S. Hunter, and W. G. Hunter. Statistics for Experimenters. John Wiley and Sons, 2nd edition, 2005.
D. C. Montgomery. Design and Analysis of Experiments. Wiley, 8th edition, 2012.
M. D. McKay, R. J. Beckman, and W. J. Conover. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics, 21(2):239-245, 1979.
T. J. Santner, B. J. Williams, and W. I. Notz. The design and analysis of computer experiments. Springer, 2003.
A. I. J. Forrester, A. Sóter, and A. J. Keane. Engineering Design via Surrogate Modelling: A Practical Guide. John Wiley adn Sons, 2008.
R. E. Bellman. Adaptive Control Processes: A Guided Tour. Princeton University Press, 1961.
G. Wang and S. Shan. Review of metamodeling techniques in support of engineering design optimization. Journal of Mechanical Design, 129(4):370-380, 2007.
J. Zhang, S. Chowdhury, A. Messac, and L. Castillo. A response surface-based cost model for wind farm design. Energy Policy, 42:538-550, 2012.
M. Goldberg. Jobs and Economic Development Impact (JEDI) Model. National Renewable Energy Laboratory, Golden, Colorado, US, October 2009.
J. Zhang, S. Chowdhury, A. Mehmani, and A. Messac. Characterizing uncertainty attributable to surrogate models. Journal of Mechanical Design, 136(3):031004, 2014.
A. Mehmani, S. Chowdhury, J. Zhang, W. Tong, and A. Messac. Quantifying regional error in surrogates by modeling its relationship with sample density. In 54th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Boston, MA, USA, 2013.
A. Ravindran, G. V. Reklaitis, and K. M. Ragsdell. Engineering Optimization: Methods and Applications. John Wiley and Sons, 2006.
R. L. Hardy. Multiquadric equations of topography and other irregular surfaces. Journal of Geophysical Research, 76(8):1905-1915, 1971.
R. Jin, W. Chen, and T. W. Simpson. Comparative studies of metamodelling techniques under multiple modelling criteria. Structural and Multidisciplinary Optimization, 23(1):1-13, 2001.
J. B. Cherrie, R. K. Beatson, and G. N. Newsam. Fast evaluation of radial basis functions: Methods for generalized multiquadrics in rn. SIAM Journal of Scientific Computing, 23(5):1549-1571, 2002.
M. F. Hussain, R. R. Barton, and S. B. Joshi. Metamodeling: Radial basis functions, versus polynomials. European Journal of Operational Research, 138(1):142-154, 2002.
A. A. Mullur and A. Messac. Extended radial basis functions: More flexible and effective metamodeling. AIAA Journal, 43(6):1306-1315, 2005.
A. A. Giunta and L.T. Watson. A comparison of approximation modeling techniques: Polynomial versus interpolating models. In 7th AIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization, St. Louis, MO, USA, September 1998.
S. Sakata, F. Ashida, and M. Zako. Structural optimization using Kriging approximation. Computer Methods in Applied Mechanics and Engineering, 192(7-8):923-939, 2003.
N. Cressie and C. K. Wikle. Statistics for Spatio-Temporal Data. John Wiley and Sons, 2011.
J. Han and M. Kamber. Data Mining: Concepts and Techniques. Morgan Kaufmann, 3rd edition, 2011.
A. A. Mullur. A New Response Surface Paradigm for Computationally Efficient Multiobjective Optimization. PhD thesis, Mechanical Engineering, Rensselaer Polytechnic Institute, Troy, NY, December 2005.
The MathWorks, Inc. MATLAB neural network toolbox. www.mathworks.com/products/neural-network/, 2014.