Institut ClΓ©ment Ader, UMR CNRS 5312, INSA Toulouse, 3 rue Caroline Aigle, 31400 Toulouse, France

2

Institut ClΓ©ment Ader, UMR CNRS 5312, UniversitΓ© Paul Sabatier (UPS), 3 rue Caroline Aigle, 31400 Toulouse, France

Abstract This paper presents a method for constructing optimal design of experiments (DoE) intended for building surrogate models using dimensionless (or non-dimensional) variables. In order to increase the fidelity of the model obtained by regression, the DoE needs to optimally cover the dimensionless space. However, in order to generate the data for the regression, one still needs a DoE for the physical variables, in order to carry out the simulations. Thus, there exist two spaces, each one needing a DoE. Since the dimensionless space is always smaller than the physical one, the challenge for building a DoE is that the relation between the two spaces is not bijective. Moreover, each space usually has its own domain constraints, which renders them not-surjective. This means that it is impossible to design the DoE in one space and then automatically generate the corresponding DoE in the other space while satisfying the constraints from both spaces. The solution proposed in the paper transforms the computation of the DoE into an optimization problem formulated in terms of a space-filling criterion (maximizing the minimum distance between neighboring points). An approach is proposed for efficiently solving this optimization problem in a two steps procedure. The method is particularly well suited for building surrogates in terms of dimensionless variables spanning several orders of magnitude (e.g. power laws). The paper also proposes some variations of the method; one when more control is needed on the number of levels on each nondimensional variable and another one when a good distribution of the DoE is desired in the logarithmic scale. The DoE construction method is illustrated on three case studies. A purely numerical case illustrates each step of the method and two other, mechanical and thermal, case studies illustrate the results in different configurations and different practical aspects.

Keywords Optimal space filling, optimization with space transformation, dimensional analysis, non-dimensional variables, surrogate modelling.

1 Introduction to surrogate models and dimensional analysis With the continuous increase in the processing power and the diversification of computer simulation software, ranging from finite element to Multiphysics system simulation, the systems being simulated today are intensively growing in complexity. Although in the past the aim of the simulation was the verification of already designed systems and components, today it is used at almost every level of system design [1] such as performance analysis [2], preliminary design [3], real time simulation [4], etc. However, although for these tasks the simulation conveniently substitutes the traditional physical experiments, for

complex systems it may be still very demanding in time and computing effort. Usually the space to be covered by simulations is very wide (due to many varying parameters) or the time to run a single simulation is very long (e.g. CFD). Accordingly, surrogate models are often computed based on a limited number of numerical experiments with different input parameters. The design of experiments (DoE) thus still remains very important in the age of computer experiments [5]. When dealing with complex systems or components, metamodeling is often used for their macroscopic modeling. In engineering, metamodels, also called surrogate models, are mathematical relations aimed to substitute heavy detailed models such as finite elements models or complex lumped parameter models that usually involve many differential-algebraic equations (DAE). Their purpose is to replace computationally intensive models by approximate, but very light models. This is extremely useful especially during the tasks that require repeated simulations with different sets of parameters, e.g. optimization routines. By contrast to model reduction techniques which seek to obtain a light model by mathematical manipulation of the detailed physical equations [6]-[8], metamodeling is adjusting the parameters of the light model so that its response fits best to the simulation results obtained out of the detailed model [9] - [12]. Some common surrogate models are polynomial response surfaces, radial basis functions, kriging, artificial neural networks, etc. The inputs of these models are called the design parameters and the output(s) β the variable(s) of interest that depend on the input parameter(s). In order to reduce the number of input variables of surrogate models, several researchers proposed to use the Vaschy-Buckingham π theorem [13] - [18] in order to construct the surrogate model in terms of non-dimensional (or dimensionless) parameters π characterizing the system. This theorem states that a physical relation involving π relevant physical variables, π₯1 , π₯2 , β¦ , π₯π , can be rewritten in terms of a set of π = π β π dimensionless variables (also called dimensionless numbers) π1 , π2 , . . . , ππ constructed from the original physical variables [19] - [21]. Here π is the number of independent physical units and ππ π π π are groupings of input variables π₯π at particular powers ππ : ππ = π₯1 1 π₯2 2 β¦ π₯π π . Consequently, the dimensionless numbers donβt have physical units. Thus, besides reducing the size of the model, this strategy may enhance the robustness of the metamodel (with fewer inputs it may be easier to obtain a more robust model), reduces the size of the required DoE for metamodel construction, and also adds physical insight to the metamodel. The physical domains that enjoyed the most the use of dimensionless numbers are probably fluid dynamics and thermal transfer, although they are also used in many other physical domains but to a lesser extent. They gave birth to some well-known dimensionless numbers such as Reynolds number, Rayleigh number, Nusselt number, Prandtl number, etc. So far, some of the types of metamodels used in conjunction with dimensionless variables are polynomials [13], [18], sum of power laws [22] and variable power laws [23], [24]. Nevertheless, in many engineering domains (e.g. thermal and aircraft engineering) [25], chapter 5 in [27], one of the most used model shape for semi-empirical laws is the product of power laws. Scaling laws, which are often used in engineering [15], [28], are also examples of power laws. Therefore, in this case the regression of the numerical parameters of the model is carried out in logarithmic scale because it becomes linear. Another reason to perform the regression in logarithmic domain is that the dimensionless numbers often vary by several orders of magnitude for usual engineering applications. We will thus also consider the special case of DoEs in logarithmic scale in the context of dimensionless variables.

1.1 The challenge of Design of Experiments with design space transformation Constructing DoE plans for a model expressed in terms of dimensionless numbers appears to be more challenging than for a model expressed in terms of physical variables. Since the metamodel is expressed in terms of dimensionless variables, ππ , instead of the physical ones, π₯π , for its construction we need a DoE

on dimensionless variables and not on physical ones. In order to increase the accuracy of the metamodel, the DoE in the dimensionless space should satisfy some distribution properties (e.g. space-filling). In this case, a straightforward solution would be to use a classical DoE owing such distribution properties directly on the dimensionless variables. Unfortunately, such a DoE would be useless because the numerical resolution by employing directly dimensionless variables is possible only in a limited number of industrial simulation tools. Numerical simulations of physical systems are almost always employing the physical variables. This means that one should construct a DoE in the dimensionless space, and then calculate the corresponding DoE for the physical variables in order to simulate the system. However, this cannot be implemented for two reasons. One difficulty lies in the fact that there are less dimensionless variables than physical variables. Therefor for a given DoE in the dimensionless space the DoE in the physical space is not unique. Although this may be considered as a nonissue (one can just pick a solution among the plethora of solutions), when considering domain limitations (constraints) it is very easy to get unfeasible solutions. In order to illustrate the challenge, letβs consider the problem of finding the bending stiffness of a rectangular beam for topologies where the geometrical assumptions of beam theory are not satisfied. Note that this problem has a trivial analytical solution but we will use it here only to illustrate the challenges involved. The stiffness of the beam, πΎ (unit [π/π]), depends on four physical variables: its geometrical dimensions, length β πΏ (unit [π]), width β π (unit [π]) and high β π» (unit [π]) and the Youngβs modulus of the material πΈ (unit [π/π2 ]). Consider that we are interested in finding the relation that approximates the stiffness of a beam whose dimensions vary within a given domain, i.e. πΏ β [πΏπππ , πΏπππ₯ ], π β [ππππ , ππππ₯ ] and π» β [π»πππ , π»πππ₯ ]. Additionally, we also look only at beams with the aspect ratios varying within a given domain, i.e.

π»

π

πΏ

β [πππ, πππ₯] and similarly for . According to the π

Vaschy-Buckingham theorem, the problem can be expressed in dimensionless space as π0 = π(π1 , π2 ) πΎ

π»

πΏ

with π0 = πΈπ, π1 = π and π2 = π. Thus, in order to find the function π one should construct a DoE for the dimensionless variables π1 and π2 . However, in order to gather the data on which the regression will be carried out, a DoE for the physical variables π», π and πΏ is also necessary. The straightforward solution would be to design the DoE for the dimensionless variables π1 and π2 which are bounded by their min/max limits, and then compute the corresponding DoE in the physical space, defined by the variables π», π and πΏ, for the simulation needs. If the function π is a power law, which is often the case when we need to cover multiple orders of magnitude of the dimensionless parameters, then the considered DoE should be constructed in logarithmic scale. This enables to compute the parameters of the model by linear regression. By transforming the dimensionless numbers in the logarithmic scale we get: log π1 = log π» β log π

(1)

log π2 = log πΏ β log π

(2)

and

In order to illustrate the problem of unfeasible solutions that can be obtained by taking this approach, let us propagate the constraints of the physical domain in the dimensionless space. First, the min/max limits for π1 due to the min/max limits of the physical variables can be obtained from eq. (1) as: min log π1 = log π»πππ β log ππππ₯

(3a)

max log π1 = log π»πππ₯ β log ππππ

(3b)

The same goes for the limits of π2 :

min log π2 = log πΏπππ β log ππππ₯

(4a)

max log π2 = log πΏπππ₯ β log ππππ

(4b)

Additionally, since π1 and π2 are coupled by the width of the beam, π, there also may be some couplings between the limits of π1 and π2 . This coupling can be found by subtracting eq. (1) from eq. (2)which enables to determine the upper and lower frontiers of log π2 in function of π1 as: log π2 = log π1 + log πΏπππ₯ β log π»πππ = π(π1 )

(5a)

log π2 = log π1 + log πΏπππ β log π»πππ₯ = π(π1 )

(5b)

Note that the frontiers of π2 is not constant when π1 varies within its own min/max limits. The same holds true for the dimensionless variable π1 . The feasible domain in the dimensionless space is thus the intersection of the domain bounded by eqs. (3a) β (5b) with the one bounded by the min-max limits of dimensionless numbers π1 and π2 , as shown in Figure 1. Therefore, designing a DoE in the dimensionless space bounded only by min-max constraints on π variables may give some points that do not satisfy the bounds of the physical variables πΏ, π» and π. This may lead to nonrealistic physical configurations, or to difficulties in simulation, which may provide unreliable data for the regression process. (3π)

(3π)

log π2

log π» πππ₯

πππ₯

πππ

πππ

Limits on physical variables (4π)

Physical domain (4π)

πππ

πππ₯

log πΏ

Limits on dimensionless variables

πππ

πππ₯

log π1

Dimensionless domain Domain of the study

FIGURE 1. PROPAGATION OF THE CONSTRAINTS FROM PHYSICAL TO THE DIMENSIONLESS SPACE AND VICE- VERSA . T HE NUMBERS IN PARENTHESES REPRESENT THE REFERENCE OF EQUATION DESCRIBING THE LINE

The other way around is also conceivable, i.e. design a DoE in the physical domain and then compute the one corresponding to the dimensionless space. Since there are less dimensionless variables then physical variables, converting the DoE form the physical space to the dimensionless one is straightforward (there is a unique solution). However, designing a DoE in the physical domain bounded only by the min-max limits of the physical variables will bring the same problem as in the previous case, i.e. violation of the constraints in the dimensionless space, as illustrated in Figure 1. Additionally, the corresponding DoE in the dimensionless space will not have the desired distribution properties in order to achieve a good estimation of the function π(π1 , π2 ). The third possibility would be to design the DoE in the dimensionless space by considering the min-max bounds on the ππ variables and the constraints propagated from the bounds of the physical variables, like the eqs. (3a) β (5b). However, when the number of physical and dimensionless variables is large, propagating the constraints analytically from one domain to the other may be time consuming or even prohibitive. Moreover, in the general case, the physical variables can be bounded by nonlinear constraints which can be impossible to analytically propagate to the dimensionless space. And on the top of that, the solution uniqueness problem of the DoE in the physical space is not solved.

1.2 Aim and organization of the paper Many methods are currently available for constructing DoE in a physical design space with different distribution properties, as will be reviewed in the next section. However, to the best of our knowledge, there is no available approach that computes a DoE for two spaces, physical and dimensionless, by considering domain constraints in both spaces. Therefore this paper aims to propose a solution to this problem, which may occur in todayβs engineering needs. The rest of this article is organized as following. In section 2 we give an overview of some of the common approaches for constructing design of experiments. In section 3 we first provide the general formulation for constructing optimal space-filling DoEs in non-dimensional space. Then we provide an efficient approach for solving the optimization problem involved in this formulation. We also provide declinations of the proposed method for cases when the user needs to specify the number of levels for each nondimensional variable and for cases where the DoE needs to be constructed in logarithmic scale. In section 4 we provide three application case studies for the proposed approach. Finally we provide concluding remarks in section 5.

2 Overview of Design of Experiments Many techniques are available for constructing a design of experiments. We will give here a brief overview of some commonly used techniques, then focus on space-filling design which is of particular interest for the present work. The full factorial design is among the most common and intuitive techniques for building a design of experiments. It consists in dividing each variable (or factor) into n levels, then constructing a point for every possible combination of the variableβs levels. The result can be seen as a regular grid (a hypercube beyond three factors) over the design domain. One of the advantages of this technique is that it samples all the corners of the design domain. The main drawback of full factorial designs in this situation is that the feasible domain of the study most often is not a hypercube, because of the constraints from physical domain, as illustrated in Figure 1. Fractional factorial designs have been developed to address the curse of dimensionality associated with full factorial designs, by considering only a fraction (or subset) of the full factorial design. Accordingly they lead to smaller sizes of the experimental design and are quite efficient for fitting first order polynomial response surface approximation to the data. However they are problematic when higher order polynomials are being sought including interaction terms. To address the construction of experimental designs for higher order polynomials, central composite designs (CCD) [29] and Box-Behnken designs [30] have been developed. Central composite design consist in fractional factorial designs to which central β facial points have been added to allow better estimation of the interaction effects. Box-Behnken designs are based on incomplete block designs. These types of DoE are better suited for experimental estimations in order to deal with the measurement noise and repeatability. Since for computer experiments the noise and repeatability are not an issue, optimal designs have been developed in this context. These designs typically optimize the statistical inference possibility given a certain model structure (for example a linear or polynomial model) or optimal space distribution to cover at best the design space [31]. For a detailed review of βXβ-optimality criteria for polynomial response surfaces the reader is referred to [32]. In the context of fitting a surrogate model in terms of dimensionless parameters we are mainly interested in having good space-filling properties and being able to control the density of points in each variable. Accordingly we will briefly review some classical techniques for space-filling designs.

The space filling design problem has been well known and extensively studied in the applied mathematics community under the name sphere packing problem [33], [34]. It can be formulated as follows. Let us consider πππ the Euclidien distance between any two points π₯ π and π₯ π : πππ = βπ₯ π β π₯ π β2

β π, π β {1, . . , π}

(6)

The circle packing problem consists in maximizing the minimum distance between any two points as shown in eq. (7). Note that this is a non-convex optimization problem. Maximize π₯

Subject to

min [πππ ] π<π

π₯π < π₯ππ < π₯π

βπ β {1, . . , π}, βπ β {1, . . , π}

(7)

Alternative formulations have been also considered in order to simplify the complexity of the underlying optimization problem. Morris and Mitchel [35] for example proposed to minimize the π·π function defined in eq. (8) where π is a positive integer. For large π the π·π criterion is equivalent to the max min criterion defined in eq. (7). πβ1

1/π

π βπ

π·π = (β β πππ )

(8)

π=1 π=π+1

Another formulation is based on minimizing the Shannon-entropy of the points in the experimental design as proposed by Shewy and Wynn [36]. The criterion can be formulated as in [37]: Minimize log π

(9)

with: π

π 2

π ππ = ππ₯π (β πππ (π₯ππ β π₯π ) )

(10)

π=1

where πππ are the correlation coefficients and π forms the correlation matrix between the points of the experimental design. Audze and Eglais [38] proposed a potential energy formulation for obtaining space filling designs based on a physical analogy. They consider a system of points with unit mass exerting a repulsive force on each other. The points will reach equilibrium when their potential energy will be minimum. The potential energy can then be expressed as: πβ1

π

π=β β π=1 π=π+1

1 2 πππ

(11)

These different formulations can be applied to obtain space-filling designs. A further refinement consists in imposing a user provided number of levels for each variable. A popular approach for achieving both space-filling properties and a controlled number of levels is the optimal Latin hypercube design [39], [40]. Some of the reasons for its popularity are discussed in [41]. Latin hypercube design consists in defining a grid over the design space with π levels over each variable. A random sampling is carried out such as to obtain only one point per level for each variable. Due to the random nature of the sampling, Latin hypercube designs are not necessarily space-filling but they can be easily made so by adding an optimization step according to one of the previous space-filling criterion. This leads to so called optimal

Latin hypercube designs, which can be constructed either through optimization [42], [43] or geometric construction [44]. In the context of constructing a DoE in terms of non-dimensional variables there is an additional issue that needs to be handled: constrained design domains. Unlike in traditional DoE where only lower and upper bounds are typically considered for each variable, the constraints in the dimensionless space may be more complex. Compared to the literature on general experimental design, relatively few works investigated constrained design of experiments. Petelet et al. [45] proposes to achieve a constrained DoE through permutations on initial Latin hypercube designs, but the approach does not guarantee space-fillingness. Fuerle and Sienz [46] propose a constrained optimization formulation which they solve by genetic algorithms. Hofwing and StrΓΆmberg [47] propose a hybrid formulation solved by a combination of genetic algorithm and sequential linear programming. Mysakova et al. [48] propose for convex constraints a formulation based on Delaunay triangulation. These approaches for constrained DoE all handle a single design space. For constructing dimensionless surrogate models two design spaces are needed: the dimensional (or physical) design space and the dimensionless one and constraints can be present in each of these design spaces. Accordingly, none of the previous approaches are directly applicable to constructing a DoE for dimensionless spaces and keeping the correspondence between dimensional and dimensionless points of the DoE. The next sections provide a methodology for constructing such DoE that are also space-filling and that allow control on the density of points in each variable.

3 The proposed method This section presents the proposed method to construct a DoE for the dimensionless and physical spaces. As illustrated in section 1.1, the main drawbacks of classical methods for the DoE construction in the dimensionless space and the corresponding DoE in the physical space are: ο· ο· ο·

a classical DoE constructed in the physical space may lead to a bad distribution of the corresponding DoE in the dimensionless space; it may be difficult to propagate the constraints from the physical space to the dimensionless space; a DoE in the dimensionless space has a non-unique solution for the corresponding physical space.

In order to overcome these drawbacks we propose to design the DoE in the physical space but its construction should be based on the optimization of a distribution criterion in the corresponding dimensionless space. Thus, the problem is defined as an optimization problem where the manipulated variables are the physical variables and the cost function is the distribution criterion of the corresponding DoE in the dimensionless space. Constraints on both the physical and non-dimensional variables are considered in the optimization formulation. In this way it is not necessary to propagate the constraints from the physical space to the dimensionless one and there is no problem with the non-uniqueness of the DoE in the physical space. The desired distribution criterion for the DoE was chosen to be the optimal space filling criterion. The considered cost function to be maximized is thus the minimal distance between any two points. The mathematical formulation of this problem is described in the following section.

3.1 General optimization problem

Let us consider a set of π physical variables, denoted π₯π , βπ β {1, β¦ , π}, that form π dimensionless variables, noted ππ , βπ β {1, β¦ , π}. Note that according to the Vaschy-Buckingham theorem π < π . The relations between the physical and dimensionless variables are:

βπ β {1, β¦ , π}

ππ = ππ (π₯1 , β¦ , π₯π ),

(12)

This means that the physical space has π variables whereas the dimensionless space has π variables. Designing a DoE containing π experiments is equivalent to placing π points π in both spaces, each point having the coordinates ππ = (π₯1π , π₯2π , β¦ , π₯ππ ), βπ β {1, β¦ , π} in the physical space and π ππ = (π1π , π2π , β¦ , ππ ), βπ β {1, β¦ , π} in the corresponding dimensionless space. Note that here π is an index and not a power coefficient. The physical and dimensionless variables may be bounded by upper and lower bounds: π₯π,min β€ π₯π β€ π₯π,max , βπ β {1, β¦ , π} ππ,min β€ ππ β€ ππ,max , βπ β {1, β¦ , π}

(13)

In order to ensure the simulation of only realistic configurations, the physical variables may have additional π inequality constraints, expressed as: ππ (π₯1 , β¦ , π₯π ) β€ 0, βπ β {1, β¦ , π}

(14)

Note that if on top of the bounds on the non-dimensional variables (eq. (13)) there are also non-linear constraints, these can always be transformed into non-linear constraints in terms of the physical variables. The formulation can thus handle non-linear constraints in terms of both dimensional and nondimensional variables. The distance π between two points ππ and ππ of the dimensionless space is defined as the second norm (Euclidian distance): πππ = βππ β ππ β , 2

βπ, π β {1, . . , π},

π<π

(15)

with ππ , π β (1, β¦ , π) being vectors of length π where the elements are the coordinates of the points in π the dimensionless space ππ = (π1π , β¦ , ππ ). These points can be also represented in terms of physical variables, by introducing eq. (12) in eq. (15) which gives: ππ = (π1 (π₯1π , β¦ , π₯ππ ), β¦ , ππ (π₯1π , β¦ , π₯ππ ))

(16)

Considering the above notations, the DoE can be computed by solving an optimization problem formulated as: Maximize

1 ,β¦,π₯ π π₯11 ,β¦,π₯1π ,β¦,π₯π π

Subject to

min [πππ ]

βπ, π β {1, . . , π}, π < π

π₯π,min β€ π₯ππ β€ π₯π,max

βπ β {1, . . , π}, βπ β {1, β¦ , π} βπ β {1, . . , π}, βπ β {1, β¦ , π}

ππ,min β€ ππ (π₯1π , β¦ , π₯ππ ) β€ ππ (π₯1π , β¦ , π₯ππ ) β€ 0

ππ,max

(17)

βπ β {1, . . , π}, βπ β {1, β¦ , π}

with πππ defined as in eq. (15) where the points ππ are defined in eq. (16). In order to solve this optimization problem with multiple cost functions, it can be transformed into an equivalent problem with a single cost function as:

Maximize

1 ,β¦,π₯ π π§,π₯11 ,β¦,π₯1π ,β¦,π₯π π

Subject to

π§ π§ β€ πππ π₯π,min β€ π₯ππ β€ π₯π,max ππ,min β€ ππ (π₯1π , β¦ , π₯ππ ) β€ ππ,max ππ (π₯1π , β¦ , π₯ππ ) β€ 0

βπ, π β {1, . . , π}, π < π βπ β {1, . . , π}, βπ β {1, β¦ , π} βπ β {1, . . , π}, βπ β {1, β¦ , π}

(18)

βπ β {1, . . , π}, βπ β {1, β¦ , π}

By discarding the variable π§ from the solution of the optimization problem (18) one obtains the DoE for the physical space and using the transformations from eq. (12) the corresponding DoE in the dimensionless space. In this paper we use for optimization the max-min approach introduced in eq. (7) and implemented in Matlab according to [49]. On all our test problems this implementation was very efficient. For problems for which this implementation appears to be too expensive, the optimization can always be reformulated in terms of the cost function of eq. (8). Although the cost function is linear, the constraints π§ β€ πππ are nonconvex. This implies that even if the functions ππ and ππ are convex, the entire optimization problem is not convex. Therefore, in order to solve the optimization problem (18) a global optimization algorithm is indicated. When addressing nonconvex optimization problems, the initial guess of the solution is of particular importance in order to increase the chance of finding the global optimum. Therefore in the following section we propose a method to rapidly generate an initial guess.

3.2 Computation of the initial guess of the solution If convex optimization algorithms are used in nonconvex optimization problems, an initial guess that is close to the global optimum considerably increases the chance of finding the latter. Moreover, a good initial guess will decrease the number of iterations of the optimization algorithm. Therefore, instead of solving the optimization problem (18) with an arbitrary initial solution, we propose to construct an initial guess, which approaches to the desired distribution of the DoE. However, for efficiency reasons, the computational effort to generate the initial guess should be just a small fraction of the optimization effort. Thus the proposed approach seeks to construct first a classical DoE that satisfies the constraints from both spaces but without relying on optimization. Certainly, the initial guess will not have the best space filling properties, but the idea is to come as close as possible to such a distribution by using classical DoE that are not based on optimization. The proposed construction of an initial guess, that satisfies the boundaries and constraints in both, physical and dimensionless spaces, follows the three steps described hereafter. First, a random DoE is generated in the physical space containing Ni points (experiments), with Ni much higher than the number of points that are needed (typically we chose Ni=20000). The designed DoE should satisfy only the boundaries and constraints imposed on the physical space; the constraints propagated from the dimensionless space are not considered at this point. The corresponding DoE in the dimensionless space is built by applying the transformation from eq. (12). At the moment the number of points highly exceeds the required number of experiments; it will be reduced later. The importance of the high number of points is to efficiently fill the dimensionless design space. At this step the obtained DoE in the dimensionless space will satisfy only the constraints from the physical space. In the following, the sets of points of this DoE are denoted by X β1 and Ξ β1 for the physical and dimensionless space, respectively. In the second step another DoE having the following features is computed only for the dimensionless space:

1. the number of points should be equal to the desired number of experiments or should correspond to the number of levels on each variable (also called factors in the literature), 2. the DoE should be constrained only by the bounds of the dimensionless variables, 3. the method used to generate the DoE should allow imposing the number of levels on each variable (factor), 4. the generated DoE should have good space filling distribution, 5. in order to be time efficient the method should not rely on optimization. One of the simplest DoE that satisfies most of the above features is the full factorial (FF) design. It provides a distribution that fills the design space very evenly, it can easily handle different number of levels on each variable, it does not use optimization, and it is naturally constrained by min-max bounds. That is why for this step FF design is considered. Note however that if a full factorial is not practical due to the high dimension of the dimensionless space than optimal latin hypercube design could also be used. In the following the set of points of this DoE is denoted by Ξ πΉπΉ . It should be noted however that using FF design, an inconsistency with the desired size of the DoE may occur. If the size of the DoE is specified by the number of levels on each variable, FF design naturally handles this specification. However if the size of the DoE is specified by the number of points to be placed, there may be situations when this number is not achievable with FF design. Actually, the number of points generated by FF design is given by ππΉπΉ = ππ1 ππ2 β¦ πππ , where πππ is the number of desired levels on the variable ππ . Thus, it can happen that any combination of levels cannot give the desired number of points π. In this case the proposed algorithm will generate the smallest possible number of experiments ππΉπΉ above the desired size π, i.e. the smallest ππΉπΉ which satisfies ππΉπΉ > π. Thus, the user can either accept a higher number of experiments or discard the extra-points in the next step. At this moment there are two DoE in the dimensionless space: a very dense one (Ξ β1) that satisfies the constraints of the physical space and a small one (Ξ πΉπΉ ) that has the desired size (except when the remark just above holds), has reasonably good space filling properties and fulfils the constraints from the dimensionless space. Therefore in the third step an βintersectionβ of the two DoE is performed in order to obtain the one that satisfies the constraints form both, physical and dimensionless spaces. This βintersectionβ is performed as follows: for each point from the set Ξ πΉπΉ the closest point from the set Ξ β1 is selected. In the following, the set of points obtained after this βintersectionβ is denoted by Ξ 0 for the dimensionless space and π0 for the corresponding physical space. In the interior domain of Ξ 0 the distribution of the selected points will be almost the same as the one given by the set Ξ πΉπΉ , i.e. distribution of a FF design, whereas on the frontier of the domain of Ξ 0 the points may be more closely spaced. Thus the set of the selected points Ξ 0 will serve as the initial guess to be used for solving the optimization problem (18). The task of the optimization will be to redistribute these points in order to enhance the space filling criterion, i.e. maximize the minimal distance between the points. In case the desired number of points π cannot be equal to the number naturally obtained by FF design, ππΉπΉ , there will be ππΉπΉ β π points that should be discarded from the set Ξ πΉπΉ . The choice is then to discard the points that are the most distant from the set Ξ β1.

3.3 Handling of multiple levels during the optimization As indicated in the previous sections, sometimes it might be desirable to have different densities along some particular axes of the DoE. This requirement is easily implemented in the computation of the initial guess by imposing the number of levels for each axis in the FF design. However, after the optimization, the DoE will most likely have a different distribution of the levels. In order to maximize the distances between the points, the optimization will take less computational effort to arrange the points in more columns (or rows) along the axis with longer domain instead of squeezing them along the axis with

smaller domain. Actually, optimization algorithms always set tolerances on the cost function and maximum number of evaluations. Thus, slightly changing the coordinates of the points along the direction with a narrow domain will finally improve the cost function by a very small amount, which may often be less than the tolerance value. Accordingly, it will pay more to change the coordinates along the direction with a larger domain. Therefore, the number of levels after the optimization will mainly depend on the ratios of the length of feasible intervals of each axis. Certainly, it is possible to decrees the tolerance of the cost function and increase the maximal number of evaluations. However, in this case the optimization will take much more computational effort. In order to overcome this problem, we propose another solution that does not need to increase the computational effort of the optimization. We propose to rescale all the axes of the dimensionless space in function of the number of levels required for each dimensionless variable, run the optimization on the rescaled axes and then scale back the solution to get the original coordinates. Consider that on the axis corresponding to the dimensionless variable ππ , πππ levels are required. The length of the feasible interval of the axis ππ can be computed as: ππ = max[ππ1 , β¦ , πππ ] β min[ππ1 , β¦ , πππ ], βπ β {1, β¦ , π}

(19)

π

where ππ is the coordinate π of the point ππ from the initial guess set Ξ 0 . The scaling coefficient ππ can be defined as: ππ =

πππ β 1 , π β {1, β¦ , π} ππ

(20)

Subtracting 1 from ππ in the above formula enables to pass from the number of levels in the interval ππ to the number of segments in this interval. All the axes of the dimensionless space have to be scaled by the coefficients ππ , before starting the optimization procedure. By performing this scaling, the new lengths of the feasible intervals become equal to the number of desired levels on each axis. In order to better illustrate the scaling, a 2dimensional example is given in Figure 2. For convenience, consider that the feasible domain of the dimensionless space is only bounded by min/max limits on each axis, e.g. π1 β [2, 6] and π2 β [1, 2]. Suppose it is desired to have a DoE with at least two levels on π1 axis and at least three levels on π2 axis. The initial distribution of the six points will be equivalent to the one depicted in Figure 2 (a). The feasible interval of π1 axis is π1 = 4 and that of π2 axis is π2 = 1. It is self-evident that during the optimization in order to maximize the minimal distance between the points they will be distributed in several columns along π1 axis, regardless of the coordinate π2 , as shown in Figure 2 (b). The sensitivity of the cost function with respect to π2 axis is much smaller than with respect to π1 . On the other hand, by rescaling the axes with coefficients calculated by eq. (20) (π1 = 0.25 and π2 = 2 ) gives an initial guess depicted in Figure 2 (c). In this case the distance between every level is equal and thus the sensitivity of the cost function is equal with respect to both axes.

(π)

π2

(π)

π2

(π)

πβ²2 4

2 1

2 1 2

6

π1

2 2

6 π1

0.5 1.5

π1β²

FIGURE 2. AXES SCALING TO CONTROL THE NUMBER OF LEVELS ON EACH AXIS DURING THE OPTIMIZATION . (A ) INITIAL DISTRIBUTION ; (B ) DISTRIBUTION AFTER OPTIMIZATION WITHOUT AXIS SCALING ; ( C) INITIAL AND OPTIMIZED DISTRIBUTION WITH AXES SCALING .

3.4 Improvement of the distribution in physical space The proposed approach assumes a good distribution of the DoE in the dimensionless space. Nevertheless, a good distribution of the DoE in the physical space may help to identify errors in the construction of dimensionless numbers. In order to have good coverage in both, physical and dimensionless spaces, the computation of DoE becomes a multi-objective optimization problem. Changing the formulation of the optimization problem (18) in order to consider a multi-objective cost function is however nontrivial; additional decision factors will be involved such as weighting coefficients or choosing a solution from the Pareto front. Moreover, the optimization process itself becomes heavier. Therefore in the following we propose a simpler solution which is not necessarily optimal but which increases the chance to get better results with minimal effort. The proposed solution deals with the choice of the initial guess π0. As explained in the first section, there are multiple combinations of physical coordinates (points in the physical space) that give the same coordinate (point) in the dimensionless space. Thus, during the first step of the initial guess design, when a high number of points is generated, there will certainly be points that are very distant from each other in the physical space but close to each other in the dimensionless space. In the light of this observation, the idea of the proposed solution is to choose an initial guess that is close to the FF design in the dimensionless space, but in the same time maximizes the spread of the corresponding points in the physical space. A way to achieve this is to adapt the fourth step of the initial guess design as follows. For each point in the set Ξ πΉπΉ , there will be selected groups of three closest points from the set Ξ β1 instead of selecting only one (the closest). Then, from each group of three is selected the one which gives the best distribution for the corresponding points in the physical space. Although it is not guaranteed that after optimization the distribution in the physical space will be optimal, starting from a more uniform distribution of the initial guess will lead to better space filling design in the physical space.

3.5 Declination of the proposed method for power laws As pointed out in the introduction, dimensionless variables are almost always monomials, i.e. products of powers of physical variables. When the dimensionless domain covers several orders of magnitudes, the power laws turn out to be well fitted for approximating the response of the system. We will thus provide in this section a particular declination of the proposed method for power laws. Consider the power law needing to be identified based on the DoE which is expressed as a product of dimensionless numbers at a certain power π: π π

π0 = π β ππ π π=1

(21)

with π being a constant. The parameters to be identified for such functions are the power coefficients, ππ and the constant π. In this case a logarithmic transformation is very useful because it transforms the nonlinear regression problem into a linear one: π

log π0 = log π + β ππ log ππ

(22)

π=1

In this case it is better to have a DoE with good distribution in the logarithmic scale instead of the linear scale. This means that the manipulated variables in the physical and in the dimensionless space will be logarithmic, which implies that the expression (12) becomes linear. This may alleviate the computational effort for solving the optimization problem (18). Thus by simply working with logarithms of physical and dimensionless variables instead of their decimal representations, may significantly enhance the efficiency of the optimization problem solving. Finally, the proposed method to construct the DoE described in this section can be visually illustrated by the flowchart from Figure 3.

Initial Guess Generation Randomly choose ππ points in the physical domain

Set πβ1

Propagate the set πβ1 in the dimensionless domain

Set Ξ β1

Generate a FF DoE in the dimensionless domain

Set Ξ πΉπΉ

Select from the set Ξ β1 the π nearest points to the set Ξ πΉπΉ

Set Ξ 0

Initial Guess Optimization

Logarithm dimensionless axes and rescale if necessary with coefficients computed by eq. (20) Formulate the optimization problem with eq. (18) Solve the optimization problem

FIGURE 3. FLOWCHART OF THE PROPOSED METHOD

The time cost of computing a DoE using the proposed method is mainly impacted by the size of the DoE. This is due to the fact that increasing the number of points will rapidly increase the number of distances, πππ , to be evaluated during the optimization. Some orders of magnitude observed in our tests carried out in Matlab on a standard PC (2.3 GHz quad-core CPU and 16 Go RAM) are: 10 s for a DoE containing 10 points, 1 min. for a DoE containing 50 points and 30 min. for a DoE containing 100 points. Keep in mind that one of the purposes of building surrogates using dimensionless variables is to have smaller design space. The design space rarely exceeds four or five dimensions or 100 design points. For expensive simulation models the DoE construction cost is thus negligible over carrying out the simulations at the DoE points.

4 Case studies This section presents three case studies that illustrate the use and results obtained by the proposed method. First, a purely numerical example is presented, which aims to illustrate the manipulations of the proposed method and its different variations. In the next two case studies a DoE is constructed for real world applications in the electro-mechanical actuation domain [50], aimed to estimate the mechanical stiffness of a structural element and the thermal exchange coefficient of a cylinder.

4.1 Numerical case study Letβs first consider a simple analytical example discussed in the introduction which treats three independent physical variables, denoted π₯π , π = {1,2,3} and two dimensionless variables, denoted π1 and π2 . The relations between the variables are: π1 =

π₯1 π₯2

and π2 =

π₯2 π₯3

(23)

The physical variables may represent the dimensions of a cuboid component and the dimensionless numbers the aspect ratios of the studied component. Letβs first consider only bounds on the physical variables as: 10 β€ π₯π β€ 100, π β {1, 3} and 1 β€ π₯2 β€ 1000

(24)

It is considered that the searched function of dimensionless numbers is a power law which requires a good distribution of the DoE in logarithmic scale. Therefore, in the following all the plots will be presented in logarithmic scale in order to illustrate the quality of the distribution at each step of the method. It will also be considered that the required size of the DoE is 50 points. Thus, in the following, first, an initial guess is generated, after which the optimization is performed in order to fulfill the space filling criterion. In the first step of the initial guess design, a random DoE is generated for the physical space, containing a very large number of points (here 20.000). Its single property is to have a good coverage of the entire physical domain. By applying to the obtained set the transformations from eq. (23) will give a corresponding DoE in dimensionless space, which is provided in Figure 4. This figure clearly illustrates how the constraints from the physical space (eq. (24)) are propagated to the dimensionless space (diagonal borders).

FIGURE 4. DOE IN DIMENSIONLESS SPACE AT THE FIRST ITERATION FOR THE DESIGN OF THE INITIAL GUESS ( SET Ξ β1 ) At the second step of the initial guess design, a FF DoE is generated in the dimensionless space. The bounds on each dimensionless axis are calculated as the extremes of the obtained set Ξ β1 plotted in Figure 4. These are: 0.01 β€ ππ β€ 100,

π β {1,2}

(25)

The obtained FF DoE, the set Ξ πΉπΉ , is illustrated by circles in Figure 5 (left). It has seven levels on each factor, thus giving 49 points which is very close to the desired number of experiments, i.e. 50. At the moment, the FF design has no equivalent DoE in the physical space and it also violates the constraints from the physical space (eq. (24)). 2

2

10

10

Initial DoE Optimized DoE

Full Factorial Design Nearest point selection 1

1

10

2

0

10

:

:

2

10

-1

0

10

-1

10

10

-2

-2

10

10 -2

10

0

10 :1

2

10

-2

10

0

10 :1

2

10

FIGURE 5. (LEFT) S TEP 2 AND 3 FOR THE COMPUTATION OF THE INITIAL GUESS . (RIGHT ) I NITIAL SOLUTION AND FINAL , OPTIMIZED , DOE The last step of the initial guess computation is to choose from the set illustrated in Figure 4 the closest points to the FF design from Figure 5 (left). The points obtained at this step are illustrated by red stars in Figure 5 (left). As it can be seen, in the interior of the dimensionless domain the selected points are very close to the FF design, thereby having a good space distribution. However, on the frontier of the domain, the points are more closely spaced, thus giving a non-optimal overall space filling. If the number of

experiments for the DoE (here 50 points) is mandatory, then at this stage a random additional point can be picked up from the set Ξ β1 generated in step 1 in order to have a total of 50 experiments. The next step is to optimize the space filling criteria starting from the initial guess, computed above. The optimization problem is formulated according to eq. (18) where the last two inequality constraints, i.e. ππ,min β€ ππ (π₯1π , β¦ , π₯ππ ) β€ ππ,max and ππ (π₯1π , β¦ , π₯ππ ) are non-existent in this problem. Since we are working in logarithmic scale, the manipulated variables are actually logarithms of the physical variables. Consequently the constraints from eq. (24) should be transformed in log space in order to be used in the optimization. The results of the optimization are given in Figure 5 (right). It can be noted that the optimization effectively re-distributed the points of the initial guess in order to give a good space filling of the design space which satisfies the constraints of the physical domain. Since the evaluation of the distribution quality out of its graphical representation may be subjective, and even impossible above three-dimensional spaces, we defined a numerical distribution indicator. In order to compute this indicator the first step is to calculate for each point the distance to its nearest neighbor. Then, the distribution indicator, π, is defined as the ratio between the standard deviation of all these distances and their mean. Thus, a number closer to one will indicate that the distances between the neighboring points are very different, which correspond to an uneven distribution whereas a small number will indicate that the points are spaced more uniformly. Moreover, since this criterion is not minimized in the optimization problem, it increases the objectivity of the distribution evaluation. The defined distribution indicator calculated for the optimized DoE in Figure 5 (right) is π β 1.7π β 9, which indicates a very good distribution. By comparison, this indicator is π β 0.7 for the initial guess distribution, which is plotted in Figure 5 right. Here, this indicator reveals an uneven distribution, which is clearly observable between the points lying along the frontier and those on the domain interior. Consider now the same example as above but with an additional constraint on a dimensionless variable, as: 1 β€ π1 β€ 10

(26)

Additionally, suppose that we need at least ten levels on the variable π1 , based on a priori knowledge that the model is highly nonlinear with respect to this variable. If the number of experiments is the same as above, about 50 points, the appropriated initial full-factorial design would have 10 levels on π1 and 5 levels on π2 . By applying the same steps as above, the obtained DoE before and after optimization is illustrated in Figure 6 left. It can be noted that the initial design satisfies the requirement of 10 levels on π1 but is not optimally distributed (π β 0.42). After the optimization, the distribution is more even (π β 1.5π β 6) but only about 6 levels on π1 can be more clearly distinguished. Note that visually the points do not seem to be well distributed. This is actually only an illusion because the scale on the two axes is not the same. There are three decades represented on the π2 axis and only one decade on the π1 axis. To cope with the desired number of levels issue during the optimization, the scaling computed by eq. (20) is applied to the dimensionless space. This gave the DoE depicted in Figure 6 right, where the 10 levels can be clearly identified on the axis π1 . Thus, it can be seen that the proposed method can effectively handle the number of desired levels on each axis independently of the imposed constraints on the dimensional or dimensionless variables.

10

1

10

Initial DoE Optimized DoE

Initial DoE Optimized DoE

0

10

0

ο°2

ο°2

10

1

10

10

-1

10

-2

10

0

10

1

10

ο°1

-1

-2

10

0

10

1

ο°1

FIGURE 6. D OE WITH MIN-MAX BOUNDS ON π1 : ( LEFT ) THE NUMBER OF LEVELS ON π1 IS NOT SATISFIED ( RIGHT) THE NUMBER OF LEVELS ON π1 IS SATISFIED

4.2 Mechanical example In the following the proposed method is applied on a real world example which compared to the previous one includes additional constraints on the dimensional variables. The case under study refers to a structural element (a connecting rod) designed to link the rotor of an electrical machine to a mechanical structure. The geometrical configuration of this component is illustrated in Figure 7. The aim is to design π experiments in order to evaluate the maximum Von Mises stress of this component using the dimensionless approach. This DoE should be used to construct a surrogate model of the stress, which could be used during a preliminary design stage. The maximum Von Mises stress ππ£ of the rod depends on its geometrical parameters π·1 , π·2 , π1 , π2 , πΏπ and ππ , and torque π applied on the ring on motorβs side with diameter π·1. From practical considerations it is assumed that the thicknesses π1 and π2 are set to 25% of the diameters π·1 and π·2, respectively. The problem can be transformed into a dimensionless representation as: π0 = π(π1 , π2 , π3 ) with π0 =

ππ£

πΏ3

π

, π1 =

π·1 πΏπ

, π2 =

ππ

π·1

π·2 πΏπ

and π3 =

ππ πΏπ

.

π2 π· 2

π1 πΏπ

FIGURE 7. GEOMETRICAL CONFIGURATION OF THE CONNECTING ROD

(27)

TABLE 1. DOMAIN OF DEFINITION OF PHYSICAL VARIABLES FOR THE MECHANICAL EXAMPLE Variable

Unit

Range

π·1

ππ

10 β 50

π·2

ππ

10 β 50

ππ

ππ

5 β 30

πΏπ

ππ

150 β 300

The DoE to be defined concerns the dimensionless variables π1 , π2 and π3 and the physical variables π·1 , π·2 , πΏπ and ππ . Given the number of dimensionless variables, a DoE of 27 experiments will be considered. The domains of variation of the physical variables are given in Table 1. Moreover, in order to keep the design realistic, the following additional constraints are considered on the physical variables: π·1 + π·2 + π1 + π2 < 0.5πΏπ 3>

(28)

π·1 1 > π·2 3

(29)

Eq. (28) will constrain the DoE to avoid the superposition of the two holes (of diameters π·1 and π·2) while imposing a minimal amount of material in-between, whereas eq. (29) will avoid having unreasonable geometrical shapes of the rod. For this example, there are no additional constraints on the dimensionless numbers. As in the previous example, it is considered that a good distribution of the dimensionless space in logarithmic domain is needed. For this application, the optimization problem (18) becomes: Maximize

π§

βπ β {1, . . ,27}

Subject to

π§ β πππ β€ 0

βπ, π β {1, . . ,27}, π < π

π§,π·1π ,π·2π ,πππ ,πΏππ

π·1π +

10 β€ π·1π β€ 50 10 β€ π·2π β€ 50 5 β€ πππ β€ 30 150 β€ πΏππ β€ 300 π π·2 + π1π + π2π β 0.5πΏππ π·1π β 3π·2π < 0 π·2π β 3π·1π < 0

(30) <0

π·π

π·π

ππ

π

π

π

with πππ defined as in eq. (15) where ππ = (log10 πΏπ1 , log10 πΏπ2 , log10 πΏππ ) , βπ β {1, β¦ , 27}. As it can be seen, the first constraint, involving the Euclidian distance, is non-linear. After application of the proposed procedure to compute an initial guess and solving the optimization problem (30), the obtained initial and optimized DoE are plotted in Figure 8. As it can be seen, for threedimensional space it is difficult to visually assess the quality of the distribution, and it is even impossible for higher dimensions. That is why the previously defined quality factor π is mainly used in this case. In this example, for the initial guess this indicator is π β 0.32 whereas for the optimized DoE it is π β 1.8π β 2, which indicates a clear improvement over the initial guess.

Since the final purpose of any DoE is to improve the quality of the regression model, we compared the accuracy of the models obtained by using the proposed DoE and an optimal LHS design on the physical variables (which is a classical DoE). The projection of the LHS design in the dimensionless space is depicted in Figure 8, whose distribution quality coefficient is π β 0.53. Despite its good distribution in the physical space, this DoE does not conserve this property in the dimensionless space. Moreover, only 24 points out of 27 satisfy the constraints from eqs. (28) and (29). The two DoE were used first for the construction of two surrogate models, and the quality of the obtained surrogates was compared in terms of maximum error both on the DoE as well as on a separate validation set with 60 points (in order to test interpolation and extrapolation capabilities). Note that for this problem the choice of the surrogate type is not trivial. We first tested polynomial models, but they were totally unadapted for this study, where the prediction error surpassed 1000% on the validation set for both DoEs. We then chose for the surrogate type a constant power law model. The maximal error at the DoE points using the LHS design was 9% whereas using the proposed DoE the error was 14%. However, when the models were tested on the validation set, the one build on LHS design exhibited 18% maximal error whereas the model built on the proposed DoE β only 6%. These errors are relatively high for both DoEs which indicates that the surrogate type is still not appropriate for this problem. Therefore, we turned to π +π log π

π +π log π log π

2 2 3 π variable power law models of the form π0 = ππ1 1 2 π2 1 2 π3 , as proposed in [23]. By doing so the maximal regression error dropped to 3% for both the optimal LHS and the proposed DoE. However, when the built models were tested on the validation set, the one build on LHS design exhibited 15% maximal error whereas the model built on the proposed DoE remained at 3%. These results confirm the intuitive assessment that a DoE with a good distribution indicator Q leads to more accurate surrogate models over the entire domain (i.e. when including validation points, that were not included for the surrogate construction). Table 2 gives an overview of the comparison results.

ο°3

10

10

10

LHS on physical variables Initial DoE Optimized DoE

0

-1

10

-2

10

0

10 10

0

-1

-1

ο°

10

-2

10

-2

ο°

2

1

FIGURE 8. D OE FOR THE MECHANICAL CASE : INITIAL GUESS AND OPTIMIZED

TABLE 2: O VERVIEW OF MODEL ACCURACY FOR MECHANICAL CASE STUDY WHEN USING PROPOSED AND CLASSICAL DOE Model using LHS DoE

Model using proposed DoE

Constant power law model

Variables power law model

Constant power law model

Variables power law model

24/27

24/27

27/27

27/27

Maximal relative prediction error at the points of the DoE that served for surrogate construction

9%

3%

14%

3%

Maximal relative prediction error at the points of the validation set

18%

15%

6%

3%

Size of the DoE satisfying the constraints/desired size of DoE

4.3 Thermal example For this last example letβs consider the problem of assessing the convective heat transfer coefficient of a cylinder. This problem is typical for the selection of an electrical motor, at a preliminary design stage,

whose continuous torque depends on its thermal balance. This example highlights the effects of constraints propagation between both spaces, which may generate problems if the DoE is constructed in one space (either dimensional or dimensionless) as explained in introduction. The mean convective heat transfer coefficient βΜ of a cylinder of length πΏ, diameter π· and elevation of the skin temperature above the ambient temperature Ξπ, can be expressed in terms of dimensionless variables as: ππ’ = π(π, ππ, πΊπ) with ππ’ =

Μ π· β π

πΏ

(31) π

β the Nusselt number, π = π· β the aspect ratio of the cylinder, ππ = ππΆ β the Prandtl

number, and πΊπ =

π

ππ½π2 Ξππ· 3 π2

β the Grashof number. In this problem the parameters π, π, πΆπ , π½ and π

represent the physical properties of air, which here for convenience are considered to be constant (although they vary slightly with the temperature). π is the gravitational constant. The ππ number appears to be constant as well and thereby it will not be considered for the DoE. Thus, for the DoE there are two dimensionless variables π and πΊπ which depend on three physical variables, .i.e. πΏ, π· and Ξπ. The problem of modeling the thermal convection around a cylinder is often encountered in the literature, and usually it is modeled by a constant power law [26]. Letβs consider that we want to explore configurations with physical variables varying within the domains presented in Table 3. Additionally, it is desired to reduce the DoE to realistic shapes of a motor, where its aspect ratio is bounded as ππππ < π < ππππ₯ . For convenience, these limits are considered to be ππππ = 0.5 and ππππ₯ = 3. If one would build a full-factorial DoE in the physical domain, the obtained DoE in the dimensionless domain would be the one in Figure 9. The number of points in this example is intentionally higher than necessary for regression, in order to better illustrate the problem of constraints propagation. As it can be seen, the points are poorly distributed in the dimensionless space, which may impact negatively the quality of the estimated surrogate model from eq. (31). Moreover, half of the points are situated outside the considered limits of π. Although having points outside the domain of interest may be seen as a nonissue, there is however a drawback. Considering that the model in eq. (31) is a response surface model, the regression process will minimize the estimation error in respect to every point from Figure 9. This will lead to a compromise between the estimation errors corresponding to the points within the domain of interest of π with those outside this domain. Consequently, besides spending unnecessary resources for the simulation of unrealistic configurations, this compromise may reduce the fidelity of the model inside the domain of interest. There is yet another danger when using this DoE. For Grashof numbers lower than 108 the flow is laminar whereas for πΊπ > 109 the flow is turbulent (in-between it is mixed convection). Therefore, in order to have consistent simulation results, different finite element models are needed for each situation (laminar or turbulent flow). Since the DoE often has a significant number of configurations to be simulated, the common practice is to launch them in batch mode. In this case a single parametric model is used where some of the physical variables are varied according to the DoE. As it can be seen from Figure 9, the problem with this is that a single model is used for different flow regimes. Thus, in the best case the simulation will fail for some configurations, and in the worst case the obtained simulation results will be wrong. Given the problems highlighted with the constructed DoE in the physical space, the other possibility is to build a DoE in the dimensionless space and then calculate an equivalent DoE in the physical space. Here

however, two other problems arise: (1) there is no lower bound for the Grashof number and (2) the points that will be placed outside the limits induced by the constraints from the physical domain will be unfeasible. Consequently, in order to satisfy the constraints form the physical and the dimensionless domains, the proposed solution solves a constrained optimization problem which considers both dimensional and non-dimensional constraints simultaneously.

TABLE 3. DOMAIN OF DEFINITION OF PHYSICAL VARIABLES FOR THE THERMAL EXAMPLE

10

10

Variable

Unit

Range

πΏ

π

0.1 β 0.5

π·

π

0.1 β 1

Ξπ

πΎ

50 β 100

πΏ/π·

β

0.5 β 3

10

Propagation of constraints on physical variables

9

Gr

πΊπ = 108 10

10

8

π=3

7

π = 0.5

10

6

10

-1

0

10 PI π

10

1

FIGURE 9. D OE IN THE DIMENSIONLESS SPACE CORRESPONDING TO A FULL- FACTORIAL D OE IN PHYSICAL SPACE

By applying the proposed method on this application, the optimization problem (18) becomes: Maximize π§,πΏπ ,π·π ,ΞΞΈk

Subject to Constraints on physical variables Constraints on dimensionless variables

π§

βπ β {1, . . ,9}

π§ β πππ β€ 0 βπ, π β {1, . . ,9}, π < π 0.1 β€ πΏπ β€ 0.5 0.1 β€ π·π β€ 1 50 β€ Ξππ β€ 100 πΏπ β 3π·π β€ 0 0.5π·π β πΏπ β€ 0 ππ½π2 ΞΞΈD3k β 109 π2 β€ 0

(32)

with πππ defined as in eq. (15) where ππ = (log10

πΏπ π·π

, log10

ππ½π2 ΞΞΈk π·π3 π2

) , βπ β {1, β¦ , 9}. Note that in this

example the index π does not appear at the exponent in order not to confuse with π· 3 in the definition of the Grashof number. After application of the proposed procedure to compute an initial guess and solving the optimization problem (32), the obtained initial and optimized DoE are plotted in Figure 10. In this example, for the initial guess the distribution indicator is π β 0.33 whereas for the optimized DoE it is π β 7π β 2, which indicate a clear improvement over the initial guess. The obtained DoE obviously satisfies the constraints from both spaces, physical and dimensionless.

8

Gr

10

LHS on physical variables Initial DoE Optimized DoE

10

10

7

6

10

0

ο°

FIGURE 10. DOE FOR THE THERMAL EXAMPLE: INITIAL GUESS AND OPTIMIZED As in the previous example, we tested the impact of the proposed DoE on the model accuracy by comparison with an LHS design on the physical variables. The projection of the LHS design in the dimensionless space is illustrated in Figure 10. Its distribution quality coefficient is π β 1.1 and it can be visually noted that the only a small fraction of the domain of interest is covered. This time, only 4 points out of 9 satisfy the constraints on the dimensionless space, represented in Figure 9. As stated before, a constant power law is well suited for this component. Accordingly we considered a surrogate under the form π0 = ππ π πΊπ π . An overview of the errors obtained by the models built on these two DoE is given in Table 1. The conclusions are similar to the previous case: (1) the proposed DoE enables to control the size of the desired DoE without losing points that do not satisfy the constraints from both domains and (2) the accuracy of the model built using the proposed DoE exhibit better accuracy over the entire domain in comparison with classical DoE that donβt provide good distribution in the dimensionless space.

TABLE 4: O VERVIEW OF MODEL ACCURACY FOR THE THERMAL CASE STUDY WHEN USING PROPOSED AND CLASSICAL DOE Model using LHS DoE

Model using proposed DoE

Size of the DoE satisfying the constraints/desired size of DoE

4/9

9/9

Maximal relative prediction error at the points of the DoE that served for surrogate construction

1%

4%

Maximal relative prediction error at the points of the validation set

10%

5%

5 Conclusions This paper first introduced the problem of constructing a DoE for applications where the searched model uses dimensionless variables as inputs of the model. It was shown that building a DoE in one domain, (physical or dimensionless) and computing the corresponding DoE in the other domain is often problematic due to the constraints from both spaces. The paper proposed a solution to this problem by formulating the computation of the DoE as an optimization problem and providing an approach for efficiently solving the problem. The proposed method gives: (1) a DoE for the dimensionless space that optimally fills the dimensionless domain and satisfies the constraints from both spaces, and (2) the corresponding DoE in the physical domain which should be used to set up the simulations. The optimality criterion of the distribution used during the optimization is the maximization of the minimal Euclidian distance between any two points of the DoE. In order to assess the distribution of the obtained DoE after the optimization, a quantitative indicator was proposed. This indicator is very useful for cases when the DoE is 3-dimensional and above, when it is difficult or even impossible to visually assess the quality of the distribution. The proposed DoE is particularly relevant for constructing surrogates in terms of dimensionless variables that span over several orders of magnitude (e.g. power laws). Additionally, two declinations of the method were outlined for situations when (1) the logarithmic scale is better suited for the DoE and (2) when the user needs to control the number of levels on one or each axis (factor). The application of the method was illustrated on three examples, a purely numerical and two real world cases. The numerical example served to illustrate the proposed method step-by-step whereas the next two examples highlighted the situations that can be encountered in engineering applications and the results that can be obtained by the proposed method. References [1] VDI, Design methodology for mechatronic systems. DΓΌsseldorf. [2] Catalina T, Virgone J, Blanco E (2008) Development and validation of regression models to predict monthly heating demand for residential buildings. Energy and Buildings 40:1825-1832 [3] Forrester AIJ, SΓ³bester A, Keane AJ (2008) Engineering Design via Surrogate Modelling. Wiley [4] Pereira FC, Antoniou C, Fargas JA, Ben-Akiva M (2014) A Metamodel for Estimating Error Bounds in Real-Time Traffic Prediction Systems. IEEE Transactions on Intelligent Transportation Systems 15(3):1310-1322 [5] Santner TJ, Williams BJ, Notz WI (2013) The design and analysis of computer experiments. Springer Science & Business Media. [6] Benner P, Gugercin S, Willcox K (2015) A survey of projection-based model reduction methods for parametric dynamical systems. SIAM review 57(4):483-531 [7] Chinesta F, Huerta A, Rozza G, Willcox K (2016) Model Order Reduction: a survey. Wiley

[8] Wilhelmus HAS, Henk A v-d V, Joost R (Eds) (2008) Model Order Reduction: Theory, Research Aspects and Applications. Springer, Berlin [9] Fang K-T, Li R, Sudjianto A (2006) Design and Modeling for Computer Experiments. Chapman & Hall/CRC, Boca Raton [10] Jin R, Chen W, Simpson TW (2001) Comparative studies of metamodelling techniques under multiple modelling criteria. Struct Multidisc Optim 23:1-13 [11] Myers RH, Montgomery DC (2002) Response Surface Methodology: Process and Product in Optimization Using Designed Experiments. John Wiley & Sons, New York [12] Queipo NV, Haftka RT, Shyy W, Goel T, Vaidyanathan R, Tucker PK (2005) Surrogate-based analysis and optimization. Progress in aerospace sciences 41(1):1-28 [13] Vignaux GA, Scott JL (1999) Simplifying regression models using dimensional analysis. Austral & New Zealand J Statist 41:31-41 [14] Lacey D, Steele C (2006) The Use of Dimensional Analysis to Augment Design of Experiments for Optimization and Robustification. Journal of Engineering Design 17(1):55β73 [15] Mendez P, Ordonez F (2005) Scaling laws from statistical data and dimensional analysis. Journal of Applied Mechanics 72(5):648-658 [16] Kaufman M, Balabanov V, Grossman B, Mason WH, Watson LT, Haftka RT (1996) Multidisciplinary Optimization via Response Surface Techniques. Proceedings of the 36th Israel Conference on Aerospace Sciences, Omanuth, Haifa, Israel [17] Venter G, Haftka RT, Starnes JH (1998) Construction of response surface approximations for design optimization. AIAA Journal 36(12):2242-2249 [18] Gogu C, Haftka RT, Bapanapalli SK, Sankar BV (2009) Dimensionality Reduction Approach for Response Surface Approximations: Application to Thermal Design. AIAA J 47:1700β1708 [19] Vaschy A (1892) Sur les lois de similitude en physique. Annales tΓ©lΓ©graphiques 19:25-28 [20] Buckingham E (1914) On physically similar systems: illustration of the use of dimensional equations. Phys Rev 4:345β376 [21] Sonin AA (2001) The Physical Basis of Dimensional Analysis. 2nd edition. Massachusetts Institute of Technology, Cambridge, MA [22] Li C-C, Lee Y-C (1990) A statistical procedure for model building in dimensional analysis. Int J Heat Mass Transf 33:1566β1567 [23] Sanchez F, Budinger M, Hazyuk I (2017) Dimensional analysis and surrogate models for the thermal modeling of Multiphysics systems. Applied Thermal Engineering 110:758-771 [24] Budinger M, Passieux J-C, Gogu C, Fraj A (2013) Scaling-law-based metamodels for the sizing of mechatronic systems. Mechatronics 24(7):775-787 [25] KuneΕ‘ J (2012) Similarity and modeling in science and engineering. Springer Science & Business Media [26] Incropera FP, DeWitt DP, Bergman TL, Lavine AS (2007) Fundamentals of Heat and Mass Transfer. John Wiley & Sons [27] Raymer DP (2002) Aircraft design: a conceptual approach. American Institute of Aeronautics and Astronautics, Washington [28] Pahl G, Beits W, Feldhusen J, Grote K-H (2007) Engineering design: a systematic approach. Springer-Verlag, London [29] Box GE, Wilson KB (1951) On the experimental attainment of optimum conditions. Journal of the Royal Statistical Society Series B (Methodological) 13(1):1-45 [30] Box GE, Behnken DW (1960) Some new three level designs for the study of quantitative variables. Technometrics 2(4):455-475 [31] Simpson T, Booker A, Ghosh D, Giunta A, Koch P, Yang R-J (2002) Approximation Methods in Multidisciplinary Analysis and Optimization:A Panel Discussion. in 9th AIAA/ISSMO Symposium on Multidisciplinary Analysis & Optimization, Atlanta [32] Pukelsheim F (1993) Optimal design of experiments. SIAM [33] Peikert R, WΓΌrtz D, Monagan M, de Groot C (1991) Packing circles in a square: a review and new results. In System Modelling and Optimization, Proceedings of the Fifteenth IFIP Conference, September 2-6, Berlin

[34] Hifi M, M'hallah R (2009) A literature review on circle and sphere packing problems: models and methodologies. Advances in Operations Research. doi: 10.1155/2009/150624 [35] Morris MD, Mitchell TJ (1995) Exploratory designs for computational experiments. Journal of statistical planning and inference 43(3):381-402 [36] Shewy M, Wynn H (1987) Maximum Entropy Design. Appl Stat 14(2):165-170 [37] Koehler JR, Owen AB (1996) 9 Computer experiments. Handbook of statistics 13:261-308 [38] Audze P, Eglais V (1977) New approach for planning out of experiments. Problems of Dynamics and Strengths 35:104-107 [39] McKay MD, Beckman RJ, Conover WJ (1979) Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics 21:239-324 [40] Iman RL, Conover WJ (1980) Small sample sensitivity analysis techniques for computer models. with an application to risk assessment. Communications in statistics-theory and methods 9(17):1749-1842 [41] Viana FA (2013) Things you wanted to know about the Latin hypercube design and were afraid to ask. In 10th World Congress on Structural and Multidisciplinary Optimization, Orlando, Florida, USA [42] Jin R, Chen W, Sudjianto A (2005) An efficient algorithm for constructing optimal design of computer experiments. Journal of Statistical Planning and Inference 134(1):268-287 [43] Bates SJ, Sienz J, Toropov VV (2011) Formulation of the optimal Latin hypercube design of experiments using a permutation genetic algorithm. 45th AIAA/ASME/ASCE/AHS/ASC Structures, Structural Dynamics and Materials Conference, Palm Springs, CA [44] Viana FA, Venter G, Balabanov V (2010) An algorithm for fast optimal Latin hypercube design of experiments. International journal for numerical methods in engineering 82(2):135-156 [45] Petelet M, Iooss B, Asserin O, Loredo A (2010) Latin hypercube sampling with inequality constraints. AStA Advances in Statistical Analysis 94(4):325-339 [46] Fuerle F, Sienz J (2011) Formulation of the AudzeβEglais uniform Latin hypercube design of experiments for constrained design spaces. Advances in Engineering Software 42(9):680-689 [47] Hofwing M, StrΓΆmberg N (2010) D-optimality of non-regular design spaces by using a Bayesian modification and a hybrid method. Structural and Multidisciplinary Optimization 42(1):73-88 [48] MyΕ‘Γ‘kovΓ‘ E, LepΕ‘ M, KucerovΓ‘ A (2012) A Method for Maximin Constrained Design of Experiments. In Proceedings of the Eighth International Conference on Engineering Computational Technology. Civil-Comp Press, Stirlingshire, UK [49] Brayton RK, Director SW, Hachtel GD, Vidigal L (1979) A New Algorithm for Statistical Circuit Design Based on Quasi-Newton Methods and Function Splitting. IEEE Trans Circuits and Systems. 26:784-794 [50] Sanchez F, Delbecq S (2016) Surrogate modeling technique for the conceptual and preliminary design of embedded actuation systems and components. ICAS 2016, Seoul, Korea