Critical assessment of models for predicting the Ms temperature of steels.

T. Sourmail*, C. Garcia-Mateo**

Department of Materials Science and Metallurgy, University of Cambridge
Pembroke Street, Cambridge CB2 3QZ, U.K.
* corresponding author, email:
** CENIM, National Center for Metallurgical Research
Av. Gregorio del Amo, 8, 28040 MADRID, Spain
Computational Materials Science, in press
download pdf file


Different approaches to predicting the Ms temperatures of steels are reviewed and discussed with the objective of summarising the main characteristics, advantages and difficulties of each method, mostly from a practical point of view. Empirical methods, and methods based on thermodynamics are then assessed against published data.

Keywords: martensite, thermodynamics, bayesian neural networks, linear regression


The martensite start temperature, Ms, is defined as the highest temperature at austenite transforms to martensite. This transformation is relatively insensitive to prior thermal history during cooling, or to the austenite grain size [1]. It is therefore reasonably easy to predict quantitatively the Ms temperature, at least for a given category of steels. This has long been done using linear regression.

This method, however has limitations, and over the past few years, a number of authors have focussed on creating models of wider applicability. Two categories of techniques have prevailed, those based on thermodymanics [2,3,4,5,6] and others fully empirical [1,7].

This review attempts to summarize the different methods and assess them against published data.

Non adaptative regression

This section presents a brief review of the various attempts made at modelling the compositional dependency of Ms using linear regression or similar methods. We have classified these as non-adaptative because the `shape' of the function is pre-determined by the authors rather than adapted to the data. In contrast, neural network methods, as discussed later, are adaptative functions.

The different equations proposed for Ms have been summarised in table 1.

Table: Different formulae for the estimation of the Ms temperature in steels.
Reference Ms/ K, all compositions in wt%
[8] 772-316.7C-33.3Mn-11.1Si-27.8Cr-16.7Ni-11.1Mo-11.1W
[9] 811-361C-38.9Mn-38.9Cr-19.4Ni-27.8Mo
[10] 772-300C-33.3Mn-11.1Si-22.2Cr-16.7Ni-11.1Mo
[11] 834.2-473.9C-33Mn-16.7Cr-16.7Ni-21.2Mo
[12] 812-423C-30.4Mn-12.1Cr-17.7Ni-7.5Mo
[12] 785-453C-16.9Ni-15Cr-9.5Mo+217(C)2-71.5(C)(Mn)-67.6(C)(Cr)

The small modification proposed by Kung and Rayment [13] has not been presented. These authors added additional terms (+10Co-7.5Si) in formulae where these elements were not present.

Regardless of the exact formulae used, these approaches, which are still in use [14,7], usually have a limited range of applicability. As pointed out by Andrews [12], `these formulae are likely to depend on the range of variation of alloy elements'.

Their existence was justified at a time when computing power was limited. However, more rigourous data analysis methods no longer suffer from these limitations. It was decided therefore not to assess the existing formulae, such an assessment can be found in the literature [13,1].

Predicting Ms from thermodynamics


The development of a thermodynamic framework to describe the nucleation of martensite [15] laid the fundations for thermodynamic models predicting the Ms temperature from the composition of the steel. In an early practical implementation, Bhadeshia [16,17] estimated the driving force for martensite formation, at Ms, in plain carbon steels. This resulted in a function of the carbon content describing the Gibbs energy that must be available to form martensite (further referred to as the critical driving force $\Delta G_{c}$). This function was in turn applied to predict the Ms temperature of low alloy steels with satisfying agreement.

In order to obtain a model of wider applicability, Ghosh and Olson proposed a model to describe the composition dependency of the critical driving force [2,3,5,6], including the effect of both interstitial and substitutional solutes. In this model, martensite transformation occurs when embryos of martensite, which are defects bounded by interfacial dislocations, can grow against the lattice friction experienced by these dislocations. The fault energy of the martensite embryos is mainly dependent on the driving force $\Delta G^{\gamma\rightarrow\alpha}=G_{\alpha}-G_{\gamma}$, and must exceed the interfacial frictional work for nucleation to occur. The compositional dependency occurs because solid solution hardening affects this frictional work. Ghosh and Olson therefore modelled the critical driving force for nucleation as a function of composition, as follows:

-\Delta G_{c}=K_{1}+W_{\mu}(X_{i})+W_{\mathrm{th}}(X_{i},T)
\end{displaymath} (1)

where $\Delta G_{c}$ is the critical driving force for nucleation, K1 a constant, $W_{\mu}$ and Wth the athermal and thermal components of the frictional work, the latter being negligible for temperature greater than $\sim 300$ K. Both are functions of the composition (Xi indicates the mole fraction of element i) and possibly temperature. The model is defined by fitting the athermal and thermal components of the frictional work to simple functions of the composition. This process is illustrated in figure 1.

Figure: Schematic illustration of the method used by Ghosh and Olson. In a first time, experimental data are collected for steels of known compositions. Using thermodynamic databases such as SGTE SSOL (as in [2]), $\Delta G^{\gamma\rightarrow\alpha}$ is estimated for this composition, at the Ms temperature where this driving force is equal to the critical driving force. Once sufficient data have been collected, a function can be fitted to describe the composition dependency of $\Delta
G_{c}$. In principle, the method only requires examination of binary systems (or in cases of experimental difficulties with the binary systems, ternary ones), but the establishement of the superposition law sometimes require data for multicomponent alloys. To predict the Ms temperature for a steel of known composition the critical driving force is estimated from the composition using the empirical formula previously derived. Using the same thermodynamic database as in the first step, a computer program can search for the temperature providing $\Delta G^{\gamma\rightarrow\alpha}$=$\Delta G_{c}$. This is the Ms temperature.

In binary steels, and in the simple case where the thermal component is neglected, the composition dependency of $\Delta G_{c}$ is expressed by an empirical formula of the kind:

-\Delta G_{c}\ ({\mathrm{J/mol}})=K_{1}+K_{\mathrm{C}{}}X_{\mathrm{C}{}}^{1/2}
\end{displaymath} (2)

where the K are constants, and XC is, in this example, the mole fraction of carbon. This can be physically justified as the solid solution strengthening effect is traditionaly believed to depend on the square root of the concentration. Ghosh and Olson [2,3] have obtained values Ki for a variety of binary systems Fe-X or Fe-Ni-X when stability problems occurred. The extension to multicomponent alloys is made by adopting a superposition law to describe the combined effect of the different solutes on $\Delta G_{c}$.

The `pythagorean' superposition law chosen by Ghosh and Olson is written:

-\Delta G_{c}\ ({\mathrm{J/mol}}) =K_{1}+
\sum_{i} \l...
\sum_{j} \left( K_{j}X_{j}^{1/2} \right)^{2}
\end{displaymath} (3)

where the elements in the first and second sums might be made up by two different subsets of all the elements. In principle the establishement of these subsets can be made purely on the basis of the different values of Ki (elements of similar effect fall in the same groups), which implies that the method does not require knowledge of Ms for multicomponent systems. However, in the first model [2,3], examination of Fe-Cr-C alloys led to a modification of the superposition law for this system. Similarly, in their improved model [5,6], the subsets established for the superposition law are difficult to justify a priori, and the best justification for the author's choice lies in the success of predicting the Ms of multicomponent alloys. This method therefore still relies, although to a small extent, on the knowledge of Ms for multicomponent alloys.

The functioning of the model in making predictions is illustrated in figure 1. For simplicity, the case where $\Delta G_{c}$ is independent of the temperature (as in [2,3]) is illustrated. There is no additional difficulty for the case where $\Delta G_{c}$ is a function of temperature (as in [5,6]).

Ghosh and Olson [2,3] provided the parameters for a number of elements (C, N, Mn, Si, Cr, Nb, V, Ti, Mo, Cu, W, Al, Ni, Co). These were derived using data on binary alloys, and the superposition law described above (equation 3) was validated on a few multicomponent steels.

Later, Cool and Bhadeshia [4] argued that the parameters published by Ghosh and Olson were not suitable for predicting the Ms temperature of ferritic power plant steels. These typically contain up to 12 additions, with some elements in relatively large concentration. However, upon further examination, it appears that these authors used a linear superposition law:

-\Delta G_{c}\ ({\mathrm{J/mol}})=K_{1}+\sum_{i} K_{i}X_{i}^{1/2}
\end{displaymath} (4)

with the parameters published in [2] for a superposition as described in equation 3. As illustrated in figure 2, when the correct superposition law is used, Ghosh and Olson's model gives the more satisfactory agreement.

Figure: Comparison between predicted and measured Ms using the data published in [18,19]. Left, using the corrected parameters proposed in [4], with the superposition law as in 3; right, the original parameters from Ghosh and Olson [2,3], same superposition law.
\begin{figure}\centering\begin{tabular}{c c}

Given that the error has been propagated in the literature, it seems worth indicating that the formula:

$\displaystyle -\Delta G_{c}\ ({\mathrm{J/mol}})$ = K1 + 4009XC1/2 + 1879XSi1/2 + 172XNi1/2 + 1418XMo1/2 + 1868 XCr1/2  
    + 1618 XV1/2 + 752 XW1/2 + 1653 XNb1/2 + 3097 XN1/2 -352 XCo1/2 (5)

reproduced as such in a number of publications [7,20], is wrongly attributed to Ghosh and Olson [4], who in fact proposed:
$\displaystyle -\Delta G_{c}\ ({\mathrm{J/mol}})$ = $\displaystyle K_{1} + \sqrt{(4009X_{\mathrm{C}}^{1/2})^2 + (3097
X_{\mathrm{N}}^{1/2})^2} + \sqrt{S} -352 X_{\mathrm{Co}}^{1/2}$  
S = (1879Xs1/2)2 + (172XNi1/2)2 + (1418XMo1/2)2 +  
    (1879Xs1/2)2 + (172XNi1/2)2 + (1418XMo1/2)2  
    (1868XCr1/2)2 + (1618 XV1/2)2 + (752XW1/2)2 + (1653XNb1/2)2 (6)

More recently, Ghosh and Olson have refined this approach by taking into account the composition and temperature dependency of the austenite shear modulus and expressing the different components of the critical driving force as function of this modulus [5,6]. This does not modify the fundamental features of the method and will therefore not be discussed in details.

Advantages, difficulties and limitations

This approach allows a much wider range of applicability than linear regression. Furthermore, the physical basis suggests that it should extrapolate relatively safely unless the mechanisms taken into account change significantly with composition, or the empirical thermodynamic data behave badly in extrapolation. It also allows separation of the effect on alloying additions on phase stability and their influence on the frictional work.

In this model, $\Delta G_{c}$ is model-dependent in the sense that is implicitely linked with the thermodynamic database that has been used during the derivation of the function to express its compositional dependency. This becomes a problem if different databases are used in deriving the criterion and in making predictions (or more exactly, if the different databases describe similar systems differently). With the increasing number of thermodynamics databases available, this problem cannot be neglected. In addition, the accuracy of the model may be limited by that of the underlying thermodynamic database.

In their first model [2,3], Ghosh and Olson used the SGTE SSOL database [21] to derive the expression of $\Delta G_{c}$, but modified parameters for a number of systems. Unfortunatly, the details of these modifications are left unpublished. It appears from [5] that parameters for Fe-Ni and Fe-Ni-C systems were significantly changed. Because these parameters were not published, we used the standard SGTE SSOL database in our evaluation. Not surprisingly, this resulted in very poor predictions on the high Ni alloys (section 5.3).

The model also explicitly limits itself to solid solutions, implying that the influence of precipitates or grain size cannot be accounted for. This does not exclude, of course, accounting for solute depletion due to precipitation by performing a prior equilibrium calculation and using the austenite composition rather than the bulk one as an input (as done by Ghosh and Olson in [5]), but this method can as easily be used with, for example, linear regression and is not an integral part of the model. Although the effects of precipitates and grain size are not expected to be large, an important correlated problem is that of solute depletion by precipitates. For example, most data for the influence of vanadium have been derived using `pure' samples, with very low carbon content. In commercial steels, a significant amount of vanadium (or Nb, Ti) will have precipitated during the austenitisation, therefore leaving an austenite of lower carbon and vanadium content than that of the bulk. As discussed later, this was also clearly visible in the assessment of the model.

Finally, making predictions requires access to expensive thermodynamic calculation software and databases.

Neural network modelling


Neural networks, in the present context, essentially refer to non-linear multiple regression tools using adaptative functions. The following section will not detail the technique (see for example [22,23,24]), but presents the fundamental differences between these methods and empirical methods such as those introduced in the first section. The typical structure of a neural network is presented in figure 3.

Figure: The typical structure of a neural network as used for non-linear multiple regressions. The first layer is made up by the inputs (1,.., xi), the second by so-called `hidden units' and the last one is the output.

The hidden-units (the second-layer in figure 3) take as input a weighted sum of the inputs and return its hyperbolic tangent:

z_{j}=\tanh{\sum_{i} w_{ji}x_{i}}
\end{displaymath} (7)

The third-layer combines these outputs using a linear superposition:
y=\sum_{j} \omega_{j} z_{j}
\end{displaymath} (8)

where the wij and $\omega_{j}$ are often referred to as the weights defining the network. `Training' the network implies identifying an optimal set of weights, given some data for which the output is known. This is similar in principle to identifying the slope and intercept of the best fit line in a linear regression.

The fundamental difference between this type of regression and methods introduced earlier is that neural networks correspond to adaptative functions. In traditional methods, the author fixes the form of the equation (for example, a second degree polynomial), and identifies the parameters that lead to optimal fitting of the observed data. Even in the few cases where the authors take the trouble to assess more than one function (for example, to determine whether a second or third degree polynomial is most appropriate), the extent to which the function is adapted to the data is very limited.

With neural networks however, the complexity of the function is mainly controlled by the weights themselves, so that the optimisation includes a determination of the most suitable shape for the function. This flexibility is not without a drawback: overfitting is the cause of most problems in neural network modelling. Overfitting occurs when an overly complex function is chosen, so that the noise, rather than the trend in the data, is fitted by the function. One method widely applied to limit overfitting is to perform the optimisation on only one part of the data, then use the second part to determine which level of complexity best fits the data. This is illustrated in figure 4.

Figure: The problem of overfitting with neural networks can be avoided if only part of the data is used to optimise the network (here the filled circle). At this stage, the best solution appears as that which goes through all the filled circles. When using the second part of the dataset (crosses), it becomes obvious, however, that this solution is strongly overfitted the real trend and the real trend is better captured by a simpler model.

Bayesian/classical framework

There are two ways to understand regression, whether in the context of neural networks or of linear, polynomial, etc. regression.

The first and still most often encountered consists in defining an error function and minimising it by adjusting the parameters. We will refer to this method as the classical method.

Bayesian probabilities offer a far more interesting approach by which the final model not only encompasses the knowledge present in the data, but also an estimation of the uncertainty on this knowledge.

Rather than identifying optimum parameters, an optimum probability distribution of parameters values is fitted to the data. In regions of space where data are sparce, this distribution will be wide, indicating that a number of solutions could fit the problem with similar probabilities. If a large amount of data is available, this distribution will be narrow indicating that one shape of function is significantly more probable than any other.

Because it can be quantified, the uncertainty on the determination of the network parameters can be translated into an uncertainty on the prediction. This is illustrated in figure 5.

Figure: Illustration of the possibilities offered by Bayesian neural networks: the prediction can be accompanied by an error bar related to the uncertainty of fitting. When data are sparse, the uncertainty of fitting is larger than in region with sufficient data.

Whether for linear regression or neural network, a bayesian approach should always be preferred because it allows predictions to be accompanied by an indication on the uncertainty. Because of the flexibility of the method, this is particularly important in neural network modelling.


Using the classical method Vermeulen et al. [1] built a neural network model for the Ms temperature of steels in the range of composition given in table 2. More recently, Capdevilla et al. [7] built a network using the bayesian approach, on a much wider range of compositions. This model being built on a superset of the database used to train the model by Vermeulen et al., the assessment to follow did not include the latter.

Table: Range of the database used for the model created by Vermeulen et al. and Capdevilla et al.. All compositions in wt%.
  Vermeulen et al. [1] Capdevilla et al. [7]
Elt. Min. Max. Min. Max.
C 0.05 0.7 0 1.62
Si 0.20 0.25 0 3.40
Mn 0.08 2.0 0 3.76
Cr 0 1.40 0 17.98
Mo 0 0.75 0 5.10
Ni 0 0.25 0 27.20
V 0 0.25 0 4.55
Co     0 30.00
Al     0 1.10
W     0 13.00
Cu     0 0.98
Nb     0 0.23
Ti     0 0.18
B     0 0.006
N     0 0.06

Although the predictions made in [1] are accompanied by error bars, these correspond to the average error of the model over the entire training database, which can be interpreted as a level of noise but do not carry any indication as to the uncertainty of the predictions.

Advantages, difficulties and limitations

Neural network modelling is not always perceived as a satisfactory method because of its purely empirical nature.

However, even in the thermodynamically based approaches, empirical equations lie at the heart of the method. The form of the function used by Ghosh and Olson to represent $\Delta G_{c}$ (equation 2) was adopted on physical bases, as the solid solution strengthening effect scales with the square root of concentration. However, the final superposition laws (in particular the grouping of elements in subsets) adopted in both [2] and [5] is not strictly derived from considerations of the solution strengthening effect and is thus best justified by later validation than a priori. That is to say, it is at least partly empirical. That there is no necessary link between the relative strength of elements and their grouping in subsets is made clear by the fact that different subsets were used in reference [2] and [5]. In both studies however, the relative effects of different elements are similar.

There is therefore no obvious reason to trust extrapolations using such models more than any other empirical method. Furthermore, this approach relies heavily on the CALPHAD [25] method to estimate the thermodynamic properties of complex systems. In the CALPHAD method, the extension of simple thermodynamic models (for example, regular solutions) to multicomponent systems and more complex behaviours is mostly empirical in nature, and there is once again no reason to trust their ability to extrapolate well. Interestingly, Stan et al. recently proposed to improve on the CALPHAD limitation by using a bayesian framework [26].

The flexibility of the neural networks avoids the use of a pre-determined type of function. If a Bayesian approach is used, the technique offers the unique advantage that the level of certainty can be assessed by the user without the need to know all the details of the model derivation. To assess the validity of a prediction made using thermodynamic models, one must not only be aware of the limits of the composition range of the data used in deriving the $\Delta G_{c}$ function, but also of those of the thermodynamic database used to link temperatures and driving forces.

Finally, these models are available as self-contained programs which are freely distributed on the world wide web [27,28].

Assessment against published data

Implementation of the thermodynamic models

To assess the models, a computer program was interfaced with the thermodynamic calculation software MTDATA [29]. Both the original method of Ghosh and Olson [2,3] and the revised method [5,6] were implemented. As mentioned earlier, Ghosh and Olson relied, in the first case, on the SGTE SSOL database, slightly modified for Fe-Ni and Fe-Ni-C to derive the function $\Delta G_{c}$.

In the second case however, significant modifications were made, resulting in a separate thermodynamic database that the authors namde kMART. Unfortunatly, this database not being available, the SGTE SSOL database was used in both cases. When using the more recent model [5,6], agreement was significantly worse than with the earlier one. Clearly, this disagreement is not related to the quality of the model but to the fact that different thermodynamic databases were used.

The driving force for martensite formation can be simply calculated as $G_{\alpha}-G_{\gamma}$. However, for low temperature or high carbon steels, it is important to account for the ordering of carbon in martensite [30,31]. The driving force for ordering can be calculated following Fisher [32]:

\Delta G_{z} ({\mathrm{J/mol}})=
2.127\times10^{5}\,y_{C}^{2}\,z^{2} +
2.77\, y_{C}\, T\, \phi
\end{displaymath} (9)

where $\phi= \left[ 2(1-z)\ln{(1-z)} + (1+2z) \ln{(1+2z)} \right]$, z is Zener's order parameter and yc is the fraction of interstitial sites occupied by carbon, given by NC/NS if NS is the sum of the mole fractions of all the subtitutional elements. The value of the ordering parameter itself depends on the ratio T/Tc where Tc is the critical temperature at which ordering takes place and is estimated by $28080\, y_{c}$ [32]. The tabulated values of z as a function of Tc/T provided by Fisher were used. Ordering only influences significantly the transformation at low temperatures and high carbon content.

Agreement between experimental and predicted Ms was systematicaly better using the original method by Ghosh and Olson [2,3] and accounting for ordering. In the following, the Ms predicted using the thermodynamic model therefore refers to results obtained with this particular method.


Two databases were used for the assessment. The first one, referred to as database A, contains all the data used by Capdevilla et al. for production of the model described in [7].

A second database (further referred to as database B) was built by the present authors using published data [2,33,34,35,36,37,38,39,40,41,11,42,9,43,8,44] which appeared not to have been used in either the work of Capdevilla et al., but have been used by Ghosh and Olson to derive $\Delta G_{c}$.


Predictions were made on database A and B using both the neural network and the thermodynamic methods. To estimate the overall performance, the average of the absolute values of the errors (further denoted $\overline{\varepsilon}$) was used, together with the standard deviation ($\sigma_{err}$).

Using database A

When comparing predictions and experimental values in database A ( figure 6), the neural network performed significantly better with $\overline{\varepsilon}$=25 ( $\sigma_{err}=34$), while the thermodynamics method gave $\overline{\varepsilon}$=37 ( $\sigma_{err}=70$).

Figure: The performance of (I) the model by Capdevilla et al. [7], and (II) the model by Ghosh and Olson [2] on database A. Predictions made following (II) are shown as zero when the driving force never exceeds the critical driving force.
\begin{figure}\centering\begin{tabular}{c c}

In both cases, the datapoints which gave prediction more than 20% of the database values (`outliers') were investigated. The composition giving the worst prediction (measured 400 K, predicted 800 K) was particularly worrying as it was, in the case of the neural network model, not accompanied by a large error bar. However, this point turned out to be a mistake in the database (Fe-0.04C-0.08Mn wt% giving a Ms temperature of 400 K) so that the predicted value was actually correct. Few other points were wrongly predicted by more than 20 % by the neural network model, as reported separately [45], a number of them were found to be erroneous entries in the database.

The limitations of the thermodynamic model became obvious upon examination of the `outliers'. As for the neural network model, the Fe-0.04C-0.08Mn steel appears wrongly predicted because of a mistake in the database. All of the high-Ni (Fe-Ni with more than 20 wt% Ni) steels were outliers. As explained earlier, this was expected because the database used to derive the critical driving force was a modified version of the SGTE SSOL, while the unmodified SGTE SSOL had to be used in this study.

Other outliers included most of the steels with vanadium, niobium or titanium additions and/or significant amounts of carbon, a few examples of which are reproduced in table 3.

Table: A selection of compositions giving large errors when using the model proposed by Ghosh and Olson. The first prediction is made using the bulk composition, the second using the austenite composition having allowed for carbide formation.
Composition / wt% Ms / K
C Mn Si Cr Ni Mo V Co
Al W Cu Nb Ti B N Ms Predicted(1) Predicted(2)
1.62 0.4 0.48 12.44 0 0.8 0.83 0
0 0 0 0 0 0 0 498 109 405
1.42 2.16 1.62 2.57 5.35 1.29 0.7 0
0.08 8.88 0.98 0.23 0.18 0.01 0.05 769 10.5 -
1.42 0.43 0.38 4.42 0 0.7 4.55 4.97
0 12.99 0 0 0 0 0 513 216 568
0.95 0.24 0.28 4.64 0 4.8 2.45 0
0 7.12 0 0 0 0 0 473 375 -

This is again not surprising given that the model explicitly limits itself to solid solutions, while most of these steels will have carbides or nitrides remaining after austenitising.

To verify whether a better prediction could be obtained, the equilibrium constitution of these outliers was estimated using MT-DATA [46], and the SGTE SSOL and substances databases. Phases allowed were cementite, mixed carbides (M23C6, M6C, etc), tungsten carbide, niobium carbide, titanium carbide and vanadium carbide. The austenitisation temperature was taken as 1373 K (the value was not provided in the sources). As can be seen from the examples in table 3, this sometimes improved the predictions, but not systematically. One reason might be that, although the solute content of the austenite should be more realistic after this procedure, the model does not account for precipitates which may also have an effect.

In some cases, no temperature could be found that satisfied the thermodynamic criterion when the new compositions were used. This is probably due to the poor assessment in the SGTE SSOL databases of the effect of either high Ni content or high W contents sometimes present.

While this assessment outlines clearly the limitations of the thermodynamic model, it does not represent a good test of the validity of the neural network model by Capdevilla et al. since database A was used to train this model.

Using database B

Results obtained using the database created by the present authors are presented in figure 7. A global comparison gives $\overline{\varepsilon}$=210 ( $\sigma_{err}=501$) for the neural network and $\overline{\varepsilon}$=116 ( $\sigma_{err}=156$) for the thermodynamic model. This is essentially caused by a few `wild' predictions from the neural network model, whose output is not bounded, while the thermodynamic model is by design limited to errors of 1000 K which is the width of the interval in which the program searches for $\Delta G^{\gamma\rightarrow\alpha}$=$\Delta G_{c}$. These `wild' predictions were accompanied by very large error bars and therefore should not be considered as 'dangerous'.

Figure: Comparison between predictions and experimental values using (I) and (II) the neural network model due to Capdevilla et al. [7], and (III) the thermodynamic approach of Ghosh and Olson [2,3], using database B. Neural network predictions which were accompanied by an error bar larger than 200 K are plotted separately for clarity (II).
\begin{figure}\centering\begin{tabular}{c c c}

To incorporate the existence of the error bars in the comparison, the predictions made with the neural network software were divided in 2 subsets, depending on whether the error bar accompanying the prediction was smaller (subset I) or greater (subset II) than $\pm 200$ K. It must be emphasized here that this does not involve a comparison with the experimental data as yet, only the use of the uncertainty estimation described in section 4.2.

Using subset I and the neural network, a value of $\overline{\varepsilon}$=40 ( $\sigma_{err}=39$) was obtained; on the same subset the thermodynamic method gives $\overline{\varepsilon}$=102 ( $\sigma_{err}=151$). On subset II, the neural network gave $\overline{\varepsilon}$=1055 ( $\sigma_{err}=800$), while, for the thermodynamic method, we obtained $\overline{\varepsilon}$=187( $\sigma_{err}=160$). Interestingly, the thermodynamic model also performed significantly worse on subset II. In the context of predictions, the behaviour of the bayesian neural network is clearly the most appropriate, as both methods wrongly predict a number of points, but only the former is able to warn the user on the reliability of the prediction.

As for database A, the entries which lead to relative error of more than 20% were further examined. The outliers were similar for both models, being the high-Ni steels (20 wt% and more), and the high nitrogen steels. It has been explained earlier that the thermodynamic model is expected to perform poorly on high-Ni steels. The neural network was trained on a database including a few Fe-30Ni-C alloys, but no other data. Consequently, new data for Fe-Ni-C alloys were well predicted, but data on Fe-Ni-X alloys (where X is Mo, V, etc. but not carbon) were not.

Similarly, the neural network failed to predict correctly the Ms of high-nitrogen steels, because the database used for training only contained amounts significantly below the solubility limit. The data introduced in database B included nitrogen contents in large excess of this point. The neural network method allows, in principle, to model influences which depends on the actual value of the input parameter, so that re-training the model on the low and high nitrogen contents should improve it significantly. The thermodynamic model is based on the assumption that elements are in solid-solution so that the failure to predict correctly the Ms of high-nitrogen steels using the bulk composition is not surprising.


A fully empirical method was compared to the thermodynamic approach to estimate the Ms temperature of steels.

The thermodynamic method provides satifying results, as long as it is used within boundaries compatible with the fundamental assumptions upon which it was built. That is to say, one must be careful not to attempt calculations where additions are beyond the solubility limit.

This method is particularly interesting as it allows to treat separately the influence of alloying elements on phase stability and their effect on the propagation of the semi-coherent interface. However, as explained earlier, the link between the solid solution strengthening effect of elements and that on the martensitic nucleation is not strongly supported by the analysis of Ghosh and Olson.

Although the fully empirical approach method does not allow the separation of different roles of alloying additions, it is able to incorporate any effect these might have, whether constant or depending on their own concentration, as long as the knowledge is somehow present in the database.

The neural network method was found to perform at least equally as well as the thermodynamic approach (on database B, in as much as predicting -2000 K or 0 K for an actual Ms of 400 K is equally useless), but nevertheless a number of improvement could be proposed and the authors have trained a new model whose performance was significantly improved [45].

Because of the risk of wild predictions, neural network methods should not be relied on unless a bayesian framework is used.


All the software and databases used in this study are available on the world-wide-web. Neural network calculations can also be made online at


The authors are grateful to Pr Fray for provision of laboratory facilities, and Pr Bhadeshia for helpful discussion, to NPL for provision of MTDATA and Neuromat for provision of the Model Manager.


W. G. Vermeulen, P. F. Morris, A. P. de Weijer, and S. van der Zwaag.
Ironmaking and Steelmaking, 23:433-437, 1996.

G. Ghosh and G. B. Olson.
Acta Mat., 42:3361-3370, 1994.

G. Ghosh and G. B. Olson.
Acta Mat., 42:3371-3379, 1994.

T. Cool and H. K. D. H. Bhadeshia.
Mat. Sci. Techn., 12:40-44, 1996.

G. Ghosh and G. B. Olson.
J. Phase Eq., 22(3):199-207, 2001.

G. Ghosh and G. B. Olson.
Acta Mat., 50:2655-2675, 2002.

C. Capdevilla, F. G. caballero, and C. Garcìa de Andrés.
I.S.I.J., 42:894-902, 2002.

P. Payson and C. H. Savage.
Trans. ASM, 33:261-281, 1944.

R. A. Grange and H. M. Stewart.
Trans. AIME, 167:467-494, 1945.

A. E. Nehrenberg.
Trans. AIME, 167:494-501, 1945.

W. Steven and A. G. Haynes.
JISI, 183:349-359, 1956.

K. W. Andrews.
JISI, 203:721-727, 1965.

C. Y. Kung and J. J. Rayment.
Metall. Trans. A, 13:328-331, 1982.

J. Wang, P. J. van der Wolk, and S. van der Zwaag.
ISIJ Int., 39:1038-1046, 1999.

L. Kaufman and M. Cohen.
Prog. Metal Sci., 7:165-246, 1965.

H. K. D. H. Bhadeshia.
Metal Sci., 15:175-177, 1981.

H. K. D. H. Bhadeshia.
Metal Sci., 15:178-180, 1981.

T. Cool., 1996.

T. Cool and H. K. D. H. Bhadeshia ., 1996.

C. Capdevilla, F. G. caballero, and C. Garcìa de Andrés.
Mater. Sci. Techn., 19:581-586, 2003.

Scientific Group Thermodata Europe., 1983.

D. J. C. Mackay.
Information Theory, Inference, and Learning Algorithms.
Cambridge University Press, Cambridge, 2003.

H. K. D. H. Bhadeshia.
ISIJ Int., 39:966-979, 1999.

T. Sourmail, H. K. D. H. Bhadeshia, and D. J. C. Mackay.
Mater. Sci. Techn., 18:655-663, 2002.

Saunders N. and Miodownik A. P.
Calphad, Calculation of Phase Diagrams, A Comprehensive Guide.
Pergamond Press, Oxford, 1998.

M. Stan and B. J. Reardon.
CALPHAD, 27:319-323, 2003.

P. van der Wolk., 1996.

C. Capdevila., 1996.

National Physical Laboratory, Teddington, Middlesex, U.K., 1989.

C. Zener.
Trans. AIME, 167:513, 1946.

C. Zener.
Trans. AIME, 167:550, 1946.

J. C. Fisher.
Metals Trans., 185:688-690, 1949.

A. B. Greninger.
Trans. ASM, 30:1-26, 1942.

T. G. Digges.
Trans. ASM, 28:575-600, 1940.

T. Bell and W. S. Owen.
JISI, 205:1777-1786, 1967.

K. Ishida and T. Nishizawa.
Trans. JIM, 15:218-224, 1974.

M. Oka and H. Okamoto.
Metall. Trans. A, 19:447-452, 1988.

J. S. Pascover and S. V. Radcliffe.
Trans. AIME, 242:673-682, 1968.

R. B. G. Yeo.
Trans AIME, 227:884-890, 1963.

A. S. Sastri and D. R. F. West.
JISI, 203:138-145, 1965.

U. R. Lenel and B. R. Knott.
Metal. Trans. A, 18:767-775, 1987.

R. H. Goodenow and R. F. Heheman.
Trans. AIME, 233:1777-1786, 1965.

M. M. Rao and P. G. Winchell.
Trans. AIME, 239:956-960, 1967.

E. S. Rowland and S. R. Lyle.
Trans. ASM, 37:27-47, 1946.

T. Sourmail and C. Garcia-Mateo.
Comp. Mater. Sci.
this volume.

National Physical Laboratory, Teddington, Middlesex, U.K., 1989.

About this document ...

This document was generated using the LaTeX2HTML translator Version 2002 (1.62)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 1 -title 'Critical assessment of models for predicting the Ms temperature of steels.' -white -noparbox_images -math_parsing -notop_navigation -nonavigation -noreuse -dir ./ manuscript.tex

The translation was initiated by on 2005-01-01