Keywords: martensite, thermodynamics, bayesian neural networks, linear regression
This method, however has limitations, and over the past few years, a number of authors have focussed on creating models of wider applicability. Two categories of techniques have prevailed, those based on thermodymanics [2,3,4,5,6] and others fully empirical [1,7].
This review attempts to summarize the different methods and assess them against published data.
This section presents a brief review of the various attempts made at modelling the compositional dependency of M_{s} using linear regression or similar methods. We have classified these as nonadaptative because the `shape' of the function is predetermined by the authors rather than adapted to the data. In contrast, neural network methods, as discussed later, are adaptative functions.
The different equations proposed for M_{s} have been summarised in table
1.

Regardless of the exact formulae used, these approaches, which are still in use [14,7], usually have a limited range of applicability. As pointed out by Andrews [12], `these formulae are likely to depend on the range of variation of alloy elements'.
Their existence was justified at a time when computing power was limited. However, more rigourous data analysis methods no longer suffer from these limitations. It was decided therefore not to assess the existing formulae, such an assessment can be found in the literature [13,1].
In order to obtain a model of wider applicability, Ghosh and Olson proposed a model to describe the composition dependency of the critical driving force [2,3,5,6], including the effect of both interstitial and substitutional solutes. In this model, martensite transformation occurs when embryos of martensite, which are defects bounded by interfacial dislocations, can grow against the lattice friction experienced by these dislocations. The fault energy of the martensite embryos is mainly dependent on the driving force , and must exceed the interfacial frictional work for nucleation to occur. The compositional dependency occurs because solid solution hardening affects this frictional work. Ghosh and Olson therefore modelled the critical driving force for nucleation as a function of composition, as follows:
In binary steels, and in the simple case where the thermal component is
neglected, the composition dependency of is expressed by
an empirical formula of the kind:
The `pythagorean' superposition law chosen by Ghosh and Olson is written:
The functioning of the model in making predictions is illustrated in figure 1. For simplicity, the case where is independent of the temperature (as in [2,3]) is illustrated. There is no additional difficulty for the case where is a function of temperature (as in [5,6]).
Ghosh and Olson [2,3] provided the parameters for a number of elements (C, N, Mn, Si, Cr, Nb, V, Ti, Mo, Cu, W, Al, Ni, Co). These were derived using data on binary alloys, and the superposition law described above (equation 3) was validated on a few multicomponent steels.
Later, Cool and Bhadeshia [4] argued that the parameters
published by Ghosh and Olson were not suitable for predicting the M_{s}
temperature of ferritic power plant steels. These typically contain up to 12
additions, with some elements in relatively large concentration. However, upon
further examination, it appears that these authors used a linear superposition
law:
Given that the error has been propagated in the literature, it seems worth indicating
that the formula:
=  K_{1} + 4009X_{C}^{1/2} + 1879X_{Si}^{1/2} + 172X_{Ni}^{1/2} + 1418X_{Mo}^{1/2} + 1868 X_{Cr}^{1/2}  
+ 1618 X_{V}^{1/2} + 752 X_{W}^{1/2} + 1653 X_{Nb}^{1/2} + 3097 X_{N}^{1/2} 352 X_{Co}^{1/2}  (5) 
=  
S  =  (1879X_{s}^{1/2})^{2} + (172X_{Ni}^{1/2})^{2} + (1418X_{Mo}^{1/2})^{2} +  
(1879X_{s}^{1/2})^{2} + (172X_{Ni}^{1/2})^{2} + (1418X_{Mo}^{1/2})^{2}  
(1868X_{Cr}^{1/2})^{2} + (1618 X_{V}^{1/2})^{2} + (752X_{W}^{1/2})^{2} + (1653X_{Nb}^{1/2})^{2}  (6) 
More recently, Ghosh and Olson have refined this approach by taking into account the composition and temperature dependency of the austenite shear modulus and expressing the different components of the critical driving force as function of this modulus [5,6]. This does not modify the fundamental features of the method and will therefore not be discussed in details.
In this model, is modeldependent in the sense that is implicitely linked with the thermodynamic database that has been used during the derivation of the function to express its compositional dependency. This becomes a problem if different databases are used in deriving the criterion and in making predictions (or more exactly, if the different databases describe similar systems differently). With the increasing number of thermodynamics databases available, this problem cannot be neglected. In addition, the accuracy of the model may be limited by that of the underlying thermodynamic database.
In their first model [2,3], Ghosh and Olson used the SGTE SSOL database [21] to derive the expression of , but modified parameters for a number of systems. Unfortunatly, the details of these modifications are left unpublished. It appears from [5] that parameters for FeNi and FeNiC systems were significantly changed. Because these parameters were not published, we used the standard SGTE SSOL database in our evaluation. Not surprisingly, this resulted in very poor predictions on the high Ni alloys (section 5.3).
The model also explicitly limits itself to solid solutions, implying that the influence of precipitates or grain size cannot be accounted for. This does not exclude, of course, accounting for solute depletion due to precipitation by performing a prior equilibrium calculation and using the austenite composition rather than the bulk one as an input (as done by Ghosh and Olson in [5]), but this method can as easily be used with, for example, linear regression and is not an integral part of the model. Although the effects of precipitates and grain size are not expected to be large, an important correlated problem is that of solute depletion by precipitates. For example, most data for the influence of vanadium have been derived using `pure' samples, with very low carbon content. In commercial steels, a significant amount of vanadium (or Nb, Ti) will have precipitated during the austenitisation, therefore leaving an austenite of lower carbon and vanadium content than that of the bulk. As discussed later, this was also clearly visible in the assessment of the model.
Finally, making predictions requires access to expensive thermodynamic calculation software and databases.
Neural networks, in the present context, essentially refer to nonlinear multiple regression tools using adaptative functions. The following section will not detail the technique (see for example [22,23,24]), but presents the fundamental differences between these methods and empirical methods such as those introduced in the first section. The typical structure of a neural network is presented in figure 3.
The hiddenunits (the secondlayer in figure 3) take as
input a weighted sum of the inputs and return its hyperbolic tangent:
The fundamental difference between this type of regression and methods introduced earlier is that neural networks correspond to adaptative functions. In traditional methods, the author fixes the form of the equation (for example, a second degree polynomial), and identifies the parameters that lead to optimal fitting of the observed data. Even in the few cases where the authors take the trouble to assess more than one function (for example, to determine whether a second or third degree polynomial is most appropriate), the extent to which the function is adapted to the data is very limited.
With neural networks however, the complexity of the function is mainly controlled by the weights themselves, so that the optimisation includes a determination of the most suitable shape for the function. This flexibility is not without a drawback: overfitting is the cause of most problems in neural network modelling. Overfitting occurs when an overly complex function is chosen, so that the noise, rather than the trend in the data, is fitted by the function. One method widely applied to limit overfitting is to perform the optimisation on only one part of the data, then use the second part to determine which level of complexity best fits the data. This is illustrated in figure 4.
The first and still most often encountered consists in defining an error function and minimising it by adjusting the parameters. We will refer to this method as the classical method.
Bayesian probabilities offer a far more interesting approach by which the final model not only encompasses the knowledge present in the data, but also an estimation of the uncertainty on this knowledge.
Rather than identifying optimum parameters, an optimum probability distribution of parameters values is fitted to the data. In regions of space where data are sparce, this distribution will be wide, indicating that a number of solutions could fit the problem with similar probabilities. If a large amount of data is available, this distribution will be narrow indicating that one shape of function is significantly more probable than any other.
Because it can be quantified, the uncertainty on the determination of the network parameters can be translated into an uncertainty on the prediction. This is illustrated in figure 5.
Whether for linear regression or neural network, a bayesian approach should always be preferred because it allows predictions to be accompanied by an indication on the uncertainty. Because of the flexibility of the method, this is particularly important in neural network modelling.
Using the classical method Vermeulen et al. [1] built a neural network model for the M_{s} temperature of steels in the range of composition given in table 2. More recently, Capdevilla et al. [7] built a network using the bayesian approach, on a much wider range of compositions. This model being built on a superset of the database used to train the model by Vermeulen et al., the assessment to follow did not include the latter.

Although the predictions made in [1] are accompanied by error bars, these correspond to the average error of the model over the entire training database, which can be interpreted as a level of noise but do not carry any indication as to the uncertainty of the predictions.
Neural network modelling is not always perceived as a satisfactory method because of its purely empirical nature.
However, even in the thermodynamically based approaches, empirical equations lie at the heart of the method. The form of the function used by Ghosh and Olson to represent (equation 2) was adopted on physical bases, as the solid solution strengthening effect scales with the square root of concentration. However, the final superposition laws (in particular the grouping of elements in subsets) adopted in both [2] and [5] is not strictly derived from considerations of the solution strengthening effect and is thus best justified by later validation than a priori. That is to say, it is at least partly empirical. That there is no necessary link between the relative strength of elements and their grouping in subsets is made clear by the fact that different subsets were used in reference [2] and [5]. In both studies however, the relative effects of different elements are similar.
There is therefore no obvious reason to trust extrapolations using such models more than any other empirical method. Furthermore, this approach relies heavily on the CALPHAD [25] method to estimate the thermodynamic properties of complex systems. In the CALPHAD method, the extension of simple thermodynamic models (for example, regular solutions) to multicomponent systems and more complex behaviours is mostly empirical in nature, and there is once again no reason to trust their ability to extrapolate well. Interestingly, Stan et al. recently proposed to improve on the CALPHAD limitation by using a bayesian framework [26].
The flexibility of the neural networks avoids the use of a predetermined type of function. If a Bayesian approach is used, the technique offers the unique advantage that the level of certainty can be assessed by the user without the need to know all the details of the model derivation. To assess the validity of a prediction made using thermodynamic models, one must not only be aware of the limits of the composition range of the data used in deriving the function, but also of those of the thermodynamic database used to link temperatures and driving forces.
Finally, these models are available as selfcontained programs which are freely distributed on the world wide web [27,28].
To assess the models, a computer program was interfaced with the thermodynamic calculation software MTDATA [29]. Both the original method of Ghosh and Olson [2,3] and the revised method [5,6] were implemented. As mentioned earlier, Ghosh and Olson relied, in the first case, on the SGTE SSOL database, slightly modified for FeNi and FeNiC to derive the function .
In the second case however, significant modifications were made, resulting in a separate thermodynamic database that the authors namde kMART. Unfortunatly, this database not being available, the SGTE SSOL database was used in both cases. When using the more recent model [5,6], agreement was significantly worse than with the earlier one. Clearly, this disagreement is not related to the quality of the model but to the fact that different thermodynamic databases were used.
The driving force for martensite formation can be simply calculated as
. However, for low temperature or high carbon steels, it
is important to account for the ordering of carbon in martensite
[30,31].
The driving force for ordering can be calculated following Fisher
[32]:
Agreement between experimental and predicted M_{s} was systematicaly better using the original method by Ghosh and Olson [2,3] and accounting for ordering. In the following, the M_{s} predicted using the thermodynamic model therefore refers to results obtained with this particular method.
A second database (further referred to as database B) was built by the present authors using published data [2,33,34,35,36,37,38,39,40,41,11,42,9,43,8,44] which appeared not to have been used in either the work of Capdevilla et al., but have been used by Ghosh and Olson to derive .
Predictions were made on database A and B using both the neural network and the thermodynamic methods. To estimate the overall performance, the average of the absolute values of the errors (further denoted ) was used, together with the standard deviation ().
When comparing predictions and experimental values in database A ( figure 6), the neural network performed significantly better with =25 ( ), while the thermodynamics method gave =37 ( ).
In both cases, the datapoints which gave prediction more than 20% of the database values (`outliers') were investigated. The composition giving the worst prediction (measured 400 K, predicted 800 K) was particularly worrying as it was, in the case of the neural network model, not accompanied by a large error bar. However, this point turned out to be a mistake in the database (Fe0.04C0.08Mn wt% giving a Ms temperature of 400 K) so that the predicted value was actually correct. Few other points were wrongly predicted by more than 20 % by the neural network model, as reported separately [45], a number of them were found to be erroneous entries in the database.
The limitations of the thermodynamic model became obvious upon examination of the `outliers'. As for the neural network model, the Fe0.04C0.08Mn steel appears wrongly predicted because of a mistake in the database. All of the highNi (FeNi with more than 20 wt% Ni) steels were outliers. As explained earlier, this was expected because the database used to derive the critical driving force was a modified version of the SGTE SSOL, while the unmodified SGTE SSOL had to be used in this study.
Other outliers included most of the steels with vanadium, niobium or titanium additions and/or significant amounts of carbon, a few examples of which are reproduced in table 3.
This is again not surprising given that the model explicitly limits itself to solid solutions, while most of these steels will have carbides or nitrides remaining after austenitising.
To verify whether a better prediction could be obtained, the equilibrium constitution of these outliers was estimated using MTDATA [46], and the SGTE SSOL and substances databases. Phases allowed were cementite, mixed carbides (M_{23}C_{6}, M_{6}C, etc), tungsten carbide, niobium carbide, titanium carbide and vanadium carbide. The austenitisation temperature was taken as 1373 K (the value was not provided in the sources). As can be seen from the examples in table 3, this sometimes improved the predictions, but not systematically. One reason might be that, although the solute content of the austenite should be more realistic after this procedure, the model does not account for precipitates which may also have an effect.
In some cases, no temperature could be found that satisfied the thermodynamic criterion when the new compositions were used. This is probably due to the poor assessment in the SGTE SSOL databases of the effect of either high Ni content or high W contents sometimes present.
While this assessment outlines clearly the limitations of the thermodynamic model, it does not represent a good test of the validity of the neural network model by Capdevilla et al. since database A was used to train this model.
Results obtained using the database created by the present authors are presented in figure 7. A global comparison gives =210 ( ) for the neural network and =116 ( ) for the thermodynamic model. This is essentially caused by a few `wild' predictions from the neural network model, whose output is not bounded, while the thermodynamic model is by design limited to errors of 1000 K which is the width of the interval in which the program searches for =. These `wild' predictions were accompanied by very large error bars and therefore should not be considered as 'dangerous'.
To incorporate the existence of the error bars in the comparison, the predictions made with the neural network software were divided in 2 subsets, depending on whether the error bar accompanying the prediction was smaller (subset I) or greater (subset II) than K. It must be emphasized here that this does not involve a comparison with the experimental data as yet, only the use of the uncertainty estimation described in section 4.2.
Using subset I and the neural network, a value of =40 ( ) was obtained; on the same subset the thermodynamic method gives =102 ( ). On subset II, the neural network gave =1055 ( ), while, for the thermodynamic method, we obtained =187( ). Interestingly, the thermodynamic model also performed significantly worse on subset II. In the context of predictions, the behaviour of the bayesian neural network is clearly the most appropriate, as both methods wrongly predict a number of points, but only the former is able to warn the user on the reliability of the prediction.
As for database A, the entries which lead to relative error of more than 20% were further examined. The outliers were similar for both models, being the highNi steels (20 wt% and more), and the high nitrogen steels. It has been explained earlier that the thermodynamic model is expected to perform poorly on highNi steels. The neural network was trained on a database including a few Fe30NiC alloys, but no other data. Consequently, new data for FeNiC alloys were well predicted, but data on FeNiX alloys (where X is Mo, V, etc. but not carbon) were not.
Similarly, the neural network failed to predict correctly the M_{s} of highnitrogen steels, because the database used for training only contained amounts significantly below the solubility limit. The data introduced in database B included nitrogen contents in large excess of this point. The neural network method allows, in principle, to model influences which depends on the actual value of the input parameter, so that retraining the model on the low and high nitrogen contents should improve it significantly. The thermodynamic model is based on the assumption that elements are in solidsolution so that the failure to predict correctly the M_{s} of highnitrogen steels using the bulk composition is not surprising.
The thermodynamic method provides satifying results, as long as it is used within boundaries compatible with the fundamental assumptions upon which it was built. That is to say, one must be careful not to attempt calculations where additions are beyond the solubility limit.
This method is particularly interesting as it allows to treat separately the influence of alloying elements on phase stability and their effect on the propagation of the semicoherent interface. However, as explained earlier, the link between the solid solution strengthening effect of elements and that on the martensitic nucleation is not strongly supported by the analysis of Ghosh and Olson.
Although the fully empirical approach method does not allow the separation of different roles of alloying additions, it is able to incorporate any effect these might have, whether constant or depending on their own concentration, as long as the knowledge is somehow present in the database.
The neural network method was found to perform at least equally as well as the thermodynamic approach (on database B, in as much as predicting 2000 K or 0 K for an actual M_{s} of 400 K is equally useless), but nevertheless a number of improvement could be proposed and the authors have trained a new model whose performance was significantly improved [45].
Because of the risk of wild predictions, neural network methods should not be relied on unless a bayesian framework is used.
This document was generated using the LaTeX2HTML translator Version 2002 (1.62)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html split 1 title 'Critical assessment of models for predicting the Ms temperature of steels.' white noparbox_images math_parsing notop_navigation nonavigation noreuse dir ./ manuscript.tex
The translation was initiated by on 20050101