[IUCr Home Page]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Phase ID Discussion Paper 7

Dear Colleagues,

     This email contains PhaseID Discussion Paper #7.  The
previous discussion paper was the first draft of our report to
the IUCr Commission on Crystallographic Nomenclature (CCN) that
was circulated just before Christmas.  Rather than circulate
another complete draft of the report, most of which would be
repetitive, I focus here on the items that need to be resolved
before the final draft report is prepared for discussion and

     Can you please respond by MARCH 26, after which I will put
together a final draft of our report to the IUCr Commission on
Crystallographic Nomenclature.

     The Discussion Paper follows.

                    David Brown

I.David Brown, Professor Emeritus of Physics
Brockhouse Institute for Materials Research
McMaster University, Hamilton, Ontario
Canada L8S 4M1
Tel: +905 525 9140 ext 24710
Fax: +905 521 2773
email: idbrown@mcmaster.ca

                 DISCUSSION PAPER # 7

  This document contains the following sections:

1. Comments received on the first draft of the report
2. Construction of the IUPAC chemical identifier (IChI)
3. Proposed additions to IChI for phase identification
4. Incorporation of the phase identifier into the IUCr-CCN Phase
              Transition Symbol
5. The use of the phase identifier in databases

I received comments from Pierre Villars and Sidney Abrahams in
response to the first draft of our report.  These are included
here, organized by topic, with my (IDB) response as necessary.

  I can fully agree with your chapter:  6. Recommendations
Layer 5. State of matter: gas, liquid, crystal etc.
Layer 6. The space group number
Layer 7. Wyckoff Sequence

      I am in full agreement with your proposal that a unique
comprehensive identifier for each chemical compound be formed by
adding the crystal phase identifier to the IChI chemical
identifier. In reading your first draft, however, it is striking
that no mention is made of the proposed method(s) of implementing
such a system, possibly because they seem obvious to the

IDB response:
  The reason this information was missing in the first draft
report is two-fold.  I first wanted to get agreement on the
principles before going into technical details, and secondly I
did not at the time have a clear picture of the actual structure
of IChI, since there is not yet a document giving a formal
description.  This detail has now been added and is given below.

    May I start by suggesting WG members may find the July 2002
presentation on preliminary thinking about IChI to be of
interest, as given at:


although the summary in your Section 5 shows that considerable
progress has been made since that conference. However, members
may well wish to see further details of the results agreed upon
during the IChI workshop at NIST in November 2003. Are these
expected to become available soon?

IDB Response:
  I have not had any word about when a report will be
forthcoming on the November meeting.  However, details of IChI
relevant to our work are given below.

    I agree with your proposal to add three crystallographic
layers to the four IChI chemical layers. The choice between
single and multiple letter codes depends upon the answers given
to the questions above [these questions can be found in section
1.5 below, IDB].  I also agree with use of the space group number
for layer 6 and, if necessary, with the Wyckoff multiplicity and
letters in layer 7.  I doubt if use of the Bravais symbol in the
identifier would be of value.

No they [the Bravais symbol and reduced cell] are not needed.
There exists quite some cases with same composition and same
space group number and same Wyckoff Sequence (after
standardization with STIDY and COMPARE). Niggli's Reduced cell is
for further distinction not helpful for such cases, one
possibility is to add to each point-set its Atomic
Environment AET's (Coordination Polyhedra), see e.g. references:
- J.L.C. Daams et al., J Alloys and Compounds, 1997, 252,110-142
- J.L.C. Daams et al., J Alloys and Compounds, 1994, 215,1-34
- J.L.C. Daams et al., J Alloys and Compounds, 1993, 197,177-196
- J.L.C. Daams et al., J Alloys and Compounds, 1993, 197,243-269
- J.L.C. Daams et al., J Alloys and Compounds, 1992, 182,1-33

IDB comment:
  At this point I don't think it is necessary to define the
AET as an additional layer but Pierre may disagree.  If during
the course of use ambiguities are found, the working group could
reconvene to discuss the need for a further layer.

  Since there were no other objections to the use of three
additional layers, I am assuming that everyone else is in
agreement and we can move on to the next step (see below).

DAVID BROWN (earlier comment):
  The information given in the IUCr-CCN Phase Transition
Nomenclature includes:
1. the common symbol used to identify this phase (e.g., alpha,II,
2. the temperature (and pressure) range in which it is stable,
3. the Hermann-Mauguin symbol and number of the space group (more
than one space group may be given, or the Bravais symbol may be
given if the space group is not known),
4. Z, the number of formula units in the conventional unit cell
(though the formula unit is not defined within the symbol),
5. the ferroic properties and
6. the structure type.

   Addition of the comprehensive IChI identifier in a new field,
probably the leading field, in the CCN phase nomenclature [see
Acta Cryst. (2001). A57, 614-626 and Acta Cryst. (1998). A54,
1028-1033)] would be appropriate in database compilations. It
would probably be inappropriate elsewhere.

  If the structure type assignment is properly done (after
standardization with STIDY and COMPARE), and each prototype is
defined by a unique combination of Space Group Number/Wyckoff
Sequence/AET's all is included in item 6) the structure type.

IDB response:
  Pierre is referring to the formal structure type as defined
in the Pauling file.  The structure type defined as item 6 in the
CCN nomenclature can be any description chosen by the user and is
therefore not suitable for machine searching.  A new field for
IChI would be better, but whether it should appear first or last
in the sequence, and what format it should have, are questions
best addressed by those who were on the working group that
defined this symbol.  See the proposed discussion for our report
given in section 4 below.

  Yes, it [the phase identifier as proposed in the first
draft] is acceptable. The Pauling File has already included this

  However, a number of questions are likely to arise in
reading our Report and I suggest it would be of value to our
readers if it contained a section that addressed these and
related issues so that our recommendations are set in their
fullest context.  These issues include the following:
  1. Once a unique identifier system has been agreed, must it
be reduced to a single algorithm to avoid the introduction of
variant identifiers?
  2. If the latter is the case, then would it be advantageous
to state or merely refer to the algorithm? 
  3. Must each database adapt the algorithm to match its
specific contents or is that the responsibility of the user?
To the extent possible, the new section should respond to these
and similar questions.

IDB response:
  These questions are now addressed in Section 5 below which
is a draft section for our report.

     In the example of a material with a single crystal form,
OsI3, I note it is not listed in the ICSD. Perhaps a better
choice should be made?

IDB response:
  This was taken from the Pauling file.  I was looking for a
non-trivial example of a compound in which only one phase was
known.  This is not simple to find.  The Pauling file allowed me
to perform that kind of search but there are not many examples -
even NaCl is known in two phases under different conditions.

  With the acceptance of the idea of introducing additional
layers into IChI to identify crystallographic phases, we are now
ready to discuss how this should be implemented.  I first give a
description of IChI as it now exists, before making
recommendations for the structure of the additional layers.

The following is an example of an IChI: the slash, /, is used to
separate the layers.


The following is an explanation of the above IChI as far as I can
figure it out.  The important items are the first three or four
which are easy to understand - the remainder in this example deal
with a description of the stereochemistry and will not frequently
be used.  Most if not all of the work done in developing IChI has
focused on organic molecules and resolving isomers, tautomers and
enantiomers.  The connectivity of infinite structures has not yet
been addressed.  This should not present a problem for devising
an IChI for phase identification because if the composition and
the space group are given (the two essential layers for any phase
identification), the connectivity is not usually needed.

1.00Beta/                           # Version of IChI
C6H9N3O3/                           # Sum formula
CT:7-4(10)1-2(5(8)11)3(1)6(9)12/    # Basic connectivity
H:1-3H,(H2,7,10)(H2,8,11)(H2,9,12)/ # Hydrogen connectivity
SC:1-,2-,3-/                        # Stereocenters, sp3
I:(1D)/                             # Isotopes (H1 is deuterium)
SC:m/                               # ?
is:0/                               # Inverted stereo (absolute
                                      stereo only)
ST:abs                              # Abs (absolute), rel
                                      (relative) or rac (racemic)

All but the first two items (which are required) are introduced
by one of the tags listed below:

"CT:";  /* connectivity */
"H:";   /* H-atoms */
"C:";   /* charge */
"DB:";  /* double bond stereo */
"SC:";  /* stereo centers sp3 */
"is:";  /* mark sp3 inverted stereo */
"SR:";  /* mark sp3 racemic stereo */
"ST:";  /* abs, rel, rac */
"I:";   /* isotopic atoms */
"fH:";  /* fixed H -- first item in non-taut */
"N:";   /* orig. at numbers in canonical order */
"NT:";  /* non-tautomeric orig. at numbers */
        /* in canonical order -- first item */
        /* in non-tautomeric aux info */
"E:";   /* atoms equivalence */
"tE:";  /* tautomeric groups equivalence */
"iC:";  /* inverted (stereo) Centers */
"iN:";  /* inverted sp3 stereo orig. atom */
        /*     numbers in canonical order */
"NI:";  /* isotopic orig. at numbers in */
        /* canonical order */
        /* first item in isotopic aux info */
"TR:";  /* transposition of components in */
        /* non-tautomeric representation */
"CRV:"; /* charges, radical, valence*/
"XYZ:"; /* xyz-coordinates */

An XML version of IChI has also been defined, but this is a
straightforward coding of the text version described above.  It
is highly verbose and somewhat opaque.  It is designed for
computers and is best left for computers to read.  CIF versions
of IChI would use the canonical form shown above.

I assume that the chemical element symbols are case sensitive so
as to distinguish between CO and Co, but it may be that the
current testing of IChI has not extended to element symbols
composed of two letters.

The remaining text in this document is designed to be part of the
final report.  Comments that are not part of the report are
indicated by text enclosed between ******* strings.

The following is a list of additional tags required for phase
identification expressed in the form of an IChI.  These would be
used in conjunction with existing IChI tags, in particular the
IChI version number and the composition:

"PH:"  /* phase or state of matter. Allowed values are: */
       /* gas, liq, amp, sol, xtl, lxl, qxl */
"SG:"  /* Space group number, integers between 1 and 230 */
"WS:"  /* Wyckoff sequence, any lower case letter */
       /* or & (for alpha) */

COMPOSITION: The composition layer in IChI for a crystalline
phase must give the contents of the formula unit of the crystal.
This is a unit in general no smaller than the crystallographic
asymmetric unit and no larger than the primitive unit cell.  It
is NOT the same as the formula of the molecule of interest unless
the molecule is the only component of the crystal.  Other
components, including solvents of crystallization, must be
explicitly included.  Wherever possible the formula unit is
chosen so that the multipliers of the elements are integers with
no common divisor, but this is not always possible.  In cases
where one or more of the multipliers is non-integral, the size of
the formula unit is indeterminate and only the relative
multipliers are meaningful.  Testing should be carried out in
this case by normalizing the multipliers, e.g., by converting the
largest multiplier to 1.00 and the others in proportion.  When
non-integral multipliers are encountered, searches should include
a tolerance factor to allow for experimental uncertainties or to
retrieve related compounds of the same phase having a similar but
not identical composition.  The tolerance should be large enough
to recognize that phase identifiers that include trace elements
are equivalent to identifiers in which the trace elements have
been omitted either because they were not determined or because
they were not considered to be important.

PH:  This layer gives the phase or state of matter.  Seven flags
are defined. Others could be formally added to this list if a
need is demonstrated.

          liq  liquid
          amp  amorphous 
          sol  solid of unknown form
          xtl  crystal
          lxl  liquid crystal
          qxl  quasi-crystal

Only if the value of PH is 'xtl' will the following two layers be

SG:  This is a number between 1 and 230 inclusive, being the
number of the space group of the crystal as given in
International Tables for Crystallography Vol A.  The following
space group pairs are identical except for their chirality:
76=78, 91=95, 92=96, 144=145, 152=153, 169=170, 171=172, 178=179,
180=181, 212=213.  The chirality is often not determined and is
only significant if the crystal contains a chiral molecule.
Since molecular chirality is already described elsewhere in IChI,
only the lower space group number of each pair should be used.
However, one of the forbidden numbers may be inadvertently used
and software should be prepared to convert it to its legal
equivalent.  There are many cases where the true space group is
not known, or the structure is incommensurate.  Different
approximate space groups might be assigned by different workers in
which case a valid match would be missed, but there seems little that can be
done to overcome this situation.

WS:  The Wyckoff sequence is an alphabetic list of the Wyckoff
symbols (letters) of the occupied special positions, with each
letter followed by the number of crystallographically distinct
atoms that occupy the site if this number is different from 1.
International Tables for Crystallography Vol. A lists the Wyckoff
letters for all special position, that is, all sites having a
crystallographically distinct site symmetry.  Before determining
the Wyckoff sequence, the structure must be normalized according
to the algorithm used in the program STRUCTURE TIDY, details of
which are given in Parthe, E., Gelato, L.M. (1984). Acta
Crystallogr. A40, 169-183, Parthe, E., Gelato, L.M. (1985). Acta
Crystallogr. A41, 142-151. and Gelato, L.M., Parthe, E. (1987).
J. Appl. Crystallogr. 20, 139-143.  The allowed letters in this
layer include all the lower case letters (as defined in the ASCII
coding) and the character '&' representing the Greek letter alpha
which appears in space group 47.

Rutile    IChIversio-x/TiO2/PH:xtl/SG:136/WS:af2
******* We could use some additional examples, particularly of
organic crystals ************

  The IUCr-CCN Phase Transition Symbol [Acta Cryst. (2001).
A57, 614-626 and Acta Cryst. (1998). A54, 1028-1033)] is composed
of six fields defined as follows:

1. the common symbol used to identify this phase (e.g., alpha,
II, etc.),
2. the temperature (and pressure) range in which it is stable,
3. the Hermann-Mauguin symbol and number of the space group (more
than one space group may be given, or the Bravais symbol may be
given if the space group is not known),
4. Z, the number of formula units in the conventional unit cell
(though the formula unit is not defined within the symbol),
5. the ferroic properties and
6. the structure type.

The formats of the fields in this symbol are not tightly
structured and may contain non-ASCII characters as the symbol was
not intended for computer use.  Given the different purposes and
structure of the IUCr-CNN Phase Transition Symbol and IChI, it is
arguable whether any purpose is served by incorporating the IChI
Phase Identifier into the IUCr-CCN Phase Transition Symbol.
However, the complete IChI Phase Identifier symbol could be
included as one or more additional fields.  Because both the
IUCr-CCN Phase Transition Symbol and the canonical form of the
IChI Phase Identifier both uses slashes as field separators, the
IChI Phase Identifier must either be incorporated as a series of
different fields, or the slash separator in the IChI Phase
Identifier must be converted to some other symbol. 
******** Members of the present working group who also served on
the CCN Phase Transition Symbol Working Group are invited to
suggest the best way in which this could be done in the spirit of
the original symbol.  Otherwise we can leave the matter
unresolved in our final report *******************

Since the IChI Phase Identifier is parsable, each of the layers
can be reformatted in any way that suits the needs of a
particular database.  Most crystallographic databases will
already have fields containing the sum formula and the space
group number, and adding a field for the Wyckoff sequence should
present no difficulty.  The 'state of matter' field, PH, would
not need to be present since it must have the value 'xtl' if the
phase is in a crystallographic database. ******** Is this true
for the Protein Data Bank? ********  Software designed to search
the database for examples of a target phase would need to extract
from the database the information identified in this and other
IChI documents.  Even if the IChIs are given in their canonical
form, they must still be parsed and compared layer by layer,
since two different identifiers may not contain the same number
of layers, or the search may not be carried out at its full depth
if, for example, chirality or isotopic content were not

All the proposed fields can be searched by looking for identical
bit sequences, except for the SG field which should be screened
for illegal numbers, and the composition field in cases where
non-integral multipliers are given.  In the latter case, the
composition must be normalized as discussed above and compared
with a predetermined tolerance.

-----end of file-------end of file-----------end of file-----

phase-identifiers mailing list

Reply to: [list | sender only]

Copyright © International Union of Crystallography

IUCr Webmaster