[IUCr Home Page]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

further proposal

Dear Colleagues,

     Thank you, Pierre, for your description of the phase
identifier used in the PAULING project.  You have obviously had
some experience with the problem and we would do well to build on
that experience.  In this email I comment on your suggestions,
make a new proposal and test the proposal on the phases in the
Pb-Sb-S system.

     You outline 5 fields in your symbol which are similar to the
six fields proposed in our earlier email.  In the following text
I include your descriptions as indented paragraphs.

     1.   We introduced and used during the last 7 years the following
     'Phase  Identifier', covering non-organic ordered single
     phase materials:
          i) Chemical System (alphabetically sorted), e.g. Al-Ge-Pb

This is a form of giving the composition, but without specifying
the relative abundances of the different elements.  One way in
which this could be made more specific would be to order the
elements in decreasing order of their frequency in the chemical
formula.  Where two elements have the same frequency the ordering
would be alphabetic.  Here are some examples:

     Formula             Abundance           Alphabetically
                         ordered             ordered
     Na2SO4              O-Na-S              Na-O-S
     Na2SO4 doped with K O-Na-S-K            K-Na-O-S
     VO                  O-V                 O-V
     V2O3, VO2, V2O5     V-O                 O-V
     Mg3Al2Si3O12        O-Mg-Si-Al          Al-Mg-O-Si
     Mg3Al2(SiO4)3       O-Mg-Si-Al          Al-Mg-O-Si
     POCl3               Cl-O-P              Cl-O-P

Giving them in abundance order would increase the information
content and would allow for searches for close matches as well as
exact matches.  Possible close matches would include: a) The same
elements present in any order; b) ignoring minor constituents by
matching only the elements in the shorter of the two strings.  In
the list above O-Na-S-K and could be considered a match with O-
Na-S which would be difficult to do using alphabetic ordering.
c) interchanging two adjacent elements, useful when a given phase
has a range of compositions.

     2.   ii) Structure Type (using the standardization program STIDY
     and the concept  used in Gmelin's TYPIX HBs), e.g.  CsCl
     type,  Al4Ba type

This could be seen as an extension of item b in the earlier
proposal, giving the phase type indicating liquid, amorphous etc,
but 'xtl' (crystal) could be replaced with one of the standard
structure types where applicable.  An enumeration list of allowed
values for this field would be needed.  Clearly not all
structures can be classified into a particular structure type and
perhaps a value, say 'oxtl' (for other crystal type) would be
needed for a crystal that cannot be assigned into one of the
enumerated structure types.  'xtl' would continue to apply to any
crystalline compound and would match any of the structure types
as well as 'oxtl'.

     3.   iii) Pearson Symbol (using Pearson's definition, but
     replacing A, B, C by S  (side-centered))
          iv) Space group number

The earlier proposal used the crystal system, the space group
number and the atom count in the unit cell as separate items
equivalent to the Pearson symbol.  The lattice centring is
redundant if the space group number is given.  I omitted this
item because of the danger that it can be misassigned.  For
example the lattice centring symbol of space group 2 is P even if
the author describes the space group as I-1, but someone not
fully familiar with space group theory might be tempted to assign
a value of I.  The danger also arises if the identifier is
assigned by an unsophisticated computer program.

You have chosen to use S for single side-centred cells (and this
is logical) but we should follow the current International Tables
convention which is to use E.  I agree that listing the number of
occupied sites, i.e., treating them all as if their occupancy is
1.00, is better than using the actual cell contents as it should
always be an unambiguous integer.  When calculating the number of
occupied sites we will need to decide whether to adopt the
hexagonal or rhombohedral setting for rhombohedral space groups.
Good arguments can be given for either choice.  My suggestion is
to use the hexagonal cell as this corresponds to a choice of an R
centred cell.

     4.   v) Formula + modification as unique name within a chemical
     system (to make  it computer friendly we gave each
     combination within a chemical system a number)

In the earlier draft proposal the unique phase identifier of the
kind described above was described as an 'external' identifier
because it has to be assigned by an external agency, while
internal identifiers are those that can be assigned from
properties internal to the material.  The choice of the names
'external' and 'internal' was, perhaps, unfortunate as you might
regard an internal identifier as one used internally within a
data center.  The difficulty in using such an arbitrary
identifier is finding an agency willing to maintain a registry of
such numbers in the public domain.  In the first instance we were
interested in seeing if we could manage without such an arbitrary
identifier.  Otherwise we would have to find an agency (or
agencies) willing to maintain the phase identifier list.

     5.   The most ambiguous item is v), but it was necessary to
     introduce it.  There we use a standardized way to sort the
     chemical elements, and the most often occurring groups
     [SO4], ....

I agree.  Items i to iv will give many false matches because they
are not sufficient to identify a phase uniquely.

The use of special symbols for complex ions such as SO4 must be
used with care.  A complex such as S2O7 might be described as
O(SO3)2 where S2O7 and SO3 might both be named complexes.  My
preference would be to avoid any structural chemical
interpretation as these depend so much on the approach of the
person making the description (see the two alternative formulae
for garnet given in the table above).  Such a scheme can be made
to work within a data center where there is someone to enforce
conformity in ambiguous cases, but our identifier must follow
rules that permit of only one possible construction. Perhaps
something like the sum formula could be used.  My earlier
proposal was to use a reduced sum formula, recognizing that the
formula does not change by multiplying all the abundances by the
same factor, e.g., HgCl and Hg2Cl2. The solution might be to
allow any sum formula to be used but require the searching
algorithm to match only the relative abundances with some user-
determined tolerance allowed (see the example below).  This was
the intent of the earlier complex set of rules for writing the
chemical formula.

Writing the formula for organic crystals presents its own
problems.  Would someone from CCDC or elsewhere with experience
in organic structures like to comment?

     6.   p.s.  I have attached a file with the chemical element
     (functional group)  sorted used to formulate the unique
     formula (please install first the MPDS  font to see the
     numbers as subscripts)

This file ended up appended to the email distributed from the
list-server rather than attached.  It is best to send items to
the list-server as text included in the main message, or arrange
some other means of distributing non-ASCII files (e.g. from a web
site).  Perhaps we should not worry about these details until we
have the broader framework defined.

     7.   For cases where the information i)-iv) is not known, we
     replaced it by a  '*'.

This agrees with the earlier proposal that not all items need to
be included.

     8.   As a help for the PAULING FILE editors we have created an
     'internal'  DISTINCT PHASES TABLE', which contains the
     following information: i)-v), and additional info like: Dm,
     Tm, color, common name, info about  T/p- stability, info
     about chemical property, .....  I agree with you that the
     additional info are not good as 'Phase  Identifier', but
     they helped us many times to add to an e.g. physical
     property entry the 'Phase Identifier'  (as very often the
     structure type is  not mentioned, but they write e.g. a
     green cubic phase,....). With this  described approach we
     were able to give to all entries a 'Phase  Identifier'. At
     present the PAULING FILE contains already about 100,000
     structure/diffraction entries, about 65,000 physical
     property entries and  about 20,000 phase diagrams, so our
     experience is already based on many  practical examples.

     9.   e) CAS number: Is very bad, this has nothing to do with a
     phase. In PAULING  FILE we have in average 15 publications
     dealing with the same phase.

Agreed that the CAS number is tricky at the best of times and I
don't favour this as a primary key, but some people might use it
and it could helpful for organic compounds where it is already
widely used and, at least for a pure molecular compound, is
unambiguous.  For other cases it can be ambiguous.  The CAS
number of CuSO4 might be used to refer to the solid CuSO4. 5(H2O)
which (I assume) has its own CAS number.  Thus a correct match
might be discarded because the two keys used different CAS
numbers.  In any case we have no control over how they are
assigned and used or even whether these numbers would be
available in the public domain.

     10.  As  unique document name I would recommend: CODEN, year,
     volume, (part), first  page, last page  (part only given if
     no volume given).

I agree, but it is not particularly relevant to the assignment of
a phase identifier unless we intend to include a bibliography as
part of the key.  I do not see any value in including the
bibliographic reference.


Combining Pierre's suggestions with our earlier proposal, I
recommend the following refined list of items for the identifier.
These are ordered with the most important elements first, though
the example given below suggests this may not be the best order.

1: Chemical system: gives the elements present ordered in
decreasing order of abundance.  Alphabetical ordering is used for
elements with the same abundance.

2: Phase type, including the structure type if known.

3. Space group number (conventions needed for enantiomorphic
pairs e.g. P41 and P43).

4: Crystal system (redundant if the space group is known but
useful if it is not and therefore it should always be given even
when the space group is known).

5: Lattice centring in the standard setting (like the crystal
system this is redundant if the space group is known).

6: Number of occupied sites in the conventional unit cell defined
by 5.  This is an integer and is not necessarily the same as the
number of atoms in the unit cell.  It differs from the definition
in the Pearson symbol.  (What should we choose as the
conventional cell of a rhombohedral crystal?)

Items 4, 5 and 6 can be concatenated into a Pearson-like symbol.

7: Chemical sum formula (starting with C and H and then
alphabetically ordered.  Only the relative element abundances are

8: Mineral name

9: Colour (useful if known for otherwise poorly characterized
materials.  Not a primary key as some materials come in a variety
of colours.)

I have chosen the PbS - Sb2S3 system as an example.  In this case
the phase depends on composition rather than temperature or
pressure.  it contains has a variety of different phases, but
some of the more ephemeral intergrowth phases are not included in
the list, and some of those listed may only be stable in the
presence of impurities which I have not noted.  Note that the
space group is not as good an indicator as one might expect
because it is easily misassigned and there is no easy way of
spotting closely related space groups, e.g., Pbn21 (33) and Pbnm
(62), though group-subgroup relations could be built into a
sophisticated search algorithm.  Of course there is a difficulty
in allowing a match between closely related space groups since
many phase transitions involve the loss of a single symmetry
element, so the distinction between 33 and 62 may be highly
significant.  The chemical formula comes out looking quite good
as an identifier in this example.  For convenience I have given
the ratio of Sb to total cation (as calculated from the formula)
on the right.  The examples are mostly taken from the ICSD (but
see also Acta Cryst. (1994) B50, 524-538 and references there).
To fit each ID onto a single text line I have concatenated the
components above, separating them with =.

#    Proposed ID                                       Sb/(Sb+Pb)
1    S-Pb-Sb=liq
2    S-Sb-Pb=liq

3    Pb-S=NaCl=225=cF8=Pb-S=Galena                     0.00

4    S-Pb-Sb=oxtl=19=oP96=Pb7-S13-Sb4=*                0.36

5    S-Pb-Sb=oxtl=62=oP*=Pb3-S6-Sb2=*                  0.40

6    S-Pb-Sb=oxtl=62=oP80=Pb5-S11-Sb4=Boulangerite     0.44
7    S-Pb-Sb=oxtl=14=mP160=Pb5-S11-Sb4=Boulangerite    0.44
8    S-Pb-Sb=oxtl=62=oP84=Pb4.82-S11-Sb4.11=Boulangerite  0.46
9    S-Pb-Sb=oxtl=62=oP96=Pb9-S22-Sb9=Boulangerite     0.50

10   S-Pb-Sb=oxtl=15=mC152=Pb9-S21-Sb8=Semseyite       0.47

11   S-Pb-Sb=oxtl=62=oP36=Pb2-S5-Sb2=*                 0.50

12   S-Pb-Sb=oxtl=55=oP38=Pb4-S11-Sb4=*                0.50

13   S-Sb-Pb=oxtl=15=mC136=Pb7-S19-Sb8=Heteromorphite  0.53

14   S-Sb-Pb=oxtl=2=aP50=Pb5-S14-Sb6=*                 0.55

15   S-Sb-Pb=oxtl=12=mP46=Pb4-S13-Sb6=Robinsonite      0.60
16   S-Sb-Pb=oxtl=1=aP46=Pb4-S13-Sb6=Robinsonite       0.60

17   S-Sb-Pb=oxtl=15=mC120=Pb5-S17-Sb8=Plagionite      0.62

18   S-Sb-Pb=oxtl=173=hP76=Pb18-S81-Sb42=Zinkenite     0.70
19   S-Sb-Pb=oxtl=173=hP72=Pb1.6-S7-Sb3.4=Zinkenite    0.68

20   S-Sb-Pb=oxtl=15=mC104=Pb3-S15-Sb8=Fueloeppite     0.73

21   S-Sb=Sb2S3=62=oP20=S3-Sb2=Stibinite               1.0
22   S-Sb=Sb2S3=15=mC104=S15-Sb9.8=Stibinite           1.0
23   S-Sb=Sb2S3=47=oP40=S3-Sb2=Stibinite               1.0
24   S-Sb=Sb2S3=31=oP20=S3-Sb2=Stibinite               1.0

(The number following the phase number in the following list is
the number of different structure determinations reported in the
#         Comment
1         In the liquid phases the symbol indicate which metal
3  (11)
4  (1)
5         This phase is not well characterized
6  (2)    Note the different space groups and compositions
          reported for Boulangerite
7  (1)
8  (1)
9  (1)
10 (1)
11 (2)
12 (1)    This composition is not electroneutral
13 (1)
14 (1)    Pearson symbol given as aI100 in ICSD
15 (1)    Pearson symbol given as mI92 in ICSD
16 (1)    Wrong assignment of space group in this structure
17 (1)
18 (1)    Pearson symbol given as hP71 in ICSD
19 (1)    Zinkenite assigned different compositions and site
20 (4)
21 (5)
22 (1)    Pearson symbol given as mC99 in ICSD may correspond to
          actual cell contents
23 (1)    Probably incorrect space group
24 (1)    Probably incorrect space group

1. Although the same space groups appear in different phases, no
two phases with the same space group have the same number of
occupied atomic sites.

2. The chemical formula is a better distinguisher of phases than
the space group because of the number of wrong space group
assignments.  Along with the space group error, go the errors in
the crystal system and lattice type.

3. If the formula is known, the chemical system is redundant and
we need not include it.  If the chemical system is known but the
formula is not, the abundances in the formula could be replaced
by *.  Do we need the chemical system?

Your comments and suggestions are welcome.

                    Best wishes


Dr.I.David Brown,  Professor Emeritus
Brockhouse Institute for Materials Research,
McMaster University, Hamilton, Ontario, Canada
Tel: 1-(905)-525-9140 ext 24710
Fax: 1-(905)-521-2773

Reply to: [list | sender only]

Copyright © International Union of Crystallography

IUCr Webmaster