[IUCr Home Page]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Discussion #6

Dear Colleagues,

Please read this paragraph before filing this email
    Can you please respond to this email by Dec. 6 indicating
whether you agree with the recommendations in Section 3.  If
these meet general approval I will circulate a draft final report
for further discussion.

    You are a member of the PhaseID working group of the IUCr
Commission on Crystallographic Nomenclature.  This group was
established in response to a request from the IUPAC project to
develop a chemical identifier (IUPAC Chemical Identifier, IChI),
a character string that would uniquely identify a chemical
compound including, if required, its crystalline phase.  In
earlier discussion we explored the problems associated with
generating such an identifier, working independently of the IChI
group.  Fortunately our group and the IChI working group have
come to very similar conclusions.  You will recently have
received my report on the IChI workshop I attended at NIST in
Washington earlier this month.  This gave me a chance to discover
what they have doing and to see how we can draft our
recommendations to mesh with theirs.

    IChI version 1.0 is close to being released and shows many
parallels to our own work, meaning that we can adopt the IChI
conventions for describing the chemistry and only need to decide
how best to identify the crystallographic phase.  As a result we
should be able to draft our recommendations relatively quickly.

1. A description of IChI.
2. A summary of our previous discussions on a phase identifier.
3. A proposal for a crystallographic phase identifier that can be
incorporated into IChI.

According to the current proposals, IChI will consist of a
variable-length character string comprising a number of
components or 'layers'.  The layers are arranged hierarchically
with the top layer being always present and with each layer
describing some property of the compound. Since the purpose of
IChI is only to identify the compound, not to describe it, the
deeper layers are only used if the higher layers fail to give a
unique identification.  Therefore, even if the information used
in the deeper layers is available, it will only be included in
IChI if it is needed for identification.  For example, CH4 and
NaCl are uniquely defined by their formulae and therefore no
further layers are needed (except possible to identify their
state of matter or crystalline phase if this is relevant).

1.1.1 Layer 1
consists of the chemical sum formula.  For many compounds this
layer will be sufficient to define the compound uniquely.

1.1.2 Layer 2a
contains the graph of the bonded structure.  All H atoms and
bonds to metal atoms and cations are ignored in this layer.  Bond
orders and formal charges are ignored in all layers. This graph
indicates which atoms are generally agreed to be bonded because
they are neighbours, but omits features that depend on particular
chemical interpretations.  It is a robust description because we
know where the atoms are, even when we disagree over how the
electrons are distributed.

1.1.3 Layer 2b
contains information about the connectivity of the fixed H atoms,
those whose location does not depend on conditions such as pH.
These are typically H bonded to C.

1.1.4 Layer 2c
contains information about the connectivity of labile H atoms,
those whose location depends on the conditions (tautomeric H, H
bonds etc.)  This layer would be omitted if, e.g., either
tautomer would satisfy the search.

1.1.5 Layer 2d
indicates which atoms the metal or cation bonds to.  Although
such bonds will always be present if metals or cations are
present, this layer is only needed if the compound has isomers
with different metal or cation coordination.  This layer will be
rarely used.

1.1.6 Layer 3
contains information about stereocenters and is only needed if
the chirality of a stereoisomer needs to be specified.

1.1.7 Layer 4
contains information about isotopic substitutions. It will not
often be used.

1.2 Some General Comments
Since the purpose of IChI is to differentiate between compounds,
not to give a full chemical description, it is only necessary to
give as many layers as are required for a unique identification.
For many compounds (particularly inorganic) the top layer will
suffice.  In special cases, such as an isotopically substituted
isomer of a transition-metal complex with chiral ligands, a deep
search may be needed, requiring a match in all layers.  Even when
all layers are included, a particular search may be restricted to
the top layers if, for example, any form of the target compound
is to be retrieved, including tautomers and those that are
isotopically enriched.

Version 1.0 of IChI will not handle polymers, cluster compounds,
metallic materials or disordered structures.  It is hoped that
these will be included in later versions.

Our earlier discussion suggested that a phase identifier would
need to comprise several component which are briefly described
below.  CIF names are shown for those items already defined in
the CIF dictionaries.

2.1. Chemical components
These items provide the chemical characterization of the phase
and should now be replaced by the respective IChI layers
described above.  They are included here to provide continuity
with our earlier discussions.  It needs to be pointed out that if
IChI is to specify the crystallographic phase, the chemical
description must be that of the whole crystal, including, for
example, solvent of crystallization, even though the main
interest may be in only one component (molecule) in the crystal.

2.1.1 The chemical formula  (CIF name: _chemical_formula_sum)
This corresponds to the top layer in IChI.  It describes the
formula unit of the crystal, which may include several distinct
moieties.  The element counts need not be integers and only their
relative values are important since the formula unit may be
chosen arbitrarily to be as small as the crystallographic
asymmetric unit or as large as (or larger than) the unit cell.
The absolute values of the element counts in the formula are
therefore not (in general) meaningful and this needs to be taken
into account in search software.  Alternatively the formula could
be normalized, but it is not obvious how this could be done
without giving unrealistic formulae for compounds that are not

2.1.2 The state of matter  
This level of discrimination is not included in the first version
of IChI.  This component records whether the compound is a gas,
liquid or solid.  We defined the following flags:
                                                  gas       gas phase
                                                  liq       liquid phase
                                                  sol       solid phase 
of unknown form
                                                  xtl       crystalline 
                                                  qxl        quasi-crystal
                                                  amp       amorphous solid
                                                  lxl       liquid 
crystal or other anomalous
                              quasi-liquid phase
The inclusion of the crystallographic components defined below
will only be meaningful if this flag is set to xtl.

2.1.3 The number of C atoms with 0, 1, 2 and 3 attached H atoms.
The information in this item serves the same function as that
given in the second layer of IChI.  It is designed to distinguish
between different isomers of organic crystals.

2.1.4 Mineral name      (CIF name _chemical_name_mineral)
This is the mineral name assigned by the International
Mineralogical Association.  It should only be used for natural
minerals, not for artificially produced mineral analogues.  This
could be useful for minerals if the rules are rigidly followed,
but the imprecise use of mineral names by non-mineralogists may
invalidate this key.  There is currently no layer in IChI that
gives this information.

2.2. Crystallographic components
These components are the ones that should form the basis of our

2.2.1 Space group number in International Tables for
Crystallography         (CIF name _space_group_IT_number)
This component consists of a number between 1 and 230 that
uniquely identifies the space group type (if known).  This number
is found in International Tables for Crystallography Vol A and is
independent of the space group setting used.  The only ambiguity
occurs for space groups such as P41 (076) and P43 (078) that are
identical except for their chirality which is more appropriately
treated in the IChI stereochemistry layer.  Since the chirality
is frequently not determined, and for inorganic compounds is
mostly meaningless, the following pairs of space groups are
treated as equivalent for present purposes.  In general only the
first number of each pair should be used, but search algorithms
should equivalence these pairs in case the second number is used:
076=078, 091=095, 092=096, 144=145, 152=153, 169=170, 171=172,
178=179, 180=181, 212=213.  A sophisticated search might look for
selected sub- or super-groups of the target space group in case
the space group has been missassigned, which can easily happen in
some classes of inorganic materials.

2.2.2 Bravais symbol    (CIF name _crystal_Bravais_type)
In the event that the space group is not known, its Bravais
symbol may be known and could be a useful component of an

2.2.3 Wyckoff sequence   
If the space group and the location of the atoms in the unit cell
are known, the Wyckoff sequence indicates how many of each of the
high-symmetry crystallographic special positions are occupied.
This is particularly useful for inorganic compounds, but would
not be needed for most organic compounds where crystallographic
special positions are occupied infrequently.  An example of a
Wyckoff sequence is 'a d i6' indicating that the a, d and i sites
are occupied, the latter, which is presumably the general
position with site symmetry C1, is occupied by six
crystallographically independent atoms.

2.2.4 Reduced cell   
The Niggli reduced cell has been recommended as a good key for
identifying identical materials, but it is subject to
experimental uncertainty and so will rarely give an exact match.
A tolerance should be specified for this match.

2.3. Simple identifiers
These are numbers of codes assigned by different databases that
might be useful components of a phase identifier.  They have been
designed by the different databases for their own purposes and
are not necessarily sufficient to uniquely characterize a
particular phase.  Even if one of these worked perfectly, the
group owning it would probably not be willing to generate
identifiers for third parties on demand.

2.3.1 Chemical Abstracts Service number     
                (CIF name _database_code_CAS)

2.3.2 Cambridge Structural Database refcode 
                (CIF name _database_code_CSD)

2.3.3 Inorganic crystal structure database collection number
                (CIF name _database_code_ICSD)

2.3.4 National Institute for Standards and Technology Code 
                    (CIF name: _database_code_NBS)

2.3.5 Protein Data Bank code      
                (CIF name _database_code_PDB)

2.3.6 Powder Data File code
                 (CIF name _database_code_PDF)

2.3.7 Pauling File code         

The following two codes indicate the structure type.  This would
only be useful for high symmetry crystals.  There may be
ambiguities between structure types when the symmetry is low.
Structure types are not useful for low symmetry structures.

2.3.8 Pauling File type code

2.3.9 Struktur Bericht type code
2.4. Conditions of Characterization
For crystallographic phases it might be useful to indicate the
conditions of temperature and pressure under which the material
was characterized or prepared.

2.4.1 Temperature     (CIF names: _diffrn_ambient_temperature

2.4.2 Pressure        (CIF names: _diffrn_ambient_pressure

The chemical characterization is well handled by the proposed
IChI symbol.  The only additional items that might be of interest
to IChI for characterizing the crystal phase are
    state flag (gas, liquid, crystal etc.)
    Space group number
    Wyckoff sequence
    Reduced cell

If the state of the compound is important, the state flag would
be needed to be set to indicate the state of matter adopted by
the compound.  If this is set to 'crystal' then the crystal phase
could be identified by giving the space group number.  For the
majority of cases this should be sufficient as there are very few
materials that have two phases belonging to the same space group,
but where these do occur, the Wyckoff sequence or reduced cell
should be able to distinguish them.  The Wyckoff sequence depends
on a correct assignment of the space group, but the reduced cell
would serve well in cases where the space group was missassigned
or unknown.

If there is a consensus within the PhaseID group by the deadline
set at the beginning of this email, I will prepare the draft of a
formal report to the IUCr Nomenclature Commission.  We should be
able to approve this without too much delay.  When approved by
the Commission it would be sent to the IUPAC IChI project to be
included as part of the IChI symbol.

                    Best wishes


I.David Brown, Professor Emetitus of Physics
Brockhouse Institute for Materials Research
McMaster University, Hamilton, Ontario
Canada L8S 4M1
Tel: +905 525 9140 ext 24710
Fax: +905 521 2773
email: idbrown@mcmaster.ca

phase-identifiers mailing list

Reply to: [list | sender only]

Copyright © International Union of Crystallography

IUCr Webmaster