[IUCr Home Page] [CIF Home Page]


[Date Prev][Date Next][Date Index]

(17) Intellectual Property Rights, new DDL, matters arising

Dear Colleagues

Please excuse the delay since the last circular. CIF matters have not exactly
slipped my mind!

Agreement
---------
(12)A8.1   Comment lines are not required to be preserved by CIF parsing
           software.
There were no objections to this.

Pending Agreement
-----------------
(12)A4.2    The treatment of the dictionary introductory sections seems to
have gone on forever. I propose that we adopt the uncontested statements
of (12)A4.2, i.e.

       "Introductory sections of the Dictionary should follow the same file
        syntax as data name definition sections, with the following
        conventions: 
                                                          _type is "null";
        and _category is "dictionary_definition". The _definition is a free
        text field describing the general characteristics of category xxxx."

The following sentence from the earlier proposal is disputed:

                    "the data block name takes the form data_xxxx_[], where
        the square brackets may contain an identifier of the dictionary, if
        it is not the Core; _name is likewise '_xxxx_[]';"

I had planned to call a vote on this, and the alternatives that people have
put forward; but I am in the end convinced by Syd's statements in (4) that
"the extension _intro etc. [is not] critical for most applications", and 
believe that we should consider the form of this as a matter of style. Paula
and I will use the [] notation in the Core and mm dictionaries; I hope that
Brian will follow suit in the powder one and use [pd]; but it is the use of
_type null and _category dictionary_definition that really matters.

Gluttons for further punishment are invited to disagree.

Call for Agreement
------------------
A10.6    David has asked for a call for agreement on the character set to be
         used for data names, along the lines of "suppose that any character
         is allowed, but only use alphanumerics". To my mind, this isn't
         strong enough for use within the Standard. I am heedful of Paula's
         remark in (14) that "just because we have license to inflict horrors
         ... doesn't mean that we ought to do it". Even so, without good
         reason to exclude particular characters, I can envisage that a use
         might well be found for any of the full ASCII character set. I am
         therefore proposing the statement that "ANY printable ASCII
         character may be used in a data name; only the leading underscore
         character has a special significance". This means that implementors
         must check that they can handle all printable characters in a
         dataname on an equal footing. Note, in particular, that the square
         bracket characters we are proposing for introductory sections do
         not in themselves have any special significance (other data names
         may contain square brackets): it is the associated _type and
         _category values that matter here.


Review
------
David has posted the following summary of recent discussions, which is useful
for reminding us where we are:

D> 4.1 (restraints) and 4.2 (introduction) are still active in spite of an
D> attempt to resolve them. 

But see my latest attempt on 4.2 above.

D> 4.3 (extension dictionaries) and 4.4 (enumeration synonyms) have been
D> closed with agreements. 
D> 4.5 (invalid cif data) and 4.6 (matters arising) are now closed, no
D> agreement being needed. 
D> 5.1 (comcifs procedures) is closed with an agreement 
D> 8.1 (treatment of comments) will close shortly with a confirmed agreement 

Now formally closed.

D> 12.1 (schedule) seems to be accepted, if somewhat reluctantly.  No formal 
D> agreement is needed.  See my comments above. 
D> 15.1 (standard item name prefixes) is out for agreement.  Only Howard objects
D> but George is all in favour - neither of these strictly are voting members
D> but in any case their votes tend to cancel each other out.  This agreement
D> can be confirmed shortly unless there are any other strong objections.
D> 10.1 (star files) is beyond comcifs jurisdiction and is closed. 
D> 10.2 (privileged characters) seems to have died without resolution.  None is
D> needed unless we wish to reserve special characters (? and .) 
D> 10.6 (restricted character set for item names) seems to have died without
D> resolution but I propose that we work on the assumption that the character
D> set is restricted, on the grounds that this will give us more flexibility
D> in the future without unduly restricting us in the present.  I would like
D> to see a formal agreement to this proposition.  Past mistakes can be dealt
D> with separately if this seems to be desirable.
D>
D> All other discussions are still open.

=========================================== New topic

D17.1 Intellectual Property Rights
----------------------------------
As you are aware, the STAR File process is the subject of a Patent
application within the UK (and I think also in the broader framework of
European Community legislation). The application is in the names of the IUCr
and S. R. Hall. The IUCr holds the copyright on the CIF specifications and
Dictionary. Within the last fortnight, the UK Patent Examiner has dropped his
objections to the application, and it therefore seems likely (and remarkable!)
that the patent will be granted.

There has been some concern among attendees at the Tarrytown workshop that
software developers of CIF-based applications need to know the position of
the IUCr regarding the development of external software products that make
use of CIF. I have sent the following message to Phil Bourne for posting to
the Tarrytown discussion lists. It is a statement of policy that was drafted
during the Union's negotiations with the ICDD. We propose to check with the
patent attorney that it remains an acceptable form of words, and use it as
the basis of a formal statement of policy that the Union should publish in
its journals and other public places. Clearly, it is important that COMCIFS
should be aware of the Union's policy on this matter. Further, it seems
entirely proper that COMCIFS should be able to advise the Executive Committee
on the development or modification of this policy, so I am using this
platform as an opportunity for you to contribute any relevant thoughts you
may have on this. I do not anticipate much volume of discussion on this
topic, however.

> Phil
> 
> While Syd was here last week we talked about the IUCr policy on intellectual
> property rights in the CIF/STAR arena. We shall publish a definitive 
> statement of policy (probably under the aegis of COMCIFS) in the near
> future, when we have had a legal eye cast over the document. In the mean
> time, I can let you have a copy of the statement that has been used in the
> past while discussing the use of CIF by other organizations. You are welcome
> to post it to the Tarrytown group, and other interested parties, so long as
> you retain the caveat:
> 
> "The attached statement is a draft of the IUCr policy on its STAR and CIF
> file formats. It will be superseded by an official statement that will be
> published in Union journals and other suitable places. This current draft
> is not to be regarded as authoritative. However, it is unlikely that the
> final version will differ significantly from this, and so it has been
> released for the information of software developers who have an interest
> in the issues involved. Developers who are concerned over any aspect of
> this statement are invited to contact the Executive Secretary of the IUCr
> for clarification or further discussion."
> 
> With best wishes
> Brian
> ==============================================================================
>             The IUCr Policy on the Use of  the  
>           Crystallographic  Information File (CIF)
> 
> The Crystallographic Information File (Hall, Allen & Brown, 1991)
> is,  as  of  January  1992, the recommended method for submitting
> publications to Acta Crystallographica Section  C.  The  Interna-
> tional  Union  of Crystallography holds the Copyright on the CIF,
> and has applied for a Patent on the STAR File syntax which is the
> basis for the CIF format.
> 
> It is a principal objective of the IUCr to promote the use of CIF
> for the exchange and storage of scientific data. The IUCr's spon-
> sorship of the CIF development was motivated by its responsibili-
> ty  to  its scientific journals, which set the standards in crys-
> tallographic publishing and are the primary sources of its funds.
> The  IUCr  intends  that CIFs will be used increasingly for elec-
> tronic submission of manuscripts to these journals in future. The
> IUCr  recognises  that,  if  the  CIF and the STAR File are to be
> adopted as a means for universal data  exchange,  the  syntax  of
> these files must be strictly and uniformly adhered to. Even small
> deviations from the syntax would ultimately cause the  demise  of
> the universal file concept. Through its Copyrights and Patent the
> IUCr has taken the steps needed to ensure strict conformance with
> this syntax.
> 
> The IUCr policy on the use of the CIF and STAR File processes  is
> as follows:
> 
>    1 CIFs and STAR Files may be generated, stored or transmitted,
>      without permission or charge, provided their  purpose is not
>      specifically  for profit  or commercial gain,  and  provided
>      that the published syntax is strictly adhered to.
>      
>    2 Computer software may be developed for use with CIFs or STAR
>      files, without permission or charge, provided it is distrib-
>      uted in the public domain.  This condition  also applies to
>      software  for  which a charge is made,  provided  that  its
>      primary  function  is  for  use  with  files  that  satisfy
>      condition 1 and that it is distributed as a minor component
>      of a larger package of software.
>      
>    3 Permission  will  be granted  for the use of CIFs and  STAR
>      Files for specific  commercial  purposes (such as databases
>      or network exchange processes),  and for  the  distribution 
>      of commercial CIF/STAR  software, on written application to
>      the  IUCr Executive  Secretary,  2  Abbey  Square,  Chester
>      CH1 2HU, England.  The nature and  the term of the licences
>      granted  will  be  determined  by the  IUCr  Executive  and
>      Finance Committees.
> 
> In summary, the IUCr wishes to promote the use of the  STAR  File
> concepts  as  a  standard  universal data file. It will insist on
> strict compliance with the published syntax for all applications.
> To  assist  with this compliance, the IUCr provides public domain
> software for checking the logical integrity of  a  CIF,  and  for
> validating the data name definitions contained within a CIF.  De-
> tailed information on this  software,  and  the  associated  dic-
> tionaries,  may  be  obtained  from  the  IUCr  Office at 5 Abbey
> Square, Chester CH1 2HU, England.


D17.2 Revised Dictionary Definition Language
--------------------------------------------
Syd Hall has been visiting England, and discussing STAR/CIF/MIF/DDL issues
at great length with Tony Cook, Frank Allen, Peter Murray-Rust and myself.
He has been working on a revised DDL which is intended to meet the various 
requirements that have been brought to his attention in this forum and by
others. I had hoped to be able to circulate these revisions with this
message, but I haven't yet received the most recent version - if it comes
before Christmas I shall distribute it to everyone as a stocking filler. The
changes that have been effected close off many of our more general STAR
discussions. Note that Syd is open to discussion on individual points of
detail (more usefully in direct correspondence than through this list), but
he does not envisage a need for further substantial change. His base level for
these particular developments is to ensure that the needs of both MIF and CIF
are met in a compatible fashion.

For your information, the major changes implemented are as follows:

(1) It is formally affirmed in the definition of _category that the _category
string must be identical for all items in a list, but there may be more than
one list with the same _category value [see (10)10.5]. Syd accepts that Brian
T. isn't happy with this, but believes it really is the best solution to meet
the needs of all the database and infomatics folk he has spoken with. 

(2) _esd and _esd_default are dropped. Their role will be transferred to a
new data name, _type_conditions.

(3) _type_conditions allows refinements to the basic allowed data types.
Examples are "esd" for experimental values, or "seq" for a sequence or range
of values (in MIF query applications, enumerated values may span a range).
See some further comments on this below.

(4) _include_file affords a syntax that will permit CIF-style information to
be inserted in the data stream from an external source: this will satisfy at
least some of Paula's requirements in handling external reference files.

Please hold any comments on this until you have received and read the revised
DDL dictionary, which I shall forward when it arrives.

=========================================== Open issues

D4.1 Restraints
---------------
P> I am listening to the complaints about the way I propose to handle
P> restraints - amongst them:  1) the definitions are in _enumeration_detail 
P> instead of in a proper definition field; 2) the units are not explicit;
P> and 3) the assumption that enumeration lists are dynamic. I am sure that
P> there are many others.  But consider the consequence of the alternative.
P> 
P> The current mechanism of handling restraints leads to a loop with the 
P> following form - exactly the table that is usually published with a structure
P> refined via Prolsq.
P>  
P> loop_
P> _refine_ls_restr_type
P> _refine_ls_restr_target
P> _refine_ls_restr_model
P> _refine_ls_restr_number
P> _refine_ls_restr_criterion
P> _refine_ls_restr_rejects
P>  'bond_d'           0.020  0.018  1654  '> 2\s'  22
P>  'angle_d'          0.030  0.038  2246  '> 2\s'  139
P>  'planar_d'         0.040  0.043  498   '> 2\s'  21
P>  'planar'           0.020  0.015  270   '> 2\s'  1
P>  'chiral'           0.150  0.177  278   '> 2\s'  2
P>  'singtor_nbd'      0.500  0.216  582   '> 2\s'  0
P>  'multtor_nbd'      0.500  0.207  419   '> 2\s'  0
P>  'xyhbond_nbd'      0.500  0.245  149   '> 2\s'  0
P>  'planar_tor'       3.0    2.6    203   '> 2\s'  9
P>  'staggered_tor'    15.0   17.4   298   '> 2\s'  31
P>  'orthonormal_tor'  20.0   18.1   12    '> 2\s'  1
P> 
P> The alternative proposal (specific data names for each type of restraint)
P> would look like this:
P> 
P> _refine_ls_restr_bond_d_target         0.020
P> _refine_ls_restr_bond_d_model          0.018
P> _refine_ls_restr_bond_d_number          1654
P> _refine_ls_restr_bond_d_criterion     '> 2\s'
P> _refine_ls_restr_bond_d_reject            22
P> _refine_ls_restr_angle_d_target        0.030
P> _refine_ls_restr_angle_d_model         0.038
P> _refine_ls_restr_angle_d_number         2246
P> _refine_ls_restr_angle_d_criterion    '> 2\s'
P> _refine_ls_restr_angle_d_reject          139
P> _refine_ls_restr_planar_d_target       0.040
P> _refine_ls_restr_planar_d_model        0.043
P> _refine_ls_restr_planar_d_number         498
P> _refine_ls_restr_planar_d_criterion   '> 2\s'
P> _refine_ls_restr_planar_d_reject          21
P> 
P> And on and on and on.  This is not out of the question, but it sure isn't 
P> elegant, either.  And if we decide that we must go with this mechanisms in 
P> this case, we are going to have to to the same thing in other places where
P> we used the same mechanism to avoid introducing 300 data names. 
P> 
P> In summary, I understand the basis for the objections to the way were are 
P> doing things now, but consider the cost if we really decide to do these
P> things more rigorously.

D10.3 Global data assignments
-----------------------------
P> I quote Brian here - "It is, of course, possible to modify ciftex to
P> recognise global_'s and print the default information in each entry:  but
P> the argument is surely that all CIF software that might need to access
P> dictionaries would need to be modified in just such a way to interpret
P> globals_'s;  and is the pay off worth the effort?"
P> 
P> I say no - as I have said before. I just don't see that the marginal benefit
P> of making the dictionary slightly shorter (which is absolutely the only 
P> benefit that I can see) justifies the programming effort that will be
P> required to deal with using global_.

I discussed this issue with Syd while he was in Chester, and he sticks by his
position that CIF dictionaries should be considered as having a standard STAR
syntax, which permits the use of global_. This is motivated by the desire to
use the same dictionary mechanism in MIF applications. It is, of course,
possible to choose not to *use* global_ in CIF dictionaries (even though it
is formally permitted); but since the current drafts already have it in,
perhaps we should just go along with this. For dictionary applications,
proper handling of global_ is not a big problem, though it does mean that
concatenation of dictionaries is not automatically permitted (see earlier
discussions). I hope Syd will now modify CYCLOPS to permit opening of
multiple dictionaries :-).

(16)A15.1 Standard program prefixes
-----------------------------------
P> I agree with Howard here - the purpose of this committee is to preside over 
P> the official CIF dictionary.  We should make every effort to provide 
P> definitions for science that is common to all applications and leave program 
P> specific input parameters to the programmers.  I am perfectly happy to have 
P> George go ahead and create a series of SHELX specific data names for things 
P> that are really too specific to his program to be of interest to the rest of 
P> the world, but the intent of *local* data names is that they are *local* - 
P> i.e., they never leave George's lab.

Not entirely so. An author may invent a local data name for a quantity which
he nevertheless wishes to see published. By special arrangement with the
author, we can permit ciftex to "see" that local data name. It is not
intended that the particular data name be propagated beyond this particular
arrangement, but its domain of influence has certainly propagated out beyond
the originator's laboratory.

P>                                      If George needs data names that will be
P> provided to the world at large (and I would think that the treatment of 
P> restraints on hydrogen atoms falls into that category) we should work with
P> him to develop official data names for this purpose.
P> 
P> I had thought that the ground rules are that users can create whatever they 
P> want locally, but that those local extensions would be completely invisible
P> in the public arena (that is, parsing software simply ignores any data
P> names that do not conform to the officially mandated dictionaries). I think
P> this is how it should be - we cannot possibly hope to maintain order over
P> all of the local extensions that are going to occur.  In particular, I think
P> it madness to sanction a _local category - this is a guarantee of clashes
P> because everyone will be using the same data names to mean different things.

I agree. Given that CIF is intended as a data exchange standard, you will
wish to exchange CIFs. You cannot use a _local_ data name in a CIF unless you
know that that particular file has never left your jurisdiction, for if it
has gone out and come back, how do you know that the _local_ is still your
_local_ and not someone else's? The registry of assigned prefixes affords a
designated namespace for users who need to reserve their own data names.
COMCIFS has no jurisdiction over what they define within that namespace, but
it will prevent collision of data names.

P> Stated once more, rather than hand out categories to programmers, I would
P> rather see us work with the programmers to provide non-program specific data 
P> names for the items that they would like to see archived.

(15)D15.1 New types
-------------------
P> I think there should be a date data type, as well as a boolean (yes and no) 
P> one.  And my understanding was that the current data types are 'numb' and 
P> 'char' (which Brian listed) and 'null' (which Brian missed out - null being 
P> for the intro sections).

As Horace would say, "bonus dormitat Homerus".

In his revision to the DDL, Syd is proposing to have a new item (probably
called "_type_conditions") that will allow type extensions. An interesting
consequence of this is that the DDL terms _esd and _esd_default will be
dropped altogether. A dataname that defines an experimentally determinable
value may have _type "numb" and _type_conditions "esd"; this will instruct a
parser to allow values of the form mmmm(nn), and interpret the (nn) in the
appropriate way. This has certain advantages - validation software written
for us under the older DDL would reject a file as invalid if an atomic
coordinate did not possess an esd, even when the atom site was on a special
position or if the site coordinates were fixed. This was because "_esd yes"
was a mandatory condition; whereas the use of _type_conditions will act more
as a description of allowed values. The basic _type values are retained as
numb, char and (yes) null.

D16.1  s.u. versus e.s.d.
-------------------------
D> I heartily concur with Howard.  I was in favour of this before the
D> international standard came out.  In fact I have had real problems with
D> esd.  In a recent paper I edited the author had a statement like: 'the P-O
D> bond length is 1.567(2) and an average Cu-O bond length of 2.10(5) A.'  It
D> is clear that the first bracket contained an s.u. and the second an
D> estimated standard deviation of the population of different Cu-O bonds
D> lengths present in the compound.  I was puzzled how to ask him to
D> distiguish between e.s.d's and estimated standard deviations!  The sooner
D> we make this change the better.

=============================================================================
May I take this opportunity to wish you all a very happy Christmas, and every
success in the year to come.

With best regards
Brian