From toprea@salud.unm.edu Thu Nov 15 10:30:29 2007
Date: Thu, 15 Nov 2007 10:29:50 -0700
From: Tudor Oprea <toprea@salud.unm.edu>
To: Sara Pollock <saran@amath.washington.edu>, Sara Pollock <sarapollock@mac.com>,
 Evangelos A. Coutsias <vageli@math.unm.edu>, Michael Wester <wester@math.unm.edu>
Subject: Fwd: Revision Requested for Manuscript ID ci-2007-003412

We have a "leg in", we just need to address the referee comments. All
five of them!
Reviewers 1 & 4 seems difficult to please. We have to address GENG
and MOLGEN.
Also, it's not important to merge the 2 papers.
 
Tudor

>>> On 11/13/2007 at 8:00 AM, in message
<363737561.1194966012881.JavaMail.wladmin@tss1be0011>,
<wendy@warr.com> wrote:
13-Nov-2007

Dear Prof. Oprea:

Manuscript ID: ci-2007-003412
Title: "Topological Analysis of Molecular Scafolds I: Enumeration of
Ring Topologies"
Author(s): Pollock, Sara; Coutsias, Evangelo; Wester, Michael; Oprea,
Tudor

Enclosed are the reviews for your paper. The reviewers have found
significant problems that require attention. Please make revisions
and respond well to the reviewers' points. The resubmitted paper and
your reply to their criticisms may be sent back to the reviewers for
further comment.

To speed up the review process and avoid un-submission of your
revised manuscript, please be sure that your revised manuscript
adheres to ACS format, especially references. For your convenience I
have also enclosed a list of reference examples for JCIM.

On your manuscript you need to use journal abbreviations, add
accessed date to web citations, and correct the fonts and journal
abbreviation in ref 3.

Plesae ensure we have the signed copyright form for your manuscript.
The copyright form is available at
http://pubs.acs.org/instruct/copyright.pdf; please complete, sign and
fax it or e-mail it to our office.

Please submit a TOC graphic.

Sincerely,


Wendy A. Warr
Associate Editor
Journal of Chemical Information and Modeling

Wendy Warr & Associates
6 Berwick Court
Holmes Chapel
Cheshire CW4 7HZ
ENGLAND
Telephone number: +44(0)-1477-533-837
Fax number:       +44(0)-1477-533-837
Email address:    wendy@warr.com


------------------------------------
To revise your manuscript, log into ACS Paragon Plus
http://paragonplus.acs.org/login and select "My Authoring Activity",
where you will find your manuscript title listed under "Manuscripts
with Decisions."  Under "Actions," click on "Create a Revision." 
Your manuscript number has been appended to denote a revision.


When submitting your revised manuscript, you will be able to respond
to the comments made by the reviewer(s) in the text box provided or
by attaching a file containing your response letter.

IMPORTANT:  Your original files are available to you when you upload
your revised manuscript.  Please delete any redundant files before
completing the submission.
------------------------------------


Reviewer(s)' Comments to Author:
Reviewer: 1

Recommendation: Publish after major revisions noted.

Comments:
The paper by Pollock et al.is a theoretical study enumerating all
possible topologies up to 8-rings using a specifically designed
algorithm. The graph generation part might not be novel, but the
analysis of the graph families is interesting. I recommend
publication after extending the graph analysis part and responding to
the comments below.

1.The paper should be checked by a mathematician for the correctness
of the argumentation.
2.In the introduction, the authors lay out a definition of a
"scaffold" graph, but also explain that this formulation is
equivalent to two previous ones where the term "molecular framework"
was used (page 2, bottom). Please just use the term "molecular
framework", which is better. 

++ We prefer the term scaffold topology to emphasize its theoretical,
chemistry-free nature. At this level, the objects contain only topological
information, as defined in the paper. As such, these objects have zero
chemistry information content.
++

3.Why was the problem not simply solved by taking all graphs from the
exhaustive enumeration available in the GENG program? It's perhaps
even possible to enumerate extended graphs directly with GENG. In
this case the graph generation part is pointless. In any event McKay
and his GENG program must be cited and it's non-utilization justified
carefully.
\bibitem{nauty}
nauty {U}ser's {G}uide, Version 2.4,  McKay,~B.~D. Department of Computer
  Science, Australian National University:  Canberra, Australia, 2007.

====================
-----------------+++++ No canonical labeling for multiple edges
(which we can generate by allowing 2-nodes, prune, classify, compare)
====================

4.The bibliography is extremely short, this cannot be.

++ we will add unnecessary references to please the referees
++
\bibitem{Fink2005}
Fink,~T.;\ \ Bruggesser,~H.;\ \ Reymond,~J.-L.  {V}irtual {E}xploration of the
  {S}mall-{M}olecule {C}hemical {U}niverse below 160 {D}altons.
  \textit{Angew. Chem. Int. Ed.} \textbf{2005,} \textsl{44,} 1504--1508.

\bibitem{deLaet2000}
de~Laet,~A.;\ \ Hehenkamp,~J. J.~J.;\ \ Wife,~R.~L.  {F}inding {D}rug
  {C}andidates in {V}irtual and {L}ost/{E}merging {C}hemistry.   \textit{J.
  Heterocyclic Chem.} \textbf{2000,} \textsl{37,} 669--674.

\bibitem{Hehenkamp2000}
Hehenkamp,~J.~J.~J.;\ \ de~Laet,~R.~C.;\ \ Parlevliet,~F.~J.;\ \
  Verheij,~H.~J.;\ \ Wife,~R.~L.  {N}avigating the real and virtual chemical
  worlds.   In  \textit{Proceedings of the 2000 Chemical Information
  Conference}; Collier,~H.,\ \ Ed.;  Infonortics: Annecy, France, 2000.

\bibitem{Ivanciuc1993}
Ivanciuc,~O.;\ \ Balaban,~T.-S.;\ \ Balaban,~A.~T.  {D}esign of topological
  indices. {P}art 4. {R}eciprocal distance matrix, related local vertex
  invariants and topological indices.   \textit{J. Math. Chem.} \textbf{1993,}
  \textsl{12,} 309--318.
\bibitem{Filip1987}
Filip,~P.~A.;\ \ Balaban,~T.-S.;\ \ Balaban,~A.~T.  {A} new approach for
  devising local graph invariants: {D}erived topological indices with low
  degeneracy and good correlation ability.   \textit{J. Math. Chem.}
  \textbf{1987,} \textsl{1,} 61--83.

\bibitem{Mekenyan1988}
Mekenyan,~O.;\ \ Bonchev,~D.;\ \ Balaban,~A.  {T}opological indices for
  molecular fragments and new graph invariants.   \textit{J. Math. Chem.}
  \textbf{1988,} \textsl{2,} 347--375.


\bibitem{Cvetkovic1997}
Cvetkovi\v{c},~D.;\ \ Rowlinson,~P.;\ \ Simi\v{c},~S. \textit{{E}igenspaces of
  {G}raphs;} Cambridge University Press: Cambridge, 1997.
\bibitem{Trinajstic1988}
Trinajsti\'c,~N.  {T}he characteristic polynomial of a chemical graph.
  \textit{J. Math. Chem.} \textbf{1988,} \textsl{2,} 197--215.
\bibitem{Trinajstic1991}
Trinajsti\'c,~N.;\ \ Nikoli\'c,~S.;\ \ Knop,~J.~V.;\ \ M{\"u}ller,~W.~R.;\ \
  Szymanski,~K. \textit{{C}omputational {C}hemical {G}raph {T}heory:
  {C}haracterization, {E}numeration and {G}eneration of {C}hemical {S}tructures
  by {C}omputer {M}ethods;} Ellis Horwood: New York, 1991.
\bibitem{Lipkus2001}
Lipkus,~A.~H.  {E}xploring {C}hemical {R}ings in a {S}imple
  {T}opological-{D}escriptor {S}pace.   \textit{J. Chem. Inf. Comput. Sci.}
  \textbf{2001,} \textsl{41,} 430--438.


5.The node-splitting rule shown in figure 2 is correct for the 1st
entry only in cases where the four ends are different, which is not
always the case. The figure is clear for the 3 last entries, the text
to the figure is obscure.

++ Sara, check this ++

6.I did not understand the concept of the return index, except that
it addresses the problem of graph comparison. There should be a
simple reference to this index.

++ we cannot reference the return index. Sara to improve its definition
++
====================
The return index was used to produce a canonical labeling of multigraphs
with up to 8 rings (nauty, at least version ## that we believe to be the
latest public version) does not offer the capability of a canonical
labeling for multigraphs (see documentation of LABELG). Although we
have only established this labeling to be discriminating for
scaffolds with up to 8 rings, this allows us to label such
scaffolds for fast subsequent comparisons.
====================

7.I am surprised by the heading "results and conclusion"... which
indicates the shortcoming of the paper: the real data is in the
analysis of the scaffold collection generated, not in the algorithm
to build it. Table 1 and figure 8 show an interesting insight, which
deserves to be moved in an authentical "results" section, and
extended. It's very sad that the author have not come up with a good
idea to explain the peak behavior of topologies in figure 8.

++ Sara & Vageli
++

8.I am surprised to see no mention of graph planarity and graph
automorphism group analysis in the paper, which would help categorize
the graphs better. Not to mention analyzing which graphs are
physically realizable in molecules of a certain size. The peak
behavior might be related to simple combinatorics and the fact that
graphs with only one type of node must be overall more symmetrical.

++ Sara & Vageli
++

===================
----WE HAVE DETERMINED PLANARITY (USING NAUTY): SEE NEW TABLES
(10% OF ALL TOPOLOGIES GENERATED ARE NON-PLANAR, AS COMPARED TO
EXACTLY 44 MOLECULES (12 topologies) IN THE ENTIRE merged database
===================

Please rate the quality of the science reported in this paper (10 -
High Importance / 1 - Low Importance): 6

Please rate the overall importance of this paper to the field of
chemical information or modeling (10 - High Importance / 1 - Low
Importance): 7


Reviewer: 2

Recommendation: Publish after minor revisions noted.

Comments:
In this paper the authors present an algorithm for generating all
possible ring topologies up to eight rings. The paper is timely, and
it is along the lines of previous effort through decades to identify
equivalent substructures and substructure counting.
1) The subject matter is quite rich in computational chemistry and
the introduction should do a better job of placing the work in the
context of other methods and algorithms described to identify and
count substructures.

++ See our reply to Reviewers 4 & 5
++

2) Comparison with the Ullman for graph isomorphism algorithm at
least and among others is needed (speed, efficiency, what are the
advantages,etc.).

++ Mike & Vageli to trace(y) Ullman
++
========================
please see answer to ref.#? we were not after a better graph isomorphism,
algorithm; rather, we needed a canonical labeling of our topology library
to allow for fast searches. Our return-index algorithm has limitations
(which we are very careful to note), but we did prove its adequacy for
scaffolds with up to 8 rings. It is reasonably fast (although, of course
much slower than nlogn) but once the library structures are labeled they
can be searched directly, and each search only costs kn^3 (where n is the
size of a new topology and k is the size of the subclass associated
to the molecule in our library, of size much smaller than the library itself).
========================

Please rate the quality of the science reported in this paper (10 -
High Importance / 1 - Low Importance):

Please rate the overall importance of this paper to the field of
chemical information or modeling (10 - High Importance / 1 - Low
Importance):


Reviewer: 3

Recommendation: Publish after major revisions noted.

Comments:
The study of rings in chemical universe and ring composition of
available chemical databases is actual and of interest to current
chemical informatics.
The communication consists of 2 parts. The first, shorter part is
description of methodology and in the 2nd part application to the
study of several chemical databases follows. I do not have any
critical points concerning the actual study. To make the paper more
readable and more useful for general chemoinformatics community,
however, I suggest the following modification:
Since the two parts are closely related and refer to each other, I
strongly recommend to merge the two manuscripts into one
communication. The first theoretical part may be somehow shortened,
the more technical part (which is too technical for general reader of
JCIM) should be moved to the supporting information section.

Some additional points:

I am missing some citations relevant to this topic, for example:

Lipkus, A. (2001), 'Exploring chemical rings in a simple
topological-descriptor space.', J Chem Inf Comput Sci. 41, 430 - 438.
================
refered to in 2nd paper already
================
or
Xu, Y. & Johnson, M. (2002), 'Using Molecular Equivalence Numbers To
Visually Explore Structural Features that Distinguish Chemical
Libraries', J. Chem. Inf. Comput. Sci. 42, 912-926.
================
add?
================

The PubChem database contains currently 10.9 million unique compounds
(the PubChem command all[filt]). The author listed 11.6 million in
Nov. 2006. How it is possible?
=====================
answered in second paper, use here also: substances vs. compounds
=====================

The form of citations is not conform with the JCIM format.
=====================
MJW fixed 2nd paper, need to do also here
=====================


Please rate the quality of the science reported in this paper (10 -
High Importance / 1 - Low Importance): 6

Please rate the overall importance of this paper to the field of
chemical information or modeling (10 - High Importance / 1 - Low
Importance): 7


Reviewer: 4

Recommendation: Publish after major revisions noted.

Comments:
This first paper in a series of two presents an algorithm to
enumerate molecular scaffolds/molecular frameworks. After defining
scaffolds (molecular graphs where nodes of degree 1 and 2 are removed
recursively), the algorithm is presented. The paper is a method paper
and does not follow the usual format method/result/discussion. I have
several concerns that the authors may want to address prior
publication.

1) The authors have done a poor job in evaluating their contribution
within the molecular graph enumeration literature. Since the 1960s
many papers and reviews have been published and molecular generator
codes such as MOLGEN can be downloaded from internet. Yet the authors
cite only two papers, one of which is a presentation (Reference #2).
===============
???
===============

2) Reference #1 cited by the authors makes use of an algorithm
developed by B. McKay named GENG (part of the Nauty package, which
can be freely downloaded at http://cs.anu.edu.au/~bdm/nauty/). That
algorithm can generate all connected graphs with a specified degree
range (could be 3 to 4) up to 32 vertices. To generate graphs with
edge multiplicity another code (named MULTIG) from the same package
can also be downloaded. Could these codes (that are well documented
and written by a recognized graph theory expert) be used in the
present context?
========================
WE CAN USE GENG TO GENERATE MULTI-EDGES if we allow nodes of degree 2, 
THEN PRUNE BUT COMPARISON PROBLEM REMAINS, GIVEN LIMITATIONS OF labelg 
RE. MULTIEDGES: see nauty documentation, as discussed above
For MULTIG:
  we could proceed as follows:
  fix n (nodes) and e (edges) and d:D = 3:4
  to all combinations compatible with the requirement of at most
  8 rings in GENG. This would generate all topologies without 
  multiple edges or loops. We could then use MULTIG to generate 
  all nonisomorphic multigraphs based on these. However neither
  has the capability of including loops (see documentation of
  MULTIG). Although we could have proceeded in this way and then
  addressed the problem of adding loops on our own, building on
  the structures generated by GENG/MULTIG, we felt that our own,
  special purpose scheme of generating scaffolds directly was
  no harder---and it had the added advantage for us that we have
  a self-contained, simple algorithm. It would be possible to 
  create the same database using GENG/MULTIG+loop addition, but
  the question of canonically labeling the database still would
  need to be addressed. 
   Of course we could have taken better advantage of nauty, but 
  we did not.
========================

3) I understand the proposed enumeration method produces isomorphic
graphs, which have to be removed. However, I do not understand why
the authors have developed a specific algorithm (based of the
return-index), which they admit is not correct above eight rings.
Many correct algorithms to detect isomorphism exist, and again the
Nauty package (cf. URL given above) contains one of them, which is
fast and can process graphs and multigraphs.

========================
---As limited as it is, the return index allows us a "canonical"
labeling of multigraphs up to 8 rings (PROOF??? see answer to Q4). 
The first pass
required an exhaustive comparison (which we could have performed
using nauty, admittedly!) but once we had it, we can use the
return index to canonically label graphs and speed up comparisons
between topologies in our database and any arbitrary topology.
========================

4) The authors state in the abstract their enumeration is exhaustive,
I did not see any proof of this claim in the manuscript.

========================
HERE WE NEED TO CAREFULLY DESCRIBE THE STEPS (GENERATION, LABELING,
COMPARISON, HOW WE PROVED THAT THE LABELING IS UNIQUE)
********* still to be done (last serious point remaining)
          but we know we are right, so it is a matter of saying it
          correctly!
========================


5) The authors state that the return index algorithm runs in cubic
time, could they elaborate on this? In addition, what is the time
complexity of enumerating scaffolds? Is it polynomial time per
output?
===============
need the argument from the paper explained more clearly
(i.e. cost of matrix multiply by a matrix of 0's and 1's
===============





Please rate the quality of the science reported in this paper (10 -
High Importance / 1 - Low Importance): 4

Please rate the overall importance of this paper to the field of
chemical information or modeling (10 - High Importance / 1 - Low
Importance): 7


Reviewer: 5

Recommendation: Publish after minor revisions noted.

Comments:
Why was the "Return Index" method developed instead of using existing
methods such as the Nauty algorithm?

Why do you assert the method is efficient?  Consider comparing the
time complexity of the Return Index method to Luks' graph
isomorphisim algorithm with Zemlyachenko's trick, where O(n) =
2^SQRT(n*log(n)) (see
http://qwiki.caltech.edu/wiki/Complexity_Garden, Section Graph
Isomorphism).

=========================
We have a canonical label---we are interested in comparing a given
graph to our entire database; once we have labeled each topology
in the database, we just need to extract the index of the new graph
and search for it in the database.
=========================

In Figure 2 (i), the depiction of the 4-node constructed in 3 ways is
odd.  Consider distinguishing the nodes.
=================
to do
=================

In Figure 3 (3), the dotted line loop raising a question in the
reader's mind.  Is it different than a solid line loop?

In Figure 4, an illustration showing construction of (3,4,0) from
(2,2,0) would be helpful.

In Figure 5, reproducing the adjacency matrix and block structure
from the information presented is straightforward, however it may be
difficult for the reader to reproduce the Return Index matrix. 
Consider showing your practice of constructing R.

In Figure 6, how do you distinguish graphs (a) and (b), given the
identical return indices and block structures of (a) and (b)?


Please rate the quality of the science reported in this paper (10 -
High Importance / 1 - Low Importance): 5

Please rate the overall importance of this paper to the field of
chemical information or modeling (10 - High Importance / 1 - Low
Importance): 5


    [ Part 2, Application/MSWORD  58KB. ]
    [ Unable to print this part. ]