From toprea@salud.unm.edu Thu Nov 15 10:30:29 2007 Date: Thu, 15 Nov 2007 10:29:50 -0700 From: Tudor Oprea To: Sara Pollock , Sara Pollock , Evangelos A. Coutsias , Michael Wester Subject: Fwd: Revision Requested for Manuscript ID ci-2007-003412 We have a "leg in", we just need to address the referee comments. All five of them! Reviewers 1 & 4 seems difficult to please. We have to address GENG and MOLGEN. Also, it's not important to merge the 2 papers. Tudor >>> On 11/13/2007 at 8:00 AM, in message <363737561.1194966012881.JavaMail.wladmin@tss1be0011>, wrote: 13-Nov-2007 Dear Prof. Oprea: Manuscript ID: ci-2007-003412 Title: "Topological Analysis of Molecular Scafolds I: Enumeration of Ring Topologies" Author(s): Pollock, Sara; Coutsias, Evangelo; Wester, Michael; Oprea, Tudor Enclosed are the reviews for your paper. The reviewers have found significant problems that require attention. Please make revisions and respond well to the reviewers' points. The resubmitted paper and your reply to their criticisms may be sent back to the reviewers for further comment. To speed up the review process and avoid un-submission of your revised manuscript, please be sure that your revised manuscript adheres to ACS format, especially references. For your convenience I have also enclosed a list of reference examples for JCIM. On your manuscript you need to use journal abbreviations, add accessed date to web citations, and correct the fonts and journal abbreviation in ref 3. Plesae ensure we have the signed copyright form for your manuscript. The copyright form is available at http://pubs.acs.org/instruct/copyright.pdf; please complete, sign and fax it or e-mail it to our office. Please submit a TOC graphic. Sincerely, Wendy A. Warr Associate Editor Journal of Chemical Information and Modeling Wendy Warr & Associates 6 Berwick Court Holmes Chapel Cheshire CW4 7HZ ENGLAND Telephone number: +44(0)-1477-533-837 Fax number: +44(0)-1477-533-837 Email address: wendy@warr.com ------------------------------------ To revise your manuscript, log into ACS Paragon Plus http://paragonplus.acs.org/login and select "My Authoring Activity", where you will find your manuscript title listed under "Manuscripts with Decisions." Under "Actions," click on "Create a Revision." Your manuscript number has been appended to denote a revision. When submitting your revised manuscript, you will be able to respond to the comments made by the reviewer(s) in the text box provided or by attaching a file containing your response letter. IMPORTANT: Your original files are available to you when you upload your revised manuscript. Please delete any redundant files before completing the submission. ------------------------------------ Reviewer(s)' Comments to Author: Reviewer: 1 Recommendation: Publish after major revisions noted. Comments: The paper by Pollock et al.is a theoretical study enumerating all possible topologies up to 8-rings using a specifically designed algorithm. The graph generation part might not be novel, but the analysis of the graph families is interesting. I recommend publication after extending the graph analysis part and responding to the comments below. 1.The paper should be checked by a mathematician for the correctness of the argumentation. 2.In the introduction, the authors lay out a definition of a "scaffold" graph, but also explain that this formulation is equivalent to two previous ones where the term "molecular framework" was used (page 2, bottom). Please just use the term "molecular framework", which is better. ++ We prefer the term scaffold topology to emphasize its theoretical, chemistry-free nature. At this level, the objects contain only topological information, as defined in the paper. As such, these objects have zero chemistry information content. ++ 3.Why was the problem not simply solved by taking all graphs from the exhaustive enumeration available in the GENG program? It's perhaps even possible to enumerate extended graphs directly with GENG. In this case the graph generation part is pointless. In any event McKay and his GENG program must be cited and it's non-utilization justified carefully. \bibitem{nauty} nauty {U}ser's {G}uide, Version 2.4, McKay,~B.~D. Department of Computer Science, Australian National University: Canberra, Australia, 2007. ==================== -----------------+++++ No canonical labeling for multiple edges (which we can generate by allowing 2-nodes, prune, classify, compare) ==================== 4.The bibliography is extremely short, this cannot be. ++ we will add unnecessary references to please the referees ++ \bibitem{Fink2005} Fink,~T.;\ \ Bruggesser,~H.;\ \ Reymond,~J.-L. {V}irtual {E}xploration of the {S}mall-{M}olecule {C}hemical {U}niverse below 160 {D}altons. \textit{Angew. Chem. Int. Ed.} \textbf{2005,} \textsl{44,} 1504--1508. \bibitem{deLaet2000} de~Laet,~A.;\ \ Hehenkamp,~J. J.~J.;\ \ Wife,~R.~L. {F}inding {D}rug {C}andidates in {V}irtual and {L}ost/{E}merging {C}hemistry. \textit{J. Heterocyclic Chem.} \textbf{2000,} \textsl{37,} 669--674. \bibitem{Hehenkamp2000} Hehenkamp,~J.~J.~J.;\ \ de~Laet,~R.~C.;\ \ Parlevliet,~F.~J.;\ \ Verheij,~H.~J.;\ \ Wife,~R.~L. {N}avigating the real and virtual chemical worlds. In \textit{Proceedings of the 2000 Chemical Information Conference}; Collier,~H.,\ \ Ed.; Infonortics: Annecy, France, 2000. \bibitem{Ivanciuc1993} Ivanciuc,~O.;\ \ Balaban,~T.-S.;\ \ Balaban,~A.~T. {D}esign of topological indices. {P}art 4. {R}eciprocal distance matrix, related local vertex invariants and topological indices. \textit{J. Math. Chem.} \textbf{1993,} \textsl{12,} 309--318. \bibitem{Filip1987} Filip,~P.~A.;\ \ Balaban,~T.-S.;\ \ Balaban,~A.~T. {A} new approach for devising local graph invariants: {D}erived topological indices with low degeneracy and good correlation ability. \textit{J. Math. Chem.} \textbf{1987,} \textsl{1,} 61--83. \bibitem{Mekenyan1988} Mekenyan,~O.;\ \ Bonchev,~D.;\ \ Balaban,~A. {T}opological indices for molecular fragments and new graph invariants. \textit{J. Math. Chem.} \textbf{1988,} \textsl{2,} 347--375. \bibitem{Cvetkovic1997} Cvetkovi\v{c},~D.;\ \ Rowlinson,~P.;\ \ Simi\v{c},~S. \textit{{E}igenspaces of {G}raphs;} Cambridge University Press: Cambridge, 1997. \bibitem{Trinajstic1988} Trinajsti\'c,~N. {T}he characteristic polynomial of a chemical graph. \textit{J. Math. Chem.} \textbf{1988,} \textsl{2,} 197--215. \bibitem{Trinajstic1991} Trinajsti\'c,~N.;\ \ Nikoli\'c,~S.;\ \ Knop,~J.~V.;\ \ M{\"u}ller,~W.~R.;\ \ Szymanski,~K. \textit{{C}omputational {C}hemical {G}raph {T}heory: {C}haracterization, {E}numeration and {G}eneration of {C}hemical {S}tructures by {C}omputer {M}ethods;} Ellis Horwood: New York, 1991. \bibitem{Lipkus2001} Lipkus,~A.~H. {E}xploring {C}hemical {R}ings in a {S}imple {T}opological-{D}escriptor {S}pace. \textit{J. Chem. Inf. Comput. Sci.} \textbf{2001,} \textsl{41,} 430--438. 5.The node-splitting rule shown in figure 2 is correct for the 1st entry only in cases where the four ends are different, which is not always the case. The figure is clear for the 3 last entries, the text to the figure is obscure. ++ Sara, check this ++ 6.I did not understand the concept of the return index, except that it addresses the problem of graph comparison. There should be a simple reference to this index. ++ we cannot reference the return index. Sara to improve its definition ++ ==================== The return index was used to produce a canonical labeling of multigraphs with up to 8 rings (nauty, at least version ## that we believe to be the latest public version) does not offer the capability of a canonical labeling for multigraphs (see documentation of LABELG). Although we have only established this labeling to be discriminating for scaffolds with up to 8 rings, this allows us to label such scaffolds for fast subsequent comparisons. ==================== 7.I am surprised by the heading "results and conclusion"... which indicates the shortcoming of the paper: the real data is in the analysis of the scaffold collection generated, not in the algorithm to build it. Table 1 and figure 8 show an interesting insight, which deserves to be moved in an authentical "results" section, and extended. It's very sad that the author have not come up with a good idea to explain the peak behavior of topologies in figure 8. ++ Sara & Vageli ++ 8.I am surprised to see no mention of graph planarity and graph automorphism group analysis in the paper, which would help categorize the graphs better. Not to mention analyzing which graphs are physically realizable in molecules of a certain size. The peak behavior might be related to simple combinatorics and the fact that graphs with only one type of node must be overall more symmetrical. ++ Sara & Vageli ++ =================== ----WE HAVE DETERMINED PLANARITY (USING NAUTY): SEE NEW TABLES (10% OF ALL TOPOLOGIES GENERATED ARE NON-PLANAR, AS COMPARED TO EXACTLY 44 MOLECULES (12 topologies) IN THE ENTIRE merged database =================== Please rate the quality of the science reported in this paper (10 - High Importance / 1 - Low Importance): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 - High Importance / 1 - Low Importance): 7 Reviewer: 2 Recommendation: Publish after minor revisions noted. Comments: In this paper the authors present an algorithm for generating all possible ring topologies up to eight rings. The paper is timely, and it is along the lines of previous effort through decades to identify equivalent substructures and substructure counting. 1) The subject matter is quite rich in computational chemistry and the introduction should do a better job of placing the work in the context of other methods and algorithms described to identify and count substructures. ++ See our reply to Reviewers 4 & 5 ++ 2) Comparison with the Ullman for graph isomorphism algorithm at least and among others is needed (speed, efficiency, what are the advantages,etc.). ++ Mike & Vageli to trace(y) Ullman ++ ======================== please see answer to ref.#? we were not after a better graph isomorphism, algorithm; rather, we needed a canonical labeling of our topology library to allow for fast searches. Our return-index algorithm has limitations (which we are very careful to note), but we did prove its adequacy for scaffolds with up to 8 rings. It is reasonably fast (although, of course much slower than nlogn) but once the library structures are labeled they can be searched directly, and each search only costs kn^3 (where n is the size of a new topology and k is the size of the subclass associated to the molecule in our library, of size much smaller than the library itself). ======================== Please rate the quality of the science reported in this paper (10 - High Importance / 1 - Low Importance): Please rate the overall importance of this paper to the field of chemical information or modeling (10 - High Importance / 1 - Low Importance): Reviewer: 3 Recommendation: Publish after major revisions noted. Comments: The study of rings in chemical universe and ring composition of available chemical databases is actual and of interest to current chemical informatics. The communication consists of 2 parts. The first, shorter part is description of methodology and in the 2nd part application to the study of several chemical databases follows. I do not have any critical points concerning the actual study. To make the paper more readable and more useful for general chemoinformatics community, however, I suggest the following modification: Since the two parts are closely related and refer to each other, I strongly recommend to merge the two manuscripts into one communication. The first theoretical part may be somehow shortened, the more technical part (which is too technical for general reader of JCIM) should be moved to the supporting information section. Some additional points: I am missing some citations relevant to this topic, for example: Lipkus, A. (2001), 'Exploring chemical rings in a simple topological-descriptor space.', J Chem Inf Comput Sci. 41, 430 - 438. ================ refered to in 2nd paper already ================ or Xu, Y. & Johnson, M. (2002), 'Using Molecular Equivalence Numbers To Visually Explore Structural Features that Distinguish Chemical Libraries', J. Chem. Inf. Comput. Sci. 42, 912-926. ================ add? ================ The PubChem database contains currently 10.9 million unique compounds (the PubChem command all[filt]). The author listed 11.6 million in Nov. 2006. How it is possible? ===================== answered in second paper, use here also: substances vs. compounds ===================== The form of citations is not conform with the JCIM format. ===================== MJW fixed 2nd paper, need to do also here ===================== Please rate the quality of the science reported in this paper (10 - High Importance / 1 - Low Importance): 6 Please rate the overall importance of this paper to the field of chemical information or modeling (10 - High Importance / 1 - Low Importance): 7 Reviewer: 4 Recommendation: Publish after major revisions noted. Comments: This first paper in a series of two presents an algorithm to enumerate molecular scaffolds/molecular frameworks. After defining scaffolds (molecular graphs where nodes of degree 1 and 2 are removed recursively), the algorithm is presented. The paper is a method paper and does not follow the usual format method/result/discussion. I have several concerns that the authors may want to address prior publication. 1) The authors have done a poor job in evaluating their contribution within the molecular graph enumeration literature. Since the 1960s many papers and reviews have been published and molecular generator codes such as MOLGEN can be downloaded from internet. Yet the authors cite only two papers, one of which is a presentation (Reference #2). =============== ??? =============== 2) Reference #1 cited by the authors makes use of an algorithm developed by B. McKay named GENG (part of the Nauty package, which can be freely downloaded at http://cs.anu.edu.au/~bdm/nauty/). That algorithm can generate all connected graphs with a specified degree range (could be 3 to 4) up to 32 vertices. To generate graphs with edge multiplicity another code (named MULTIG) from the same package can also be downloaded. Could these codes (that are well documented and written by a recognized graph theory expert) be used in the present context? ======================== WE CAN USE GENG TO GENERATE MULTI-EDGES if we allow nodes of degree 2, THEN PRUNE BUT COMPARISON PROBLEM REMAINS, GIVEN LIMITATIONS OF labelg RE. MULTIEDGES: see nauty documentation, as discussed above For MULTIG: we could proceed as follows: fix n (nodes) and e (edges) and d:D = 3:4 to all combinations compatible with the requirement of at most 8 rings in GENG. This would generate all topologies without multiple edges or loops. We could then use MULTIG to generate all nonisomorphic multigraphs based on these. However neither has the capability of including loops (see documentation of MULTIG). Although we could have proceeded in this way and then addressed the problem of adding loops on our own, building on the structures generated by GENG/MULTIG, we felt that our own, special purpose scheme of generating scaffolds directly was no harder---and it had the added advantage for us that we have a self-contained, simple algorithm. It would be possible to create the same database using GENG/MULTIG+loop addition, but the question of canonically labeling the database still would need to be addressed. Of course we could have taken better advantage of nauty, but we did not. ======================== 3) I understand the proposed enumeration method produces isomorphic graphs, which have to be removed. However, I do not understand why the authors have developed a specific algorithm (based of the return-index), which they admit is not correct above eight rings. Many correct algorithms to detect isomorphism exist, and again the Nauty package (cf. URL given above) contains one of them, which is fast and can process graphs and multigraphs. ======================== ---As limited as it is, the return index allows us a "canonical" labeling of multigraphs up to 8 rings (PROOF??? see answer to Q4). The first pass required an exhaustive comparison (which we could have performed using nauty, admittedly!) but once we had it, we can use the return index to canonically label graphs and speed up comparisons between topologies in our database and any arbitrary topology. ======================== 4) The authors state in the abstract their enumeration is exhaustive, I did not see any proof of this claim in the manuscript. ======================== HERE WE NEED TO CAREFULLY DESCRIBE THE STEPS (GENERATION, LABELING, COMPARISON, HOW WE PROVED THAT THE LABELING IS UNIQUE) ********* still to be done (last serious point remaining) but we know we are right, so it is a matter of saying it correctly! ======================== 5) The authors state that the return index algorithm runs in cubic time, could they elaborate on this? In addition, what is the time complexity of enumerating scaffolds? Is it polynomial time per output? =============== need the argument from the paper explained more clearly (i.e. cost of matrix multiply by a matrix of 0's and 1's =============== Please rate the quality of the science reported in this paper (10 - High Importance / 1 - Low Importance): 4 Please rate the overall importance of this paper to the field of chemical information or modeling (10 - High Importance / 1 - Low Importance): 7 Reviewer: 5 Recommendation: Publish after minor revisions noted. Comments: Why was the "Return Index" method developed instead of using existing methods such as the Nauty algorithm? Why do you assert the method is efficient? Consider comparing the time complexity of the Return Index method to Luks' graph isomorphisim algorithm with Zemlyachenko's trick, where O(n) = 2^SQRT(n*log(n)) (see http://qwiki.caltech.edu/wiki/Complexity_Garden, Section Graph Isomorphism). ========================= We have a canonical label---we are interested in comparing a given graph to our entire database; once we have labeled each topology in the database, we just need to extract the index of the new graph and search for it in the database. ========================= In Figure 2 (i), the depiction of the 4-node constructed in 3 ways is odd. Consider distinguishing the nodes. ================= to do ================= In Figure 3 (3), the dotted line loop raising a question in the reader's mind. Is it different than a solid line loop? In Figure 4, an illustration showing construction of (3,4,0) from (2,2,0) would be helpful. In Figure 5, reproducing the adjacency matrix and block structure from the information presented is straightforward, however it may be difficult for the reader to reproduce the Return Index matrix. Consider showing your practice of constructing R. In Figure 6, how do you distinguish graphs (a) and (b), given the identical return indices and block structures of (a) and (b)? Please rate the quality of the science reported in this paper (10 - High Importance / 1 - Low Importance): 5 Please rate the overall importance of this paper to the field of chemical information or modeling (10 - High Importance / 1 - Low Importance): 5 [ Part 2, Application/MSWORD 58KB. ] [ Unable to print this part. ]