From vageli@math.unm.edu Fri Feb 15 11:14:13 2002 Date: Thu, 14 Feb 2002 15:49:29 -0700 (MST) From: Evangelos A. Coutsias To: Chaok Seok Cc: Evangelos A. Coutsias Subject: minimizations etc. ----------------------------------------------------------------- Hi Chaok, I meant to ask you, how was your trip to Korea? Relaxing and happy I am sure! I hope your daughter was not too sad from missing her grandparents afterwards. At last I can say I am over a very difficult time. Unfortunately I was not able to save my marriage, but Cecilia and myself were at least able to part as friends, perhaps even close friends. Our divorce became final last December and now I can concentrate on healing. Since this process began in January '01 it has been difficult to do more than take care of survival... I have been slowly rethinking our work and especially the ideas around CGU and other genetic algorithms for seeding the search, and I wanted to share some of my confused thoughts with you. Last winter it became clear to me that CGU per se is not quite well defined in generalized coordinates: a large number of terms in the energy function are periodic wrt. internal coordinates (and the whole energy function is also periodic in terms of the torsion angles), so that one is forced to think in terms of cartesians when discussing seeding; however, since the whole molecule configuration is also invariant with respect to a 3d rotation, one must even be vey careful how one seeds there. But, even then, this wastes one of the main advantages of using internal dof., namely the huge reduction in dimensionality. It also wastes a lot of calculation on minimizing over ensembles of configurations that are chosen inefficiently. I have some arguments to support the above claims, but the main (simple!) idea here is that one should explore minimizations from topologically inequivalent initial configurations, paying special attention to sidechain minimizations. If we fix the backbone and consider minimizations over sidechains as a step in a hierarchical minimization scheme, then we clearly are faced with periodic coordinates. Yet, some aspect of the CGU concept must survive (I think I have some arguments to support this too!). The original idea should be applicable, but only sufficiently close to the (postulated) energy minimum one should expect that things become convex. For deformations of small pieces of the chain, we should suffer a small energy penalty if some obstruction forms to prevent achieving the native state etc., so that it is reasonable to expect that "nearby" minima are above the "global" minimum, progressively increasing for greater distortions. So, not too far from the global minimum we should be observing a CGU type lower bound for the distribution of minima. Seeding in internal variable space may make it easier to sample different topologies, while seeding in point space is too wasteful, and I believe one can implement diagnostics on the CGU scheme to prove that. Some integer-programing technique that knows about topologically (sterically?) distinct configurations ought to be incorporated in the seeding algorithm, in the least. Thoughts along these lines made me decide last winter to develop a very general asynchronous parallel scheme to fire-off energy calculations (what I sent you, more or less). The plan was to use it with any genetic algorithm we choose: for the energy calculations we take a point in some high-dimensional space (initial configuration) and return another point (local minimum). In general, we have little knowledge of what lies nearby the minimization path. I thought that we could interfere creatively at this stage, with many possible strategies. Hence the model I proposed, with our scheme invoking the CGU routine (and not the other way around!). The thought was to use that to fire off lots of parallel computations, sharing among themselve invariant pieces of the same initial configuration. This way, we could explore the labyrinth in the neighborhood of any local minimum (trap) to discover any possible exits to nearby lower minima. However, if one is experimenting with a specific algorithm (eg. CGU) and wants an energy function, then their algorithm can handle the process management just as well. That is to say, I believe we should keep developing our own overall parallel scheme. I will be able to do some programming also now! My main thoughts (in the little time I could give to creative thinking these past months) have been in two directions: (1) Is there a natural hierarchical structuring of the computation of the energy function and its derivatives (Jacobian, Hessian)? This thought comes from an amateurish idea of mine to model the minimization calculation after the multigrid approach of solving large, complicated elliptic equations. These also depend on a maximum (or minimum) principle, and are based on minimizing hierarchically (some iteration is also involved) over coarser and coarser grids. This is where I see the main advantage of having developed a structured approach to the geometry and energy computation. Certain types of movements allow the reuse of large parts of the Hessian from previous steps eg., with no new computation. To avoid unnecessary duplication and/or communication, all minimizations that can reuse these pieces should be performed on the same processor. I think that one can very efficiently parallelize energy computations, since one could compute several nearby configurations at once this way, getting a huge reduction in the overall cost. This could be done for instance for sidechain minimizations, given a backbone (just an example, many similar piecewise minimizations can be employed, usually as parts of an iteration scheme). This also applies to Jacobian computations. To implement this idea we would need to interfere with some of the workings of the nonlinear-programming package we use with the minimization or index each and every d.o.f. so we can fix them in any order and combination we want. Also, analytical Jacobians and Hessians should replace the numerical ones-normally too expensive, not so if we consider reduced-dof deformations for the minimization). (2) The other idea, which I hope can be relevant regardless of the genetic algorithm adopted, is to develop a sensitivity score for various parameters/structures. I believe that certain types of native structures are very sensitive to the values of some energy parameters, insensitive to others. This might vary over classes of structures. One ought to account for the different sensitivities, else the process becomes too noisy. Say, only use structures sufficiently sensitive to some given parameter, to optimize the value of that parameter, and form compatible structure classes for each parameter based on some sensitivity score of the structure for a particular parameter. Perhaps do same for pairs of parameters,... Unfortunately the parameters enter nonlinearly, so the sensitivity needs to be defined as a function depending on all the parameters! Hopeless, unless we are close to the "correct" values for at least some of the parameters, preferably all but a few!. The idea again is not new, it is I think used in mathematical economics, where you have huge functions depending on large numbers of parameters, and these parameters are things like "price elasticity" and "demand rigidity" and whatever other unknown coefficients modelers include in their analogues of an energy (profit, penalty etc) function. Finally, was the structure of the protein you included at its native or just some minimum? I will continue studying your code, and I will be in touch, do the same! Best, Vageli On Thu, 14 Feb 2002, Chaok Seok wrote: > Hi Vageli, > > Thank you for letting me know what has been going on. I think I know why > the minimization program crashed when you tried different proteins. The > 'reconstruct' variable should be set to 1 before the 'initialize_protein' > subroutine is called in general cases. reconstruct=0 is fine for the > particular pdb file I gave you. If reconstruct=1, 'initialize_protein' > calls two subroutines 'cartesian2internal' and 'internal2cartesian'. What > this effectively does is to reconstruct the protein conformation from the > internal coordinates (angles and bonds) calculated from the cartesian > coordinates in the given pdb file, and also by using default angles (given > in the topology_geo.in file) for atoms (such as hydrogen atoms in > x-ray structures) not present in the pdb file. I add missing atoms > this way because we need to know the coordinates of all the atoms in order > to compute the energy. (The energy function I use is EEF1, and its vacuum > energy is CHARMM19 polar hydrogen force field. EEF1 has additional > solvation free energy. In fact, in CHARMM19, only polar hydrogens, > hydrogens attached to electronegative atoms such as oxygen or nitrogen, > are explicit, and hydrogens attached to carbon are only implicitly taken > into account by the effective van der Waals radii and other parameters.) > Anyway, the pdb file I gave you has all the polar hydrogen atom > coordinates, so there is no need for reconstruction, and I just > set reconstruct=0. If your input structure does not have all the atoms > needed to compute the energy, then reconstruct should be 1. > > As you have noticed, 1cmr01_min.pdb has a structure that is already energy > minimized. The original pdb file (1cmr.pdb) has about 10 structures > (determined by NMR). I picked the first structure in the original pdb, and > minimized energy starting from it. I picked 1cmr because it is relatively > small and has 3 disulfide bonds, as you noticed. > > Frankly, I am not sure what is the issue in performing local deformations > efficiently. I am sorry that I don't remember David Baker's talk at all. > Certainly I need to read more, though. I guess efficient local deformation > may be better than some random deformation in Metropolis Monte Carlo > simulations, but I feel that since protein is so compact, even local > deformation may not be very useful in Metropolis Monte Carlo. Kevin > Cahill tested his local deformation method on a very very simple energy > model using Metropolis Monte Carlo, and showed there is a big improvement, > but I doubt whether it will still work well for more realistic energy > model. There is also Monte Carlo Minimization that combines Monte Carlo > and local minimization. I don't know whether new local deformation methods > can improve efficiency for MCM. According to my experiments, Kevin > Cahill's local deformation is not more efficient than random deformation > in use with MCM. Also, using using SVD with MCM was not efficient > according to work of John and a rotation student. I feel that because MCM > involves local minimization, local deformation may not need to be very > subtle since it will be changed by minimization after all, and coordinated > local moves may restrict the conformational search space and worsen the > performance. > > Here are a few references for MCM and local deformation if you are > interested: > > @Article{li87:_monte_carlo, > author = {Z. Li and H. A. Scheraga}, > title = {{M}onte {C}arlo-minimization approach to the > multiple-minima problem in protein folding}, > journal = PNAS, > year = 1987, > volume = 84, > pages = {6611-15} > } > > @Article{baysal97:_effic, > author = {Baysal, C. and Meirovitch, H.}, > title = {Efficiency of the local torsional deformations method > for identifying the > stable structures of cyclic molecules}, > journal = {J. Phys. Chem. A}, > year = 1997, > volume = 101, > pages = 2185 > } > > Thank you and good luck, > > Chaok > > > Subject: Re: Happy New Year! Hi Chaok, I did not get the pdf file, only a note. But I can ask Kevin for a copy. I have heard of this work but maybe I should pay it more attention! I got the idea for doing this from Baker's talk at the CASP last year. As he was minimizing energy for his 9-mers, he did not try to make them fit with their neighbors in any way. I felt that the scheme would be similar to multigrid schemes for solving complicated systems of PDEs, and the key problem there is fitting the pieces, and the precise method of fitting things can mean the success or failure of a scheme. I realized that having the geometry expressed as a product of quaternions (and a sum of products) provided a very simple way to manipulate protein-dependent functions: since each q enters linearly in the expressions, 1st and 2nd derivatives are easy to write and can be computed quite efficiently, since they all invlolve the same pieces. Localizing deformations with our formulation is easy to write, even to do the "2nd order corrections" that I saw mentioned. The trick is to include the second derivative info to the formula I sent you and find the zero by some efficient form of a Newton iteration. Anyway, it will be easier to program this than to explain it, I hope! I look forward to seeing the present state of the code, it feels great to be thinking about protein geometry again! Thanks, Vageli From vageli@math.unm.edu Fri Feb 15 11:17:03 2002 Date: Thu, 07 Feb 2002 16:19:58 -0700 From: vageli@math.unm.edu To: chaok@maxwell.ucsf.edu Cc: vageli@math.unm.edu Subject: Happy New Year! Hi Chaok! I hope all is well with you and your family. I think it is Chinese New Year soon and I wanted to give you my best wishes. I am finally planning a trip to SF, hoping that despites all the time that passed I can pick up where I left last year. A lot happened since I left (I am sure the same is true over there!). Perhaps Ken has mentioned that I will be in SF from March 8-17. That was the best time for him (and it falls on my Spring break, so I will only miss one class) so I hope this is a good time for you as well. I am managing to get my thought back to work after the many distractions of last Fall. I was trying to familiarize myself again with our code, but I have so many versions I am not sure which one incorporates your final corrections. Some time ago you had written that you had corrected some aspects relating to proline, as well as adding terminal atoms etc. Also you were going to change the energy calculation re. torsion angles etc following the CHARMM philosophy. If I could have a copy of the latest version you have, then I could try to understand it all before I come in March. I have been working on the idea I mentioned to you last Spring, i.e. introducing localized deformations into the chain as a way of escaping false minima. I think I have a good algorithm for doing that (at least in principle), but I wanted to try and build it around your latest code. The idea is quite simple: assume the backbone quaternions are numbered consecutively, q1,q2,...,qm; let the total quaternions be Qn=q1*q2*...*qn. I want a move that fixes QN1, QN2, with N2=N1+L; L is the length of a (small) segment of the backbone that will be perturbed, subject to the constraint that QN1, QN2 remain fixed (to be more precise, RN1 and RN2 must also remain fixed; this just adds a few more constraints, but the analysis is similar, so let's leave it out of the discussion!). Define q(N1+k):=p(k) (then p(L)=q(N2)) Since QN2 = QN1*p(1)*...*p(L) let's assume that all the intermediate quaternions, p(k) are perturbed slightly: p(k) -->p(k) + d(k), d(k) small. Then, QN2 = QN1*(p(1)+d(1))*...*(p(L)+d(L)) = QN1*p(1)*...*p(L) and for small d(k) expand keeping only the linear terms. This gives a rank=4 constraint on the d(k); it's null vectors are all the allowable deformations that leave the two ends of the chain fixed (if we also include the position vectors, then we get 6 conditions which are again linear for small deformations). There are of course some complications with this method (may need to include quadratic terms, in which case one must iterate to find "good" deformations of somewhat larger amplitude), but its main advantage is the simplicity of the formulas, and the fact that it is easy to evaluate the resulting expressions directly from quantities available in the code. I have been trying to think of a way to use these localized deformations as a means of applying a multi-scale approach to energy minimization (i.e. relax different parts of the chain in stages, and vary the order of doing so, to try different combinations so as to avoid traps). I think such an algorithm (together with a good topology-based method of constructing inequivalent configurations to minimize from--such as the motion planning ideas) could be a big improvement over the wholesale minimizations over the entire chain. At the first stage, one looks for a consistent local deformation (i.e. one that leaves QN1, QN2 unchanged) that involves the least energy penalty. For example, if the molecule has a contact between two distant residues, consider pairwise deformations of the chain near each contact site etc. I need to play with this once I have implemented the deformation scheme, but I think it offers many possibilities for control. As an idea it is not new, but the simple formulas for performing consistent deformations are new I believe. This way, one can implement such deformations without worrying about fixing up the rest of the molecule... In any case, I hope you are still interested in this collaboration, even though I abandoned our project for so long. I am now ready to return to it, and at last life is not working against me!... Best, Vageli PS: I know that John has been developing his algorithm in C(++?) and has incorporated some of our ideas, but I still want to use a F95 program, because I find it much simpler to work on and develop. In any case, as I told Ken, I am sure you folks have been moving now for the last few months and I am afraid I may be left behind, but hopefully I can catch up quickly when I visit, especially with a little help! From chaok@zimm.ucsf.edu Fri Feb 15 11:17:28 2002 Date: Thu, 7 Feb 2002 17:04:32 -0800 (PST) From: Chaok Seok To: vageli@math.unm.edu Subject: Re: Happy New Year! Hi Vageli, It is really nice to hear from you again. Happy new year to you, too. We don't have much progress, so don't worry about catching up things around here. I will try to send you a package of my code tomorrow. I think I need to clean up some junk and irrelevant things from the code. In fact, your local deformation idea sounds similar to what I heard recently. I have attached a pdf file to this email. It is what Kevin Cahill at UNM (the same university as you) sent to Ken. I am trying to see how useful it is for Monte Carlo Minimization, but my preliminary results are not very optimistic. I guess you help us more than we help you usually. I will email you again soon. Best, Chaok [ Part 2, "Cahill.pdf" Application/PDF 1.5KB. ] [ Unable to print this part. ] From chaok@zimm.ucsf.edu Fri Feb 15 11:17:40 2002 Date: Fri, 8 Feb 2002 12:47:40 -0800 (PST) From: Chaok Seok To: Evangelos A. Coutsias Subject: code Hi Vageli, I put some files in my UNM account, so you may copy them. They are in ~chaok/vageli/. The file eef1.tar contains bin and src directories. I cleaned up my codes a bit, and the copy I give you can do energy evaluation, local energy minimization both in internal and cartesian space, internal<->cartesian conversions, and so on. I hope the code is understandable. Please let me know if you have any questions. I also put the pdf file for Cahill's work in case you have not gotten that yet. Bye now, Chaok From chaok@zimm.ucsf.edu Fri Feb 15 11:18:09 2002 Date: Sat, 9 Feb 2002 06:53:55 -0800 (PST) From: Chaok Seok To: Evangelos A. Coutsias Subject: Re: code Sorry, I did not know that. Now I chmod, and you must be able to read. Thanks, Chaok On Sat, 9 Feb 2002, Evangelos A. Coutsias wrote: > Chaok, thanks for doing it! Now, could you please also make the directory > readable? > Thanks, > Vageli > From chaok@zimm.ucsf.edu Fri Feb 15 11:18:17 2002 Date: Sat, 9 Feb 2002 10:42:30 -0800 (PST) From: Chaok Seok To: Evangelos A. Coutsias Subject: Re: code Sorry. It was a mistake. I made a change. I hope it works now. On Sat, 9 Feb 2002, Evangelos A. Coutsias wrote: > Hi, it is still inaccessible; you need to chmod 755 to make > the directory accessible, then chmod 644 /* to make the files in > it world-readable. > Vageli > PS: there was a break-in earlier this year and the systems guys > set the defaults for everyone to be 700 for directories and 600 for files, > so one must explicitly change permissions to make things readable by > others! > > On Sat, 9 Feb 2002, Chaok Seok wrote: > > > > > Sorry, I did not know that. Now I chmod, and you must be able to read. > > Thanks, > > > > Chaok > > > > On Sat, 9 Feb 2002, Evangelos A. Coutsias wrote: > > > > > Chaok, thanks for doing it! Now, could you please also make the directory > > > readable? > > > Thanks, > > > Vageli > > > > > > > > > >