From vageli@math.unm.edu Fri Feb 15 11:14:13 2002
Date: Thu, 14 Feb 2002 15:49:29 -0700 (MST)
From: Evangelos A. Coutsias <vageli@math.unm.edu>
To: Chaok Seok <chaok@zimm.ucsf.edu>
Cc: Evangelos A. Coutsias <vageli@math.unm.edu>
Subject: minimizations etc.

-----------------------------------------------------------------
Hi Chaok,  
 I meant to ask you, how was your trip to Korea? Relaxing and happy
I am sure! I hope your daughter was not too sad from missing her 
grandparents afterwards.

 At last I can say I am over a very difficult time. Unfortunately
I was not able to save my marriage, but Cecilia and myself were
at least able to part as friends, perhaps even close friends. 
Our divorce became final last December and now I can concentrate
on healing. Since this process began in January '01 it has been
difficult to do more than take care of survival...

 I have been slowly rethinking our work and especially the ideas
around CGU and other genetic algorithms for seeding the search,
and I wanted to share some of my confused thoughts with you. 
Last winter it became clear to me that CGU per se is not quite
well defined in generalized coordinates: a large number of terms
in the energy function are periodic wrt. internal coordinates
(and the whole energy function is also periodic in terms of the
torsion angles), so that one is forced to think in terms of 
cartesians when discussing seeding; however, since the whole molecule
configuration is also invariant with respect to a 3d rotation, one
must even be vey careful how one seeds there. But, even then, this wastes 
one of the main advantages of using internal dof., namely the huge 
reduction in dimensionality. It also wastes a lot of calculation on
minimizing over ensembles of configurations that are chosen
inefficiently.  I have some arguments to support the above 
claims, but the main (simple!) idea here is that one should 
explore minimizations from topologically inequivalent initial 
configurations, paying special attention to sidechain 
minimizations. If we fix the backbone and consider minimizations
over sidechains as a step in a hierarchical minimization scheme,
then we clearly are faced with periodic coordinates. Yet, some
aspect of the CGU concept must survive (I think I have some
arguments to support this too!). The original idea should be
applicable, but only sufficiently close to the (postulated)
energy minimum one should expect that things become convex. 
For deformations of small pieces of the chain, we should suffer
a small energy penalty if some obstruction forms to prevent
achieving the native state etc., so that it is reasonable
to expect that "nearby" minima are above the "global" minimum,
progressively increasing for greater distortions. So, not too
far from the global minimum we should  be observing a CGU type
lower bound for the distribution of minima. Seeding in 
internal variable space may make it easier to sample different 
topologies, while seeding in point space is too wasteful, and 
I believe one can implement diagnostics on the CGU scheme to 
prove that. Some integer-programing technique that knows about 
topologically (sterically?) distinct configurations ought to be 
incorporated in the seeding algorithm, in the least.

 Thoughts along these lines made me decide last winter to 
develop a very general asynchronous parallel scheme to fire-off
energy calculations (what I sent you, more or less). 
The plan was to use it with any genetic algorithm we choose: 
for the energy calculations we take a point
in some high-dimensional space (initial configuration) and return
another point (local minimum). In general, we have little knowledge
of what lies nearby the minimization path. I thought that
we could interfere creatively at this stage, with many possible
strategies. Hence the model I proposed, with our scheme invoking
the CGU routine (and not the other way around!). The thought
was to use that to fire off lots of parallel computations, 
sharing among themselve invariant pieces of the same initial
configuration. This way, we could explore the labyrinth in
the neighborhood of any local minimum (trap) to
discover any possible exits to nearby lower minima.

 However, if one is experimenting with a specific algorithm
(eg. CGU) and wants an energy function, then their algorithm can
handle the process management just as well. That is to say, 
I believe we should keep developing our own overall parallel scheme.
I will be able to do some programming also now!

 My main thoughts (in the little time I could give to creative 
thinking these past months) have been in two directions:

(1) Is there a natural hierarchical structuring of the computation
 of the energy  function and its derivatives (Jacobian, Hessian)? 
This thought comes from an amateurish idea of mine to model 
the minimization calculation after the multigrid approach of 
solving large, complicated elliptic equations. These also depend 
on a maximum (or minimum) principle, and are based on minimizing 
hierarchically (some iteration is also involved) over coarser and 
coarser grids. 
 This is where I see the main advantage of having developed
a structured approach to the geometry and energy computation. 
Certain types of movements allow the reuse of large parts 
of the Hessian from previous steps eg., with no new computation. 
To avoid unnecessary duplication and/or communication, all 
minimizations that can reuse these pieces should be performed 
on the same processor. I think that one can very efficiently
parallelize energy computations, since one could compute several 
nearby configurations at once this way, getting a huge reduction
in the overall cost. This could be done for instance for sidechain 
minimizations, given a backbone (just an example, many similar
piecewise minimizations can be employed, usually as parts of an
iteration scheme). This also applies to Jacobian computations.
To implement this idea we would need to interfere with some of
the workings of the nonlinear-programming package we use with
the minimization or index each and every d.o.f. so we can fix
them in any order and combination we want. Also, analytical
Jacobians and Hessians should replace the numerical ones-normally
too expensive, not so if we consider reduced-dof deformations for
the minimization).

 (2) The other idea, which I hope can be relevant regardless 
of the genetic algorithm adopted, is to develop a sensitivity 
score for various parameters/structures. I believe that certain
types of native structures are very sensitive to the values of
some energy parameters, insensitive to others. This might vary
over classes of structures. One ought to account for the different
sensitivities, else the process becomes too noisy. Say, only use
structures sufficiently sensitive to some given parameter, to 
optimize the value of that parameter, and form compatible
structure classes for each parameter based on some sensitivity
score of the structure for a particular parameter. 
Perhaps do same for pairs of parameters,... Unfortunately
the parameters enter nonlinearly, so the sensitivity needs to be
defined as a function depending on all the parameters! Hopeless,
unless we are close to the "correct" values for at least some of
the parameters, preferably all but a few!. The idea again is not new,
it is I think used in mathematical economics, where you have huge
functions depending on large numbers of parameters, and these
parameters are things like "price elasticity" and "demand rigidity"
and whatever other unknown coefficients modelers include in their
analogues of an energy (profit, penalty etc) function. 

 Finally, was the structure of the protein you included at its native
or just some minimum? I will continue studying your code, and I will
be in touch, do the same!
 Best,
Vageli





On Thu, 14 Feb 2002, Chaok Seok wrote:

> Hi Vageli,
> 
> Thank you for letting me know what has been going on. I think I know why
> the minimization program crashed when you tried different proteins. The
> 'reconstruct' variable should be set to 1 before the 'initialize_protein'
> subroutine is called in general cases.  reconstruct=0 is fine for the
> particular pdb file I gave you. If reconstruct=1, 'initialize_protein'
> calls two subroutines 'cartesian2internal' and 'internal2cartesian'. What
> this effectively does is to reconstruct the protein conformation from the
> internal coordinates (angles and bonds) calculated from the cartesian
> coordinates in the given pdb file, and also by using default angles (given
> in the topology_geo.in file) for atoms (such as hydrogen atoms in
> x-ray structures) not present in the pdb file. I add missing atoms
> this way because we need to know the coordinates of all the atoms in order
> to compute the energy. (The energy function I use is EEF1, and its vacuum
> energy is CHARMM19 polar hydrogen force field. EEF1 has additional
> solvation free energy. In fact, in CHARMM19, only polar hydrogens,
> hydrogens attached to electronegative atoms such as oxygen or nitrogen,
> are explicit, and hydrogens attached to carbon are only implicitly taken
> into account by the effective van der Waals radii and other parameters.)
> Anyway, the pdb file I gave you has all the polar hydrogen atom
> coordinates, so there is no need for reconstruction, and I just
> set reconstruct=0. If your input structure does not have all the atoms
> needed to compute the energy, then reconstruct should be 1.
> 
> As you have noticed, 1cmr01_min.pdb has a structure that is already energy
> minimized. The original pdb file (1cmr.pdb) has about 10 structures
> (determined by NMR). I picked the first structure in the original pdb, and
> minimized energy starting from it. I picked 1cmr because it is relatively
> small and has 3 disulfide bonds, as you noticed.
> 
> Frankly, I am not sure what is the issue in performing local deformations
> efficiently. I am sorry that I don't remember David Baker's talk at all.
> Certainly I need to read more, though. I guess efficient local deformation
> may be better than some random deformation in Metropolis Monte Carlo
> simulations, but I feel that since protein is so compact, even local
> deformation may not be very useful in Metropolis Monte Carlo.  Kevin
> Cahill tested his local deformation method on a very very simple energy
> model using Metropolis Monte Carlo, and showed there is a big improvement,
> but I doubt whether it will still work well for more realistic energy
> model. There is also Monte Carlo Minimization that combines Monte Carlo
> and local minimization. I don't know whether new local deformation methods
> can improve efficiency for MCM. According to my experiments, Kevin
> Cahill's local deformation is not more efficient than random deformation
> in use with MCM. Also, using using SVD with MCM was not efficient
> according to work of John and a rotation student. I feel that because MCM
> involves local minimization, local deformation may not need to be very
> subtle since it will be changed by minimization after all, and coordinated
> local moves may restrict the conformational search space and worsen the
> performance.
> 
> Here are a few references for MCM and local deformation if you are
> interested:
> 
> @Article{li87:_monte_carlo,
>   author =       {Z. Li and H. A. Scheraga},
>   title =        {{M}onte {C}arlo-minimization approach to the
>                   multiple-minima problem in protein folding},
>   journal =      PNAS,
>   year =         1987,
>   volume =       84,
>   pages =        {6611-15}
> }
> 
> @Article{baysal97:_effic,
>   author =       {Baysal, C. and Meirovitch, H.},
>   title =        {Efficiency of the local torsional deformations method
> for identifying the
>                   stable structures of cyclic molecules},
>   journal =      {J. Phys. Chem. A},
>   year =         1997,
>   volume =       101,
>   pages =        2185
> }
> 
> Thank you and good luck,
> 
> Chaok
> 
> 
> 


Subject: Re: Happy New Year!

Hi Chaok, I did not get the pdf file, only a note. But I can ask Kevin for
a copy. I have heard of this work but maybe I should pay it more
attention! I got the idea for doing this from Baker's talk at the CASP
last year. As he was minimizing energy for his 9-mers, he did not try
to make them fit with their neighbors in any way. I felt that the
scheme would be similar to multigrid schemes for solving complicated
systems of PDEs, and the key problem there is fitting the pieces, and
the precise method of fitting things can mean the success or failure
of a scheme. I realized that having the geometry expressed as a
product of quaternions (and a sum of products) provided a very simple
way to manipulate protein-dependent functions: since each q enters
linearly in the expressions, 1st and 2nd derivatives are easy to write 
and can be computed quite efficiently, since they all invlolve the
same pieces. Localizing deformations with our formulation  is 
easy to write, even to do the
"2nd order corrections" that I saw mentioned. The trick is to include the
second derivative info to the formula I sent you and find the zero by some
efficient form of a Newton iteration. Anyway, it will be easier to program
this than to explain it, I hope! I look forward to seeing the present
state of the code, it feels great to be thinking about protein geometry
again!
 Thanks,
Vageli



From vageli@math.unm.edu Fri Feb 15 11:17:03 2002
Date: Thu, 07 Feb 2002 16:19:58 -0700
From: vageli@math.unm.edu
To: chaok@maxwell.ucsf.edu
Cc: vageli@math.unm.edu
Subject: Happy New Year!

Hi Chaok!
    I hope all is well with you and your family.
I think it is Chinese New Year soon and I wanted to 
give you my best wishes.
 I am finally planning a trip to SF, hoping that despites all 
the time that passed I can pick up where I left last year. 
A lot happened since I left (I am sure the same is true
over there!). 
 Perhaps Ken has mentioned that I will be in SF from 
March 8-17. That was the best time for him (and it 
falls on my Spring break, so I will only miss one class)
so I hope this is a good time for you as well. 
 I am managing to get my thought back to work
after the many distractions of last Fall. I was trying to
familiarize myself again with our code, but I have so many
versions I am not sure which one incorporates your final
corrections. Some time ago you had written that you had 
corrected some aspects relating to proline, as well as 
adding terminal atoms etc. Also you were going to change
the energy calculation re. torsion angles etc following
the CHARMM philosophy. 
 If I could have a copy of the latest version you have,
then I could try to understand it all before I come in
March. I have been working on the idea I mentioned to
you last Spring, i.e. introducing localized deformations
into the chain as a way of escaping false minima. I think
I have a good algorithm for doing that (at least in
principle), but I wanted to try and build it around your
latest code. The idea is quite simple: assume
the backbone quaternions are numbered consecutively,
q1,q2,...,qm; let the total quaternions be Qn=q1*q2*...*qn.
I want a move that fixes QN1, QN2, with N2=N1+L; L is the 
length of a (small) segment of the backbone that will be
perturbed, subject to the constraint that QN1, QN2 remain
fixed (to be more precise, RN1 and RN2 must also remain
fixed; this just adds a few more constraints, but the 
analysis is similar, so let's leave it out of the 
discussion!). Define q(N1+k):=p(k) (then p(L)=q(N2))
 Since QN2 = QN1*p(1)*...*p(L) let's assume that
all the intermediate quaternions, p(k) are perturbed
slightly:  p(k) -->p(k) + d(k), d(k) small. Then,
QN2 = QN1*(p(1)+d(1))*...*(p(L)+d(L)) = QN1*p(1)*...*p(L)
and for small d(k) expand keeping only the linear terms.
This gives a rank=4 constraint on the d(k); it's null
vectors are all the allowable deformations that leave
the two ends of the chain fixed (if we also include the
position vectors, then we get 6 conditions which are
again linear for small deformations). 
 There are of course some complications with this
method (may need to include quadratic terms, in which
case one must iterate to find "good" deformations of
somewhat larger amplitude), but its main advantage is
the simplicity of the formulas, and the fact that 
it is easy to evaluate the resulting expressions 
directly from quantities available in the code.
I have been trying to think of a way to use these
localized deformations as a means of applying a
multi-scale approach to energy minimization (i.e.
relax different parts of the chain in stages, and
vary the order of doing so, to try different
combinations so as to avoid traps). I think such
an algorithm (together with a good topology-based
method of constructing inequivalent configurations
to minimize from--such as the motion planning
ideas) could be a big improvement over the
wholesale minimizations over the entire chain.
 At the first stage, one looks for a consistent
local deformation (i.e. one that leaves QN1, QN2
unchanged) that involves the least energy penalty.
For example, if the molecule has a contact between
two distant residues, consider pairwise deformations 
of the chain near each contact site etc.
 I need to play with this once I have implemented
the deformation scheme, but I think it offers many
possibilities for control. As an idea it is not
new, but the simple formulas for performing consistent 
deformations are new I believe. This way, one can
implement such deformations without worrying about
fixing up the rest of the molecule...
 In any case, I hope you are still interested in this
collaboration, even though I abandoned our project for
so long. I am now ready to return to it, and at last
life is not working against me!...
 Best,
Vageli
PS: I know that John has been developing his algorithm in C(++?)
and has incorporated some of our ideas, but I still want to
use a F95 program, because I find it much simpler to
work on and develop.  In any case, as I told Ken, I am sure
you folks have been moving now for the last few months and
I am afraid I may be left behind, but hopefully I can catch up
quickly when I visit, especially with a little help!

From chaok@zimm.ucsf.edu Fri Feb 15 11:17:28 2002
Date: Thu, 7 Feb 2002 17:04:32 -0800 (PST)
From: Chaok Seok <chaok@zimm.ucsf.edu>
To: vageli@math.unm.edu
Subject: Re: Happy New Year!

Hi Vageli,

It is really nice to hear from you again. Happy new year to you, too. We
don't have much progress, so don't worry about catching up things around
here. I will try to send you a package of my code tomorrow. I think I
need to clean up some junk and irrelevant things from the code.

In fact, your local deformation idea sounds similar to what I heard
recently. I have attached a pdf file to this email. It is what Kevin
Cahill at UNM (the same university as you) sent to Ken. I am trying to see
how useful it is for Monte Carlo Minimization, but my preliminary results
are not very optimistic.

I guess you help us more than we help you usually. I will email you again
soon. Best,

Chaok

    [ Part 2, "Cahill.pdf"  Application/PDF  1.5KB. ]     [ Unable to print this part. ]


From chaok@zimm.ucsf.edu Fri Feb 15 11:17:40 2002
Date: Fri, 8 Feb 2002 12:47:40 -0800 (PST)
From: Chaok Seok <chaok@zimm.ucsf.edu>
To: Evangelos A. Coutsias <vageli@math.unm.edu>
Subject: code

Hi Vageli,

I put some files in my UNM account, so you may copy them. They are in
~chaok/vageli/. The file eef1.tar contains bin and src directories. I
cleaned up my codes a bit, and the copy I give you can do energy
evaluation, local energy minimization both in internal and cartesian
space, internal<->cartesian conversions, and so on. I hope the code is
understandable. Please let me know if you have any questions.
I also put the pdf file for Cahill's work in case you have not gotten that
yet. Bye now,

Chaok


From chaok@zimm.ucsf.edu Fri Feb 15 11:18:09 2002
Date: Sat, 9 Feb 2002 06:53:55 -0800 (PST)
From: Chaok Seok <chaok@zimm.ucsf.edu>
To: Evangelos A. Coutsias <vageli@math.unm.edu>
Subject: Re: code


Sorry, I did not know that. Now I chmod, and you must be able to read.
Thanks,

Chaok

On Sat, 9 Feb 2002, Evangelos A. Coutsias wrote:

> Chaok, thanks for doing it! Now, could you please also make the directory
> readable?
>  Thanks,
> Vageli
>



From chaok@zimm.ucsf.edu Fri Feb 15 11:18:17 2002
Date: Sat, 9 Feb 2002 10:42:30 -0800 (PST)
From: Chaok Seok <chaok@zimm.ucsf.edu>
To: Evangelos A. Coutsias <vageli@math.unm.edu>
Subject: Re: code


Sorry. It was a mistake. I made a change. I hope it works now.

On Sat, 9 Feb 2002, Evangelos A. Coutsias wrote:

> Hi, it is still inaccessible; you need to chmod 755 <dir> to make
> the directory accessible, then chmod 644 <dir>/* to make the files in
> it world-readable.
>  Vageli
> PS: there was a break-in earlier this year and the systems guys
> set the defaults for everyone to be 700 for directories and 600 for files,
> so one must explicitly change permissions to make things readable by
> others!
>
> On Sat, 9 Feb 2002, Chaok Seok wrote:
>
> >
> > Sorry, I did not know that. Now I chmod, and you must be able to read.
> > Thanks,
> >
> > Chaok
> >
> > On Sat, 9 Feb 2002, Evangelos A. Coutsias wrote:
> >
> > > Chaok, thanks for doing it! Now, could you please also make the directory
> > > readable?
> > >  Thanks,
> > > Vageli
> > >
> >
> >
> >
>