Wednesday, May 6, 2020

Two algorithms from the 60's


The story goes that Michael (Mike) Powell was supposed to give a seminar at the University of Leeds on derivative free optimization one day in the early 60s, probably 1963. In November 1959 William (Bill) Davidon had just released a technical report on what he called the “variable metric method”. He was a physicist at the Argonne National Laboratory that was using gradient (and coordinate) descent methods and came to invent quasi-Newton methods. Powell had got his hands on the report in 1962 and so it seems had Colin Reeves, a chemist working on molecular structure problems and also the PhD advisor of Roger Fletcher at Leeds. Reeves gave the report to Fletcher, who implemented the method, as did Powell. After giving his talk Powell was surprised to find out that Reeves already knew the method and had implemented it. They decided to work together on the method and out came the first paper in 1963 on what was later called the Davidon-Fletcher-Powell (DFP) formula. The irony is that Davidon was rejected for publication and his paper was finally published only in 1991 in the first issue of SIAM Journal on Optimization.

The DFP work later lead to the renowned BFGS method and the whole class of quasi-Newton algorithms. The funny thing about BFGS is that it was not single paper with the acronym coming from the authors' initials, but it was actually four separate papers published independently. The matrix free or limited memory version of BFGS (L-BFGS) later become a very powerful and efficient method (due to Jorge Nocedal). Overall, the DFP, BFGS and similar methods brought significant improvements to the state of numerical optimization at that time which consisted mostly of steepest descent and Newton’s method.

One has to keep in mind that numerical analysis at that time was still incipient and that software libraries were in the process of being written. Every university had its own clunky computer, that filled entire rooms or even floors, and not even Fortran was yet standard. For instance, Davidon used block diagrams in his report and Fletcher coded in Algol. Some people used assembly, other early forms of compilers. These people were pioneers of numerical optimization in those days and many of them were not mathematicians per se. Colin Reeves started as a quantum chemist at Cambridge working on molecular orbitals and later became a computer scientist. Fletcher is described by chemists as a computer scientists who later left molecular simulations for a career in optimization. But after all, he was therefore an applied mathematician until the end. Powell was for most of his career a mathematician working at the Atomic Energy Research Establishment at Harwell, close to Oxford; he later moved to Cambridge as a professor. Davidon was a physicist (although Fletcher majored in physics at Cambridge too) who later became known as a peace activist, anti-Vietnam war protester and mastermind behind the FBI office burglary in 1971. On the other hand, Powell was interested only in the mathematics and not the physics part, as he admits that he could not follow Davidon’s arguments on the variable metric inspired from general relativity and differential geometry.

Fletcher soon went on after the DFP work to develop the nonlinear conjugate gradient (CG) solver for unconstrained nonlinear optimization. He claims that the idea was given to him by Reeves who realized that he could re-use the line search procedure inside the then classic CG solver for linear systems. This resulted in the Fletcher-Reeves method. In fact, Fletcher is of the opinion that he was given two good ideas by other people. In this way, Fletcher helped develop two methods that have become cornerstones of numerical optimization. Although, in a later interview he shared his doubts about NCG being a reliable method for large scale problems. But at that time, NCG and DFP/BFGS were pretty much the only efficient methods that were available in numerical libraries and computational quantum chemistry software suites and probably other places too. And then followed augmented Lagrangian, SQP, interior point methods and all that.

Fletcher was a man that relished practical problems and considered them as drivers of the field of numerical optimization. He considered himself a physicist rather than a mathematician until the end. In an interview he gave in 2015, before passing away one year later, he said the following: “I think for us in numerical analysis – in optimization in particular – because that’s the purpose of it: it’s to optimize; it’s not to produce pure mathematics... in my view, differing from other people’s view. But it’s to solve problems. And we just keep solving the same old problems – getting another 5 % here and another 5 % there: it’s a bit sterile.” I think these are very nice and true words coming from one of the giants of optimization.