Emissaire

Friday, March 19, 2021

The many faces of projection

What I want to talk about in fact is the Jacobian matrix of the constraint function. All these things that I want to mention are probably clear to people with strong backgrounds in linear algebra and differential geometry. But for me they were revelations of sort many of them.

All right, so maybe the first thing to start with is the fact the Jacobian matrix $\mathbf{J}$ of a function $g$ is the analog of the gradient operator or the normal to a surface. And this is why for many people $\mathbf{J}$ is synonymous to the constraint direction(s). Because the normal to the constraint manifold is transversal to the space of motion, hence along the constrained directions that can't move. Think of a pendulum: the constraint manifold is a circle and the normal to it is along the radius, which is exactly the pendulum's (constrained) direction.

But some math books also call $\mathbf{J}$ the differential (in coordinates). And this takes me to a confusion I always made in the beginning: for a single constraint, should $\mathbf{J}$ be a row or column vector? And it seemed other people weren't sure about it either in their notation. But I stuck with it being a row vector, as that was the majority view. And now it makes sense, because it is a covector (mainly because it has the components of a differential which is a 1-form, same thing). Hence, $\mathbf{J} \equiv dg$ and $\mathbf{J}^T \equiv \nabla g$.

I still don't fully grasp the concepts and that's why I don't want to talk formally about these things. But I do have to stress the fact that the configuration space of our mechanical system (where all the generalized coordinates reside) is not really $\mathbb{R}^n$, but one that has a metric (a curved space) given by the mass matrix $\mathbf{M}$ (i.e., the kinetic energy). It basically tells you that not all bodies are equal but weighed by their mass information. If this weren't the case, then vectors and covectors in $\mathbb{R}^n$ would be the same thing. But they're not! If for velocity (tangent) vectors $\mathbf{v}$ the norm is given by the well known kinetic energy ($\tfrac{1}{2}\mathbf{v}^T \mathbf{Mv}$), then for covectors the matrix is now the inverse of $\mathbf{M}$. Think about momenta $\mathbf{p}$, the kinetic energy is the same, but now we have to divide by the mass rather than multiply ($\tfrac{1}{2}\mathbf{p}^T \mathbf{M}^{-1}\mathbf{p}$). And you have to multiply velocities by mass to get momenta $\mathbf{p} = \mathbf{Mv}$ and the other way around. Some other covectors are forces (as they work directly on the momenta) and surface normals; and they inhabit a place called the cotangent space.

Now let's fall back from geometry to linear algebra. Locally, $\mathbf{J}$ is a linearization of $g$ and maps the tangent space of the configuration space to the tangent space of the "constraint" space $\mathbb{R}^m$. So, it's a linear transform, thus a rectangular matrix. And it follows all the rules of linear algebra. Ideally, it's full rank and fat (rather than thin, i.e., $m <n$) in order for the system to be solvable. The image of $\mathbf{J}$ is the whole of $\mathbb{R}^m$, which can be viewed as just a collection of numbers, just as for one constraint you get only one number. The meaning of these numbers comes from the analogy with the normal vector, in fact any vector for that matter. They are the magnitudes or components of the projection of a vector $\mathbf{v}$ on the "normal" $\mathbf{J}$, so $\mathbf{Jv}$ is just that. For one constraint it is a number, in general it is a $m$-dimensional vector because the "normal" $\mathbf{J}$ spans an $m$-dimensional space. You can think of taking each row of $\mathbf{J}$ and dotting it with $\mathbf{v}$ to get the projection length. And in fact, from linear algebra we know that the row space of $\mathbf{J}$ is formed by vectors $\mathbf{J}^T\mathbf{y}$, so we know that the rows of $\mathbf{J}$ span the "normal" or "constraint" space of dimension $m$. This space is orthogonal to the tangent space of the constraint manifold, which is precisely the null space of $\mathbf{J}$, i.e., vectors satisfying $\mathbf{Jv}=0$.

What does this tell us about $\mathbf{J}$? That it takes any vector $\mathbf{v}$ and converts it into its projection components. What about the reverse direction? It will only reconstruct the "normal" part of those vectors. So, you lose some information, but this is fine because you obtain the projected vector $\mathbf{J}^T\mathbf{Jv}$ (almost, stay tuned). If you think about it, this is what you do when you project on a unit vector $\mathbf{n}$: $\mathbf{nn}^T\mathbf{v}$. So you may say that $\mathbf{J}^T\mathbf{Jv}$ is a projection operator. Note that if $\mathbf{v}$ is already in the row space of $\mathbf{J}$ no information is lost and projection leaves it unchanged. Some people call $\mathbf{Jv}$ a "projection" and $\mathbf{J}^T\mathbf{v}$ an "un-projection", but this is not really true, as they do not satisfy the definition of a projection operator; but they do part of the job, so we may call these matrices something like "projectors" (when in fact they define the space we are projecting upon).

But this not the whole truth. In 3D you also have to divide by the norm squared $\mathbf{n}^T\mathbf{n}$ if we're dealing with a non-unit vector. And you have to do the same in $n$-dimensional space. Linear algebra and least square problems give you the formula $\mathbf{A}^{-1}\mathbf{Jv}$ for computing the actual projection lengths, where $\mathbf{A}=\mathbf{JJ}^T$. Here $\mathbf{A}$ plays the role of the norm squared of $\mathbf{J}$ that you divide by. Finally, the projection that takes these lengths back to configuration space is $\mathbf{P}=\mathbf{JA}^{-1}\mathbf{J}^T$ and this is a genuine projection operator. Of course, if you're lucky and $\mathbf{A}$ is the identity then the previous paragraph still works, but that's rarely the case.

The projection $\mathbf{P}$ or rather its complement $\mathbf{I}-\mathbf{P}$ is usually called the "canonical" projection on the constraint manifold. And you may recognize the formula from the analytical solution of linearized velocity-level constrained dynamics. In practice you usually do not invert $\mathbf{A}$, but rather solve a linear system where the unknowns are the projection lengths. But $\mathbf{A}$ is not really the same, is it?

So, what is the meaning behind $\mathbf{A}$ anyway? Some people call it the reduced or effective mass, others call it the Delassus operator, others compliance and so on. I think I hinted already at it: $\mathbf{A}$ determines the metric of the "constraint" space. In fact, it's the inverse $\mathbf{A}^{-1}$ that is the true effective mass, the one seen by the degrees of freedom along the constraint directions, and helps define the kinetic energy there. (I picture it as the transformation of the mass matrix $\mathbf{M}$ into "constraint" space). And $\mathbf{A}$ is actually the metric of the cotangent "constraint" space. This makes sense as $\mathbf{J}$ spans the cotangent space of the constraint manifold (in configuration space) and there the metric is $\mathbf{M}^{-1}$, so our "norm squared" becomes $\mathbf{A}=\mathbf{JM}^{-1}\mathbf{J}^T$. This is exactly the matrix that appears in constrained dynamics.

I want to end by noting the fun fact that by deriving a projection operator (pretty similar to what you would do in 3D), we actually solved for the Lagrange multipliers in constrained dynamics. In fact, projection is an optimization problem: find the shortest distance from a point to a surface. And the closest point happens to be on the perpendicular to the surface, one like our "normal" $\mathbf{J}$. Pretty much what we talked about. So, deriving the projector $\mathbf{P}$ amounts to solving the projection (or least squares) problem. Or it does at least for velocity vectors. The general case for projecting a point in configuration space on the constraint manifold is a bit trickier as it's a nonlinear problem, but it can be broken down into such linear steps. But that's another story...

To summarize, in order to solve the (linearized) constrained dynamics problem one simply needs to project velocities into the tangent space which is the null space of $\mathbf{J}$. This is done by projecting them onto the space spanned by the columns of $\mathbf{J}^T$ and then taking the orthogonal complement of that projection.

Monday, January 4, 2021

Mixed FEM and constrained dynamics

I recently got a paper accepted, which in itself was an achievement and culmination of a long period of work, hardship and even failures. But the idea I want to bring forward in this post is that the paper only scratches the surface of the implications it can have and could be developed further. Such is the way of academic publishing: not always the results I or you deem important are considered results worth publishing by the academic community. And I won't go that far into saying that my results were revolutionary. I'm merely saying that they are interesting and they may settle a decades-long debate in the engineering, graphics and simulation circles.

But let me tell you more before I get to that. The mixed formulation of the finite element method (FEM) is a technique that has been around at least from the 70's. Its main purpose was to solve incompressible problems in fluid flow and elasticity. This was due to a problem called locking that is still affecting our simulations. As a simple example, locking happens when you simulate a cantilever bar bending under some load; and although you know it should deflect by quite a bit, the numerical results tell you the deflection is zero or very close to zero. But let's not worry about locking, as even if avoiding it may seem like the main result, I argue that it isn't.

The main achievement of mixed FEM is its beauty and unifying power. So what is it? I will make an analogy with springs. The opposite of mixed FEM is irreducible FEM, or I call it standard FEM, because it's used all over the place. Spring forces (like stresses in a finite element) are given by a constitutive law like Hooke's law: the elongation times a spring constant. These forces are then plugged in some equilibrium conditions or Newton's second law to obtain a simulation of the system. The main takeaway here is that you always need a constitutive law of some sorts besides the laws of statics and dynamics to completely determine the behavior of a mechanical system.

What if we don't substitute the constitutive law into the equations of mechanics as we usually do? We get the mixed formulation of springs, and FEM and everything else. We just get (roughly) double the number of equations that we need to solve, but the problem is essentially the same. In fact, it can be argued this formulation is "more" fundamental, and the irreducible form is one that has been reduced from it (and cannot be reduced any further). But why wouldn't you substitute and reduce the equations? Apparently, no reason at all.

This takes us a long lasting debate between choosing reduced coordinates over maximal coordinates. That is, if a system is constrained in any way (like a pendulum attached to a point) then it makes complete sense to consider that some degrees of freedom were removed by the constraints and use only the remaining ones. Or not... The view of constrained dynamics is that you should still use all the initial degrees of freedom of the pendulum and add extra equations and variables for each constraint. Understandingly, a lot of people don't like this, especially in their computer implementations. But it just so happens that this is the best way to simulate constrained dynamics when you don't know from the start what the constraints will be (i.e. model an arbitrary system).

On the other hands, constraints are only ideal and do not really exist in nature. They are just really strong forces in the end. So, there is a middle way: use all the degrees of freedom and model constraints using very very strong forces (like the ones in springs). This keeps the number of variables and equations in the middle. It is also the standard or irreducible form.

But something quite bad happens when the spring constants grow very large. In theory they should tend to infinity so that the constraints are exactly fulfilled. And this is not possible in practice. Numerics become ill-conditioned and, on top of it, extra problems appear like locking. And these won't go away by just putting more power into the number crunching. You need to roll back to the reducible or mixed form of the equations. Or the one that had both constraint equations and equations of motion. You need to make the problem larger in order to make it solvable. And here comes the big revelation: the constitutive laws in the mixed formulation are just soft constraints in constrained dynamics.

And of course people knew about it. At least many do or have a strong intuition about it. But the rest act as if they don't. There are actually two camps in solving contacts and other topics which divide the world into penalty vs. constraint formulations. When they are in fact two sides of the same coin. In their defense, the constraint camp does assume that their constraints are infinitely hard which is not possible when using finite spring constants or material parameters (and despite being ill-conditioned their problems are still solvable, even if by a bit of cheating). But the mixed formulation is just a place where these two different views of the world meet. In technical terms it is called a saddle point problem and keeps popping up whenever you do simulation (see Strang's textbooks for example). And optimization people have known it for a long a time: they call it the KKT optimality conditions. For them the degrees of freedom of the system are the primal variables and the tensions inside the springs are dual variables or Lagrange multipliers.

My personal breakthrough was to show that any simulation problem, no matter how complex (including PDEs), can be formulated as such a mixed formulation involving both the equations of motion and the soft constraints (aka constitutive laws). If any of the constraints needs to be hard, we just put a zero value next to it and everything still works. So, we can build a single unified solver for all mechanical phenomena, including say FEM and contacts, by using the same framework. Of course, that doesn't mean it will be optimal; but sometimes elegance and generalization is better than some optimizations here and there.

And, ok, you would argue, this was still known. Yes, yet nobody used that knowledge. For example, you can build a whole multibody simulator by just treating every constitutive law used as a soft constraint and then use a single constraint solver like XPBD (or a more accurate one, I don't care). Or you can use very stiff springs (the penalty approach) as long as you find a nice stable way to solve the problem. And in the end it may not even matter as optimization theory is already using techniques that rely on both facets of the problem, like the augmented Lagrangian method. You add a bit of penalty, a bit of constraints and you get to solve your problem efficiently.

So what's left to do? I would say a lot. We need to build these unified multibody simulators using the mixed framework and see how they perform. What is the best approach for solving them? Sometimes, you have to pay a bigger price or be a bit slower than the state of the art to reach this degree of generality. Also, pay attention to traps like locking, which become visible in the mixed formulation, but require deep changes at the discretization level in order to be fixed. Overall, I would say it gives us a much higher vantage point on things. What else? Maybe people using implicit integration of stiff springs will stop claiming that they are doing a very different thing from people using constraints. Or claiming you can not simulate some fancy nonlinear dynamics using constraints (which you can by using the three field mixed formulation). Or constraint people realizing that regularization is good from time to time, that relaxation solvers like Gauss-Seidel are in fact already doing it and that not every problem is a nail. And that at some point you can write a solver that takes the best of both worlds. For example, one that tackles high stiffness through constraint solvers and large mass ratios through primal solvers (at the same time, in a single pass, inside one monolithic hybrid solver).

Wednesday, December 23, 2020

The force accumulation pattern

Game physics has resisted parallelization over the years and especially so on the GPU. One reason is that it is not embarrassingly parallel like vertex and fragment shaders. Another reason is that the code becomes hairy as soon as you start optimizing it for SIMD or multi-threading. My goal is to keep the code both easy to read and as fast as possible. Also, I would like it to run both on CPU cores and GPU (not necessarily at the same time). This is usually done through abstractions that do all the optimizations behind the curtains. Starting with this blog post, I will try to identify some of these abstractions that appear often in physics simulation and see how they can be exploited.

I will start with some very simple examples applied to masses connected through springs: explicit integration and gradient descent. The former is used for dynamic simulation, while the latter is the most rudimentary form of static analysis, although it can be used for implicit integration too. In fact, they are almost the same thing mathematically speaking. There are also differences, so I will focus on the computational similarities only.

Gradient descent

$ p_{i+1} = p_i + \alpha f $

Verlet integratin:

$ p_{i+1} = 2 p_i - p_{i-1} + \frac{h^2}{m} f $

By $f$ I denoted the total force acting on the system, which is the negative gradient of the total energy; hence, the plus sign in both formulas. Leaving external forces (like gravity) aside, the total force then consists of internal forces only (that is, forces due to springs). Therefore, we first calculate the spring forces, then we disrtibute them to the masses and finally update the positions using one of the formulas above. Here’s the picture:

This pattern will come up over and over again, so let’s analyze it. The first and last steps are embarrassingly parallel - essentially map operations. What differentiates them though is that the first one operates on the springs (edges), whereas the last one operates on the particles (nodes/vertices). For this reason, the first step also includes a gather operation, i.e. it reads the positions of the two end-points:

p1 = p[edge(j, 1)]

p2 = p[edge(j, 2)]

This is just an indirect load operation, which can be done in parallel. (It could also be accelerated by using some fancy SIMD gather instructions.) After that we just compute the length of the spring and subtract the rest length - let’s call this value ɸ. I call ɸ the error function or the contraint function. For springs, it is simply the elongation. If we divide it by the rest length then it’s called the strain. We multiply ɸ by the spring constant k (or whatever nonlinear spring law we have) in order to obtain the spring force magnitudes λ (or tensions). Here’s the pseudocode:

d = p2 - p1

l = len(d)

n = d / l

ɸ = l - l0

λ = -k * ɸ

Now, comes the hardest part - step 2. Or if you don’t want to do it in parallel, then it’s not hard. You simply compute the force along the spring t = λ * n and distribute it to the nodes according to the following algorithm: if the spring leaves the node you add the force, otherwise subtract it. You do this using a force accumulator: you reset it to the zero for each node and then add/subtract to it as described. You can do this per-edge as a continuation of the code above:

f[edge(j, 1)] -= t

f[edge(j, 2)] += t

This is analogous to Kirchoff’s law for currents entering and leaving a circuit node (more on this another time). But the code looks a lot like the gather operation in reverse - it’s really a scatter operation (or a mixture thereof). More than that it’s a scatter-add operation. We can also call it a reduction, but this applies more to accumulating into one or two variables, like the norm of a vector or the total energy of the system. And this won’t work in parallel.

There are several ways to attacking the scatter-add. Essentially, all work on the underlying graph of springs and masses. What we need is the node-edge incidence table: for each node store initially each incident edge and its orientation (plus or minus). Then go through each node in parallel and simply add/subtract (depending on the sign) all the incident spring forces. That was easy! This is the simplest method I could find, without using locks, critical sections or atomic operations. The drawbacks are that you need to store the extra memory for the incidence (which is bit redundant) and the spring forces and update the incidence info if the graph changes (e.g. for contacts). Oh, and it might thrash the cash. Still, I don’t have a better solution for writing it from scratch.

But the good thing is that we have identified the scatter-add primitive and we can abstract it and optimize it later. In fact, some libraries offer it; for instance, I used torch-scatter for PyTorch. And to sell you a secret, you can even do it with any sparse linear algebra library out there. You just need to build the difference matrix A with -1 and 1 non-zero values on each row (corresponding to each spring) and then multiply its transpose with the spring forces: $f = A^T t$. This matrix is equivalent to the incidence info and you might even end up using less memory. I honestly don’t know if it’s faster or slower than the hand-crafted version. I’m just glad we have options. Additionally, some libraries offer a SIMD version of scatter write, but you would have to write the scatter-add yourself.

Let’s go back to the figure above and change it to make it more general. The point is that you can reuse it for all kinds of different computational codes. And not only simulation, for instance computing the vertex normals of a mesh. By local quantities I mean anything that can be defined over an edge in the graph or a generalization thereof (a stencil or hyper-edge) like a triangle or tetrahedron. The best examples are finite elements - you can implement a very simple FEM simulator with only these concepts. Finite differences should work too. And you can also use the same flow to implement a Jacobi solver for a constraint solver. And the step of computing and assembling global gradient vectors gets called many times over advanced solvers. This is why it’s crucial that it is as optimized as possible.

Wednesday, May 6, 2020

Two algorithms from the 60's

The story goes that Michael (Mike) Powell was supposed to give a seminar at the University of Leeds on derivative free optimization one day in the early 60s, probably 1963. In November 1959 William (Bill) Davidon had just released a technical report on what he called the “variable metric method”. He was a physicist at the Argonne National Laboratory that was using gradient (and coordinate) descent methods and came to invent quasi-Newton methods. Powell had got his hands on the report in 1962 and so it seems had Colin Reeves, a chemist working on molecular structure problems and also the PhD advisor of Roger Fletcher at Leeds. Reeves gave the report to Fletcher, who implemented the method, as did Powell. After giving his talk Powell was surprised to find out that Reeves already knew the method and had implemented it. They decided to work together on the method and out came the first paper in 1963 on what was later called the Davidon-Fletcher-Powell (DFP) formula. The irony is that Davidon was rejected for publication and his paper was finally published only in 1991 in the first issue of SIAM Journal on Optimization.

The DFP work later lead to the renowned BFGS method and the whole class of quasi-Newton algorithms. The funny thing about BFGS is that it was not single paper with the acronym coming from the authors' initials, but it was actually four separate papers published independently. The matrix free or limited memory version of BFGS (L-BFGS) later become a very powerful and efficient method (due to Jorge Nocedal). Overall, the DFP, BFGS and similar methods brought significant improvements to the state of numerical optimization at that time which consisted mostly of steepest descent and Newton’s method.

One has to keep in mind that numerical analysis at that time was still incipient and that software libraries were in the process of being written. Every university had its own clunky computer, that filled entire rooms or even floors, and not even Fortran was yet standard. For instance, Davidon used block diagrams in his report and Fletcher coded in Algol. Some people used assembly, other early forms of compilers. These people were pioneers of numerical optimization in those days and many of them were not mathematicians per se. Colin Reeves started as a quantum chemist at Cambridge working on molecular orbitals and later became a computer scientist. Fletcher is described by chemists as a computer scientists who later left molecular simulations for a career in optimization. But after all, he was therefore an applied mathematician until the end. Powell was for most of his career a mathematician working at the Atomic Energy Research Establishment at Harwell, close to Oxford; he later moved to Cambridge as a professor. Davidon was a physicist (although Fletcher majored in physics at Cambridge too) who later became known as a peace activist, anti-Vietnam war protester and mastermind behind the FBI office burglary in 1971. On the other hand, Powell was interested only in the mathematics and not the physics part, as he admits that he could not follow Davidon’s arguments on the variable metric inspired from general relativity and differential geometry.

Fletcher soon went on after the DFP work to develop the nonlinear conjugate gradient (CG) solver for unconstrained nonlinear optimization. He claims that the idea was given to him by Reeves who realized that he could re-use the line search procedure inside the then classic CG solver for linear systems. This resulted in the Fletcher-Reeves method. In fact, Fletcher is of the opinion that he was given two good ideas by other people. In this way, Fletcher helped develop two methods that have become cornerstones of numerical optimization. Although, in a later interview he shared his doubts about NCG being a reliable method for large scale problems. But at that time, NCG and DFP/BFGS were pretty much the only efficient methods that were available in numerical libraries and computational quantum chemistry software suites and probably other places too. And then followed augmented Lagrangian, SQP, interior point methods and all that.

Fletcher was a man that relished practical problems and considered them as drivers of the field of numerical optimization. He considered himself a physicist rather than a mathematician until the end. In an interview he gave in 2015, before passing away one year later, he said the following: “I think for us in numerical analysis – in optimization in particular – because that’s the purpose of it: it’s to optimize; it’s not to produce pure mathematics... in my view, differing from other people’s view. But it’s to solve problems. And we just keep solving the same old problems – getting another 5 % here and another 5 % there: it’s a bit sterile.” I think these are very nice and true words coming from one of the giants of optimization.

Monday, August 26, 2019

Quaternion constraints

This post is mainly a distillation of the information given by Claude Lacoursière in his thesis and in a chapter from the Game Physics Pearls book. It is mostly the result of trying to implement these quaternion based constraints into a game physics engine. I will limit myself to only two examples that seemed most practical and are also part of the understanding process: the totally locked rotational constraint and the hinge joint. I will leave it to you to investigate more complex rotational joint types. My only goal here is to simplify a bit the original texts and give some direct results, ready for implementation.

Let's start from a completely locked quaternion-based rotational constraint:
\begin{equation}
g(q,p)=\mathbb{P}p^*q = G(p)q = 0,
\end{equation}
where $p$ represents the desired relative orientation and $q=p_1^*p_2$ is the relative quaternion between the two bodies. By $p_i$ we mean the constraint frame orientation quaternion in world space relative to body $i$. The projection matrix $\mathbb{P}$ extracts the vector part from a quaternion: $q_v = \mathbb{P}q$ and $q=(q_s, q_v)$.

$G$ is a helper matrix used to convert quaternion operations to matrix operations. We will be using two such matrices:
\begin{equation}
E(q) =
\begin{bmatrix}
-q_v && S(q)
\end{bmatrix},
\end{equation}
\begin{equation}
G(q) =
\begin{bmatrix}
-q_v && T(q)
\end{bmatrix}.
\end{equation}
The 3 by 3 matrices $S$ and $T$ are the transpose of each other so we can only use the latter:
\begin{equation}
\label{tmatrix}
T(q) = q_s I_3 + [q_v]^{\times},
\end{equation}
where the skew-symmetric matrix of a vector $v=(x,y,z)$ is:
\begin{equation}
[v]^{\times}=
\begin{bmatrix}
0 && v.z && -v.y \\
-v.z && 0 && v.x \\
v.y && -v.x && 0
\end{bmatrix}.
\end{equation}
Note that we are using a convention different than the one used by Lacoursiere, i.e. the transpose of his matrix (or with a flipped sign as for any skew-symmetric matrix $A$ we have $A^T=-A$).

In general, the time derivative of the constraint function (i.e. the velocity error) is of the form:
\begin{equation}
\label{velconstr}
\dot{g}=J_1 \omega_1 + J_2 \omega_2,
\end{equation}
where $J_1$ and $J_2$ are the angular Jacobians corresponding to the two bodies and we denote by $\omega$ their angular velocities. Note that I have considered that the constraint function is not time dependent; so make sure the constraint frame is not changing in time, otherwise you will have to include the time derivative of $g$ too.

I will now give the formula for the Jacobian. Keeping in mind that $J_2 = -J_1$ we have:
\begin{equation}
J_1 = -\tfrac{1}{2}G(p_1 p) E^T(p_2).
\end{equation}
Using the desired target relative quaternion $p$ is a good thing when you want to set up a joint motor or to set up a locked constraint without actually providing a constraint frame (by setting $p$ to the initial relative orientation between the bodies, i.e. $p=q_1^*q_2$ using the body orientations at the time the constraint is set up). However, for most practical reasons (like implementing a hinge) you will require a constraint frame to be defined and then all is required is that $p$ is the identity quaternion (and thus $q$ too):
\begin{equation}
g(p)=\mathbb{P}q=q_v=0.
\end{equation}
That simply means that the two constraint frames expressed in the two body local frames should overlap (for a locked joint at least). We only need three constraint equations because there are only three rotational degrees of freedom. This is also why we use unit quaternions (as the $q_s=1$ constraint becomes implicit).

After getting rid of $p$, we can further simplify the Jacobian so that it only consists of 3 by 3 matrices:

\begin{equation}
J_1 = -\tfrac{1}{2}[q_{1,v} q_{2,v}^T + T(q_1)T(q_2)].
\end{equation}

A hinge constraint is defined as following:

\begin{equation}
g(q) =
\begin{bmatrix}
a^T \\
b^T
\end{bmatrix}
\mathbb{P}q = P_{hinge}q=
\begin{bmatrix}
a \cdot q_v \\
b \cdot q_v
\end{bmatrix},
\end{equation}
where $a$ and $b$ are two perpendicular unit vectors in the constraint frame - their cross product will be the hinge axis. Most of the time we will choose them to be two of the coordinate system axes.

For solving the constraint we define two vectors $u$ and $v$ as the columns of $J_1^T$ corresponding to the coordinate axes chosen as $a$ and $b$. Then the velocity error becomes:
\begin{equation}
\eta=
\begin{bmatrix}
u^T \\
v^T
\end{bmatrix}
(\omega_1 - \omega_2).
\end{equation}
If we define the symmetric matrix:
\begin{equation}
\tilde{A}=
\begin{bmatrix}
u^TAu && u^TAv \\
v^TAu && v^TAv
\end{bmatrix},
\end{equation}
then the hinge constraint reduces to solving the 2 by 2 linear system:
\begin{equation}
\tilde{A}\lambda + \eta + \tfrac{\beta}{h}g(q)=0.
\end{equation}

Friday, January 19, 2018

Transversal stability of constraints

Often, constraint solvers get unstable, especially if they are working at velocity level. Actually the root cause is that they are explicit solvers, so both SHAKE and velocity time stepping (VTS) can suffer from similar problems. Interestingly enough, SHAKE right away explodes if it's stressed enough, while VTS manifests what is known as "jittering" (in the same conditions). And so far, nobody really knew what jittering really is and how to measure it. In what follows I will attempt to formulate jittering mathematically as oscillations transversal to the constraint directions. Actually, this fact was gradually hinted by a series of papers: [Servin et al. 2011], [Kaufman et al. 2014], [Tournier et al. 2015], [Andrews et al. 2017]. But what I'm trying to do is show that this is formally true, at least for bilateral constraints.

Another important thing to note is that instabilities don't come from the constraints themselves, but from how constraint forces are distributed to the other constraints and how these constraints are geometrically related to each other. This is is the essence of the geometric stiffness concept: that constraint forces and constraint orientations can act as stiff forces (e.g. angular springs) in the transversal directions. Take as example a thread made of particles and massless rigid constraints between them: the constraint forces that act along constraints to restrict half of the degrees of freedom (DOFs), also act on the rest of the DOFs (i.e. the angles of the multi-pendulum) and couples them together (like in the chaotic behavior of the double pendulum).

Consider we are given a dynamical system with $n$ degrees of freedom and $m$ constraints. I will consider the case $m<n$ here, and leave the over-constrained case for future treatment. The constraint are given implicitly by $m$ constraint functions: $\psi(q)=0$. In fact, these functions measure the displacement along the constraint direction, so they can be considered as a coordinate.

The trick I will use is to consider a coordinate change from $q$ to another set of coordinates $(\psi, \theta)$. As you can see, we already know $m$ coordinates given by $\psi$, and we need to pull out the rest of $n-m$ coordinates $\theta$ from somewhere. This is quite simple to do for a thread (multi-pendulum) where we have closed formulas for all the coordinate changes, but not so straightforward for the general case. Turns out we don't need to know these nonlinear transformation formulas, but rather their Jacobians. This approach is inspired from an older way of solving constraints, known as the local coordinates approach. The main reference used here will be [Potra and Rheinboldt 1991] (On the Numerical Solution of Euler-Lagrange Equations).

The Jacobian of $\psi(q)$ is denoted by $J=\nabla_q \psi(q)$ and is usually easy to compute and necessary for constraint solving. This is not true though for $L$, the Jacobian of $\theta(q)$. But [Potra and Rheinboldt 1991] show that $L$ exists locally and can be computed numerically (through QR factorization of J). Moreover, it can be chosen so that $L^TL=1$ (i.e. orthonormal). Once we have $L$ (even if locally), we can start looking at the equations of motion and how they would look like in $\theta$-space (i.e. transversal direction). So let's see the equations of motion in maximal coordinates $q$ with only constraints and no other forces:
\begin{equation}
M\ddot{q} = J^T \lambda.
\end{equation}
Move the mass on the right-hand side and multiply to the left by $L$:
\begin{equation}
L\ddot{q} = L M^{-1} J^T \lambda.
\end{equation}
We do this because we know the fact that $L=\nabla_q \theta(q)$ and so $\dot{\theta}=L\dot{q}$. Differentiate another time and we get $\ddot{\theta}=L\ddot{q} + \dot{L}\dot{q}$. Therefore we get:
\begin{equation}
\ddot{\theta} = L M^{-1} J^T \lambda + \dot{L}\dot{q}.
\end{equation}
I will call the last term a gyroscopic term, as it only appears because of the nonlinear change of coordinates and I will ignore it from now on, because I know it's not the culprit for instability. But I do know that the term $L M^{-1} J^T \lambda$ is the one that's stiff. Most of this insight came (to me and others) from the first two papers mentioned in the first paragraph. So, in order to quantify the stiffness, we need to linearize this term (keep in mind we are already in a restricted vicinity of $q$ because of $L$). So let's differentiate the matrix $A = L M^{-1} J^T$ with respect to $\theta$:
\begin{equation}
\frac{\partial A}{\partial \theta}=\frac{\partial L}{\partial \theta} M^{-1} J^T + L M^{-1} \frac{\partial J^T}{\partial \theta}.
\end{equation}
But here's the catch: the transversal directions $L$ should not depend on the transversal coordinates $\theta$ but only on the other coordinates $\psi$. A similar choice of constraint directions $J$ can be made so that they only depend on $\theta$. What I cannot prove right now is that such a choice for both $J$ and $L$ always exists. However, I will assume it does, and continue based on that - namely, that $\frac{\partial L}{\partial \theta}=0$. Another thing we want to do, is express all terms as much as possible in terms of $\theta$, so we note that:
\begin{equation}
\frac{\partial J^T}{\partial \theta}= L^T \frac{\partial^2 \psi}{\partial \theta^2}.
\end{equation}
We can now finish our linearization, and after dumping the gyroscopic term, we get the following approximate local equation:
\begin{equation}
\ddot{\theta} = C + LM^{-1}L^T \left( \frac{\partial^2 \psi}{\partial \theta^2}\lambda \right) \theta = C + EK\theta,
\end{equation}
where $C$ is a constant term, $E$ is the inverse effective mass in transversal space and $K$ is the stiffness matrix as seen by the free degrees of freedom $\theta$. What we would like to show now is that there is a relation between this matrix $K=\frac{\partial^2 \psi}{\partial \theta^2}\lambda$ and the geometric stiffness $\tilde K = \frac{\partial^2 \psi}{\partial q^2}\lambda$. Given that $\frac{\partial^2 \psi}{\partial \psi^2} = \frac{\partial^2 \psi}{\partial \theta \partial \psi} = 0$, I was able to show that $\tilde K = L^T K L$. I may have to go again through the calculations, but it makes sense that the matrix $K$ is "pushed forward" in $q$-space and vice-versa.

Now, let's try and bring the last equation from transversal space back to $q$-space. Note that we linearized and approximated the original equations, so what we are bringing back now is basically just a portion of all the forces, but most importantly they are the stiff forces. This why we also leave out the constant term. Therefore, we multiply the equation to the left by $L^T$:
\begin{equation}
L^T\ddot{\theta} = L^T LM^{-1}L^T K \theta.
\end{equation}
We can now use the fact that $L^TL=1$:
\begin{equation}
\ddot{q} = M^{-1}L^T K \theta(q).
\end{equation}
We need again to linearize this equation (around a point $q^*$), so that stiffness shows its face:
\begin{equation}
\ddot{q} = M^{-1}L^T K L (q - q^*) + M^{-1}f^*.
\end{equation}
It is clear now to see that:
\begin{equation}
M\ddot{q} = \tilde K (q - q^*) + f^*.
\end{equation}
Again, we should ignore the constant terms and notice the fact that the stiffness matrix of the transverse forces in $q$-space is just $\tilde K$ - the geometric stiffness. Of course, this is just a linearization of the real force, but it is what matters locally, especially when doing numerical integration. Linear stability analysis is all we have, so the values of the $\tilde K$ matrix are very important for stability analysis in $q$-space (as the values of $K$ are in $\theta$-space). If the highest eigenvalue $\omega^2$ of $M^{-1}\tilde K$ does not satisfy $\omega < 2/h$, then the simulation becomes unstable. This is an idea that has already been proposed in [Andrews et al. 2017], but I was just not sure if it was true. Now that I've shown it is, it's time to focus on the modal analysis of $M^{-1}\tilde K$ as the proper way of studying stability in constrained based dynamics.

Tuesday, January 9, 2018

Implicit velocity projection

During the time I spent looking closer at PBD and what it really is I learned one clear thing: PBD is nothing else than a form of implicit integration (Backward Euler) of the constraint forces. Goldenthal calls it Implicit Constraint Directions (ICD). His optimized version he called Fast Projection (FP). FP solves a linear system at every nonlinear iteration. PBD runs just one Gauss-Seidel (or Jacobi) step at every iteration, resulting in what is known as Nonlinear Gauss-Seidel (NGS).

The other thing I learned is that PBD is very stable just because it is fully implicit. And this is why you can do cloth with PBD, but not with velocity based methods, which I'll denote hereon as velocity time stepping (VTS). This is because articulated systems can get very unstable with VTS. The short explanation is that VTS is using the constraint direction at the beginning of the frame (i.e. explicit directions) and not the one at the end (implicit directions). Therefore a part of the system is integrated explicitly (the part transversal to constraints) and it is well known that explicit integrators can get easily unstable and even blow up. The good part is that VTS is very similar to SHAKE (pretty much a linearized version of it) which is symplectic. This means energy gets conserved better in the transverse direction - I will detail these things in another post.

Unfortunately conserving energy is not always what we want. Stability comes a lot of time for free from implicit integrators because they dissipate energy artificially. This may be bad for science projects, but not for games, graphics and real-time simulation. Therefore I've been looking for a way to make VTS stable without going really the PBD way. The differences between the two comes a lot from what we are trying to solve for: VTS solves for velocity constraints, PBD for position constraints. So how can we keep the VTS way without altering it too much, and also get the stability of PBD? The answer: integrate using implicit constraint directions but keep the velocity constraints.

Let's throw in some equations for this case:

\begin{align}
Mv^{l+1} &= Mv^l +hf_{ext} + hJ(x^{l+1})^T \lambda \\
x^{l+1} &= x^l + hv^{l+1} \\
0 &= \tfrac{\beta}{h} c(x^l) + J(x^l)v^{l+1}
\end{align}

We can express this as a system in the unknowns $v$ and $\lambda$:
\begin{align}
Mv &= Mv^l +hf_{ext} + hJ(x^l + hv)^T \lambda \\
0 &= \tfrac{\beta}{h} c(x^l) + J(x^l)v = G(v)
\end{align}

While the second equation is linear, the first one is not and we need to apply Newton's method to it. The Jacobian for the whole system then becomes:

\begin{equation}
\begin{bmatrix}
M-h^2 H(x)\lambda & -h J(x)^T \\
J(x^l) & 0
\end{bmatrix},
\end{equation}
where $x = x^l + hv$.We can proceed just like in the case of FP and ignore the geometric stiffness term $K=H\lambda$, where $H=\nabla J = \nabla^2 c$, and also set the residue of the first of the first equation to be always 0. We then get the following linear system:
\begin{equation}
\begin{bmatrix}
M & -h J(x_0)^T \\
J(x^l) & 0
\end{bmatrix}
\begin{pmatrix}
\Delta v_1 \\ \Delta \lambda_1
\end{pmatrix} =
\begin{pmatrix}
0 \\ -G(v_0)
\end{pmatrix}
\end{equation}

We can now take directly a Schur complement and obtain the usual constraint solver equation:

\begin{equation}
hJM^{-1}J^T\Delta\lambda_1 + G(v_0) = 0,
\end{equation}

where we can take by approximation $J$ to be the same on both side, i.e. $J(x_0)$. Then the change in velocity is $\Delta v_1 = hM^{-1}J^T\Delta\lambda_1$ and the update step is given by:
\begin{align}
v_1 &= v_0 + \Delta v_1, \\
x_1 &= x^l + h v_1.
\end{align}
If you iterate this scheme you get implicit VTS and it's way more stable than VTS. I only tested this on a 2D thread/cable made of distance constraints and I haven't got to make measurements, but it looks just like PBD. But the algorithm looks a lot like VTS - there are two main differences though:

at every iteration, after you compute the new velocities, you compute new positions too, and
at every iteration you recompute the Jacobians of the constraints based on the new positions.

Otherwise, everything stays the same. You could use an exact solver for the Lagrange multiplier or better just use one Gauss-Seidel step, which results basically in NGS. The only trick is that you don't set $\beta$ too close to 1 or it explodes (why, I don't know yet).

Here's how it looks like for some extreme initial velocity conditions:

The same scenario, using VTS.

And for completeness, the PBD case.