Emissaire

Wednesday, January 4, 2023

Hello integrator

I thought many times about posting some kind of tutorial for physics simulation. And my mind always goes back to the basics: the integrator. Actually the most simple integrator that works: Symplectic Euler. But how do you implement it best if I were to start over (or from scratch)?

This is how I would do it today:

$v += h g + hM^{-1}f$

$x += h v$

The main idea is that I would be using vector notation just like in Matlab. The main quantities are the positions $x$ and the velocities $v$. Now there is the question how do you actually store these positions: either as an array of vector3 (or vector2) or store first the x coordinates, then the y and so on. Or you can store them as the rectangular matrix where the rows are the individual coordinates and the columns the vectors. In my experience, the first option is the best, as most operations are done on vector3 corresponding to each particle. The problems start appearing when you want to switch from 3D to 2D or from particles to rigid bodies. In fact you can have any number of degrees of freedom (DOFs) per body and in total, so it's up to the vector abstraction code to deal with such irregularities (the "views" from NumPy and PyTorch may be a way to go, but not sure how expensive they are).

But in the example above we're just dealing with regular n-dimensional vectors. The only hiccup is the broadcast of the gravitational acceleration $g$ to the bigger vector $v$. But assuming that we are dealing with a list of vector3's (and leaving rigid bodies aside) this should be rather easy. Same goes for scaling by a number, in this case the time-step $h$. Multiplying a general vector by a matrix should also be doable, although in this case we know it is a diagonal matrix and we should probably do it by a pair-wise multiplication instead (would need a special operator notation). (When doing FEM with non-lumped masses, $M$ is no longer diagonal.)

The whole idea here is to avoid for loops and leave the vector abstraction layer do it for you. It's simply more elegant and easier to write if you ask me. And if the vectors are very, large as it usuallys happens, resort to parallel fors or some other parallelization technique (GPU is a bit trickier to abstract as plain C++ code, but would be nice too). For the time being, I expect all these linear algebra operations to be executed on the spot (by order of precedence of operators). Some kind of lazy evaluation (a la Eigen) would be nice, or even fully storing the graph of operations (a la PyTorch) for later execution on GPU or auto-differentiation would be even nicer. But let's stay on planet Earth for the moment and stick with an in-house implementation of these operation before we consider other linear algebra or tensor libraries.

And I did implement this in C++ in a pretty straightforward manner:

void Update()
{
   const real h = real(0.16);
   vec2 gravity(0, 10);
   vel += mask<vec2>(h * gravity, invMass) + h * invMass * forces;
   pos += h * vel;

}

Above, invMass is another vector containing the diagonal entries of $M^{-1}$. But as you can see, this is not the end of the story: we still have the problem of "boundary conditions". Usually, when dealing with particle systems we only deal with one type of boundary condition (or constraint): fixed particle in space. In some cases this particle can also have a prescribed motion in which case we call it kinematic rather than fixed. But in either case, such particles are not to be integrated. This results in a simple algorithm where we check to see if the particle is fixed or kinematic and skip it in the integrator. The convention is usually that such particles have infinite mass (they can't be moved by any forces) and thus zero inverse mass. A lot of code out there (especially constraint solvers) is relying on this convention and storing only the inverse masses for particles and exploiting the zero value whenever needed (e.g., a constraint linked to fixed anchor in space or coming from animation). But as you can imagine, this makes it working with the actual masses harder. However, let's indulge ourselves in this convention and continue.

There are a few ways I identified in which we can treat boundary conditions (Dirichlet ones mostly):

group all the fixed particles at the beginning of the vector, store an offset to know where the dynamic particles begin and only integrator from there on;
use a "mask" operation (like the ones used in SIMD) to select between the integrator result and another value, most probably zero;
store a list of indices in the big vector to mark the dynamic particles (or fixed ones, respectively) and then use that list to access a sub-set of the vector (a la Matlab or NumPy);
eliminate those fixed particles altogether from the global DOFs vector (through "direct substitution"), but then you have to store them somewhere else and take them into account when computing (spring) forces.

The first option is something that I used extensively in my FEM code (where a node is equivalent to a particle). I also used option 3 when porting the same FEM code to Python (either NumPy or PyTorch) and it's basically inspired from the Matlab way of doing things. That doesn't mean you can't use the other options for a Matlab or Python simulator. It's clear that here I chose option 2, as I think now it's a better choice, at least for the Symplectic Euler code. It pretty much corresponds to the old way of putting an if inside a for loop, although the actual implementation of the abstraction may vary. Finally, option 4 is something I did in some of my constraint solvers, but the downside is that then you have to differentiate between your constraint types: particle-world vs particle-particle. In the end, it's a lot about personal tastes and how you like to organize your code. Which is the faster one is hard to say, probably 1 or 4, but such options are quite limiting too. For example, you can't dynamically set particles to be kinematic or dynamic without some complicated reshuffling or re-labeling of edge forces. Hence the mask option for now, which can probably be implemented some SSE instructions.

Finally, the hardest part in this code is not even contained in it: it's computing the actual forces $f$ acting on each particle. I already talked about this in my force accumulation pattern post, and I intend to do it one more time in a future post, as there is not enough space left in this one. But I will mention that a lot of my current motivation is coming from the Simit language. Even if it looks more like GPU code (kernels/shaders) and it's not always vectorized like Python or Matlab code, it has an important feature: reductions (like the force accumulation) are abstracted and automated by the language. The "map" step as it is called in Simit allows the user to perform Matlab-like linear algebra operations after the vectors and matrices have been assembled (i.e., vectorized code). And these could be either an implicit integrator of some sort involving a linear solver (or nonlinear optimization) or an explicit one as above. In the end, this is what I would like to achieve in my own C++ code too, or even use Simit directly if possible.

The end goal is to understand all the steps and abstractions necessary in writing simulation code and paving the way to a pipeline design like in graphics, while leaving enough flexibility (and ease of coding) to implement completely different algorithms (integrators, solvers, etc.). I'm not sure if a domain specific language (DSL) is the way to go here or if writing everything in C++ once for multiple backends is better (although not sure how to target GPU from there: SyCL hints in that direction, also PyTorch is an example of success too).

Wednesday, December 14, 2022

Maya plugin for intersection

I spent some time lately trying to write a Maya plugin that would find the intersection of two surfaces and help then the user "fix" that intersection, i.e., deform at least one of the surfaces so that they intersect no more. Here is the video of what I managed to do so far.

It is by far not complete, but I decided to call it a day for now until I feel like getting back to it. It is also very simple - right now it only works with oriented surfaces, as I'm using the triangle normals to figure the interior and the exterior of the intersection. And this is pretty much what it does, it gives you the intersection points and the list of interior points on each mesh. Then you can use either of these lists to select the intersecting points on one of the meshes and then do something with them, like move them or shrink wrap to the other surface (which is what I do in the video above).

As for Maya, it proved to be much harder than I thought. Respect to all the people that are working with the Maya API, both in C++ and MEL (or Python). This is some pretty hard stuff to learn. Therefore, my solution isn't final yet. All I managed to do is create a locator node with some custom drawing (based on an example) and just plot the intersection points. The outputs of my node are: the number of intersection points, the list of interior points for mesh 1 and 2, respectively, and each of the two mesh patches involved in the intersection (the intersecting triangles and the interior ones). Then I wrote a MEL procedure to take the interior point list output (an int array) and use it to make a vertex selection. In my opinion, this is pretty useful, even if I'm replicating the work, as the boolean operation in Maya doesn't offer this option. Then you can do what you want with it.

For the spheres case, a shrink wrap using vertex normals seemed to do the job (after some tweaking of the parameters). what the shrink wrap does is choose a direction (or one per vertex) and shoot rays from each vertex (raycast) until they intersect the other surface - those will be the target positions. But moving the vertices along the normals also works, as well as using a bit of soft selection (but not what I expected, really). What the shrink wrap does is choose a direction (or one per vertex) and shoot rays from each vertex (raycast) until they intersect the other surface - those will be the target positions. Of course, there are a million ways this could go wrong for more complex scenarios, e.g., triangle inversions and self intersections. I wrote my own shrink-wrap-like deformer to better control the process. The key is to only advance by incremental steps before self-intersections occur; and even if it does occur it can be reversed by some smoothing operation. As you can see in the example below in some cases you need to first smooth (a lot) the mesh selection before starting the shrink wrap deformation.

I need to play a bit more with the deformation direction field: average it somehow, use rays coming back from the other mesh, etc. But I still think vertex normals are essential for the direction of deformation, as they are also the gradient of the volume of intersection (which we would like to minimize down to zero).

Wednesday, April 20, 2022

Projective constraints

This post just needed to be done, to mark again something that was known for some time and that uses once again the concept of projection. I am calling "projective constraints" what Baraff and Witkin called conjugate gradient (CG) filters in their break of the century seminal paper on cloth. I took the name from the SOFA framework documentation where it appears in a slightly different (but simpler) form, sign that this concept was previously known (which is almost always the case). In fact, a very similar form of CG appears in the book by Nocedal and Wright on numerical optimization.

I will try to be brief and acknowledge my main source as being the excellent tutorial on deformable body simulation from Pixar written by David Eberly and Theodore Kim. They are the ones to extract an important idea from a 2015 paper on multigrid cloth simulation by Tamstorf et al. And that idea is that you don't necessarily need CG to filter (or project) constraints, but you can do it with any linear system solver of choice as long you transform your system into the right space. This was an idea that was explored before by Ascher, Boxermann and Ye, but they all focused on the idea of a reduced system (of a smaller and even varying size). The pre-filtered preconditioned CG (PPCG) method, as it was later called, got one thing right: the correct form of the transformed system that was well-conditioned and works great in practice. In retrospective this looks moderately easy to derive, but somehow no one managed to derive it correctly before.

Hopefully, I will not mess up equations too much and add to the confusion, but here are the main lines of reasoning. You start with a linear system $\mathbf{Ax}=\mathbf{b}$. Then, you have a projection matrix $\mathbf{S}$ that takes you from the full configuration space to a smaller one where motion is restricted by some constraints. The letter $\mathbf{S}$ is chosen historically and also to differentiate from its orthogonal complement $\mathbf{P} = \mathbf{I} - \mathbf{S}$ which takes you onto the constraint space as was described on this blog before. Keep in mind also that orthogonal projection matrices are symmetric.

By the rules in which matrices and vectors transform under a change of basis, we can move our linear system into the "free" or "unconstrained" space: $\mathbf{SASx}=\mathbf{Sb}$. Or at least it would seem so. The more correct form is the one from the PPCG paper:

$(\mathbf{SAS} + \mathbf{I} - \mathbf{S}) \mathbf{y}=\mathbf{S}(\mathbf{b} - \mathbf{Az})$.

We assumed $\mathbf{z}$ is a prescribed solution for the constrained dimensions of the system. This means the final solution is a mixture between these and the free degrees of freedom: $\mathbf{x} = \mathbf{y} + \mathbf{z}$. Hence, $\mathbf{y} = \mathbf{x} - \mathbf{z}$. You need to be a bit extra careful to project $\mathbf{z}$ by $\mathbf{I} - \mathbf{S}$ whenever you are adding it to $\mathbf{x}$ so that you don't add contributions from the wrong dimensions, but if we consider it from the start as being in the right space we shouldn't worry about it. Finally, we are solving for $\mathbf{y}$ which is in the same space as $\mathbf{x}$ but it simply doesn't move in the constrained directions. We took care of it not happening by modifying the system matrix. Moreover, the system is perfectly solvable as the $\mathbf{I} - \mathbf{S}$ term guarantees no rank deficiency (and one can use a direct solver too).

I admit I don't fully grasp every little reasoning and subtlety in the derivation. And it's always easy to get yourself confused between one projection matrix and its complement. I swapped them by mistake many times and it takes a bit of practice and study of the original papers to get it right, but it's doable.

The only thing that I want to draw attention to is this remark from the PPCG paper: "However, the above method easily generalizes to any kind of equality constraint, including the face-vertex and edge-edge constraints used in other collision response algorithms." Indeed, this is a step beyond the original CG filters, but it begs the question how to do it, as it's not directly obvious. And this is where the constraints projection matrix comes in handy and the relation $\mathbf{S} = \mathbf{I} - \mathbf{P}$. Now, in our previous discussions and the constrained dynamics literature, this matrix involves the inversion of the Delassus operator (or Schur complement matrix). But, here, we can consider the metric of the configuration space as being the identity, as all the authors above have considered, and we get a simpler form: $\mathbf{P} = \mathbf{J}^T(\mathbf{JJ}^T)^{-1}\mathbf{J}$, where $\mathbf{J}$ is the Jacobian of the constraint functions.

And immediately we notice that we may run into problems if $\mathbf{JJ}^T$ is rank deficient (i.e., the constraints are redundant). But this can be solved by performing a QR or SVD decomposition of $\mathbf{J}$ in order to compute the required inverse. Moreover, in my brief experiments I found that taking only some of the first singular values is enough to make it work. There is still much work to be done here, but it shows in principle that this method is feasible in practice. And if the (approximate) factorization of $\mathbf{J}$ is done fast enough it can outcompete the Gauss-Seidel solver for the Schur complement system usually employed for constraints. (I think there is a paper by Jernej Barbic that did it some time ago too.) But I'm not sure if in the end it's really worth it (in terms of speed or solving inequality constraints).

All I care about is the beauty of the connections. And I'm just glad that I know now how to better understand CG filters and how to (potentially) extend them to arbitrary constraints. (The only paper that I know of that goes on a similar path is "An Implicit Frictional Contact Solver for Adaptive Cloth Simulation" by Li et al.) Who knows? Maybe in the future we will find out that solving constraints this way is more robust, and cures all the problems of dual relaxation solvers like slow convergence to zero constraint error or large mass ratios.

But should we use the proper mass matrix in the projection matrix? And how do we find the prescribed $\mathbf{z}$ values? Isn't that a full projection step that we do to solve constraints? And how are the free degrees of freedom interacting with the constraints? (It seems we are freezing the constraint solutions when in fact there should be an equilibrium between them and the rest of the system.) I think these are question that have to be answered in the wider scope of coupling elasto-dynamics and constraints (which may still offer some surprise new solvers over the next years). And projective constraints may play an important role in solving this problem, or at least an alternative.

Tuesday, January 25, 2022

Are constraint forces ever infinite?

This post was triggered by a section in the Lanczos book which talks about the physical meaning of the Lagrange multipliers. It is basically the argument that constraint forces are in fact the gradients of a very strong potential. And the usual picture is that this potential has a minimum exactly on the constraint curve and hence a zero force there just as expected. One such example of a potential is a typical inter-molecular force (e.g., van der Waals, Lennard-Jones, Morse) which has an equilibrium point at a certain distance from the molecule, corresponding to a stable point of the chemical bond. Molecules can vibrate around these points (like in a crystal) resulting in a thermal motion that is usually reduced away, leaving only the macroscopic degrees of freedom (DOF). The microscopic DOF oscillations have a reasonably sized amplitude and usually a very high frequency due to the stiff forces.

One first question that comes to mind when dealing with constrained Lagrangian mechanics is why are the Lagrange multipliers non-zero but the constraint is always satisfied $\phi(x)=0$? We know how to compute them algebraically, but where do they come from? And the standard answer is that they are the limiting case of a very strong potential as described above. But what does it mean very strong? Is the force becoming infinite or just the stiffness, i.e., the second derivative value at the equilibrium point?

A partial answer comes from the fact that if initial conditions are compatible with the constraint then the constraint forces have the duty to cancel out external and centrifugal forces that act perpendicular to the constraint direction (i.e., along the normal to the constraint manifold). This is in accord with d'Alembert's principle and the Newtonian view where there is always a reaction force keeping the particles in equilibrium along the constrained direction. This simplifies the problem to one of statics. But what is the potential we are trying to minimize?

Apparently, there should be three potentials summed up: one for inertial terms, another for external fields like gravity and finally the constraints. But if the last one is very strong then the others shouldn't matter anymore, should they? Also, the arguments should not be mixed: we either start from the axiomatic principles of d'Alembert and Lagrange and derive equations with reduced DOF due to constraints; or we start with the maximal DOF, consider very strong potentials and take the limiting case. And at the end ask ourselves if they are the same.

Lanczos and Arnold argue that they are indeed the same. Other authors go a bit deeper and say that the convergence to the Lagrangian equations of motion is only in a weak or averaged sense. Some go as far as saying that the Lagrange equations on a constraint manifold are not even correct: either from a numerical standpoint (the integrator cannot achieve the same results as the reduced DOF solution) or theoretical (the convergence solution depends on many aspects of the real strong potential function).

For small displacements, it is enough to approximate the potential function around the equilibrium point. Thus, the the first order them (the force) is zero and only the second order term survives, resulting always in a quadratic potential. Hence, the ubiquitous analogy with a very strong linear spring. (Of course, with some positive definiteness assumptions.) But what justifies us taking the small displacement limit? Is it the a priori assumption that a constraint should always be satisfied? Or is it a more practical assumption that somehow the physical system will never oscillate too far from the equilibrium point? That is, the initial conditions are either compatible with the constraint or very close to it. The reasoning is a bit circular, because we would never idealize a very strong interaction force with a constraint, unless all the conditions in practice were met for such an idealization (e.g., a rigid body). And this would involve in my view some kind of energy dissipation through which all the particles would fall into their places (condense) and stay put on the constraint manifold.

But there would still be a microscopic vibrating motion, that we can either ignore or not. In the latter case, in molecular dynamics simulations it amounts to thermodynamical properties. Usually, great effort is put into correctly simulating this highly oscillatory motion and preserving the energy. But in mechanical engineering, this motion is totally ignored. It is not clear though what is assumed about this motion? Does it still happen as a perpetual oscillation and the resulting motion is an averaged motion? Or that our numerical method hits only those points where the constraint is satisfied at either position or velocity level? This does happen in practice a lot. Or is this vibrational energy totally damped out?

The view that Lanczos takes is that the microscopic deviation always happen, it cannot be prevented. And that is why there is always a non-zero constraint force (magnitude), i.e., the Lagrange multiplier. So, at a microscopic level, the constraint is never fully satisfied, except at some sample points or on average. This would answer our first question. (This is also the view behind constraint regularization, soft constraints and mixed formulations.) But then there are many authors who claim that an ideal constraint exists only if infinite forces are allowed. So the second question is: are constraint forces bounded? Again, the Lagrangian formalism answers simply by giving us a formula which can never produce infinite values. But the limiting potential theory could, as the derivative of the potential can grow towards infinity.

In my view, the answer is simple. If the constraint force is just a spring force $f = -k \phi(x)$ then taking the limit $k \rightarrow \infty$ then the force would become infinite. But remember that we also considered the deviation $\phi$ to be very small, i.e. $\phi \rightarrow 0$. Or, more often, the compliance $\epsilon = 1 / k$ is used instead, and so we get the ratio of two very small numbers; they both tend towards zero, but their ratio is finite. (Although, this probably doesn't qualify as a proof.) But is the small deviation hypothesis a sensible one?

In practice with numerical methods we can always get configurations where the constraint is badly violated. And what about the real physical world? If we raise the temperature high enough, the bonds will vibrate wildly, not only fast, but with high amplitude. Of course, at some point they would break and the constraint idealization would not make sense anymore. So, even if $k$ is not small we can see visible macroscopic violations of the constraints. But would that matter in an averaged sense? (If we consider a rigid body as the stiff limit of an elastic body, impacts or rigidity itself can be seen as transient dynamic phenomena that dissipate inside the bodies. The energy gets transformed into microscopic vibrations or heat that flows to a cold sink.)

I don't know how these cases should be considered. All I know, is that for the problems at hand in multibody systems, the motion outside the constraint plane does not concern us and it can be either oscillatory or damped, it doesn't matter. Or does it (given all the numerical drift)? And this is why projection methods are often preferred to penalty methods, because the latter usually oscillate visibly (unless carefully tuned and damped). And in my view we should always use projection because it's the only method that can handle incompatible initial conditions or position and velocity drift. But what is the physical limiting phenomenon behind projection? I hope to answer this question in the future, for now it is only a discussion. I think that such a phenomenon should clearly contain damping or impulsive components, but the details are not yet clear.

To conclude, I think whenever you read about infinite forces you should take it with a grain of salt. It's probably that the stiffness of the potential is infinite, but the constraint force should be bounded. Maybe only if you are considering impulsive motion (non-smooth) you can consider the projection of the velocity as the result of an infinite force integrated over a small time (Dirac delta function). But when we are talking about a strong harmonic potential and constraint regularization, there cannot be such forces. The actual motion is in fact highly oscillatory and the constraint is only satisfied on average.

Sunday, April 11, 2021

The merits of projecting twice

I started a couple of years ago writing a post called "The trouble with PBD contacts", but never finished. This was based on my practical experience of modeling contact. I hope I will finish that post one day, once I learn all the factors causing all that trouble. But long story short, PBD contacts are pretty crap and usually quite violent. As Stewart puts it: "the resulting discretization behaves as if it had a "random" coefficient of restitution when impacts occur". And the same applies to stabilized velocity contact constraints. In practice, this manifests as objects popping out sometimes at high speed. I call them "violent contacts". Bottom line: the velocity at impact should always be zero and this should be our constraint (unless we want to model non-zero restitution, but we can add that afterwards). If we do that, then all our contacts will be just fine and no explosions will appear. But what about the penetration error? I'm sure you will ask that as nobody likes objects on screen clipping each other. Stay tuned.

I will focus here mostly on what I call position-based methods, which is different from position-based dynamics (PBD), but it includes it. There are generally two types of such methods: explicit and implicit, pertaining of course to the type of integrator used. But, more importantly, is that it also makes the distinction between explicit and implicit constraint directions. Meaning, when is the direction of the constraint forces computed: at the beginning or the end of the time-step? In the first category I put SHAKE and RATTLE which are the darlings of molecular dynamics. They are explicit symplectic integrators of constrained systems. Hence, they can easily blow up or end up without a solution for large time steps, but people live with it as long as energy is conserved. On the other hand, implicit methods tend to be much more stable, as by construction they dissipate energy (and people live with it). This is why they are used so much in graphics and gaming. In this category I put the so-called "projection" methods, where you integrate up to a candidate position (without constraints) and then project the positions back onto the constraint manifold. They are also known as step-and-project (SAP). Here's where I put PBD and "fast projection" too, as they are approximate solvers for SAP. In fact, I will often say PBD instead of SAP, unless I make it clear that I'm referring to the nonlinear Gauss-Seidel projection popularized by Nvidia.

I will not go into the nitty gritty details of equations, but there is one important point I want to make. Neither SHAKE nor PBD preserve the hidden constraint that velocities along constraint directions should be zero (the normal component to the constraint manifold or the first derivative of the constraint equation). And I will prove it with a plot of the velocity constraint drift for the case of a simple pendulum.

RATTLE is the only scheme that calculates the correct velocity. And it does it at the price of an extra projection step at the end for the velocity. Given that the position constraint (explicit) projection at the beginning of RATTLE is nonlinear, and the velocity projection is always linear, it seems like a small price to pay. It's only one linear system to solve, compared to several if using Newton for the positions. So, here's what I did, I took RATTLE and I replaced the first part with PBD. And the results were great in terms of velocity. I called it PROJ2 for lack of a better name and because I project twice: first the positions and then the velocities.

But why should we bother? SHAKE and PBD work perfectly well the way they are. Indeed, if you are dealing with equality constraints. But if you are dealing with contacts (and friction) velocity starts to matter more. That high velocity error in the plot will become the impact reflected velocity. This is not an issue with equality constraints, as no matter what velocity they have along the constraint direction, the bodies will stay pinned to the constrained manifold (up to a solver tolerance). So, this is why I modified RATTLE like I did. It's basically the modification you need to do to PBD in order to make it explosion free for contacts.

I have not not tried PROJ2 on practical contacts yet, but I did something similar in the past. I projected the velocities before running PBD and everything was great. If you think about it, it's the same thing, if you move the last step of RATTLE at the beginning of the next frame. So, I know it works! It made contacts much smoother. The only reluctance I had, like everybody else, was that it was more expensive. And now I know the answer: it doesn't matter! It's worth the price if your contacts are fully inelastic and not exploding. And you also get rid of that stabilization term...

Think about it: first project velocities and then positions. Does it sound familiar? The latter step is sometimes called "post-projection": where you correct the positions after you solve the problem at velocity level in order to remove any penetration. This technique is usually attributed to a paper by Cline and Pai, but ultimately it's just projection. Usually the problem is linearized, but when dealing with contact, the normals are usually considered constant throughout the solve, so it is linear. Of course, it would be much better if you used PBD instead, but the border between post-projection and PBD is quite gray to be honest anyway.

One more thing, what's wrong with stabilization? Nothing, if you are writing a rigid body simulator where you don't care so much about small penetrations. And then you can use a small stabilization factor and clamp the velocity to avoid explosions. But if you care about having zero penetrations and correct normal velocities, you should use the post-projection step. In fact, this has been a debate for years in the community, and for that reason Box2D uses post-projection, while most other engines uses stabilization or a mixture of the two. Now, I tend to agree with Box2D. By the way, if you care about that little analogy between implicit springs and the stabilization factor and the "constraint force mixing" factor, I would say forget it - I think it's pretty much wrong anyway (or at least not rigorous).

In the past, I feared that post-projection dissipates artificially too much energy. But this phase plot for the pendulum made me feel better.

In fact, it preserves energy much better than PBD. So, it has two merits: it reduces dissipation for PBD and solves penetration problem for velocity level methods. From now on I think this is what I'm going to use as a contact solver (and constraints in general), no matter how expensive it is.

Friday, March 19, 2021

The many faces of projection

What I want to talk about in fact is the Jacobian matrix of the constraint function. All these things that I want to mention are probably clear to people with strong backgrounds in linear algebra and differential geometry. But for me they were revelations of sort many of them.

All right, so maybe the first thing to start with is the fact the Jacobian matrix $\mathbf{J}$ of a function $g$ is the analog of the gradient operator or the normal to a surface. And this is why for many people $\mathbf{J}$ is synonymous to the constraint direction(s). Because the normal to the constraint manifold is transversal to the space of motion, hence along the constrained directions that can't move. Think of a pendulum: the constraint manifold is a circle and the normal to it is along the radius, which is exactly the pendulum's (constrained) direction.

But some math books also call $\mathbf{J}$ the differential (in coordinates). And this takes me to a confusion I always made in the beginning: for a single constraint, should $\mathbf{J}$ be a row or column vector? And it seemed other people weren't sure about it either in their notation. But I stuck with it being a row vector, as that was the majority view. And now it makes sense, because it is a covector (mainly because it has the components of a differential which is a 1-form, same thing). Hence, $\mathbf{J} \equiv dg$ and $\mathbf{J}^T \equiv \nabla g$.

I still don't fully grasp the concepts and that's why I don't want to talk formally about these things. But I do have to stress the fact that the configuration space of our mechanical system (where all the generalized coordinates reside) is not really $\mathbb{R}^n$, but one that has a metric (a curved space) given by the mass matrix $\mathbf{M}$ (i.e., the kinetic energy). It basically tells you that not all bodies are equal but weighed by their mass information. If this weren't the case, then vectors and covectors in $\mathbb{R}^n$ would be the same thing. But they're not! If for velocity (tangent) vectors $\mathbf{v}$ the norm is given by the well known kinetic energy ($\tfrac{1}{2}\mathbf{v}^T \mathbf{Mv}$), then for covectors the matrix is now the inverse of $\mathbf{M}$. Think about momenta $\mathbf{p}$, the kinetic energy is the same, but now we have to divide by the mass rather than multiply ($\tfrac{1}{2}\mathbf{p}^T \mathbf{M}^{-1}\mathbf{p}$). And you have to multiply velocities by mass to get momenta $\mathbf{p} = \mathbf{Mv}$ and the other way around. Some other covectors are forces (as they work directly on the momenta) and surface normals; and they inhabit a place called the cotangent space.

Now let's fall back from geometry to linear algebra. Locally, $\mathbf{J}$ is a linearization of $g$ and maps the tangent space of the configuration space to the tangent space of the "constraint" space $\mathbb{R}^m$. So, it's a linear transform, thus a rectangular matrix. And it follows all the rules of linear algebra. Ideally, it's full rank and fat (rather than thin, i.e., $m <n$) in order for the system to be solvable. The image of $\mathbf{J}$ is the whole of $\mathbb{R}^m$, which can be viewed as just a collection of numbers, just as for one constraint you get only one number. The meaning of these numbers comes from the analogy with the normal vector, in fact any vector for that matter. They are the magnitudes or components of the projection of a vector $\mathbf{v}$ on the "normal" $\mathbf{J}$, so $\mathbf{Jv}$ is just that. For one constraint it is a number, in general it is a $m$-dimensional vector because the "normal" $\mathbf{J}$ spans an $m$-dimensional space. You can think of taking each row of $\mathbf{J}$ and dotting it with $\mathbf{v}$ to get the projection length. And in fact, from linear algebra we know that the row space of $\mathbf{J}$ is formed by vectors $\mathbf{J}^T\mathbf{y}$, so we know that the rows of $\mathbf{J}$ span the "normal" or "constraint" space of dimension $m$. This space is orthogonal to the tangent space of the constraint manifold, which is precisely the null space of $\mathbf{J}$, i.e., vectors satisfying $\mathbf{Jv}=0$.

What does this tell us about $\mathbf{J}$? That it takes any vector $\mathbf{v}$ and converts it into its projection components. What about the reverse direction? It will only reconstruct the "normal" part of those vectors. So, you lose some information, but this is fine because you obtain the projected vector $\mathbf{J}^T\mathbf{Jv}$ (almost, stay tuned). If you think about it, this is what you do when you project on a unit vector $\mathbf{n}$: $\mathbf{nn}^T\mathbf{v}$. So you may say that $\mathbf{J}^T\mathbf{Jv}$ is a projection operator. Note that if $\mathbf{v}$ is already in the row space of $\mathbf{J}$ no information is lost and projection leaves it unchanged. Some people call $\mathbf{Jv}$ a "projection" and $\mathbf{J}^T\mathbf{v}$ an "un-projection", but this is not really true, as they do not satisfy the definition of a projection operator; but they do part of the job, so we may call these matrices something like "projectors" (when in fact they define the space we are projecting upon).

But this not the whole truth. In 3D you also have to divide by the norm squared $\mathbf{n}^T\mathbf{n}$ if we're dealing with a non-unit vector. And you have to do the same in $n$-dimensional space. Linear algebra and least square problems give you the formula $\mathbf{A}^{-1}\mathbf{Jv}$ for computing the actual projection lengths, where $\mathbf{A}=\mathbf{JJ}^T$. Here $\mathbf{A}$ plays the role of the norm squared of $\mathbf{J}$ that you divide by. Finally, the projection that takes these lengths back to configuration space is $\mathbf{P}=\mathbf{JA}^{-1}\mathbf{J}^T$ and this is a genuine projection operator. Of course, if you're lucky and $\mathbf{A}$ is the identity then the previous paragraph still works, but that's rarely the case.

The projection $\mathbf{P}$ or rather its complement $\mathbf{I}-\mathbf{P}$ is usually called the "canonical" projection on the constraint manifold. And you may recognize the formula from the analytical solution of linearized velocity-level constrained dynamics. In practice you usually do not invert $\mathbf{A}$, but rather solve a linear system where the unknowns are the projection lengths. But $\mathbf{A}$ is not really the same, is it?

So, what is the meaning behind $\mathbf{A}$ anyway? Some people call it the reduced or effective mass, others call it the Delassus operator, others compliance and so on. I think I hinted already at it: $\mathbf{A}$ determines the metric of the "constraint" space. In fact, it's the inverse $\mathbf{A}^{-1}$ that is the true effective mass, the one seen by the degrees of freedom along the constraint directions, and helps define the kinetic energy there. (I picture it as the transformation of the mass matrix $\mathbf{M}$ into "constraint" space). And $\mathbf{A}$ is actually the metric of the cotangent "constraint" space. This makes sense as $\mathbf{J}$ spans the cotangent space of the constraint manifold (in configuration space) and there the metric is $\mathbf{M}^{-1}$, so our "norm squared" becomes $\mathbf{A}=\mathbf{JM}^{-1}\mathbf{J}^T$. This is exactly the matrix that appears in constrained dynamics.

I want to end by noting the fun fact that by deriving a projection operator (pretty similar to what you would do in 3D), we actually solved for the Lagrange multipliers in constrained dynamics. In fact, projection is an optimization problem: find the shortest distance from a point to a surface. And the closest point happens to be on the perpendicular to the surface, one like our "normal" $\mathbf{J}$. Pretty much what we talked about. So, deriving the projector $\mathbf{P}$ amounts to solving the projection (or least squares) problem. Or it does at least for velocity vectors. The general case for projecting a point in configuration space on the constraint manifold is a bit trickier as it's a nonlinear problem, but it can be broken down into such linear steps. But that's another story...

To summarize, in order to solve the (linearized) constrained dynamics problem one simply needs to project velocities into the tangent space which is the null space of $\mathbf{J}$. This is done by projecting them onto the space spanned by the columns of $\mathbf{J}^T$ and then taking the orthogonal complement of that projection.

Monday, January 4, 2021

Mixed FEM and constrained dynamics

I recently got a paper accepted, which in itself was an achievement and culmination of a long period of work, hardship and even failures. But the idea I want to bring forward in this post is that the paper only scratches the surface of the implications it can have and could be developed further. Such is the way of academic publishing: not always the results I or you deem important are considered results worth publishing by the academic community. And I won't go that far into saying that my results were revolutionary. I'm merely saying that they are interesting and they may settle a decades-long debate in the engineering, graphics and simulation circles.

But let me tell you more before I get to that. The mixed formulation of the finite element method (FEM) is a technique that has been around at least from the 70's. Its main purpose was to solve incompressible problems in fluid flow and elasticity. This was due to a problem called locking that is still affecting our simulations. As a simple example, locking happens when you simulate a cantilever bar bending under some load; and although you know it should deflect by quite a bit, the numerical results tell you the deflection is zero or very close to zero. But let's not worry about locking, as even if avoiding it may seem like the main result, I argue that it isn't.

The main achievement of mixed FEM is its beauty and unifying power. So what is it? I will make an analogy with springs. The opposite of mixed FEM is irreducible FEM, or I call it standard FEM, because it's used all over the place. Spring forces (like stresses in a finite element) are given by a constitutive law like Hooke's law: the elongation times a spring constant. These forces are then plugged in some equilibrium conditions or Newton's second law to obtain a simulation of the system. The main takeaway here is that you always need a constitutive law of some sorts besides the laws of statics and dynamics to completely determine the behavior of a mechanical system.

What if we don't substitute the constitutive law into the equations of mechanics as we usually do? We get the mixed formulation of springs, and FEM and everything else. We just get (roughly) double the number of equations that we need to solve, but the problem is essentially the same. In fact, it can be argued this formulation is "more" fundamental, and the irreducible form is one that has been reduced from it (and cannot be reduced any further). But why wouldn't you substitute and reduce the equations? Apparently, no reason at all.

This takes us a long lasting debate between choosing reduced coordinates over maximal coordinates. That is, if a system is constrained in any way (like a pendulum attached to a point) then it makes complete sense to consider that some degrees of freedom were removed by the constraints and use only the remaining ones. Or not... The view of constrained dynamics is that you should still use all the initial degrees of freedom of the pendulum and add extra equations and variables for each constraint. Understandingly, a lot of people don't like this, especially in their computer implementations. But it just so happens that this is the best way to simulate constrained dynamics when you don't know from the start what the constraints will be (i.e. model an arbitrary system).

On the other hands, constraints are only ideal and do not really exist in nature. They are just really strong forces in the end. So, there is a middle way: use all the degrees of freedom and model constraints using very very strong forces (like the ones in springs). This keeps the number of variables and equations in the middle. It is also the standard or irreducible form.

But something quite bad happens when the spring constants grow very large. In theory they should tend to infinity so that the constraints are exactly fulfilled. And this is not possible in practice. Numerics become ill-conditioned and, on top of it, extra problems appear like locking. And these won't go away by just putting more power into the number crunching. You need to roll back to the reducible or mixed form of the equations. Or the one that had both constraint equations and equations of motion. You need to make the problem larger in order to make it solvable. And here comes the big revelation: the constitutive laws in the mixed formulation are just soft constraints in constrained dynamics.

And of course people knew about it. At least many do or have a strong intuition about it. But the rest act as if they don't. There are actually two camps in solving contacts and other topics which divide the world into penalty vs. constraint formulations. When they are in fact two sides of the same coin. In their defense, the constraint camp does assume that their constraints are infinitely hard which is not possible when using finite spring constants or material parameters (and despite being ill-conditioned their problems are still solvable, even if by a bit of cheating). But the mixed formulation is just a place where these two different views of the world meet. In technical terms it is called a saddle point problem and keeps popping up whenever you do simulation (see Strang's textbooks for example). And optimization people have known it for a long a time: they call it the KKT optimality conditions. For them the degrees of freedom of the system are the primal variables and the tensions inside the springs are dual variables or Lagrange multipliers.

My personal breakthrough was to show that any simulation problem, no matter how complex (including PDEs), can be formulated as such a mixed formulation involving both the equations of motion and the soft constraints (aka constitutive laws). If any of the constraints needs to be hard, we just put a zero value next to it and everything still works. So, we can build a single unified solver for all mechanical phenomena, including say FEM and contacts, by using the same framework. Of course, that doesn't mean it will be optimal; but sometimes elegance and generalization is better than some optimizations here and there.

And, ok, you would argue, this was still known. Yes, yet nobody used that knowledge. For example, you can build a whole multibody simulator by just treating every constitutive law used as a soft constraint and then use a single constraint solver like XPBD (or a more accurate one, I don't care). Or you can use very stiff springs (the penalty approach) as long as you find a nice stable way to solve the problem. And in the end it may not even matter as optimization theory is already using techniques that rely on both facets of the problem, like the augmented Lagrangian method. You add a bit of penalty, a bit of constraints and you get to solve your problem efficiently.

So what's left to do? I would say a lot. We need to build these unified multibody simulators using the mixed framework and see how they perform. What is the best approach for solving them? Sometimes, you have to pay a bigger price or be a bit slower than the state of the art to reach this degree of generality. Also, pay attention to traps like locking, which become visible in the mixed formulation, but require deep changes at the discretization level in order to be fixed. Overall, I would say it gives us a much higher vantage point on things. What else? Maybe people using implicit integration of stiff springs will stop claiming that they are doing a very different thing from people using constraints. Or claiming you can not simulate some fancy nonlinear dynamics using constraints (which you can by using the three field mixed formulation). Or constraint people realizing that regularization is good from time to time, that relaxation solvers like Gauss-Seidel are in fact already doing it and that not every problem is a nail. And that at some point you can write a solver that takes the best of both worlds. For example, one that tackles high stiffness through constraint solvers and large mass ratios through primal solvers (at the same time, in a single pass, inside one monolithic hybrid solver).