Emissaire

Thursday, January 26, 2023

Laplacian smoothing and friends

In its simplest form, it amounts to replacing a point by the average of its neighbors. For a mesh, a neighborhood can be a flexible concept, but we can reduce it to being the one-ring of each vertex, i.e., the vertices connected through an edge to the current vertex. This is valid for both triangle and quad meshes or any polygonal faces. For quad meshes, we can probably extend it to contain vertices that lie opposite (along the diagonal of the quad) to the current vertex. This is often done for cloth meshes to define stretch springs - in which case we don't really have a mesh anymore, but rather a graph. Note that this extension to quads (and cloth) is not really a second order neighborhood (vertices connected through 2 edges), but rather a subset of it.

As you can see, already the notion of a neighborhood is rich enough for a discussion. I guess we can also apply the same concept in a mesh-free context (like in SPH), where a neighborhood is defined by a radius around the current vertex. The next tricky part are the weights of each vertex in the neighborhood. In my experiments, using the "umbrella" Laplacian operator (e.g. [-1 2 1]) seems to be just fine for many cases and I'll stick with it for simplicity.

In this blog post, I will try to look at Laplacian smoothing and its various facets and try to learn something new out of it. Mainly, how is it many things at the same time? A low pass filter, a membrane simulation, a 2D diffusion process, etc. And how are they connected? Definitely, it's not going to be an exhaustive answer.

Let's start with the math:

$p'_i = p_i + \lambda L p_i = (I + \lambda L) p_i = p_i + \lambda \Delta p_i$

The new position $p'$ is the old position plus a delta term which is obtained by applying the Laplacian $L$ operator (which is in fact a matrix) to the current position $p$. But this operator depends also on the neigboring vertices of $p$. So, in the end it's a sum of the form

$\Delta p = \sum_j w_j (p_j - p_i)$

where the $w$ terms are called the weights (like in a weighted average or a convolution).

I won't talk here much about the shrinking effect of Laplacian smoothing. The most common solutions are to use the Taubin method (also known as the "lambda | mu" method) or trying to preserve the volume somehow.

Image filters

Laplacian filters are used in image processing to detect edges or sharpen the image. Usually, the convolution kernels are not "normalized" and look something like [-1 2 -1]. My guess is this is because they want to exaggerate its effect, as the image colors are clamped anyway between 0 and 255. Another important distinction is that the signed is reversed from the one used for smoothing curves and meshes. From my experiments on images and meshes so far, my conclusion is that a negative L will sharpen the field values and a positive one will smooth them.

Another interesting thing I noticed is that the resulting averaging kernel of the form (e.g. [0.5 0 0.5]) obtained after adding the Laplacian (with a full step) does not include the original value anymore. The result is still a smoothing, hence a blurring of the image, but very different from a classic Gaussian low pass filter (or sinc, or cubic B-spline for that matter). However, Taubin argues that the averaging is still a Gaussian weighting, but maybe the difference is that one works per nodes (for images) and the other per edges (for meshes). Still, it's pretty clear they are both low pass filters, so it's probably just different ways of doing it, with different derivations.

Clearly, the biggest distinction is that images are represented on a regular grid. If we were to represent an image as a scalar field broken down into triangular elements the situation would be more similar. It would be like working on the uv space of the mesh. But an image also does not have intrinsic curvature, but that is taken into account by the Laplace-Beltrami operator (the contangent weights). But if we use uniform weights, the analogy is pretty clear, even more if we used quad meshes, as then we would have something very close to a grid, at least topologically. By the way, there are two flavours of 3x3 Laplacian convolution kernels in image processing: one uses only direct neighbors and the other also the diagonal ones (as discussed above).

Mesh smoothing

The Laplacian is arising from minimizing some kind of membrane (or thin plate) energy. This in turn is usually the squared norm of some strain tensor, which also approximates total curvature. Given this objective, it is clear to see that the end result is a surface without curvature or at least the minimal curvature everywhere that still preserves the topology of the mesh. Given no other constraints, the minimization may also shrink the mesh in the process.

Desbrun et al. show that smoothing is in fact a diffusion PDE (time-dependent) integrated on a curved mesh surface. (It's like a temperature becoming uniform everywhere on the mesh; or some spiky hairs on it, like the normal vectors.) But the mesh is already a space discretization, so in the end we only need to integrate an ODE in time. This can be done either using explicit Euler resulting in the classic repeatead application of the Laplacian operator. The method using implicit Euler is called "implicit fairing" and requires the solving of a linear system, as it is usually the case with implicit integration.

Graph Laplacian

A more general way to define the Laplacian operator is to do it on any graph, not only a manifold mesh. And in words, this is a matrix that has the degree (of incidence) of each vertex on the diagonal and then -1 for each connected vertex index and everything else is zero. This can be summarized by the equation

$L = D-A$,

where $D$ is the degree matrix and $A$ the adjacency matrix (1 for each neighboring vertex). But a more interesting way is to use the incidence matrix $B$, which encodes edges in pairs of the form [1 -1]. And this results in a symmetric form factorization:

$L = B B^T$

A mesh is a special case of a graph. So, this tells us that the above Laplacian will always be a symmetric matrix (under normal assumptions) that in turn can come from some kind of "difference" rectangular matrix multiplied with itself transposed. Which makes sens given that tn vectory analysis the Laplacian is the "del" operator or the gradient dotted with itself: $\Delta = \nabla\cdot \nabla$ (sometimes denoted $\nabla^2).

Cloth simulation

Smoothing is equivalent to having a spring on each edge (according to Desbrun et al.), but a special type of spring with zero rest length. This is related to the minimization structure described above. It also explains the shrinking.

We could then build a cloth simulator that reproduces the same behavior as Laplacian smoothing. This could be done using the Baraff-Witkin method (or any other implicit scheme). In this case, the method should be equivalent to "implicit fairing". But it can also be done with an explicit integration step.

Given that a zero-rest length spring doesn't really have a direction, but it's more like a 3 DOF constraint, it is also fully linear. It is equivalent to 3 independent springs acting on each axis. This makes the forces (thus the Laplacian) a linear operator acting on the positions. Thus the stiffness matrix (gradient of force, Hessian of energy) is the Laplacian matrix itself. All the pieces fit!

A first extension that we could think of is to implement a smoothing algorithm based on cloth simulation that doesn't target zero rest length, but only a fraction of the rest length. Of course, it will make things more complicated. For one, the Laplacian operator will no longer be what it used to be, as it's being replaced by the stiffness matrix and nonlinear spring forces. But in my mind it would offer more control over the smoothing process. And the reasoning also goes the other way around: cloth simulation is essentially a mesh smoother. Add on top of that bending springs or volume preservation and you get a powerful geometric tool.

The fact that cloth simulation is a "smoother" is even more clear when doing explicit integration or constraint solving. There is a pattern for accumulating forces in almost any solver, that I call the Jacobi fashion. And this uses the neighborhood of each vertex or incident constraints/springs. You could do it in two passes: one over edges, compute some edge quantities and then sum them up for each vertex (either using the incidence info or not). Or you could do it in one go for each vertex (and its one ring neighborhood), which starts looking a lot like mesh smoothing or a convolution kernel. And in the end a Jacobi constraint solver (or Gauss-Seidel) is just a smoother. In fact relaxation solvers are sometimes called smoothers in the context of multigrid. And there are actual results out there in fluid simulation where Jacobi acts as recursive convolution kernels - they call them Jacobi-Poisson filters.

And to go a bit into details, we already know from the context of implicit cloth simulation that the Laplacian operator corresponds to the stifness matrix $K=k J^T J + K_g$, where $k$ is the springs stiffness (considered the same for all springs), $J$ is the Jacobian of the constraint functions (used by the springs too) and $K_g$ is something that I will call geometric stiffness and ignore here. Here $J$ stands for an equivalent (transposed) of the adjacency matrix $B$ - it's actually an equivalent with extra orientation in space. $J$ also stands for the Jacobian matrix, which is just a generalized and transposed gradient.

Now, if we take a projection operation, which is at the heart of any constraint solver, you get

$p' = (I - J^T (JJ^T)^{-1}J) p$.

This again gives us a glimpse of an equivalent Laplacian operator. But even more, this is equivalent to solving a linear system

$(JJ^T)x = b$

which in fact is what is done in practice for finding out the Lagrange multipliers, here denoted by $x$. And the matrix of this system is clearly a Laplacian of some graph itself and the whole linear system corresponds to a discrete form of a Poisson equation. I posit that this Laplacian is nothing else but the one coming from the dual graph of the constraints/springs. And it is something that I've been glimpsing throughout the years. It's not especially hard if you look closely at a PBD or impulse solver (especially Jacobi ones). But my current realization, which I've been guessing all this time, is that a constraint solver is really a smoothing operation of some field (velocity, position) in the end.

However, in the end, smoothing and cloth simulation are still distinct processes. Maybe they are both more or less equivalent to each other and they both act as spatial filltes. It is also said that the "contangent Laplacian" can be derived from FEM, thus it cannot come from springs alone, but rather triangle elements. So, it's just different flavors of "Laplacians" that appear in mass-spring or constraint-based simulations. Also, smoothing seems to act more on the curvature part of the surface, thus bending forces or constraints may play a bigger role on the side of cloth simulation. In fact, usually they are all super-posed on each other (stretching, shearing, bending) resulting in a global stiffnes matrix or Laplacian. So, there are a lot of unanswered questions in my head and experiments to be done.

Bibliography

Desbrun et al., "Implicit Fairing of Irregular Meshes using Diffusion and Curvature Flow", 1999

Wednesday, January 4, 2023

Hello integrator

I thought many times about posting some kind of tutorial for physics simulation. And my mind always goes back to the basics: the integrator. Actually the most simple integrator that works: Symplectic Euler. But how do you implement it best if I were to start over (or from scratch)?

This is how I would do it today:

$v += h g + hM^{-1}f$

$x += h v$

The main idea is that I would be using vector notation just like in Matlab. The main quantities are the positions $x$ and the velocities $v$. Now there is the question how do you actually store these positions: either as an array of vector3 (or vector2) or store first the x coordinates, then the y and so on. Or you can store them as the rectangular matrix where the rows are the individual coordinates and the columns the vectors. In my experience, the first option is the best, as most operations are done on vector3 corresponding to each particle. The problems start appearing when you want to switch from 3D to 2D or from particles to rigid bodies. In fact you can have any number of degrees of freedom (DOFs) per body and in total, so it's up to the vector abstraction code to deal with such irregularities (the "views" from NumPy and PyTorch may be a way to go, but not sure how expensive they are).

But in the example above we're just dealing with regular n-dimensional vectors. The only hiccup is the broadcast of the gravitational acceleration $g$ to the bigger vector $v$. But assuming that we are dealing with a list of vector3's (and leaving rigid bodies aside) this should be rather easy. Same goes for scaling by a number, in this case the time-step $h$. Multiplying a general vector by a matrix should also be doable, although in this case we know it is a diagonal matrix and we should probably do it by a pair-wise multiplication instead (would need a special operator notation). (When doing FEM with non-lumped masses, $M$ is no longer diagonal.)

The whole idea here is to avoid for loops and leave the vector abstraction layer do it for you. It's simply more elegant and easier to write if you ask me. And if the vectors are very, large as it usuallys happens, resort to parallel fors or some other parallelization technique (GPU is a bit trickier to abstract as plain C++ code, but would be nice too). For the time being, I expect all these linear algebra operations to be executed on the spot (by order of precedence of operators). Some kind of lazy evaluation (a la Eigen) would be nice, or even fully storing the graph of operations (a la PyTorch) for later execution on GPU or auto-differentiation would be even nicer. But let's stay on planet Earth for the moment and stick with an in-house implementation of these operation before we consider other linear algebra or tensor libraries.

And I did implement this in C++ in a pretty straightforward manner:

void Update()
{
   const real h = real(0.16);
   vec2 gravity(0, 10);
   vel += mask<vec2>(h * gravity, invMass) + h * invMass * forces;
   pos += h * vel;

}

Above, invMass is another vector containing the diagonal entries of $M^{-1}$. But as you can see, this is not the end of the story: we still have the problem of "boundary conditions". Usually, when dealing with particle systems we only deal with one type of boundary condition (or constraint): fixed particle in space. In some cases this particle can also have a prescribed motion in which case we call it kinematic rather than fixed. But in either case, such particles are not to be integrated. This results in a simple algorithm where we check to see if the particle is fixed or kinematic and skip it in the integrator. The convention is usually that such particles have infinite mass (they can't be moved by any forces) and thus zero inverse mass. A lot of code out there (especially constraint solvers) is relying on this convention and storing only the inverse masses for particles and exploiting the zero value whenever needed (e.g., a constraint linked to fixed anchor in space or coming from animation). But as you can imagine, this makes it working with the actual masses harder. However, let's indulge ourselves in this convention and continue.

There are a few ways I identified in which we can treat boundary conditions (Dirichlet ones mostly):

group all the fixed particles at the beginning of the vector, store an offset to know where the dynamic particles begin and only integrator from there on;
use a "mask" operation (like the ones used in SIMD) to select between the integrator result and another value, most probably zero;
store a list of indices in the big vector to mark the dynamic particles (or fixed ones, respectively) and then use that list to access a sub-set of the vector (a la Matlab or NumPy);
eliminate those fixed particles altogether from the global DOFs vector (through "direct substitution"), but then you have to store them somewhere else and take them into account when computing (spring) forces.

The first option is something that I used extensively in my FEM code (where a node is equivalent to a particle). I also used option 3 when porting the same FEM code to Python (either NumPy or PyTorch) and it's basically inspired from the Matlab way of doing things. That doesn't mean you can't use the other options for a Matlab or Python simulator. It's clear that here I chose option 2, as I think now it's a better choice, at least for the Symplectic Euler code. It pretty much corresponds to the old way of putting an if inside a for loop, although the actual implementation of the abstraction may vary. Finally, option 4 is something I did in some of my constraint solvers, but the downside is that then you have to differentiate between your constraint types: particle-world vs particle-particle. In the end, it's a lot about personal tastes and how you like to organize your code. Which is the faster one is hard to say, probably 1 or 4, but such options are quite limiting too. For example, you can't dynamically set particles to be kinematic or dynamic without some complicated reshuffling or re-labeling of edge forces. Hence the mask option for now, which can probably be implemented some SSE instructions.

Finally, the hardest part in this code is not even contained in it: it's computing the actual forces $f$ acting on each particle. I already talked about this in my force accumulation pattern post, and I intend to do it one more time in a future post, as there is not enough space left in this one. But I will mention that a lot of my current motivation is coming from the Simit language. Even if it looks more like GPU code (kernels/shaders) and it's not always vectorized like Python or Matlab code, it has an important feature: reductions (like the force accumulation) are abstracted and automated by the language. The "map" step as it is called in Simit allows the user to perform Matlab-like linear algebra operations after the vectors and matrices have been assembled (i.e., vectorized code). And these could be either an implicit integrator of some sort involving a linear solver (or nonlinear optimization) or an explicit one as above. In the end, this is what I would like to achieve in my own C++ code too, or even use Simit directly if possible.

The end goal is to understand all the steps and abstractions necessary in writing simulation code and paving the way to a pipeline design like in graphics, while leaving enough flexibility (and ease of coding) to implement completely different algorithms (integrators, solvers, etc.). I'm not sure if a domain specific language (DSL) is the way to go here or if writing everything in C++ once for multiple backends is better (although not sure how to target GPU from there: SyCL hints in that direction, also PyTorch is an example of success too).

Wednesday, December 14, 2022

Maya plugin for intersection

I spent some time lately trying to write a Maya plugin that would find the intersection of two surfaces and help then the user "fix" that intersection, i.e., deform at least one of the surfaces so that they intersect no more. Here is the video of what I managed to do so far.

It is by far not complete, but I decided to call it a day for now until I feel like getting back to it. It is also very simple - right now it only works with oriented surfaces, as I'm using the triangle normals to figure the interior and the exterior of the intersection. And this is pretty much what it does, it gives you the intersection points and the list of interior points on each mesh. Then you can use either of these lists to select the intersecting points on one of the meshes and then do something with them, like move them or shrink wrap to the other surface (which is what I do in the video above).

As for Maya, it proved to be much harder than I thought. Respect to all the people that are working with the Maya API, both in C++ and MEL (or Python). This is some pretty hard stuff to learn. Therefore, my solution isn't final yet. All I managed to do is create a locator node with some custom drawing (based on an example) and just plot the intersection points. The outputs of my node are: the number of intersection points, the list of interior points for mesh 1 and 2, respectively, and each of the two mesh patches involved in the intersection (the intersecting triangles and the interior ones). Then I wrote a MEL procedure to take the interior point list output (an int array) and use it to make a vertex selection. In my opinion, this is pretty useful, even if I'm replicating the work, as the boolean operation in Maya doesn't offer this option. Then you can do what you want with it.

For the spheres case, a shrink wrap using vertex normals seemed to do the job (after some tweaking of the parameters). what the shrink wrap does is choose a direction (or one per vertex) and shoot rays from each vertex (raycast) until they intersect the other surface - those will be the target positions. But moving the vertices along the normals also works, as well as using a bit of soft selection (but not what I expected, really). What the shrink wrap does is choose a direction (or one per vertex) and shoot rays from each vertex (raycast) until they intersect the other surface - those will be the target positions. Of course, there are a million ways this could go wrong for more complex scenarios, e.g., triangle inversions and self intersections. I wrote my own shrink-wrap-like deformer to better control the process. The key is to only advance by incremental steps before self-intersections occur; and even if it does occur it can be reversed by some smoothing operation. As you can see in the example below in some cases you need to first smooth (a lot) the mesh selection before starting the shrink wrap deformation.

I need to play a bit more with the deformation direction field: average it somehow, use rays coming back from the other mesh, etc. But I still think vertex normals are essential for the direction of deformation, as they are also the gradient of the volume of intersection (which we would like to minimize down to zero).

Wednesday, April 20, 2022

Projective constraints

This post just needed to be done, to mark again something that was known for some time and that uses once again the concept of projection. I am calling "projective constraints" what Baraff and Witkin called conjugate gradient (CG) filters in their break of the century seminal paper on cloth. I took the name from the SOFA framework documentation where it appears in a slightly different (but simpler) form, sign that this concept was previously known (which is almost always the case). In fact, a very similar form of CG appears in the book by Nocedal and Wright on numerical optimization.

I will try to be brief and acknowledge my main source as being the excellent tutorial on deformable body simulation from Pixar written by David Eberly and Theodore Kim. They are the ones to extract an important idea from a 2015 paper on multigrid cloth simulation by Tamstorf et al. And that idea is that you don't necessarily need CG to filter (or project) constraints, but you can do it with any linear system solver of choice as long you transform your system into the right space. This was an idea that was explored before by Ascher, Boxermann and Ye, but they all focused on the idea of a reduced system (of a smaller and even varying size). The pre-filtered preconditioned CG (PPCG) method, as it was later called, got one thing right: the correct form of the transformed system that was well-conditioned and works great in practice. In retrospective this looks moderately easy to derive, but somehow no one managed to derive it correctly before.

Hopefully, I will not mess up equations too much and add to the confusion, but here are the main lines of reasoning. You start with a linear system $\mathbf{Ax}=\mathbf{b}$. Then, you have a projection matrix $\mathbf{S}$ that takes you from the full configuration space to a smaller one where motion is restricted by some constraints. The letter $\mathbf{S}$ is chosen historically and also to differentiate from its orthogonal complement $\mathbf{P} = \mathbf{I} - \mathbf{S}$ which takes you onto the constraint space as was described on this blog before. Keep in mind also that orthogonal projection matrices are symmetric.

By the rules in which matrices and vectors transform under a change of basis, we can move our linear system into the "free" or "unconstrained" space: $\mathbf{SASx}=\mathbf{Sb}$. Or at least it would seem so. The more correct form is the one from the PPCG paper:

$(\mathbf{SAS} + \mathbf{I} - \mathbf{S}) \mathbf{y}=\mathbf{S}(\mathbf{b} - \mathbf{Az})$.

We assumed $\mathbf{z}$ is a prescribed solution for the constrained dimensions of the system. This means the final solution is a mixture between these and the free degrees of freedom: $\mathbf{x} = \mathbf{y} + \mathbf{z}$. Hence, $\mathbf{y} = \mathbf{x} - \mathbf{z}$. You need to be a bit extra careful to project $\mathbf{z}$ by $\mathbf{I} - \mathbf{S}$ whenever you are adding it to $\mathbf{x}$ so that you don't add contributions from the wrong dimensions, but if we consider it from the start as being in the right space we shouldn't worry about it. Finally, we are solving for $\mathbf{y}$ which is in the same space as $\mathbf{x}$ but it simply doesn't move in the constrained directions. We took care of it not happening by modifying the system matrix. Moreover, the system is perfectly solvable as the $\mathbf{I} - \mathbf{S}$ term guarantees no rank deficiency (and one can use a direct solver too).

I admit I don't fully grasp every little reasoning and subtlety in the derivation. And it's always easy to get yourself confused between one projection matrix and its complement. I swapped them by mistake many times and it takes a bit of practice and study of the original papers to get it right, but it's doable.

The only thing that I want to draw attention to is this remark from the PPCG paper: "However, the above method easily generalizes to any kind of equality constraint, including the face-vertex and edge-edge constraints used in other collision response algorithms." Indeed, this is a step beyond the original CG filters, but it begs the question how to do it, as it's not directly obvious. And this is where the constraints projection matrix comes in handy and the relation $\mathbf{S} = \mathbf{I} - \mathbf{P}$. Now, in our previous discussions and the constrained dynamics literature, this matrix involves the inversion of the Delassus operator (or Schur complement matrix). But, here, we can consider the metric of the configuration space as being the identity, as all the authors above have considered, and we get a simpler form: $\mathbf{P} = \mathbf{J}^T(\mathbf{JJ}^T)^{-1}\mathbf{J}$, where $\mathbf{J}$ is the Jacobian of the constraint functions.

And immediately we notice that we may run into problems if $\mathbf{JJ}^T$ is rank deficient (i.e., the constraints are redundant). But this can be solved by performing a QR or SVD decomposition of $\mathbf{J}$ in order to compute the required inverse. Moreover, in my brief experiments I found that taking only some of the first singular values is enough to make it work. There is still much work to be done here, but it shows in principle that this method is feasible in practice. And if the (approximate) factorization of $\mathbf{J}$ is done fast enough it can outcompete the Gauss-Seidel solver for the Schur complement system usually employed for constraints. (I think there is a paper by Jernej Barbic that did it some time ago too.) But I'm not sure if in the end it's really worth it (in terms of speed or solving inequality constraints).

All I care about is the beauty of the connections. And I'm just glad that I know now how to better understand CG filters and how to (potentially) extend them to arbitrary constraints. (The only paper that I know of that goes on a similar path is "An Implicit Frictional Contact Solver for Adaptive Cloth Simulation" by Li et al.) Who knows? Maybe in the future we will find out that solving constraints this way is more robust, and cures all the problems of dual relaxation solvers like slow convergence to zero constraint error or large mass ratios.

But should we use the proper mass matrix in the projection matrix? And how do we find the prescribed $\mathbf{z}$ values? Isn't that a full projection step that we do to solve constraints? And how are the free degrees of freedom interacting with the constraints? (It seems we are freezing the constraint solutions when in fact there should be an equilibrium between them and the rest of the system.) I think these are question that have to be answered in the wider scope of coupling elasto-dynamics and constraints (which may still offer some surprise new solvers over the next years). And projective constraints may play an important role in solving this problem, or at least an alternative.

Tuesday, January 25, 2022

Are constraint forces ever infinite?

This post was triggered by a section in the Lanczos book which talks about the physical meaning of the Lagrange multipliers. It is basically the argument that constraint forces are in fact the gradients of a very strong potential. And the usual picture is that this potential has a minimum exactly on the constraint curve and hence a zero force there just as expected. One such example of a potential is a typical inter-molecular force (e.g., van der Waals, Lennard-Jones, Morse) which has an equilibrium point at a certain distance from the molecule, corresponding to a stable point of the chemical bond. Molecules can vibrate around these points (like in a crystal) resulting in a thermal motion that is usually reduced away, leaving only the macroscopic degrees of freedom (DOF). The microscopic DOF oscillations have a reasonably sized amplitude and usually a very high frequency due to the stiff forces.

One first question that comes to mind when dealing with constrained Lagrangian mechanics is why are the Lagrange multipliers non-zero but the constraint is always satisfied $\phi(x)=0$? We know how to compute them algebraically, but where do they come from? And the standard answer is that they are the limiting case of a very strong potential as described above. But what does it mean very strong? Is the force becoming infinite or just the stiffness, i.e., the second derivative value at the equilibrium point?

A partial answer comes from the fact that if initial conditions are compatible with the constraint then the constraint forces have the duty to cancel out external and centrifugal forces that act perpendicular to the constraint direction (i.e., along the normal to the constraint manifold). This is in accord with d'Alembert's principle and the Newtonian view where there is always a reaction force keeping the particles in equilibrium along the constrained direction. This simplifies the problem to one of statics. But what is the potential we are trying to minimize?

Apparently, there should be three potentials summed up: one for inertial terms, another for external fields like gravity and finally the constraints. But if the last one is very strong then the others shouldn't matter anymore, should they? Also, the arguments should not be mixed: we either start from the axiomatic principles of d'Alembert and Lagrange and derive equations with reduced DOF due to constraints; or we start with the maximal DOF, consider very strong potentials and take the limiting case. And at the end ask ourselves if they are the same.

Lanczos and Arnold argue that they are indeed the same. Other authors go a bit deeper and say that the convergence to the Lagrangian equations of motion is only in a weak or averaged sense. Some go as far as saying that the Lagrange equations on a constraint manifold are not even correct: either from a numerical standpoint (the integrator cannot achieve the same results as the reduced DOF solution) or theoretical (the convergence solution depends on many aspects of the real strong potential function).

For small displacements, it is enough to approximate the potential function around the equilibrium point. Thus, the the first order them (the force) is zero and only the second order term survives, resulting always in a quadratic potential. Hence, the ubiquitous analogy with a very strong linear spring. (Of course, with some positive definiteness assumptions.) But what justifies us taking the small displacement limit? Is it the a priori assumption that a constraint should always be satisfied? Or is it a more practical assumption that somehow the physical system will never oscillate too far from the equilibrium point? That is, the initial conditions are either compatible with the constraint or very close to it. The reasoning is a bit circular, because we would never idealize a very strong interaction force with a constraint, unless all the conditions in practice were met for such an idealization (e.g., a rigid body). And this would involve in my view some kind of energy dissipation through which all the particles would fall into their places (condense) and stay put on the constraint manifold.

But there would still be a microscopic vibrating motion, that we can either ignore or not. In the latter case, in molecular dynamics simulations it amounts to thermodynamical properties. Usually, great effort is put into correctly simulating this highly oscillatory motion and preserving the energy. But in mechanical engineering, this motion is totally ignored. It is not clear though what is assumed about this motion? Does it still happen as a perpetual oscillation and the resulting motion is an averaged motion? Or that our numerical method hits only those points where the constraint is satisfied at either position or velocity level? This does happen in practice a lot. Or is this vibrational energy totally damped out?

The view that Lanczos takes is that the microscopic deviation always happen, it cannot be prevented. And that is why there is always a non-zero constraint force (magnitude), i.e., the Lagrange multiplier. So, at a microscopic level, the constraint is never fully satisfied, except at some sample points or on average. This would answer our first question. (This is also the view behind constraint regularization, soft constraints and mixed formulations.) But then there are many authors who claim that an ideal constraint exists only if infinite forces are allowed. So the second question is: are constraint forces bounded? Again, the Lagrangian formalism answers simply by giving us a formula which can never produce infinite values. But the limiting potential theory could, as the derivative of the potential can grow towards infinity.

In my view, the answer is simple. If the constraint force is just a spring force $f = -k \phi(x)$ then taking the limit $k \rightarrow \infty$ then the force would become infinite. But remember that we also considered the deviation $\phi$ to be very small, i.e. $\phi \rightarrow 0$. Or, more often, the compliance $\epsilon = 1 / k$ is used instead, and so we get the ratio of two very small numbers; they both tend towards zero, but their ratio is finite. (Although, this probably doesn't qualify as a proof.) But is the small deviation hypothesis a sensible one?

In practice with numerical methods we can always get configurations where the constraint is badly violated. And what about the real physical world? If we raise the temperature high enough, the bonds will vibrate wildly, not only fast, but with high amplitude. Of course, at some point they would break and the constraint idealization would not make sense anymore. So, even if $k$ is not small we can see visible macroscopic violations of the constraints. But would that matter in an averaged sense? (If we consider a rigid body as the stiff limit of an elastic body, impacts or rigidity itself can be seen as transient dynamic phenomena that dissipate inside the bodies. The energy gets transformed into microscopic vibrations or heat that flows to a cold sink.)

I don't know how these cases should be considered. All I know, is that for the problems at hand in multibody systems, the motion outside the constraint plane does not concern us and it can be either oscillatory or damped, it doesn't matter. Or does it (given all the numerical drift)? And this is why projection methods are often preferred to penalty methods, because the latter usually oscillate visibly (unless carefully tuned and damped). And in my view we should always use projection because it's the only method that can handle incompatible initial conditions or position and velocity drift. But what is the physical limiting phenomenon behind projection? I hope to answer this question in the future, for now it is only a discussion. I think that such a phenomenon should clearly contain damping or impulsive components, but the details are not yet clear.

To conclude, I think whenever you read about infinite forces you should take it with a grain of salt. It's probably that the stiffness of the potential is infinite, but the constraint force should be bounded. Maybe only if you are considering impulsive motion (non-smooth) you can consider the projection of the velocity as the result of an infinite force integrated over a small time (Dirac delta function). But when we are talking about a strong harmonic potential and constraint regularization, there cannot be such forces. The actual motion is in fact highly oscillatory and the constraint is only satisfied on average.

Sunday, April 11, 2021

The merits of projecting twice

I started a couple of years ago writing a post called "The trouble with PBD contacts", but never finished. This was based on my practical experience of modeling contact. I hope I will finish that post one day, once I learn all the factors causing all that trouble. But long story short, PBD contacts are pretty crap and usually quite violent. As Stewart puts it: "the resulting discretization behaves as if it had a "random" coefficient of restitution when impacts occur". And the same applies to stabilized velocity contact constraints. In practice, this manifests as objects popping out sometimes at high speed. I call them "violent contacts". Bottom line: the velocity at impact should always be zero and this should be our constraint (unless we want to model non-zero restitution, but we can add that afterwards). If we do that, then all our contacts will be just fine and no explosions will appear. But what about the penetration error? I'm sure you will ask that as nobody likes objects on screen clipping each other. Stay tuned.

I will focus here mostly on what I call position-based methods, which is different from position-based dynamics (PBD), but it includes it. There are generally two types of such methods: explicit and implicit, pertaining of course to the type of integrator used. But, more importantly, is that it also makes the distinction between explicit and implicit constraint directions. Meaning, when is the direction of the constraint forces computed: at the beginning or the end of the time-step? In the first category I put SHAKE and RATTLE which are the darlings of molecular dynamics. They are explicit symplectic integrators of constrained systems. Hence, they can easily blow up or end up without a solution for large time steps, but people live with it as long as energy is conserved. On the other hand, implicit methods tend to be much more stable, as by construction they dissipate energy (and people live with it). This is why they are used so much in graphics and gaming. In this category I put the so-called "projection" methods, where you integrate up to a candidate position (without constraints) and then project the positions back onto the constraint manifold. They are also known as step-and-project (SAP). Here's where I put PBD and "fast projection" too, as they are approximate solvers for SAP. In fact, I will often say PBD instead of SAP, unless I make it clear that I'm referring to the nonlinear Gauss-Seidel projection popularized by Nvidia.

I will not go into the nitty gritty details of equations, but there is one important point I want to make. Neither SHAKE nor PBD preserve the hidden constraint that velocities along constraint directions should be zero (the normal component to the constraint manifold or the first derivative of the constraint equation). And I will prove it with a plot of the velocity constraint drift for the case of a simple pendulum.

RATTLE is the only scheme that calculates the correct velocity. And it does it at the price of an extra projection step at the end for the velocity. Given that the position constraint (explicit) projection at the beginning of RATTLE is nonlinear, and the velocity projection is always linear, it seems like a small price to pay. It's only one linear system to solve, compared to several if using Newton for the positions. So, here's what I did, I took RATTLE and I replaced the first part with PBD. And the results were great in terms of velocity. I called it PROJ2 for lack of a better name and because I project twice: first the positions and then the velocities.

But why should we bother? SHAKE and PBD work perfectly well the way they are. Indeed, if you are dealing with equality constraints. But if you are dealing with contacts (and friction) velocity starts to matter more. That high velocity error in the plot will become the impact reflected velocity. This is not an issue with equality constraints, as no matter what velocity they have along the constraint direction, the bodies will stay pinned to the constrained manifold (up to a solver tolerance). So, this is why I modified RATTLE like I did. It's basically the modification you need to do to PBD in order to make it explosion free for contacts.

I have not not tried PROJ2 on practical contacts yet, but I did something similar in the past. I projected the velocities before running PBD and everything was great. If you think about it, it's the same thing, if you move the last step of RATTLE at the beginning of the next frame. So, I know it works! It made contacts much smoother. The only reluctance I had, like everybody else, was that it was more expensive. And now I know the answer: it doesn't matter! It's worth the price if your contacts are fully inelastic and not exploding. And you also get rid of that stabilization term...

Think about it: first project velocities and then positions. Does it sound familiar? The latter step is sometimes called "post-projection": where you correct the positions after you solve the problem at velocity level in order to remove any penetration. This technique is usually attributed to a paper by Cline and Pai, but ultimately it's just projection. Usually the problem is linearized, but when dealing with contact, the normals are usually considered constant throughout the solve, so it is linear. Of course, it would be much better if you used PBD instead, but the border between post-projection and PBD is quite gray to be honest anyway.

One more thing, what's wrong with stabilization? Nothing, if you are writing a rigid body simulator where you don't care so much about small penetrations. And then you can use a small stabilization factor and clamp the velocity to avoid explosions. But if you care about having zero penetrations and correct normal velocities, you should use the post-projection step. In fact, this has been a debate for years in the community, and for that reason Box2D uses post-projection, while most other engines uses stabilization or a mixture of the two. Now, I tend to agree with Box2D. By the way, if you care about that little analogy between implicit springs and the stabilization factor and the "constraint force mixing" factor, I would say forget it - I think it's pretty much wrong anyway (or at least not rigorous).

In the past, I feared that post-projection dissipates artificially too much energy. But this phase plot for the pendulum made me feel better.

In fact, it preserves energy much better than PBD. So, it has two merits: it reduces dissipation for PBD and solves penetration problem for velocity level methods. From now on I think this is what I'm going to use as a contact solver (and constraints in general), no matter how expensive it is.

Friday, March 19, 2021

The many faces of projection

What I want to talk about in fact is the Jacobian matrix of the constraint function. All these things that I want to mention are probably clear to people with strong backgrounds in linear algebra and differential geometry. But for me they were revelations of sort many of them.

All right, so maybe the first thing to start with is the fact the Jacobian matrix $\mathbf{J}$ of a function $g$ is the analog of the gradient operator or the normal to a surface. And this is why for many people $\mathbf{J}$ is synonymous to the constraint direction(s). Because the normal to the constraint manifold is transversal to the space of motion, hence along the constrained directions that can't move. Think of a pendulum: the constraint manifold is a circle and the normal to it is along the radius, which is exactly the pendulum's (constrained) direction.

But some math books also call $\mathbf{J}$ the differential (in coordinates). And this takes me to a confusion I always made in the beginning: for a single constraint, should $\mathbf{J}$ be a row or column vector? And it seemed other people weren't sure about it either in their notation. But I stuck with it being a row vector, as that was the majority view. And now it makes sense, because it is a covector (mainly because it has the components of a differential which is a 1-form, same thing). Hence, $\mathbf{J} \equiv dg$ and $\mathbf{J}^T \equiv \nabla g$.

I still don't fully grasp the concepts and that's why I don't want to talk formally about these things. But I do have to stress the fact that the configuration space of our mechanical system (where all the generalized coordinates reside) is not really $\mathbb{R}^n$, but one that has a metric (a curved space) given by the mass matrix $\mathbf{M}$ (i.e., the kinetic energy). It basically tells you that not all bodies are equal but weighed by their mass information. If this weren't the case, then vectors and covectors in $\mathbb{R}^n$ would be the same thing. But they're not! If for velocity (tangent) vectors $\mathbf{v}$ the norm is given by the well known kinetic energy ($\tfrac{1}{2}\mathbf{v}^T \mathbf{Mv}$), then for covectors the matrix is now the inverse of $\mathbf{M}$. Think about momenta $\mathbf{p}$, the kinetic energy is the same, but now we have to divide by the mass rather than multiply ($\tfrac{1}{2}\mathbf{p}^T \mathbf{M}^{-1}\mathbf{p}$). And you have to multiply velocities by mass to get momenta $\mathbf{p} = \mathbf{Mv}$ and the other way around. Some other covectors are forces (as they work directly on the momenta) and surface normals; and they inhabit a place called the cotangent space.

Now let's fall back from geometry to linear algebra. Locally, $\mathbf{J}$ is a linearization of $g$ and maps the tangent space of the configuration space to the tangent space of the "constraint" space $\mathbb{R}^m$. So, it's a linear transform, thus a rectangular matrix. And it follows all the rules of linear algebra. Ideally, it's full rank and fat (rather than thin, i.e., $m <n$) in order for the system to be solvable. The image of $\mathbf{J}$ is the whole of $\mathbb{R}^m$, which can be viewed as just a collection of numbers, just as for one constraint you get only one number. The meaning of these numbers comes from the analogy with the normal vector, in fact any vector for that matter. They are the magnitudes or components of the projection of a vector $\mathbf{v}$ on the "normal" $\mathbf{J}$, so $\mathbf{Jv}$ is just that. For one constraint it is a number, in general it is a $m$-dimensional vector because the "normal" $\mathbf{J}$ spans an $m$-dimensional space. You can think of taking each row of $\mathbf{J}$ and dotting it with $\mathbf{v}$ to get the projection length. And in fact, from linear algebra we know that the row space of $\mathbf{J}$ is formed by vectors $\mathbf{J}^T\mathbf{y}$, so we know that the rows of $\mathbf{J}$ span the "normal" or "constraint" space of dimension $m$. This space is orthogonal to the tangent space of the constraint manifold, which is precisely the null space of $\mathbf{J}$, i.e., vectors satisfying $\mathbf{Jv}=0$.

What does this tell us about $\mathbf{J}$? That it takes any vector $\mathbf{v}$ and converts it into its projection components. What about the reverse direction? It will only reconstruct the "normal" part of those vectors. So, you lose some information, but this is fine because you obtain the projected vector $\mathbf{J}^T\mathbf{Jv}$ (almost, stay tuned). If you think about it, this is what you do when you project on a unit vector $\mathbf{n}$: $\mathbf{nn}^T\mathbf{v}$. So you may say that $\mathbf{J}^T\mathbf{Jv}$ is a projection operator. Note that if $\mathbf{v}$ is already in the row space of $\mathbf{J}$ no information is lost and projection leaves it unchanged. Some people call $\mathbf{Jv}$ a "projection" and $\mathbf{J}^T\mathbf{v}$ an "un-projection", but this is not really true, as they do not satisfy the definition of a projection operator; but they do part of the job, so we may call these matrices something like "projectors" (when in fact they define the space we are projecting upon).

But this not the whole truth. In 3D you also have to divide by the norm squared $\mathbf{n}^T\mathbf{n}$ if we're dealing with a non-unit vector. And you have to do the same in $n$-dimensional space. Linear algebra and least square problems give you the formula $\mathbf{A}^{-1}\mathbf{Jv}$ for computing the actual projection lengths, where $\mathbf{A}=\mathbf{JJ}^T$. Here $\mathbf{A}$ plays the role of the norm squared of $\mathbf{J}$ that you divide by. Finally, the projection that takes these lengths back to configuration space is $\mathbf{P}=\mathbf{JA}^{-1}\mathbf{J}^T$ and this is a genuine projection operator. Of course, if you're lucky and $\mathbf{A}$ is the identity then the previous paragraph still works, but that's rarely the case.

The projection $\mathbf{P}$ or rather its complement $\mathbf{I}-\mathbf{P}$ is usually called the "canonical" projection on the constraint manifold. And you may recognize the formula from the analytical solution of linearized velocity-level constrained dynamics. In practice you usually do not invert $\mathbf{A}$, but rather solve a linear system where the unknowns are the projection lengths. But $\mathbf{A}$ is not really the same, is it?

So, what is the meaning behind $\mathbf{A}$ anyway? Some people call it the reduced or effective mass, others call it the Delassus operator, others compliance and so on. I think I hinted already at it: $\mathbf{A}$ determines the metric of the "constraint" space. In fact, it's the inverse $\mathbf{A}^{-1}$ that is the true effective mass, the one seen by the degrees of freedom along the constraint directions, and helps define the kinetic energy there. (I picture it as the transformation of the mass matrix $\mathbf{M}$ into "constraint" space). And $\mathbf{A}$ is actually the metric of the cotangent "constraint" space. This makes sense as $\mathbf{J}$ spans the cotangent space of the constraint manifold (in configuration space) and there the metric is $\mathbf{M}^{-1}$, so our "norm squared" becomes $\mathbf{A}=\mathbf{JM}^{-1}\mathbf{J}^T$. This is exactly the matrix that appears in constrained dynamics.

I want to end by noting the fun fact that by deriving a projection operator (pretty similar to what you would do in 3D), we actually solved for the Lagrange multipliers in constrained dynamics. In fact, projection is an optimization problem: find the shortest distance from a point to a surface. And the closest point happens to be on the perpendicular to the surface, one like our "normal" $\mathbf{J}$. Pretty much what we talked about. So, deriving the projector $\mathbf{P}$ amounts to solving the projection (or least squares) problem. Or it does at least for velocity vectors. The general case for projecting a point in configuration space on the constraint manifold is a bit trickier as it's a nonlinear problem, but it can be broken down into such linear steps. But that's another story...

To summarize, in order to solve the (linearized) constrained dynamics problem one simply needs to project velocities into the tangent space which is the null space of $\mathbf{J}$. This is done by projecting them onto the space spanned by the columns of $\mathbf{J}^T$ and then taking the orthogonal complement of that projection.