    # Also in the Article

Classical optimizers: PSO and BO
This protocol is extracted from research article:
Training of quantum circuits on a hybrid quantum computer

Procedure

We explored two different classical optimizers in this study: PSO and BO.

PSO is a gradient-free optimization method inspired by the social behavior of some animals. Each particle represents a candidate solution and moves within the solution space according to its current performance and the performance of the swarm. Three hyperparameters control the dynamics of the swarm: a cognition coefficient c1, a social coefficient c2, and an inertia coefficient w (24).

Concretely, each particle consists of a position vector θi and a velocity vector vi. At iteration t of the algorithm, the velocity of particle i for the coordinate d is updated as$vi,d(t+1)=wvi,d(t)+c1r1,d(t)(pi,d(t)−θi,d(t))+c2r2,d(t)(gd(t)−θi,d(t))$(1)where $r1,d(t)$ and $r2,d(t)$ are random numbers sampled from the uniform distribution in [0,1] for every dimension and every iteration, $pi(t)$ is the particle’s best position, and g(t) is the swarm’s best position. The position is then updated as$θi(t+1)=θi(t)+vi(t)$(2)

In our problem, each particle corresponds to a point in parameter space of the quantum circuit. For example, in the fully connected circuit with two layers, each particle consists of an instance of the 14 parameters. Recall, however, that parameters are angles and therefore periodic; we customized the PSO updates above to use this information. In Eq. 1, $pi,d(t)$ and $θi,d(t)$ can be thought of as two points on a circle. Instead of using the standard displacement $pi,d(t)−θi,d(t)$, we used the angular displacement, that is, the signed length of the minor arc on the unit circle. We used the same definition of displacement for the swarm’s best position $gi,d(t)$. Last, in Eq. 2, we made sure to express angles always using their principal values.

In our experiments, we set the number of particles to twice the number of parameters of the circuit. Position and velocity vectors of each particle were initialized from the uniform distribution. For the coefficients, we used c1 = c2 = 1 and w = 0.5.

BO is a powerful global optimization paradigm. It is best suited to finding optima of multimodal objective functions that are expensive to evaluate. There are two main features that characterize the BO process: the surrogate model and an acquisition function.

The surrogate model is nonparametric model of the objective function. At each iteration, the surrogate model is updated using the sampled points in parameter space. The package used in this study is OPTaaS by Mind Foundry. It implements the surrogate model as regression using Gaussian process (36). A kernel (or correlation function) characterizes the Gaussian process, we used a Matern 5/2 as it provides the most flexibility.

The acquisition function is computed from the surrogate model. It is used to select points for evaluation during the optimization. It trades off exploration against exploitation. The acquisition function of a point has a high value if the cost function is expected to give a notable improvement over historically sampled points or if the uncertainty of the point is high, according to the surrogate model. A simple and well-known acquisition function, Expected Improvement (37), is used here.

In our case, OPTaaS also leverages the cyclic symmetry of the angles by embedding the parameter space into a metric space with the appropriate topology, effectively allowing the Gaussian process surrogate model to be placed over a hypertorus rather than a hypercube. This greatly alleviates the so-called curse of dimensionality (38) and allows for much more efficient use of samples of the objective function.

It is the key in BO to adequately optimize the acquisition function during each iteration. OPTaaS puts considerable computational resources toward this nonconvex optimization problem.

There are two major reasons why the BO out performs PSO in our specific case. First, PSO spends significant amount of computation resource exploring trajectories far from optimal, while BO mitigates it by the use of acquisition function. Second, the maintenance of the surrogate model enables us to make much better use of the information from the historical exploration of the parameter space.

Note: The content above has been extracted from a research article, so it may not display correctly.

Q&A