diff --git a/_sass/thesis.scss b/_sass/thesis.scss index 5fc5322..7fcd559 100644 --- a/_sass/thesis.scss +++ b/_sass/thesis.scss @@ -88,5 +88,11 @@ nav.overall-table-of-contents > ul { // Page header div#page-header { + //make the header sticky, I don't really like how this looks but it's fun to play with + // position: sticky; + // top: 0px; + // background: white; + // z-index: 10; + // width: 100%; p { margin-block-end: 0px;} } \ No newline at end of file diff --git a/_thesis/1_Introduction/1_Intro.html b/_thesis/1_Introduction/1_Intro.html index a6efefb..2f5a923 100644 --- a/_thesis/1_Introduction/1_Intro.html +++ b/_thesis/1_Introduction/1_Intro.html @@ -595,8 +595,7 @@ to the Falikov-Kimball Model, the Kitaev Honeycomb Model, disorder and localisation. Then Chapter 3 introduces and studies the Long Range Falikov-Kimball Model in one dimension while Chapter 4 focusses on the Amorphous Kitaev Model.

-

Next Chapter: 2 +

Next Chapter: 2 Background

diff --git a/_thesis/2_Background/2.1_FK_Model.html b/_thesis/2_Background/2.1_FK_Model.html index e2c5928..cf3dc3b 100644 --- a/_thesis/2_Background/2.1_FK_Model.html +++ b/_thesis/2_Background/2.1_FK_Model.html @@ -138,7 +138,7 @@ href="#ref-gruberFalicovKimballModel2005" role="doc-biblioref">9]. The absence of a hopping term for the heavy electrons means they do not need the factor of \(\epsilon_i\). See appendix A.1 +href="../6_Appendices/A.1_Particle_Hole_Symmetry-Copy1.html#particle-hole-symmetry">A.1 for a full derivation of the PH symmetry.

@@ -419,9 +419,8 @@ j|^{-\alpha} S_i S_j\) as the exponent of the interaction \(\alpha\) is varied.
-

Next Section: The -Kitaev Honeycomb Model

+

Next Section: The Kitaev +Honeycomb Model

diff --git a/_thesis/2_Background/2.2_HKM_Model.html b/_thesis/2_Background/2.2_HKM_Model.html index f510c47..8007b18 100644 --- a/_thesis/2_Background/2.2_HKM_Model.html +++ b/_thesis/2_Background/2.2_HKM_Model.html @@ -140,8 +140,7 @@ role="doc-biblioref">1]

Phase Diagram

-

Next Section: Disorder +

Next Section: Disorder and Localisation

diff --git a/_thesis/2_Background/2.3_Disorder.html b/_thesis/2_Background/2.3_Disorder.html index 8b2820e..c29b041 100644 --- a/_thesis/2_Background/2.3_Disorder.html +++ b/_thesis/2_Background/2.3_Disorder.html @@ -174,8 +174,8 @@ timescales to the infinite limit.

-link to the Kitaev Model

-link to the physics of amorphous systems

Next Chapter: 3 -The Long Range Falikov-Kimball Model

+href="../3_Long_Range_Falikov_Kimball/3.1_LRFK_Model.html">3 The Long +Range Falikov-Kimball Model

diff --git a/_thesis/3_Long_Range_Falikov_Kimball/3.1_LRFK_Model.html b/_thesis/3_Long_Range_Falikov_Kimball/3.1_LRFK_Model.html index 46036c8..8cfa62b 100644 --- a/_thesis/3_Long_Range_Falikov_Kimball/3.1_LRFK_Model.html +++ b/_thesis/3_Long_Range_Falikov_Kimball/3.1_LRFK_Model.html @@ -272,7 +272,7 @@ href="#ref-fukuiOrderNClusterMonte2009" role="doc-biblioref">30]. We only consider even system sizes given that odd system sizes are not commensurate with a CDW state.

Next Section: Methods

+href="../3_Long_Range_Falikov_Kimball/3.2_LRFK_Methods.html">Methods

Bibliography

diff --git a/_thesis/3_Long_Range_Falikov_Kimball/3.2_LRFK_Methods.html b/_thesis/3_Long_Range_Falikov_Kimball/3.2_LRFK_Methods.html index 56fb8f0..50f3598 100644 --- a/_thesis/3_Long_Range_Falikov_Kimball/3.2_LRFK_Methods.html +++ b/_thesis/3_Long_Range_Falikov_Kimball/3.2_LRFK_Methods.html @@ -29,58 +29,20 @@ image: @@ -118,58 +70,20 @@ Trick @@ -198,76 +102,171 @@ Trick --> +

Methods

-
-

Markov Chain Monte Carlo

+
+

Thermodynamics of the LRFK Model

+

The results for the phase diagram were obtained with a classical +Markov Chain Monte Carlo (MCMC) method which we discuss in the +following. It allows us to solve our long-range FK model efficiently, +yielding unbiased estimates of thermal expectation values and linking it +to disorder physics in a translationally invariant setting.

+

Since the spin configurations are classical, the Hamiltonian can be +split into a classical spin part \(H_s\) and an operator valued part \(H_c\).

+

\[\begin{aligned} +H_s& = - \frac{U}{2}S_i + \sum_{i, j}^{N} J_{ij} S_i S_j \\ +H_c& = \sum_i U S_i c^\dagger_{i}c_{i} -t(c^\dagger_{i}c_{i+1} + +c^\dagger_{i+1}c_{i}) \end{aligned}\]

+

The partition function can then be written as a sum over spin +configurations, \(\vec{S} = (S_0, +S_1...S_{N-1})\):

+

\[\begin{aligned} +\mathcal{Z} = \mathrm{Tr} e^{-\beta H}= \sum_{\vec{S}} e^{-\beta H_s} +\mathrm{Tr}_c e^{-\beta H_c} .\end{aligned}\]

+

The contribution of \(H_c\) to the +grand canonical partition function can be obtained by performing the sum +over eigenstate occupation numbers giving \(-\beta F_c[\vec{S}] = \sum_k \ln{(1 + e^{- \beta +\epsilon_k})}\) where \({\epsilon_k[\vec{S}]}\) are the eigenvalues +of the matrix representation of \(H_c\) +determined through exact diagonalisation. This gives a partition +function containing a classical energy which corresponds to the +long-range interaction of the spins, and a free energy which corresponds +to the quantum subsystem. \[\begin{aligned} +\mathcal{Z} = \sum_{\vec{S}} e^{-\beta H_S[\vec{S}] - \beta +F_c[\vec{S}]} = \sum_{\vec{S}} e^{-\beta +E[\vec{S}]}\end{aligned}\]

-
-

Sampling

-

Markov Chain Monte Carlo (MCMC) is a useful method whenever we have a -probability distribution that we want to sample from but there is not -direct sampling way to do so.

-

In almost any computer simulation the ultimate source of randomness -is a stream of (close to) uniform, uncorrelated bits generated from a -pseudo random number generator. A direct sampling method takes such a -source and outputs uncorrelated samples from the target distribution. -The fact they’re uncorrelated is key as we’ll see later. Examples of -direct sampling methods range from the trivial: take n random bits to -generate integers uniformly between 0 and \(2^n\) to more complex methods such as -inverse transform sampling and rejection sampling  [1].

-

In physics the distribution we usually want to sample from is the -Boltzmann probability over states of the system \(S\): \[ -\begin{aligned} -p(S) &= \frac{1}{\mathcal{Z}} e^{-\beta H(S)} \\ -\end{aligned} -\] where \(\mathcal{Z} = \sum_S -e^{-\beta H(S)}\) is the normalisation factor and ubiquitous -partition function. In principle we could directly sample from this, for -a discrete system there are finitely many choices. We could calculate -the probability of each one and assign each a region of the unit -interval which we could then sample uniformly from.

-

However if we actually try to do this we will run into two problems, -we can’t calculate \(\mathcal{Z}\) for -any reasonably sized systems because the state space grows exponentially -with system size. Even if we could calculate \(\mathcal{Z}\), sampling from an -exponentially large number of options quickly become tricky. This kind -of problem happens in many other disciplines too, particularly when -fitting statistical models using Bayesian inference  [2].

-
-
-

Markov Chains

-

So what can we do? Well it turns out that if we’re willing to give up -in the requirement that the samples be uncorrelated then we can use MCMC -instead.

-

MCMC defines a weighted random walk over the states \((S_0, S_1, S_2, ...)\), such that in the -long time limit, states are visited according to their probability \(p(S)\) +

Markov Chain Monte Carlo and Emergent Disorder

+

Classical MCMC defines a weighted random walk over the spin states +\((\vec{S}_0, \vec{S}_1, \vec{S}_2, +...)\), such that the likelihood of visiting a particular state +converges to its Boltzmann probability \(p(\vec{S}) = \mathcal{Z}^{-1} e^{-\beta +E}\)  [315].

-

In a physics context this lets us evaluate any observable with a mean -over the states visited by the walk. 3]. Hence, any observable can be +estimated as a mean over the states visited by the walk, \[\begin{aligned} -\langle O \rangle & = \sum_{S} p(S) \langle O \rangle_{S} = \sum_{i -= 0}^{M} \expval{O}_{S_i} + \mathcal{O}(\tfrac{1}{\sqrt{M}})\\ +\label{eq:thermal_expectation} +\langle O \rangle & = \sum_{\vec{S}} p(\vec{S}) \langle O +\rangle_{\vec{S}}\\ + & = \sum_{i = 0}^{M} \langle O\rangle_{\vec{S}_i} +\pm \mathcal{O}(\tfrac{1}{\sqrt{M}}) +\end{aligned}\] where the former sum runs over the entire state +space while the later runs over all the state visited by a particular +MCMC run.

+

\[\begin{aligned} +\langle O \rangle_{\vec{S}}& = \sum_{\nu} n_F(\epsilon_{\nu}) +\langle O \rangle{\nu} \end{aligned}\]

+

Where \(\nu\) runs over the +eigenstates of \(H_c\) for a particular +spin configuration and \(n_F(\epsilon) = +\left(e^{-\beta\epsilon} + 1\right)^{-1}\) is the Fermi +function.

The choice of the transition function for MCMC is under-determined as one only needs to satisfy a set of balance conditions for which there are many solutions  [6].

+role="doc-biblioref">4]. Here, we incorporate a modification +to the standard Metropolis-Hastings algorithm  [5] gleaned from Krauth  [6]. Let us first recall the standard +algorithm which decomposes the transition probability into \(\mathcal{T}(a \to b) = p(a \to b)\mathcal{A}(a \to +b)\). Here, \(p\) is the +proposal distribution that we can directly sample from while \(\mathcal{A}\) is the acceptance +probability. The standard Metropolis-Hastings choice is \[\mathcal{A}(a \to b) = \min\left(1, \frac{p(b\to +a)}{p(a\to b)} e^{-\beta \Delta E}\right)\;,\] with \(\Delta E = E_b - E_a\). The walk then +proceeds by sampling a state \(b\) from +\(p\) and moving to \(b\) with probability \(\mathcal{A}(a \to b)\). The latter +operation is typically implemented by performing a transition if a +uniform random sample from the unit interval is less than \(\mathcal{A}(a \to b)\) and otherwise +repeating the current state as the next step in the random walk. The +proposal distribution is often symmetric so does not appear in \(\mathcal{A}\). Here, we flip a small number +of sites in \(b\) at random to generate +proposals, which is indeed symmetric.

+

In our computations  [7] we +employ a modification of the algorithm which is based on the observation +that the free energy of the FK system is composed of a +classical part which is much quicker to compute than the quantum part. +Hence, we can obtain a computational speedup by first considering the +value of the classical energy difference \(\Delta H_s\) and rejecting the transition +if the former is too high. We only compute the quantum energy difference +\(\Delta F_c\) if the transition is +accepted. We then perform a second rejection sampling step based upon +it. This corresponds to two nested comparisons with the majority of the +work only occurring if the first test passes and has the acceptance +function \[\mathcal{A}(a \to b) = +\min\left(1, e^{-\beta \Delta H_s}\right)\min\left(1, e^{-\beta \Delta +F_c}\right)\;.\]

+

See Appendix [app:balance] for a proof that this +satisfies the detailed balance condition.

+

For the model parameters used in Fig. 1, we find +that with our new scheme the matrix diagonalisation is skipped around +30% of the time at \(T = 2.5\) and up +to 80% at \(T = 1.5\). We observe that +for \(N = 50\), the matrix +diagonalisation, if it occurs, occupies around 60% of the total +computation time for a single step. This rises to 90% at N = 300 and +further increases for larger N. We therefore get the greatest speedup +for large system sizes at low temperature where many prospective +transitions are rejected at the classical stage and the matrix +computation takes up the greatest fraction of the total computation +time. The upshot is that we find a speedup of up to a factor of 10 at +the cost of very little extra algorithmic complexity.

+

Our two-step method should be distinguished from the more common +method for speeding up MCMC which is to add asymmetry +to the proposal distribution to make it as similar as possible to \(\min\left(1, e^{-\beta \Delta E}\right)\). +This reduces the number of rejected states, which brings the algorithm +closer in efficiency to a direct sampling method. However it comes at +the expense of requiring a way to directly sample from this complex +distribution, a problem which MCMC was employed to solve in +the first place. For example, recent work trains restricted Boltzmann +machines (RBMs) to generate samples for the proposal distribution of the +FK model  [8]. The RBMs are chosen as a +parametrisation of the proposal distribution that can be efficiently +sampled from while offering sufficient flexibility that they can be +adjusted to match the target distribution. Our proposed method is +considerably simpler and does not require training while still reaping +some of the benefits of reduced computation.

Application to the FK Model

@@ -302,206 +301,67 @@ to the quantum subsystem. \[\begin{aligned} \mathcal{Z} = \sum_{\vec{S}} e^{-\beta H_S[\vec{S}] - \beta F_c[\vec{S}]} = \sum_{\vec{S}} e^{-\beta E[\vec{S}]} \end{aligned}\]

-
-

Markov Chain Monte Carlo

-

Markov Chain Monte Carlo (MCMC) is a technique for evaluating thermal -expectation values \(\expval{O}\) with -respect to some physical system defined by a set of states \(\{x: x \in S\}\) and a free energy \(F(x)\)  [ +

+

Two Step Trick

+

Here, we incorporate a modification to the standard +Metropolis-Hastings algorithm  [5] gleaned from Krauth  [7]. The thermal expectation value is -defined via a Boltzmann weighted sum over the entire states: \[ -\begin{aligned} - \expval{O} &= \frac{1}{\mathcal{Z}} \sum_{x \in S} O(x) P(x) \\ - P(x) &= \frac{1}{\mathcal{Z}} e^{-\beta F(x)} \\ - \mathcal{Z} &= \sum_{x \in S} e^{-\beta F(x)} -\end{aligned} -\]

-

When the state space is too large to evaluate this sum directly, MCMC -defines a stochastic algorithm which generates a random walk \(\{x_0\ldots x_i\ldots x_N\}\) whose -distribution \(p(x_i)\) approaches a -target distribution \(P(x)\) in the -large N limit.

-

\[\lim_{i\to\infty} p(x_i) = -P(x)\]

-

In this case the target distribution will be the thermal one \(P(x) \rightarrow \mathcal{Z}^{-1} e^{-\beta -F(x)}\). The major benefit of the method being that as long as -one can express the desired \(P(x)\) up -to a multiplicative constant, MCMC can be applied. Since \(e^{-\beta F(x)}\) is relatively easy to -evaluate, MCMC provides a useful method for finite temperature -physics.

-

Once the random walk has been carried out for many steps, the -expectation values of \(O\) can be -estimated from the MCMC samples: \[ - \expval{O} = \sum_{i = 0}^{N} O(x_i) + -\mathcal{O}(\frac{1}{\sqrt{N}}) -\] The the samples in the random walk are correlated so the -samples effectively contain less information than \(N\) independent samples would. As a -consequence the variance is larger than the \(\expval{O^2} - \expval{O}^2\) form it would -have if the estimates were uncorrelated. Methods of estimating the true -variance of \(\expval{O}\) and decided -how many steps are needed will be considered later.

-
-
-
-

The Metropolis-Hasting Algorithm

-

Markov chains are defined by a transition function $(x_{i+1} x_i) $ -giving the probability that a chain in state \(x_i\) at time \(i\) will transition to a state \(x_{i+1}\). Since we must transition -somewhere at each step, this comes with the normalisation condition that -\(\sum\limits_x \mathcal{T}(x' \rightarrow -x) = 1\).

-

If we define an ensemble of Markov chains and consider the -distributions we get a sequence of distributions \(\{p_0(x), p_1(x), p_2(x)\ldots\}\) with -\[\begin{aligned} -p_{i+1}(x) &= \sum_{x' \in S} p_i(x') \mathcal{T}(x' -\rightarrow x) -\end{aligned}\] \(p_o(x)\) might -be a delta function on one particular starting state which would then be -smoothed out by the transition function repeatedly.

-

As we’d like to draw samples from the target distribution \(P(x)\) the trick is to choose $(x_{i+1} -x_i) $ such that :

-

\[\begin{aligned} -P(x) &= \sum_{x'} P(x') \mathcal{T}(x' \rightarrow x) -\end{aligned} -\] In other words the MCMC dynamics defined by \(\mathcal{T}\) must be constructed to have -the target distribution as their only fixed point. This condition is -called the global balance condition. Along with some more technical -considerations such as ergodicity which won’t be considered here, global -balance suffices to ensure that a MCMC method is correct.

-

A sufficient but not necessary condition for global balance to hold -is detailed balance:

-

\[ -P(x) \mathcal{T}(x \rightarrow x') = P(x') \mathcal{T}(x' -\rightarrow x) -\] % In practice most algorithms are constructed to satisfy -detailed balance though there are arguments that relaxing the condition -can lead to faster algorithms  [8].

-

The goal of MCMC is then to choose \(\mathcal{T}\) so that it has the desired -thermal distribution \(P(x)\) as its -fixed point and that it converges quickly onto it. This boils down to -requiring that the matrix representation of \(T_{ij} = \mathcal{T}(x_i \to x_j)\) has an -eigenvector equal to \(P_i = P(x_i)\) -with eigenvalue 1 and all other eigenvalues with magnitude less than -one. The convergence time depends on the magnitude of the second largest -eigenvalue.

-
-
-

Metropolis-Hastings

-

In order to actually choose new states according to \(\mathcal{T}\) one chooses states from a -proposal distribution \(q(x_i \to -x')\) that can be directly sampled from. For instance, this -might mean flipping a single random spin in a spin chain, in which case -\(q(x'\to x_i)\) is the uniform -distribution on states reachable by one spin flip from \(x_i\). The proposal \(x'\) is then accepted or rejected with -an acceptance probability \(\mathcal{A}(x'\to x_{i+1})\), if the -proposal is rejected then \(x_{i+1} = -x_{i}\). Now \(\mathcal{T}(x\to x') -= q(x\to x')\mathcal{A}(x \to x')\).

-

The Metropolis-Hasting algorithm is a slight extension of the -original Metropolis algorithm that allows for non-symmetric proposal -distributions $q(xx’) q(x’x) $. It can be derived starting from detailed -balance  [7]: \[\begin{aligned} -P(x)\mathcal{T}(x \to x') &= P(x')\mathcal{T}(x' \to x) -\\ -P(x)q(x \to x')\mathcal{A}(x \to x') &= P(x')q(x' -\to x)\mathcal{A}(x' \to x) \\ -\label{eq:db2} \frac{\mathcal{A}(x \to x')}{\mathcal{A}(x' \to -x)} &= \frac{P(x')q(x' \to x)}{P(x)q(x \to x')} = f(x, -x')\\ -\end{aligned} -\] % The Metropolis-Hastings algorithm is the choice: \[ -\begin{aligned} -\label{eq:mh} -\mathcal{A}(x \to x') = \min\left(1, f(x,x')\right) -\end{aligned} -\] % Noting that \(f(x,x') = -1/f(x',x)\), Eq. \(\ref{eq:mh}\) can be seen to satisfy Eq. -\(\ref{eq:db2}\) by considering the two -cases \(f(x,x') > 1\) and \(f(x,x') < 1\).

-

By choosing the proposal distribution such that \(f(x,x')\) is as close as possible to -one, the rate of rejections can be reduced and the algorithm sped -up.

-
-
-

Convergence, Auto-correlation and Binning

-

%Thinning, burn in, multiple runs

-
-
-

Proposal Distributions

-

In a MCMC method a key property is the proportion of the time that -proposals are accepted, the acceptance rate. If this rate is too low the -random walk is trying to take overly large steps in energy space which -problematic because it means very few new samples will be generated. If -it is too high it implies the steps are too small, a problem because -then the walk will take longer to explore the state space and the -samples will be highly correlated. Ideal values for the acceptance rate -can be calculated under certain assumptions  [9]. Here we monitor the acceptance rate -and if it is too high we re-run the MCMC with a modified proposal -distribution that has a chance to propose moves that flip multiple sites -at a time.

-

In addition we exploit the particle-hole symmetry of the problem by -occasionally proposing a flip of the entire state. This works because -near half-filling, flipping the occupations of all the sites will -produce a state at or near the energy of the current one.

-
-
-

Perturbation MCMC

-

The matrix diagonalisation is the most computationally expensive step -of the process, a speed up can be obtained by modifying the proposal -distribution to depend on the classical part of the energy, a trick -gleaned from Ref.  [7]: \[ -\begin{aligned} -q(k \to k') &= \min\left(1, e^{\beta (H^{k'} - H^k)}\right) -\\ -\mathcal{A}(k \to k') &= \min\left(1, e^{\beta(F^{k'}- -F^k)}\right) -\end{aligned}\] % This allows the method to reject some states -without performing the diagonalisation at no cost to the accuracy of the -MCMC method.

-

An extension of this idea is to try to define a classical model with -a similar free energy dependence on the classical state as the full -quantum, Ref. 6].

+

In our computations  [7] we +employ a modification of the algorithm which is based on the observation +that the free energy of the FK system is composed of a classical part +which is much quicker to compute than the quantum part. Hence, we can +obtain a computational speedup by first considering the value of the +classical energy difference \(\Delta +H_s\) and rejecting the transition if the former is too high. We +only compute the quantum energy difference \(\Delta F_c\) if the transition is accepted. +We then perform a second rejection sampling step based upon it. This +corresponds to two nested comparisons with the majority of the work only +occurring if the first test passes and has the acceptance function \[\mathcal{A}(a \to b) = \min\left(1, e^{-\beta +\Delta H_s}\right)\min\left(1, e^{-\beta \Delta +F_c}\right)\;.\]

+

For the model parameters \(U=2/5, T = 1.5 / +2.5, J = 5,\;\alpha = 1.25\), we find that with our new scheme +the matrix diagonalisation is skipped around 30% of the time at \(T = 2.5\) and up to 80% at \(T = 1.5\). We observe that for \(N = 50\), the matrix diagonalisation, if it +occurs, occupies around 60% of the total computation time for a single +step. This rises to 90% at N = 300 and further increases for larger N. +We therefore get the greatest speedup for large system sizes at low +temperature where many prospective transitions are rejected at the +classical stage and the matrix computation takes up the greatest +fraction of the total computation time. The upshot is that we find a +speedup of up to a factor of 10 at the cost of very little extra +algorithmic complexity.

+

Our two-step method should be distinguished from the more common +method for speeding up MCMC which is to add asymmetry to the proposal +distribution to make it as similar as possible to \(\min\left(1, e^{-\beta \Delta E}\right)\). +This reduces the number of rejected states, which brings the algorithm +closer in efficiency to a direct sampling method. However it comes at +the expense of requiring a way to directly sample from this complex +distribution, a problem which MCMC was employed to solve in the first +place. For example, recent work trains restricted Boltzmann machines +(RBMs) to generate samples for the proposal distribution of the FK +model  [10] does this with restricted Boltzmann -machines whose form is very similar to a classical spin model.

+role="doc-biblioref">8]. The RBMs are chosen as a +parametrisation of the proposal distribution that can be efficiently +sampled from while offering sufficient flexibility that they can be +adjusted to match the target distribution. Our proposed method is +considerably simpler and does not require training while still reaping +some of the benefits of reduced computation.

Scaling

@@ -529,429 +389,10 @@ phase transition. If multiple such curves are plotted for different system sizes, a crossing indicates the location of a critical point  [11,9,musialMonteCarloSimulations2002?].

-
-

Markov Chain Monte-Carlo in Practice

-
-

Quick Intro to MCMC

-

The main paper relies on extensively to evaluate thermal expectation -values within the model by walking over states of the classical spin -system \(S_i\). For a classical system, -the thermal expectation value of some operator \(O\) is defined by a Boltzmann weighted sum -over the classical state space: \[\being{aligned} - \tex{O} &= \frac{1}{\mathcal{Z}} \sum_{\s \in S} O(x) P(x) \\ - P(x) &= \frac{1}{\mathcal{Z}} e^{-\beta F(x)} \\ - \mathcal{Z} &= \sum_{\s \in S} e^{-\beta F(x)} -\end{aligned}\] While for a quantum system these sums are -replaced by equivalent traces. The obvious approach to evaluate these -sums numerically would be to directly loop over all the classical states -in the system and perform the sum. But we all know know why this isn’t -feasible: the state space is too large! Indeed even if we could do it, -it would still be computationally wasteful since at low temperatures the -sums are dominated by low energy excitations about the ground states of -the system. Even worse, in our case we must fully solve the fermionic -system via exact diagonalisation for each classical state in the sum, a -very expensive operation!~\footnote{The effort involved in exact -diagonalisation scales like \(N^2\) for -systems with a tri-diagonal matrix representation (open boundary -conditions and nearest neighbour hopping) and like \(N^3\) for a generic matrix  [12,13].

-

c

-

MCMC sidesteps these issues by defining a random walk that focuses on -the states with the greatest Boltzmann weight. At low temperatures this -means we need only visit a few low energy states to make good estimates -while at high temperatures the weights become uniform so a small number -of samples distributed across the state space suffice. However we will -see that the method is not without difficulties of its own.

-

%MCMC from an ensemble point of view In implementation can be boiled -down to choosing a transition function $(_{t} _t+1) $ where \(\s\) are vectors representing classical -spin configurations. We start in some initial state \(\s_0\) and then repeatedly jump to new -states according to the probabilities given by \(\mathcal{T}\). This defines a set of random -walks \(\{\s_0\ldots \s_i\ldots -\s_N\}\). Fig.~\(\ref{fig:single}\) shows this in practice: -we have a (rather small) ensemble of \(M = -2\) walkers starting at the same point in state space and then -spreading outwards by flipping spins along the way.

-

In pseudo-code one could write the MCMC simulation for a single -walker as:

-
current_state = initial_state
-
-for i in range(N_steps):
-    new_state = sample_T(current_state) 
-    states[i] = current_state
-

Where the sample_T function here produces a state with -probability determined by the current_state and the -transition function \(\mathcal{T}\).

-

If we ran many such walkers in parallel we could then approximate the -distribution \(p_t(\s; \s_0)\) which -tells us where the walkers are likely to be after they’ve evolved for -\(t\) steps from an initial state \(\s_0\). We need to carefully choose \(\mathcal{T}\) such that after a large -number of steps \(k\) (the convergence -time) the probability \(p_t(\s;\s_0)\) -approaches the thermal distribution \(P(\s; -\beta) = \mathcal{Z}^{-1} e^{-\beta F(\s)}\). This turns out to -be quite easy to achieve using the Metropolis-Hasting algorithm.

-
-
-

Convergence Time

-

Considering \(p(\s)\) as a vector -\(\vec{p}\) whose jth entry is the -probability of the jth state \(p_j = -p(\s_j)\), and writing \(\mathcal{T}\) as the matrix with entries -\(T_{ij} = \mathcal{T}(\s_j \rightarrow -\s_i)\) we can write the update rule for the ensemble probability -as: \[\vec{p}_{t+1} = \mathcal{T} \vec{p}_t -\implies \vec{p}_{t} = \mathcal{T}^t \vec{p}_0\] where \(\vec{p}_0\) is vector which is one on the -starting state and zero everywhere else. Since all states must -transition to somewhere with probability one: \(\sum_i T_{ij} = 1\).

-

Matrices that satisfy this are called stochastic matrices exactly -because they model these kinds of Markov processes. It can be shown that -they have real eigenvalues, and ordering them by magnitude, that \(\lambda_0 = 1\) and \(0 < \lambda_{i\neq0} < 1\). -%https://en.wikipedia.org/wiki/Stochastic_matrix

-

Assuming \(\mathcal{T}\) has been -chosen correctly, its single eigenvector with eigenvalue 1 will be the -thermal distribution so repeated application of the transition function -eventually leads there, while memory of the initial conditions decays -exponentially with a convergence time \(k\) determined by \(\lambda_1\). In practice this means that -one throws away the data from the beginning of the random walk in order -reduce the dependence on the initial conditions and be close enough to -the target distribution.

-
-
-

Auto-correlation Time

-
-
- - -
-
-

At this stage one might think we’re done. We can indeed draw -independent samples from \(P(\s; -\beta)\) by starting from some arbitrary initial state and doing -\(k\) steps to arrive at a sample. -However a key insight is that after the convergence time, every state -generated is a sample from \(P(\s; -\beta)\)! They are not, however, independent samples. In -Fig.~\(\ref{fig:raw}\) it is already -clear that the samples of the order parameter m have some -auto-correlation because only a few spins are flipped each step but even -when the number of spins flipped per step is increased, Fig.~\(\ref{fig:m_autocorr}\) shows that it can be -an important effect near the phase transition. Let’s define the -auto-correlation time \(\tau(O)\) -informally as the number of MCMC samples of some observable O that are -statistically equal to one independent sample or equivalently as the -number of MCMC steps after which the samples are correlated below some -cutoff, see  [14] for a more rigorous definition -involving a sum over the auto-correlation function. The auto-correlation -time is generally shorter than the convergence time so it therefore -makes sense from an efficiency standpoint to run a single walker for -many MCMC steps rather than to run a huge ensemble for \(k\) steps each.

-

Once the random walk has been carried out for many steps, the -expectation values of \(O\) can be -estimated from the MCMC samples \(\s_i\): \[ - \tex{O} = \sum_{i = 0}^{N} O(\s_i) + \mathcal{O}(\frac{1}{\sqrt{N}}) -\] The the samples are correlated so the N of them effectively -contains less information than \(N\) -independent samples would, in fact roughly \(N/\tau\) effective samples. As a -consequence the variance is larger than the \(\qex{O^2} - \qex{O}^2\) form it would have -if the estimates were uncorrelated. There are many methods in the -literature for estimating the true variance of \(\qex{O}\) and deciding how many steps are -needed but my approach has been to run a small number of parallel -chains, which are independent, in order to estimate the statistical -error produced. This is a slightly less computationally efficient -because it requires throwing away those \(k\) steps generated before convergence -multiple times but it is a conceptually simple workaround.

-

In summary, to do efficient simulations we want to reduce both the -convergence time and the auto-correlation time as much as possible. In -order to explain how, we need to introduce the Metropolis-Hasting (MH) -algorithm and how it gives an explicit form for the transition -function.

-
-
-

The Metropolis-Hastings Algorithm

-

MH breaks up the transition function into a proposal distribution -\(q(\s \to \s')\) and an acceptance -function \(\mathcal{A}(\s \to -\s')\). \(q\) needs to be -something that we can directly sample from, and in our case generally -takes the form of flipping some number of spins in \(\s\), i.e if we’re flipping a single random -spin in the spin chain, \(q(\s \to -\s')\) is the uniform distribution on states reachable by one -spin flip from \(\s\). This also gives -the nice symmetry property that \(q(\s \to -\s') = q(\s' \to \s)\).

-

The proposal \(\s'\) is then -accepted or rejected with an acceptance probability \(\mathcal{A}(\s \to \s')\), if the -proposal is rejected then \(\s_{i+1} = -\s_{i}\). Hence:

-

\[\mathcal{T}(x\to x') = q(x\to -x')\mathcal{A}(x \to x')\]

-

When the proposal distribution is symmetric as ours is, it cancels -out in the expression for the acceptance function and the -Metropolis-Hastings algorithm is simply the choice: \[ \mathcal{A}(x \to x') = \min\left(1, -e^{-\beta\;\Delta F}\right)\] Where \(F\) is the overall free energy of the -system, including both the quantum and classical sector.

-

To implement the acceptance function in practice we pick a random -number in the unit interval and accept if it is less than \(e^{-\beta\;\Delta F}\):

-
current_state = initial_state
-
-for i in range(N_steps):
-    new_state = proposal(current_state)
-    df = free_energy_change(current_state, new_state, parameters)
-
-    if uniform(0,1) < exp(-beta * df):
-        current_state = new_state
-        
-    states[i] = current_state
-

This has the effect of always accepting proposed states that are -lower in energy and sometimes accepting those that are higher in energy -than the current state.

-
-
-
-

Two Step Trick

-

Here, we incorporate a modification to the standard -Metropolis-Hastings algorithm  [15] gleaned from Krauth  [7].

-

In our computations  [16] we -employ a modification of the algorithm which is based on the observation -that the free energy of the FK system is composed of a classical part -which is much quicker to compute than the quantum part. Hence, we can -obtain a computational speedup by first considering the value of the -classical energy difference \(\Delta -H_s\) and rejecting the transition if the former is too high. We -only compute the quantum energy difference \(\Delta F_c\) if the transition is accepted. -We then perform a second rejection sampling step based upon it. This -corresponds to two nested comparisons with the majority of the work only -occurring if the first test passes and has the acceptance function \[\mathcal{A}(a \to b) = \min\left(1, e^{-\beta -\Delta H_s}\right)\min\left(1, e^{-\beta \Delta -F_c}\right)\;.\]

-

For the model parameters used in Fig. 2, we find -that with our new scheme the matrix diagonalisation is skipped around -30% of the time at \(T = 2.5\) and up -to 80% at \(T = 1.5\). We observe that -for \(N = 50\), the matrix -diagonalisation, if it occurs, occupies around 60% of the total -computation time for a single step. This rises to 90% at N = 300 and -further increases for larger N. We therefore get the greatest speedup -for large system sizes at low temperature where many prospective -transitions are rejected at the classical stage and the matrix -computation takes up the greatest fraction of the total computation -time. The upshot is that we find a speedup of up to a factor of 10 at -the cost of very little extra algorithmic complexity.

-

Our two-step method should be distinguished from the more common -method for speeding up MCMC which is to add asymmetry to the proposal -distribution to make it as similar as possible to \(\min\left(1, e^{-\beta \Delta E}\right)\). -This reduces the number of rejected states, which brings the algorithm -closer in efficiency to a direct sampling method. However it comes at -the expense of requiring a way to directly sample from this complex -distribution, a problem which MCMC was employed to solve in the first -place. For example, recent work trains restricted Boltzmann machines -(RBMs) to generate samples for the proposal distribution of the FK -model  [10]. The RBMs are chosen as a -parametrisation of the proposal distribution that can be efficiently -sampled from while offering sufficient flexibility that they can be -adjusted to match the target distribution. Our proposed method is -considerably simpler and does not require training while still reaping -some of the benefits of reduced computation.

-
-
-

Detailed Balance for the two step method

-

Given a MCMC algorithm with target distribution \(\pi(a)\) and transition function \(\mathcal{T}\) the detailed balance -condition is sufficient (along with some technical constraints  [5]) to guarantee that in the long time -limit the algorithm produces samples from \(\pi\). \[\pi(a)\mathcal{T}(a \to b) = \pi(b)\mathcal{T}(b -\to a)\]

-

In pseudo-code, our two step method corresponds to two nested -comparisons with the majority of the work only occurring if the first -test passes:

-
current_state = initial_state
-
-for i in range(N_steps):
-  new_state = proposal(current_state)
-
-  c_dE = classical_energy_change(
-                               current_state,
-                               new_state)
-  if uniform(0,1) < exp(-beta * c_dE):
-    q_dF = quantum_free_energy_change(
-                                current_state,
-                                new_state)
-    if uniform(0,1) < exp(- beta * q_dF):
-      current_state = new_state
-
-    states[i] = current_state
-

Defining \(r_c = e^{-\beta H_c}\) -and \(r_q = e^{-\beta F_q}\) our target -distribution is \(\pi(a) = r_c r_q\). -This method has \(\mathcal{T}(a\to b) = q(a\to -b)\mathcal{A}(a \to b)\) with symmetric \(p(a \to b) = \pi(b \to a)\) and \(\mathcal{A} = \min\left(1, r_c\right) \min\left(1, -r_q\right)\)

-

Substituting this into the detailed balance equation gives: \[\mathcal{T}(a \to b)/\mathcal{T}(b \to a) = -\pi(b)/\pi(a) = r_c r_q\]

-

Taking the LHS and substituting in our transition function: \[\begin{aligned} -\mathcal{T}(a \to b)/\mathcal{T}(b \to a) = \frac{\min\left(1, -r_c\right) \min\left(1, r_q\right)}{ \min\left(1, 1/r_c\right) -\min\left(1, 1/r_q\right)}\end{aligned}\]

-

which simplifies to \(r_c r_q\) as -\(\min(1,r)/\min(1,1/r) = r\) for \(r > 0\).

-
-

Two Step Trick

-

Our method already relies heavily on the split between the classical -and quantum sector to derive a sign problem free MCMC algorithm but it -turns out that there is a further trick we can play with it. The free -energy term is the sum of an easy to compute classical energy and a more -expensive quantum free energy, we can split the acceptance function into -two in such as way as to avoid having to compute the full exact -diagonalisation some of the time:

-

-current_state = initial_state
-
-for i in range(N_steps):
-    new_state = proposal(current_state)
-
-    df_classical = classical_free_energy_change(current_state, new_state, parameters)
-    if exp(-beta * df_classical) < uniform(0,1):
-        f_quantum = quantum_free_energy(current_state, new_state, parameters)
-    
-        if exp(- beta * df_quantum) < uniform(0,1):
-          current_state = new_state
-    
-        states[i] = current_state
-    
-
-
-

Tuning the proposal distribution

-
-
- - -
-
-

Now we can discuss how to minimise the auto-correlations. The general -principle is that one must balance the proposal distribution between two -extremes. Choose overlay small steps, like flipping only a single spin -and the acceptance rate will be high because \(\Delta F\) will usually be small, but each -state will be very similar to the previous and the auto-correlations -will be high too, making sampling inefficient. On the other hand, -overlay large steps, like randomising a large portion of the spins each -step, will result in very frequent rejections, especially at low -temperatures.

-

I evaluated a few different proposal distributions for use with the -FK model.

-
    -
  1. Flipping a single random site
  2. -
  3. Flipping N random sites for some N
  4. -
  5. Choosing n from Uniform(1, N) and then flipping n sites for some -fixed N.
  6. -
  7. Attempting to tune the proposal distribution for each parameter -regime.
  8. -
-

Fro Figure~\(\ref{fig:comparison}\) -we see that even at moderately high temperatures \(T > T_c\) flipping one or two sites is -the best choice. However for some simulations at very high temperature -flipping more spins is warranted. Tuning the proposal distribution -automatically seems like something that would not yield enough benefit -for the additional complexity it would require.

-
-

Diagnostics of Localisation

@@ -961,7 +402,7 @@ function \(\psi_i = \psi(x_i), \sum_i \abs{\psi_i}^2 = 1\) as its fourth moment  [17]: \[ +role="doc-biblioref">10]: \[ P^{-1} = \sum_i \abs{\psi_i}^4 \] % It acts as a measure of the portion of space occupied by the wave function. For localised states it will be independent of system @@ -975,7 +416,7 @@ P(L) \goeslike L^{d*} 0\). In this work we take use an energy resolved IPR  [18]: \[ +role="doc-biblioref">11]: \[ DOS(\omega) = \sum_n \delta(\omega - \epsilon_n) IPR(\omega) = DOS(\omega)^{-1} \sum_{n,i} \delta(\omega - \epsilon_n) \abs{\psi_{n,i}}^4 @@ -984,73 +425,10 @@ wavefunction corresponding to the energy \(\epsilon_n\) at the ith site. In practice we bin the energies and IPRs into a fine energy grid and use Lorentzian smoothing if necessary.

-

Dimensionality can be both a blessing and a curse. In section ?? I’ll -discuss the fact that statistical physics can be somewhat boring in one -dimension where most simple models have no phase transitions. This -chapter is motivated by the the converse problem, high dimensional -spaces can sometimes be just too much.

-

While there are many problems with high dimensions, my favourite -being that there are no stable gravitational orbits in 4D and above, the -specific issue we’ll focus on here is that it’s very hard to compute -integrals over high dimensional spaces.

-

The standard methods for numerical integration in 1,2 and 3 -dimensions mostly work in the same way REFERENCE. You evaluate the -integrand at a grid of points, define an interpolating function over the -points that’s easy to integrate and then integrate the function. For a -fixed grid spacing \(d\) on a finite -domain of integration we’ll find that we need to evaluate \(\propto (1/d)^D\) points, which scales -exponentially with dimension!

-

In statistical physics the main integral that one would love to be -able to evaluate is of course the partition function.

-

\[Z = \int ds e^{-\beta F}\]

-

And as this is condensed matter theory, we will mainly be looking at -quantum models in which our states are discrete occupation numbers of -single particle energy states. For a spin model with just two states per -site and N sites we therefore end up with \(2^N\) possible states of the system.

-

domain of integration is bounded we can cif we take a discrete space -with \(M\) dimensions each taking \(N\) distinct values.

-

Detailed and Global balance equation Mixing times Cluster updates and -Critical slowing down Effective Sample Size

-
-
-
-

Markov Chain Monte-Carlo

-

Dimensionality can be both a blessing and a curse. In I’ll discuss -the fact that statistical physics can be somewhat boring in one -dimension where most simple models have no phase transitions. This -chapter is motivated by the the converse problem, high dimensional -spaces can sometimes be just too much.

-

While there are many problems with high dimensions 1 -it’s very hard to compute integrals over high dimensional spaces. If we -take a discrete space with \(M\) -dimensions each taking \(N\) distinct -values.

-

For a classical system, the thermal expectation value of some -operator \(O\) is defined by a -Boltzmann weighted sum over the classical state space: \[\begin{aligned} - \tex{O} &= \frac{1}{\Z} \sum_{\s \in S} O(x) P(x) \\ - P(x) &= \frac{1}{\Z} e^{-\beta F(x)} \\ - \Z &= \sum_{\s \in S} e^{-\beta F(x)}\end{aligned}\] -While for a quantum system these sums are replaced by equivalent traces. -The obvious approach to evaluate these sums numerically would be to -directly loop over all the classical states in the system and perform -the sum. But we all know know why this isn’t feasible: the state space -is too large! Indeed even if we could do it, it would still be -computationally wasteful since at low temperatures the sums are -dominated by low energy excitations about the ground states of the -system. Even worse, in our case we must fully solve the fermionic system -via exact diagonalisation for each classical state in the sum, a very -expensive operation! 2

-

Where the sample_T function here produces a state with probability determined by the current_state and the -transition function \(\T\).

+transition function \(\mathcal{T}\).

If we ran many such walkers in parallel we could then approximate the distribution \(p_t(\s; \s_0)\) which tells us where the walkers are likely to be after they’ve evolved for \(t\) steps from an initial state \(\s_0\). We need to carefully choose \(\T\) such that after a large number of -steps \(k\) (the convergence time) the -probability \(p_t(\s;\s_0)\) approaches -the thermal distribution \(P(\s; \beta) = -\Z^{-1} e^{-\beta F(\s)}\). This turns out to be quite easy to -achieve using the Metropolis-Hasting algorithm.

+class="math inline">\(\mathcal{T}\) such that after a large +number of steps \(k\) (the convergence +time) the probability \(p_t(\s;\s_0)\) +approaches the thermal distribution \(P(\s; +\beta) = \mathcal{Z}^{-1} e^{-\beta F(\s)}\). This turns out to +be quite easy to achieve using the Metropolis-Hasting algorithm.

-
+
+

Convergence Time

Considering \(p(\s)\) as a vector \(\vec{p}\) whose jth entry is the probability of the jth state \(p_j = -p(\s_j)\), and writing \(\T\) as -the matrix with entries \(T_{ij} = \T(\s_j -\rightarrow \s_i)\) we can write the update rule for the ensemble -probability as: \[\vec{p}_{t+1} = \T -\vec{p}_t \implies \vec{p}_{t} = \T^t \vec{p}_0\] where , and writing \(\mathcal{T}\) as the matrix with entries +\(T_{ij} = \mathcal{T}(\s_j \rightarrow +\s_i)\) we can write the update rule for the ensemble probability +as: \[\vec{p}_{t+1} = \mathcal{T} \vec{p}_t +\implies \vec{p}_{t} = \mathcal{T}^t \vec{p}_0\] where \(\vec{p}_0\) is vector which is one on the starting state and zero everywhere else. Since all states must transition to somewhere with probability one: \(\lambda_0 = 1\) and \(0 < \lambda_{i\neq0} < 1\). Assuming -\(\T\) has been chosen correctly, its -single eigenvector with eigenvalue 1 will be the thermal distribution 3 so repeated application of the -transition function eventually leads there, while memory of the initial -conditions decays exponentially with a convergence time \(\mathcal{T}\) has been chosen +correctly, its single eigenvector with eigenvalue 1 will be the thermal +distribution [^3] so repeated application of the transition function +eventually leads there, while memory of the initial conditions decays +exponentially with a convergence time \(k\) determined by \(\lambda_1\). In practice this means that one throws away the data from the beginning of the random walk in order reduce the dependence on the initial conditions and be close enough to the target distribution.

-
+

Auto-correlation Time

-
-
-

The Metropolis-Hastings Algorithm

-

MH breaks up the transition function into a proposal distribution -\(q(\s \to \s')\) and an acceptance -function \(\A(\s \to \s')\). \(q\) needs to be something that we can -directly sample from, and in our case generally takes the form of -flipping some number of spins in \(\s\), i.e if we’re flipping a single random -spin in the spin chain, \(q(\s \to -\s')\) is the uniform distribution on states reachable by one -spin flip from \(\s\). This also gives -the nice symmetry property that \(q(\s \to -\s') = q(\s' \to \s)\).

-

The proposal \(\s'\) is then -accepted or rejected with an acceptance probability \(\A(\s \to \s')\), if the proposal is -rejected then \(\s_{i+1} = \s_{i}\). -Hence:

-

\[\T(x\to x') = q(x\to x')\A(x \to -x')\]

-

When the proposal distribution is symmetric as ours is, it cancels -out in the expression for the acceptance function and the -Metropolis-Hastings algorithm is simply the choice: \[\A(x \to x') = \min\left(1, e^{-\beta\;\Delta -F}\right)\] Where \(F\) is the -overall free energy of the system, including both the quantum and -classical sector.

-

To implement the acceptance function in practice we pick a random -number in the unit interval and accept if it is less than \(e^{-\beta\;\Delta F}\):

-
current_state = initial_state
-
-for i in range(N_steps):
-    new_state = proposal(current_state)
-    df = free_energy_change(current_state, new_state, parameters)
-
-    if uniform(0,1) < exp(-beta * df):
-        current_state = new_state
-        
-    states[i] = current_state
-

This has the effect of always accepting proposed states that are -lower in energy and sometimes accepting those that are higher in energy -than the current state.

-
-
-

Choosing the proposal distribution

-

Simulations -showing how the autocorrelation of the order parameter depends on the -proposal distribution used at different temperatures, we see that at -\(T = 1.5 < T_c\) a single spin flip -is likely the best choice, while at the high temperature \(T = 2.5 > T_c\) flipping two sites or a -mixture of flipping two and 1 sites is likely a better choice.

-

Now we can discuss how to minimise the auto-correlations. The general -principle is that one must balance the proposal distribution between two -extremes. Choose overlay small steps, like flipping only a single spin -and the acceptance rate will be high because \(\Delta F\) will usually be small, but each -state will be very similar to the previous and the auto-correlations -will be high too, making sampling inefficient. On the other hand, -overlay large steps, like randomising a large portion of the spins each -step, will result in very frequent rejections, especially at low -temperatures.

-

I evaluated a few different proposal distributions for use with the -FK model.

-
    -
  1. Flipping a single random site

  2. -
  3. Flipping N random sites for some N

  4. -
  5. Choosing n from Uniform(1, N) and then flipping n sites for some -fixed N.

  6. -
  7. Attempting to tune the proposal distribution for each parameter -regime.

  8. -
-

Fro Figure 4 we see that even at moderately -high temperatures \(T > T_c\) -flipping one or two sites is the best choice. However for some -simulations at very high temperature flipping more spins is warranted. -Tuning the proposal distribution automatically seems like something that -would not yield enough benefit for the additional complexity it would -require.

-
-
-

Two Step Trick

-

Our method already relies heavily on the split between the classical -and quantum sector to derive a sign problem free MCMC algorithm but it -turns out that there is a further trick we can play with it. The free -energy term is the sum of an easy to compute classical energy and a more -expensive quantum free energy, we can split the acceptance function into -two in such as way as to avoid having to compute the full exact -diagonalisation some of the time:

-
current_state = initial_state
-
-for i in range(N_steps):
-    new_state = proposal(current_state)
-
-    df_classical = classical_free_energy_change(current_state, new_state, parameters)
-    if exp(-beta * df_classical) < uniform(0,1):
-        f_quantum = quantum_free_energy(current_state, new_state, parameters)
-    
-        if exp(- beta * df_quantum) < uniform(0,1):
-          current_state = new_state
-    
-        states[i] = current_state
-    
+

Next Section: Results

Bibliography

-
-
[1]
L. -Devroye, Random -Sampling, in Non-Uniform Random Variate Generation, -edited by L. Devroye (Springer, New York, NY, 1986), pp. 611–641.
-
-
-
[2]
O. -A. Martin, R. Kumar, and J. Lao, Bayesian Modeling and Computation -in Python (Boca Raton, 2021).
-
-
[3]
K. +
[1]
K. Binder and D. W. Heermann, Guide to Practical Work with the Monte Carlo Method, in Monte Carlo Simulation @@ -1372,7 +629,7 @@ W. Heermann (Springer Berlin Heidelberg, Berlin, Heidelberg, 1988), pp.
-
[4]
J. +
[2]
-
[5]
U. +
[3]
U. Wolff, Monte Carlo Errors with Less Errors, Computer Physics Communications 156, 143 (2004).
-
[6]
F. +
[4]
F. P. Kelly, Reversibility and Stochastic Networks / F.P. Kelly, SERBIULA (Sistema Librum 2.0) 76, (1981).
+
+
[5]
+
-
[7]
W. +
[6]
W. Krauth, Introduction To Monte Carlo Algorithms, in Advances in Computer Simulation: Lectures Held at the Eötvös Summer School in Budapest, Hungary, 16–20 July 1996 (Springer-Verlag, Berlin Heidelberg, 1998).
-
-
[8]
S. -C. Kapfer and W. Krauth, Sampling from a -Polytope and Hard-Disk Monte Carlo, J. Phys.: Conf. Ser. -454, 012031 (2013).
-
-
-
[9]
G. -O. Roberts, A. Gelman, and W. R. Gilks, Weak Convergence and -Optimal Scaling of Random Walk Metropolis Algorithms, Ann. -Appl. Probab. 7, 110 (1997).
+
[7]
-
[10]
L. +
[8]
L. Huang and L. Wang, Accelerated Monte Carlo Simulations with Restricted Boltzmann Machines, Phys. @@ -1427,48 +681,14 @@ Rev. B 95, 035105 (2017).
-
[11]
K. +
[9]
K. Binder, Finite Size Scaling Analysis of Ising Model Block Distribution Functions, Z. Physik B - Condensed Matter 43, 119 (1981).
-
-
[12]
G. -Bolch, S. Greiner, H. de Meer, and K. S. Trivedi, Queueing Networks -and Markov Chains: Modeling and Performance Evaluation with Computer -Science Applications (John Wiley & Sons, 2006).
-
-
-
[13]
R. -A. Usmani, Inversion of a -Tridiagonal Jacobi Matrix, Linear Algebra and Its Applications -212-213, 413 (1994).
-
-
-
[14]
W. -Krauth, Introduction -To Monte Carlo Algorithms, arXiv:cond-Mat/9612186 (1996).
-
-
-
[15]
-
-
-
[16]
-
-
[17]
B. +
[10]
B. Kramer and A. MacKinnon, Localization: Theory and Experiment, Rep. Prog. Phys. 56, 1469 @@ -1476,7 +696,7 @@ and Experiment, Rep. Prog. Phys. 56, 1469
-
[18]
P. +
[11]
P. W. Anderson, Absence of Diffusion in Certain Random Lattices, Phys. Rev. 109, 1492 @@ -1484,40 +704,6 @@ Certain Random Lattices, Phys. Rev. 109, 1492
-
-
-
    -
  1. my favourite being that there are no -stable gravitational orbits in 4D and above↩︎

  2. -
  3. The effort involved in exact -diagonalisation scales like \(N^2\) for -systems with a tri-diagonal matrix representation (open boundary -conditions and nearest neighbour hopping) and like \(N^3\) for a generic matrix  [12,13].↩︎

  4. -
  5. or, in the general case, any desired -distribution. MCMC has found a lot of use in sampling from the -complicated distributions that arise when taking a Bayesian approach to -statistics.↩︎

  6. -
  7. or equivalently as the number of MCMC -steps after which the samples are correlated below some cutoff, -see  [14] for a more rigorous definition -involving a sum over the auto-correlation function.↩︎

  8. -
-
diff --git a/_thesis/3_Long_Range_Falikov_Kimball/3.3_LRFK_Results.html b/_thesis/3_Long_Range_Falikov_Kimball/3.3_LRFK_Results.html index 438c43f..fd3872e 100644 --- a/_thesis/3_Long_Range_Falikov_Kimball/3.3_LRFK_Results.html +++ b/_thesis/3_Long_Range_Falikov_Kimball/3.3_LRFK_Results.html @@ -470,8 +470,8 @@ H_{\mathrm{DM}} = & \;U \sum_{i} (-1)^i \; d_i \;(c^\dag_{i}c_{i} -

Next Chapter: 4 -The Amorphous Kitaev Model

+href="../4_Amorphous_Kitaev_Model/4.1.2_AMK_Model.html">4 The Amorphous +Kitaev Model

Bibliography

diff --git a/_thesis/4_Amorphous_Kitaev_Model/4.1.2_AMK_Model.html b/_thesis/4_Amorphous_Kitaev_Model/4.1.2_AMK_Model.html index 4887d94..5525928 100644 --- a/_thesis/4_Amorphous_Kitaev_Model/4.1.2_AMK_Model.html +++ b/_thesis/4_Amorphous_Kitaev_Model/4.1.2_AMK_Model.html @@ -1070,8 +1070,7 @@ role="doc-biblioref">16,kitaevFaulttolerantQuantumComputation2003?].

Next Section: The -Model

+href="../4_Amorphous_Kitaev_Model/4.1_AMK_Model.html">The Model

diff --git a/_thesis/4_Amorphous_Kitaev_Model/4.1_AMK_Model.html b/_thesis/4_Amorphous_Kitaev_Model/4.1_AMK_Model.html index f886cf0..d4d72f8 100644 --- a/_thesis/4_Amorphous_Kitaev_Model/4.1_AMK_Model.html +++ b/_thesis/4_Amorphous_Kitaev_Model/4.1_AMK_Model.html @@ -769,7 +769,7 @@ anyway, an arbitrary pairing of the unpaired \(b^\alpha\) operators could be performed. </i,j></i,j>

Next Section: Methods

+href="../4_Amorphous_Kitaev_Model/4.2_AMK_Methods.html">Methods

diff --git a/_thesis/4_Amorphous_Kitaev_Model/4.2_AMK_Methods.html b/_thesis/4_Amorphous_Kitaev_Model/4.2_AMK_Methods.html index e903f18..90b6a37 100644 --- a/_thesis/4_Amorphous_Kitaev_Model/4.2_AMK_Methods.html +++ b/_thesis/4_Amorphous_Kitaev_Model/4.2_AMK_Methods.html @@ -541,7 +541,7 @@ system.

Discuss link between Chern number and Anyonic Statistics

Next Section: Results

+href="../4_Amorphous_Kitaev_Model/4.3_AMK_Results.html">Results

diff --git a/_thesis/4_Amorphous_Kitaev_Model/4.3_AMK_Results.html b/_thesis/4_Amorphous_Kitaev_Model/4.3_AMK_Results.html index 9e7afb0..dde0675 100644 --- a/_thesis/4_Amorphous_Kitaev_Model/4.3_AMK_Results.html +++ b/_thesis/4_Amorphous_Kitaev_Model/4.3_AMK_Results.html @@ -665,8 +665,8 @@ href="#ref-Wu2009" role="doc-biblioref">47]

quantum many body phases albeit material candidates aplenty. We expect our exact chiral amorphous spin liquid to find many generalisation to realistic amorphous quantum magnets and beyond.

-

Next Chapter: 5 Conclusion

+

Next Chapter: 5 +Conclusion

diff --git a/_thesis/5_Conclusion/5_Conclusion.html b/_thesis/5_Conclusion/5_Conclusion.html index d806d0a..2ee1c5f 100644 --- a/_thesis/5_Conclusion/5_Conclusion.html +++ b/_thesis/5_Conclusion/5_Conclusion.html @@ -27,6 +27,14 @@ image:

Bibliography

diff --git a/_thesis/6_Appendices/A.2_Markov_Chain_Monte_Carlo.html b/_thesis/6_Appendices/A.2_Markov_Chain_Monte_Carlo.html index a3c9d29..f56f227 100644 --- a/_thesis/6_Appendices/A.2_Markov_Chain_Monte_Carlo.html +++ b/_thesis/6_Appendices/A.2_Markov_Chain_Monte_Carlo.html @@ -30,10 +30,61 @@ image:
  • Markov Chain Monte Carlo
  • +
  • [app:balance] DETAILED BALANCE
  • +
  • Bibliography
  • {% endcapture %} @@ -49,102 +100,883 @@ model
  • Markov Chain Monte Carlo
  • +
  • [app:balance] DETAILED BALANCE
  • +
  • Bibliography
  • --> -

    Markov Chain Monte Carlo

    -
    -

    Applying MCMC to the FK model

    -

    MCMC can be applied to sample over the classical degrees of freedom -of the model. We take the full Hamiltonian and split it into a classical -and a quantum part: \[\begin{aligned} - H_{\mathrm{FK}} &= -\sum_{<ij>} c^\dagger_{i}c_{j} + U -\sum_{i} (c^\dagger_{i}c_{i} - 1/2)( n_i - 1/2) \\ - &+ \sum_{ij} J_{ij} (n_i - 1/2) (n_j - 1/2) - \mu \sum_i -(c^\dagger_{i}c_{i} + n_i)\\ - H_q &= -\sum_{<ij>} c^\dagger_{i}c_{j} + \sum_{i} -\left(U(n_i - 1/2) - \mu\right) c^\dagger_{i}c_{i}\\ - H_c &= \sum_i \mu n_i - \frac{U}{2}(n_i - 1/2) + -\sum_{ij}J_{ij}(n_i - 1/2)(n_j - 1/2) +

    Markov Chain Monte Carlo (MCMC) is a useful method whenever we have a +probability distribution that we want to sample from but there is not +direct sampling way to do so.

    +
    +

    Direct Random Sampling

    +

    In almost any computer simulation the ultimate source of randomness +is a stream of (close to) uniform, uncorrelated bits generated from a +pseudo random number generator. A direct sampling method takes such a +source and outputs uncorrelated samples from the target distribution. +The fact they’re uncorrelated is key as we’ll see later. Examples of +direct sampling methods range from the trivial: take n random bits to +generate integers uniformly between 0 and \(2^n\) to more complex methods such as +inverse transform sampling and rejection sampling  [1].

    +

    In physics the distribution we usually want to sample from is the +Boltzmann probability over states of the system \(S\): \[ +\begin{aligned} +p(S) &= \frac{1}{\mathcal{Z}} e^{-\beta H(S)} \\ \end{aligned} -\]

    -

    There are \(2^N\) possible ion -configurations \(\{ n_i \}\), we define -\(n^k_i\) to be the occupation of the -ith site of the kth configuration. The quantum part of the free energy -can then be defined through the quantum partition function \(\mathcal{Z}^k\) associated with each ionic -state \(n^k_i\): \[\begin{aligned} -F^k &= -1/\beta \ln{\mathcal{Z}^k} \\ -\end{aligned}\] % Such that the overall partition function is: -\[\begin{aligned} -\mathcal{Z} &= \sum_k e^{- \beta H^k} Z^k \\ -&= \sum_k e^{-\beta (H^k + F^k)} \\ -\end{aligned}\] % Because fermions are limited to occupation -numbers of 0 or 1 \(Z^k\) simplifies -nicely. If \(m^j_i = \{0,1\}\) is -defined as the occupation of the level with energy \(\epsilon^k_i\) then the partition function -is a sum over all the occupation states labelled by j: \[\begin{aligned} -Z^k &= \Tr e^{-\beta F^k} = \sum_j e^{-\beta \sum_i m^j_i -\epsilon^k_i}\\ - &= \sum_j \prod_i e^{- \beta m^j_i \epsilon^k_i}= \prod_i -\sum_j e^{- \beta m^j_i \epsilon^k_i}\\ - &= \prod_i (1 + e^{- \beta \epsilon^k_i})\\ -F^k &= -1/\beta \sum_k \ln{(1 + e^{- \beta \epsilon^k_i})} -\end{aligned}\] % Observables can then be calculated from the -partition function, for examples the occupation numbers:

    -

    \[\begin{aligned} -\tex{N} &= \frac{1}{\beta} \frac{1}{Z} \frac{\partial Z}{\partial -\mu} = - \frac{\partial F}{\partial \mu}\\ - &= \frac{1}{\beta} \frac{1}{Z} \frac{\partial}{\partial \mu} -\sum_k e^{-\beta (H^k + F^k)}\\ - &= 1/Z \sum_k (N^k_{\mathrm{ion}} + N^k_{\mathrm{electron}}) -e^{-\beta (H^k + F^k)}\\ -\end{aligned}\] % with the definitions:

    -

    \[\begin{aligned} -N^k_{\mathrm{ion}} &= - \frac{\partial H^k}{\partial \mu} = \sum_i -n^k_i\\ -N^k_{\mathrm{electron}} &= - \frac{\partial F^k}{\partial \mu} = -\sum_i \left(1 + e^{\beta \epsilon^k_i}\right)^{-1}\\ -\end{aligned}\] % The MCMC algorithm consists of performing a -random walk over the states \(\{ n^k_i -\}\). In the simplest case the proposal distribution corresponds -to flipping a random site from occupied to unoccupied or vice versa, -since this proposal is symmetric the acceptance function becomes: \[\begin{aligned} -P(k) &= \mathcal{Z}^{-1} e^{-\beta(H^k + F^k)} \\ -\mathcal{A}(k \to k') &= \min\left(1, -\frac{P(k')}{P(k)}\right) = \min\left(1, e^{\beta(H^{k'} + -F^{k'})-\beta(H^k + F^k)}\right) -\end{aligned}\] % At each step \(F^k\) is calculated by diagonalising the -tri-diagonal matrix representation of \(H_q\) with open boundary conditions. -Observables are simply averages over the their value at each step of the -random walk. The full spectrum and eigenbasis is too large to save to -disk so usually running averages of key observables are taken as the -walk progresses.

    -
    -

    -

    Next Section: Lattice -Generation

    +\] where \(\mathcal{Z} = \sum_S +e^{-\beta H(S)}\) is the normalisation factor and ubiquitous +partition function. In principle we could directly sample from this, for +a discrete system there are finitely many choices. We could calculate +the probability of each one and assign each a region of the unit +interval which we could then sample uniformly from.

    +

    However if we actually try to do this we will run into two problems, +we can’t calculate \(\mathcal{Z}\) for +any reasonably sized systems because the state space grows exponentially +with system size. Even if we could calculate \(\mathcal{Z}\), sampling from an +exponentially large number of options quickly become tricky. This kind +of problem happens in many other disciplines too, particularly when +fitting statistical models using Bayesian inference  [2].

    +
    +

    MCMC Sampling

    +

    So what can we do? Well it turns out that if we’re willing to give up +in the requirement that the samples be uncorrelated then we can use MCMC +instead.

    +

    MCMC defines a weighted random walk over the states \((S_0, S_1, S_2, ...)\), such that in the +long time limit, states are visited according to their probability \(p(S)\) [35].   [6]

    +

    \[\lim_{i\to\infty} p(S_i) = +P(S)\]

    +

    In a physics context this lets us evaluate any observable with a mean +over the states visited by the walk. \[\begin{aligned} +\langle O \rangle & = \sum_{S} p(S) \langle O \rangle_{S} = \sum_{i += 0}^{M} \langle O\rangle_{S_i} + \mathcal{O}(\tfrac{1}{\sqrt{M}})\\ +\end{aligned}\]

    +

    The the samples in the random walk are correlated so the samples +effectively contain less information than \(N\) independent samples would. As a +consequence the variance is larger than the \(\langle O^2 \rangle - \langle O\rangle^2\) +form it would have if the estimates were uncorrelated. Methods of +estimating the true variance of \(\langle O +\rangle\) and decided how many steps are needed will be +considered later.

    +
    +
    +

    Implementation of MCMC

    +

    In implementation MCMC can be boiled down to choosing a transition +function $(S_{t} S_{t+1}) $ where \(S\) +are vectors representing classical spin configurations. We start in some +initial state \(S_0\) and then +repeatedly jump to new states according to the probabilities given by +\(\mathcal{T}\). This defines a set of +random walks \(\{S_0\ldots S_i\ldots +S_N\}\).

    +

    In pseudo-code one could write the MCMC simulation for a single +walker as:

    +
    current_state = initial_state
    +
    +for i in range(N_steps):
    +    new_state = sample_T(current_state) 
    +    states[i] = current_state
    +

    Where the sample_T function samples directly from the +transition function \(\mathcal{T}\).

    +

    If we run many such walkers in parallel we can then approximate the +distribution \(p_t(S; S)\) which tells +us where the walkers are likely to be after they’ve evolved for \(t\) steps from an initial state \(S_0\). We need to carefully choose \(\mathcal{T}\) such that the the probability +\(p_t(S; S_0)\) approaches the +distribution of interest. In this case the thermal distribution \(P(S; \beta) = \mathcal{Z}^{-1} e^{-\beta +F(S)}\).

    +
    +
    +

    Global and Detailed balance equations

    +

    We cam quite easily write down the properties that \(\mathcal{T}\) must have in order to yield +the correct target distribution. Since we must transition somewhere at +each step, we first have the normalisation condition that \[\sum\limits_S \mathcal{T}(S' \rightarrow S) = +1.\]

    +

    Second, let us move to an ensemble view, where rather than individual +walkers and states, we think about the probability distribution of many +walkers at each step. If we start all the walkers in the same place the +initial distribution will be a delta function and as we step the walkers +will wander around, giving us a sequence of probability distributions +\(\{p_0(S), p_1(S), p_2(S)\ldots\}\). +For discrete spaces we can write the action of the transition function +on \(p_i\) as a matrix equation

    +

    \[\begin{aligned} +p_{i+1}(S) &= \sum_{S' \in \{S\}} p_i(S') \mathcal{T}(S' +\rightarrow S) +\end{aligned}\]

    +

    This equation is essentially just stating that total probability mass +is conserved as our walkers flow around the state space.

    +

    In order that \(p_i\) converges to +our target distribution \(p\) in the +long time limit, we need the target distribution to be a fixed point of +the transition function

    +

    \[\begin{aligned} +P(S) &= \sum_{S'} P(S') \mathcal{T}(S' \rightarrow S) +\end{aligned} +\] Along with some more technical considerations such as +ergodicity which won’t be considered here, global balance suffices to +ensure that a MCMC method is correct  [7].

    +

    A sufficient but not necessary condition for global balance to hold +is called detailed balance:

    +

    \[ +P(S) \mathcal{T}(S \rightarrow S') = P(S') \mathcal{T}(S' +\rightarrow S) +\]

    +

    In practice most algorithms are constructed to satisfy detailed +rather than global balance, though there are arguments that the relaxed +requirements of global balance can lead to faster algorithms  [8].

    +

    The goal of MCMC is then to choose \(\mathcal{T}\) so that it has the desired +thermal distribution \(P(S)\) as its +fixed point and converges quickly onto it. This boils down to requiring +that the matrix representation of \(T_{ij} = +\mathcal{T}(S_i \to S_j)\) has an eigenvector with entries \(P_i = P(S_i)\) with eigenvalue 1 and all +other eigenvalues with magnitude less than one. The convergence time +depends on the magnitude of the second largest eigenvalue.

    +

    The choice of the transition function for MCMC is under-determined as +one only needs to satisfy a set of balance conditions for which there +are many solutions  [7]. The standard choice that satisfies +these requirements is called the Metropolis-Hastings algorithm.

    +
    +
    +

    The Metropolis-Hastings Algorithm

    +

    The Metropolis-Hastings algorithm breaks the transition function into +a proposal distribution \(q(S \to +S')\) and an acceptance function \(\mathcal{A}(S \to S')\). \(q\) must be a function we can directly +sample from, and in many cases takes the form of flipping some number of +spins in \(S\), i.e if we’re flipping a +single random spin in the spin chain, \(q(S +\to S')\) is the uniform distribution on states reachable by +one spin flip from \(S\). This also +gives the symmetry property that \(q(S \to +S') = q(S' \to S)\).

    +

    The proposal \(S'\) is then +accepted or rejected with an acceptance probability \(\mathcal{A}(S \to S')\), if the +proposal is rejected then \(S_{i+1} = +S_{i}\). Hence:

    +

    \[\mathcal{T}(S\to S') = q(S\to +S')\mathcal{A}(S \to S')\]

    +

    The Metropolis-Hasting algorithm is a slight extension of the +original Metropolis algorithm which allows for non-symmetric proposal +distributions $q(SS’) q(S’S) $. It can be derived starting from detailed +balance  [6]:

    +

    \[ +P(S)\mathcal{T}(S \to S') = P(S')\mathcal{T}(S' \to S) +\]

    +

    inserting the proposal and acceptance function

    +

    \[ +P(S)q(S \to S')\mathcal{A}(S \to S') = P(S')q(S' \to +S)\mathcal{A}(S' \to S) +\]

    +

    rearranging gives us a condition on the acceptance function in terms +of the target distribution and the proposal distribution which can be +thought of as inputs to the algorithm

    +

    \[ +\frac{\mathcal{A}(S \to S')}{\mathcal{A}(S' \to S)} = +\frac{P(S')q(S' \to S)}{P(S)q(S \to S')} = f(S, S') +\]

    +

    The Metropolis-Hastings algorithm is the choice

    +

    \[ +\begin{aligned} +\label{eq:mh} +\mathcal{A}(S \to S') = \min\left(1, f(S,S')\right) +\end{aligned} +\] for the acceptance function. The proposal distribution is left +as a free choice.

    +

    Noting that \(f(S,S') = +1/f(S',S)\), we can see that the MH algorithm satifies +detailed balance by considering the two cases \(f(S,S') > 1\) and \(f(S,S') < 1\).

    +

    By choosing the proposal distribution such that \(f(S,S')\) is as close as possible to +one, the rate of rejections can be reduced and the algorithm sped up. +This can be challenging though, as getting \(f(S,S')\) close to 1 would imply that +we can directly sample from a distribution very close to the target +distribution. As MCMC is usually applied to problems for which there is +virtually no hope of sampling directly from the target distribution, +it’s rare that one can do so approximately.

    +

    When the proposal distribution is symmetric as ours is, it cancels +out in the expression for the acceptance function and the +Metropolis-Hastings algorithm is simply the choice

    +

    \[\mathcal{A}(S \to S') = \min\left(1, +e^{-\beta\;\Delta F}\right)\]

    +

    where \(F\) is the overall free +energy of the system, including both the quantum and classical +sector.

    +
    +
    +

    Implementation of the MH Algorithm

    +

    To implement the acceptance function in practice we pick a random +number in the unit interval and accept if it is less than \(e^{-\beta\;\Delta F}\):

    +
    current_state = initial_state
    +
    +for i in range(N_steps):
    +    new_state = proposal(current_state)
    +    df = free_energy_change(current_state, new_state, parameters)
    +
    +    if uniform(0,1) < exp(-beta * df):
    +        current_state = new_state
    +        
    +    states[i] = current_state
    +
    +
    +

    The Metropolis-Hasting Algorithm

    +
    +
    +

    Metropolis-Hastings

    +

    In order to actually choose new states according to \(\mathcal{T}\) one chooses states from a +proposal distribution \(q(S_i \to +S')\) that can be directly sampled from. For instance, this +might mean flipping a single random spin in a spin chain, in which case +\(q(x_i\to x_i)\) is the uniform +distribution on states reachable by one spin flip from \(x_i\). The proposal \(S'\) is then accepted or rejected with +an acceptance probability \(\mathcal{A}(x_i\to +x_{i+1})\), if the proposal is rejected then \(x_{i+1} = x_{i}\). Now \(\mathcal{T}(S\to S') = q(S\to +S')\mathcal{A}(S \to S')\).

    +
    +

    The Metropolis-Hastings Algorithm

    +

    MH breaks up the transition function into a proposal distribution +\(q(S \to S')\) and an acceptance +function \(\mathcal{A}(S \to S')\). +\(q\) needs to be something that we can +directly sample from, and in our case generally takes the form of +flipping some number of spins in \(S\), +i.e if we’re flipping a single random spin in the spin chain, \(q(S \to S')\) is the uniform +distribution on states reachable by one spin flip from \(S\). This also gives the nice symmetry +property that \(q(S \to S') = q(S' \to +S)\).

    +

    The proposal \(S'\) is then +accepted or rejected with an acceptance probability \(\mathcal{A}(S \to S')\), if the +proposal is rejected then \(S_{i+1} = +S_{i}\). Hence:

    +

    \[\mathcal{T}(S\to S') = q(S\to +S')\mathcal{A}(S \to S')\]

    +

    When the proposal distribution is symmetric as ours is, it cancels +out in the expression for the acceptance function and the +Metropolis-Hastings algorithm is simply the choice: \[ \mathcal{A}(S \to S') = \min\left(1, +e^{-\beta\;\Delta F}\right)\] Where \(F\) is the overall free energy of the +system, including both the quantum and classical sector.

    +

    To implement the acceptance function in practice we pick a random +number in the unit interval and accept if it is less than \(e^{-\beta\;\Delta F}\):

    +
    current_state = initial_state
    +
    +for i in range(N_steps):
    +    new_state = proposal(current_state)
    +    df = free_energy_change(current_state, new_state, parameters)
    +
    +    if uniform(0,1) < exp(-beta * df):
    +        current_state = new_state
    +        
    +    states[i] = current_state
    +

    This has the effect of always accepting proposed states that are +lower in energy and sometimes accepting those that are higher in energy +than the current state.

    +
    +
    +
    +

    Two Step Trick

    +

    Our method already relies heavily on the split between the classical +and quantum sector to derive a sign problem free MCMC algorithm but it +turns out that there is a further trick we can play with it. The free +energy term is the sum of an easy to compute classical energy and a more +expensive quantum free energy, we can split the acceptance function into +two in such as way as to avoid having to compute the full exact +diagonalisation some of the time:

    +
    current_state = initial_state
    +
    +for i in range(N_steps):
    +    new_state = proposal(current_state)
    +
    +    df_classical = classical_free_energy_change(current_state, new_state, parameters)
    +    if exp(-beta * df_classical) < uniform(0,1):
    +        f_quantum = quantum_free_energy(current_state, new_state, parameters)
    +    
    +        if exp(- beta * df_quantum) < uniform(0,1):
    +          current_state = new_state
    +    
    +        states[i] = current_state
    +    
    +
    +
    +

    Detailed Balance for the two step method

    +

    Given a MCMC algorithm with target distribution \(\pi(a)\) and transition function \(\mathcal{T}\) the detailed balance +condition is sufficient (along with some technical constraints  [5]) to guarantee that in the long time +limit the algorithm produces samples from \(\pi\). \[\pi(a)\mathcal{T}(a \to b) = \pi(b)\mathcal{T}(b +\to a)\]

    +

    In pseudo-code, our two step method corresponds to two nested +comparisons with the majority of the work only occurring if the first +test passes:

    +
    current_state = initial_state
    +
    +for i in range(N_steps):
    +  new_state = proposal(current_state)
    +
    +  c_dE = classical_energy_change(
    +                               current_state,
    +                               new_state)
    +  if uniform(0,1) < exp(-beta * c_dE):
    +    q_dF = quantum_free_energy_change(
    +                                current_state,
    +                                new_state)
    +    if uniform(0,1) < exp(- beta * q_dF):
    +      current_state = new_state
    +
    +    states[i] = current_state
    +

    Defining \(r_c = e^{-\beta H_c}\) +and \(r_q = e^{-\beta F_q}\) our target +distribution is \(\pi(a) = r_c r_q\). +This method has \(\mathcal{T}(a\to b) = q(a\to +b)\mathcal{A}(a \to b)\) with symmetric \(p(a \to b) = \pi(b \to a)\) and \(\mathcal{A} = \min\left(1, r_c\right) \min\left(1, +r_q\right)\)

    +

    Substituting this into the detailed balance equation gives: \[\mathcal{T}(a \to b)/\mathcal{T}(b \to a) = +\pi(b)/\pi(a) = r_c r_q\]

    +

    Taking the LHS and substituting in our transition function: \[\begin{aligned} +\mathcal{T}(a \to b)/\mathcal{T}(b \to a) = \frac{\min\left(1, +r_c\right) \min\left(1, r_q\right)}{ \min\left(1, 1/r_c\right) +\min\left(1, 1/r_q\right)}\end{aligned}\]

    +

    which simplifies to \(r_c r_q\) as +\(\min(1,r)/\min(1,1/r) = r\) for \(r > 0\).

    +
    +

    Two Step Trick

    +

    Our method already relies heavily on the split between the classical +and quantum sector to derive a sign problem free MCMC algorithm but it +turns out that there is a further trick we can play with it. The free +energy term is the sum of an easy to compute classical energy and a more +expensive quantum free energy, we can split the acceptance function into +two in such as way as to avoid having to compute the full exact +diagonalisation some of the time:

    +
    
    +current_state = initial_state
    +
    +for i in range(N_steps):
    +    new_state = proposal(current_state)
    +
    +    df_classical = classical_free_energy_change(current_state, new_state, parameters)
    +    if exp(-beta * df_classical) < uniform(0,1):
    +        f_quantum = quantum_free_energy(current_state, new_state, parameters)
    +    
    +        if exp(- beta * df_quantum) < uniform(0,1):
    +          current_state = new_state
    +    
    +        states[i] = current_state
    +    
    +
    +
    +

    Auto-correlation Time

    +
    +
    + + +
    +
    +

    At this stage one might think we’re done. We can indeed draw +independent samples from \(P(S; +\beta)\) by starting from some arbitrary initial state and doing +\(k\) steps to arrive at a sample. +However a key insight is that after the convergence time, every state +generated is a sample from \(P(S; +\beta)\)! They are not, however, independent samples. In Fig. ?? +it is already clear that the samples of the order parameter m have some +auto-correlation because only a few spins are flipped each step but even +when the number of spins flipped per step is increased, Fig. +autocorrelation shows that it can be an important effect near the phase +transition. Let’s define the auto-correlation time \(\tau(O)\) informally as the number of MCMC +samples of some observable O that are statistically equal to one +independent sample or equivalently as the number of MCMC steps after +which the samples are correlated below some cut-off, see  [9] for a more rigorous definition +involving a sum over the auto-correlation function. The auto-correlation +time is generally shorter than the convergence time so it therefore +makes sense from an efficiency standpoint to run a single walker for +many MCMC steps rather than to run a huge ensemble for \(k\) steps each.

    +

    Once the random walk has been carried out for many steps, the +expectation values of \(O\) can be +estimated from the MCMC samples \(S_i\): \[ + \tex{O} = \sum_{i = 0}^{N} O(S_i) + \mathcal{O}(\frac{1}{\sqrt{N}}) +\]

    +

    The the samples are correlated so the N of them effectively contains +less information than \(N\) independent +samples would, in fact roughly \(N/\tau\) effective samples. As a +consequence the variance is larger than the \(\qex{O^2} - \qex{O}^2\) form it would have +if the estimates were uncorrelated. There are many methods in the +literature for estimating the true variance of \(\qex{O}\) and deciding how many steps are +needed but my approach has been to run a small number of parallel +chains, which are independent, in order to estimate the statistical +error produced. This is a slightly less computationally efficient +because it requires throwing away those \(k\) steps generated before convergence +multiple times but it is a conceptually simple workaround.

    +

    In summary, to do efficient simulations we want to reduce both the +convergence time and the auto-correlation time as much as possible. In +order to explain how, we need to introduce the Metropolis-Hasting (MH) +algorithm and how it gives an explicit form for the transition +function.

    +
    +
    +

    Tuning the proposal distribution

    +
    +
    + + +
    +
    +

    Now we can discuss how to minimise the auto-correlations. The general +principle is that one must balance the proposal distribution between two +extremes. Choose overlay small steps, like flipping only a single spin +and the acceptance rate will be high because \(\Delta F\) will usually be small, but each +state will be very similar to the previous and the auto-correlations +will be high too, making sampling inefficient. On the other hand, +overlay large steps, like randomising a large portion of the spins each +step, will result in very frequent rejections, especially at low +temperatures.

    +

    I evaluated a few different proposal distributions for use with the +FK model.

    +
      +
    1. Flipping a single random site
    2. +
    3. Flipping N random sites for some N
    4. +
    5. Choosing n from Uniform(1, N) and then flipping n sites for some +fixed N.
    6. +
    7. Attempting to tune the proposal distribution for each parameter +regime.
    8. +
    +

    Fro Figure~\(\ref{fig:comparison}\) +we see that even at moderately high temperatures \(T > T_c\) flipping one or two sites is +the best choice. However for some simulations at very high temperature +flipping more spins is warranted. Tuning the proposal distribution +automatically seems like something that would not yield enough benefit +for the additional complexity it would require.

    +
    +
    +
    +

    Proposal Distributions

    +

    In a MCMC method a key property is the proportion of the time that +proposals are accepted, the acceptance rate. If this rate is too low the +random walk is trying to take overly large steps in energy space which +problematic because it means very few new samples will be generated. If +it is too high it implies the steps are too small, a problem because +then the walk will take longer to explore the state space and the +samples will be highly correlated. Ideal values for the acceptance rate +can be calculated under certain assumptions  [10]. Here we monitor the acceptance rate +and if it is too high we re-run the MCMC with a modified proposal +distribution that has a chance to propose moves that flip multiple sites +at a time.

    +

    In addition we exploit the particle-hole symmetry of the problem by +occasionally proposing a flip of the entire state. This works because +near half-filling, flipping the occupations of all the sites will +produce a state at or near the energy of the current one.

    +
    +
    +

    Choosing the proposal distribution

    +

    Simulations +showing how the autocorrelation of the order parameter depends on the +proposal distribution used at different temperatures, we see that at +\(T = 1.5 < T_c\) a single spin flip +is likely the best choice, while at the high temperature \(T = 2.5 > T_c\) flipping two sites or a +mixture of flipping two and 1 sites is likely a better choice.

    +

    Now we can discuss how to minimise the auto-correlations. The general +principle is that one must balance the proposal distribution between two +extremes. Choose overlay small steps, like flipping only a single spin +and the acceptance rate will be high because \(\Delta F\) will usually be small, but each +state will be very similar to the previous and the auto-correlations +will be high too, making sampling inefficient. On the other hand, +overlay large steps, like randomising a large portion of the spins each +step, will result in very frequent rejections, especially at low +temperatures.

    +

    I evaluated a few different proposal distributions for use with the +FK model.

    +
      +
    1. Flipping a single random site

    2. +
    3. Flipping N random sites for some N

    4. +
    5. Choosing n from Uniform(1, N) and then flipping n sites for some +fixed N.

    6. +
    7. Attempting to tune the proposal distribution for each parameter +regime.

    8. +
    +

    Fro Figure 4 we see that even at moderately +high temperatures \(T > T_c\) +flipping one or two sites is the best choice. However for some +simulations at very high temperature flipping more spins is warranted. +Tuning the proposal distribution automatically seems like something that +would not yield enough benefit for the additional complexity it would +require.

    +
    +
    +

    Perturbation MCMC

    +

    The matrix diagonalisation is the most computationally expensive step +of the process, a speed up can be obtained by modifying the proposal +distribution to depend on the classical part of the energy, a trick +gleaned from Ref.  [6]: \[ +\begin{aligned} +q(k \to k') &= \min\left(1, e^{\beta (H^{k'} - H^k)}\right) +\\ +\mathcal{A}(k \to k') &= \min\left(1, e^{\beta(F^{k'}- +F^k)}\right) +\end{aligned}\] % This allows the method to reject some states +without performing the diagonalisation at no cost to the accuracy of the +MCMC method.

    +

    An extension of this idea is to try to define a classical model with +a similar free energy dependence on the classical state as the full +quantum, Ref.  [11] does this with restricted Boltzmann +machines whose form is very similar to a classical spin model.

    +
    +

    Convergence Time

    +

    Considering \(p(S)\) as a vector +\(\vec{p}\) whose jth entry is the +probability of the jth state \(p_j = +p(S_j)\), and writing \(\mathcal{T}\) as the matrix with entries +\(T_{ij} = \mathcal{T}(S_j \rightarrow +S_i)\) we can write the update rule for the ensemble probability +as: \[\vec{p}_{t+1} = \mathcal{T} \vec{p}_t +\implies \vec{p}_{t} = \mathcal{T}^t \vec{p}_0\] where \(\vec{p}_0\) is vector which is one on the +starting state and zero everywhere else. Since all states must +transition to somewhere with probability one: \(\sum_i T_{ij} = 1\).

    +

    Matrices that satisfy this are called stochastic matrices exactly +because they model these kinds of Markov processes. It can be shown that +they have real eigenvalues, and ordering them by magnitude, that \(\lambda_0 = 1\) and \(0 < \lambda_{i\neq0} < 1\). +%https://en.wikipedia.org/wiki/Stochastic_matrix

    +

    Assuming \(\mathcal{T}\) has been +chosen correctly, its single eigenvector with eigenvalue 1 will be the +thermal distribution so repeated application of the transition function +eventually leads there, while memory of the initial conditions decays +exponentially with a convergence time \(k\) determined by \(\lambda_1\). In practice this means that +one throws away the data from the beginning of the random walk in order +reduce the dependence on the initial conditions and be close enough to +the target distribution.

    +
    +
    +
    +
    +

    [app:balance] +DETAILED BALANCE

    +

    Given a MCMC algorithm with target +distribution \(\pi(a)\) and transition +function \(\T\) the detailed balance +condition is sufficient (along with some technical constraints  [5]) to guarantee that in the long time +limit the algorithm produces samples from \(\pi\). \[\pi(a)\T(a \to b) = \pi(b)\T(b \to +a)\]

    +

    In pseudo-code, our two step method corresponds to two nested +comparisons with the majority of the work only occurring if the first +test passes:

    +
    current_state = initial_state
    +
    +for i in range(N_steps):
    +  new_state = proposal(current_state)
    +
    +  c_dE = classical_energy_change(
    +                               current_state,
    +                               new_state)
    +  if uniform(0,1) < exp(-beta * c_dE):
    +    q_dF = quantum_free_energy_change(
    +                                current_state,
    +                                new_state)
    +    if uniform(0,1) < exp(- beta * q_dF):
    +      current_state = new_state
    +
    +    states[i] = current_state
    +

    Defining \(r_c = e^{-\beta H_c}\) +and \(r_q = e^{-\beta F_q}\) our target +distribution is \(\pi(a) = r_c r_q\). +This method has \(\T(a\to b) = q(a\to b)\A(a +\to b)\) with symmetric \(p(a \to b) = +\p(b \to a)\) and \(\A = \min\left(1, +r_c\right) \min\left(1, r_q\right)\)

    +

    Substituting this into the detailed balance equation gives: \[\T(a \to b)/\T(b \to a) = \pi(b)/\pi(a) = r_c +r_q\]

    +

    Taking the LHS and substituting in our transition function: \[\begin{aligned} +\T(a \to b)/\T(b \to a) = \frac{\min\left(1, r_c\right) \min\left(1, +r_q\right)}{ \min\left(1, 1/r_c\right) \min\left(1, +1/r_q\right)}\end{aligned}\]

    +

    which simplifies to \(r_c r_q\) as +\(\min(1,r)/\min(1,1/r) = r\) for \(r > 0\).

    +
    +
    +
    +

    Bibliography

    +
    +
    +
    [1]
    L. +Devroye, Random +Sampling, in Non-Uniform Random Variate Generation, +edited by L. Devroye (Springer, New York, NY, 1986), pp. 611–641.
    +
    +
    +
    [2]
    O. +A. Martin, R. Kumar, and J. Lao, Bayesian Modeling and Computation +in Python (Boca Raton, 2021).
    +
    +
    +
    [3]
    K. +Binder and D. W. Heermann, Guide to Practical +Work with the Monte Carlo Method, in Monte Carlo Simulation +in Statistical Physics: An Introduction, edited by K. Binder and D. +W. Heermann (Springer Berlin Heidelberg, Berlin, Heidelberg, 1988), pp. +68–112.
    +
    +
    +
    [4]
    J. +Kertesz and I. Kondor, editors, Advances in Computer +Simulation: Lectures Held at the Eötvös Summer School in Budapest, +Hungary, 16–20 July 1996 (Springer-Verlag, Berlin Heidelberg, +1998).
    +
    +
    +
    [5]
    U. +Wolff, Monte +Carlo Errors with Less Errors, Computer Physics Communications +156, 143 (2004).
    +
    +
    +
    [6]
    W. +Krauth, Introduction To +Monte Carlo Algorithms, in Advances in Computer Simulation: +Lectures Held at the Eötvös Summer School in Budapest, Hungary, 16–20 +July 1996 (Springer-Verlag, Berlin Heidelberg, 1998).
    +
    +
    +
    [7]
    F. +P. Kelly, Reversibility +and Stochastic Networks / F.P. Kelly, SERBIULA (Sistema Librum +2.0) 76, (1981).
    +
    +
    +
    [8]
    S. +C. Kapfer and W. Krauth, Sampling from a +Polytope and Hard-Disk Monte Carlo, J. Phys.: Conf. Ser. +454, 012031 (2013).
    +
    +
    +
    [9]
    W. +Krauth, Introduction +To Monte Carlo Algorithms, arXiv:cond-Mat/9612186 (1996).
    +
    +
    +
    [10]
    G. +O. Roberts, A. Gelman, and W. R. Gilks, Weak Convergence and +Optimal Scaling of Random Walk Metropolis Algorithms, Ann. +Appl. Probab. 7, 110 (1997).
    +
    +
    +
    [11]
    L. +Huang and L. Wang, Accelerated Monte +Carlo Simulations with Restricted Boltzmann Machines, Phys. +Rev. B 95, 035105 (2017).
    +
    +
    diff --git a/_thesis/6_Appendices/A.3_Lattice_Generation.html b/_thesis/6_Appendices/A.3_Lattice_Generation.html index 704c3aa..e89d427 100644 --- a/_thesis/6_Appendices/A.3_Lattice_Generation.html +++ b/_thesis/6_Appendices/A.3_Lattice_Generation.html @@ -57,7 +57,7 @@ Generation

    Next Section: Lattice +href="../6_Appendices/A.4_Lattice_Colouring.html">Lattice Colouring

    diff --git a/_thesis/toc.html b/_thesis/toc.html index f6e68a8..238d775 100644 --- a/_thesis/toc.html +++ b/_thesis/toc.html @@ -1,35 +1,43 @@