diff --git a/_sass/thesis.scss b/_sass/thesis.scss index 5fc5322..7fcd559 100644 --- a/_sass/thesis.scss +++ b/_sass/thesis.scss @@ -88,5 +88,11 @@ nav.overall-table-of-contents > ul { // Page header div#page-header { + //make the header sticky, I don't really like how this looks but it's fun to play with + // position: sticky; + // top: 0px; + // background: white; + // z-index: 10; + // width: 100%; p { margin-block-end: 0px;} } \ No newline at end of file diff --git a/_thesis/1_Introduction/1_Intro.html b/_thesis/1_Introduction/1_Intro.html index a6efefb..2f5a923 100644 --- a/_thesis/1_Introduction/1_Intro.html +++ b/_thesis/1_Introduction/1_Intro.html @@ -595,8 +595,7 @@ to the Falikov-Kimball Model, the Kitaev Honeycomb Model, disorder and localisation. Then Chapter 3 introduces and studies the Long Range Falikov-Kimball Model in one dimension while Chapter 4 focusses on the Amorphous Kitaev Model.
-Next Chapter: 2
+ Next Chapter: 2
Background
Next Section: The -Kitaev Honeycomb Model
+Next Section: The Kitaev +Honeycomb Model
Next Section: Disorder
+ Next Section: Disorder
and Localisation
-link to the Kitaev Model
-link to the physics of amorphous systems
Next Chapter: 3 -The Long Range Falikov-Kimball Model
+href="../3_Long_Range_Falikov_Kimball/3.1_LRFK_Model.html">3 The Long +Range Falikov-Kimball ModelNext Section: Methods
+href="../3_Long_Range_Falikov_Kimball/3.2_LRFK_Methods.html">Methods3 The Long Range Falikov-Kimball Model
+The results for the phase diagram were obtained with a classical +Markov Chain Monte Carlo (MCMC) method which we discuss in the +following. It allows us to solve our long-range FK model efficiently, +yielding unbiased estimates of thermal expectation values and linking it +to disorder physics in a translationally invariant setting.
+Since the spin configurations are classical, the Hamiltonian can be +split into a classical spin part \(H_s\) and an operator valued part \(H_c\).
+\[\begin{aligned} +H_s& = - \frac{U}{2}S_i + \sum_{i, j}^{N} J_{ij} S_i S_j \\ +H_c& = \sum_i U S_i c^\dagger_{i}c_{i} -t(c^\dagger_{i}c_{i+1} + +c^\dagger_{i+1}c_{i}) \end{aligned}\]
+The partition function can then be written as a sum over spin +configurations, \(\vec{S} = (S_0, +S_1...S_{N-1})\):
+\[\begin{aligned} +\mathcal{Z} = \mathrm{Tr} e^{-\beta H}= \sum_{\vec{S}} e^{-\beta H_s} +\mathrm{Tr}_c e^{-\beta H_c} .\end{aligned}\]
+The contribution of \(H_c\) to the +grand canonical partition function can be obtained by performing the sum +over eigenstate occupation numbers giving \(-\beta F_c[\vec{S}] = \sum_k \ln{(1 + e^{- \beta +\epsilon_k})}\) where \({\epsilon_k[\vec{S}]}\) are the eigenvalues +of the matrix representation of \(H_c\) +determined through exact diagonalisation. This gives a partition +function containing a classical energy which corresponds to the +long-range interaction of the spins, and a free energy which corresponds +to the quantum subsystem. \[\begin{aligned} +\mathcal{Z} = \sum_{\vec{S}} e^{-\beta H_S[\vec{S}] - \beta +F_c[\vec{S}]} = \sum_{\vec{S}} e^{-\beta +E[\vec{S}]}\end{aligned}\]
Markov Chain Monte Carlo (MCMC) is a useful method whenever we have a -probability distribution that we want to sample from but there is not -direct sampling way to do so.
-In almost any computer simulation the ultimate source of randomness -is a stream of (close to) uniform, uncorrelated bits generated from a -pseudo random number generator. A direct sampling method takes such a -source and outputs uncorrelated samples from the target distribution. -The fact they’re uncorrelated is key as we’ll see later. Examples of -direct sampling methods range from the trivial: take n random bits to -generate integers uniformly between 0 and \(2^n\) to more complex methods such as -inverse transform sampling and rejection sampling [1].
-In physics the distribution we usually want to sample from is the -Boltzmann probability over states of the system \(S\): \[ -\begin{aligned} -p(S) &= \frac{1}{\mathcal{Z}} e^{-\beta H(S)} \\ -\end{aligned} -\] where \(\mathcal{Z} = \sum_S -e^{-\beta H(S)}\) is the normalisation factor and ubiquitous -partition function. In principle we could directly sample from this, for -a discrete system there are finitely many choices. We could calculate -the probability of each one and assign each a region of the unit -interval which we could then sample uniformly from.
-However if we actually try to do this we will run into two problems, -we can’t calculate \(\mathcal{Z}\) for -any reasonably sized systems because the state space grows exponentially -with system size. Even if we could calculate \(\mathcal{Z}\), sampling from an -exponentially large number of options quickly become tricky. This kind -of problem happens in many other disciplines too, particularly when -fitting statistical models using Bayesian inference [2].
-So what can we do? Well it turns out that if we’re willing to give up -in the requirement that the samples be uncorrelated then we can use MCMC -instead.
-MCMC defines a weighted random walk over the states \((S_0, S_1, S_2, ...)\), such that in the
-long time limit, states are visited according to their probability \(p(S)\).
+ Classical MCMC defines a weighted random walk over the spin states
+\((\vec{S}_0, \vec{S}_1, \vec{S}_2,
+...)\), such that the likelihood of visiting a particular state
+converges to its Boltzmann probability \(p(\vec{S}) = \mathcal{Z}^{-1} e^{-\beta
+E}\) [3–1–5]. In a physics context this lets us evaluate any observable with a mean
-over the states visited by the walk. 3]. Hence, any observable can be
+estimated as a mean over the states visited by the walk, \[\begin{aligned}
-\langle O \rangle & = \sum_{S} p(S) \langle O \rangle_{S} = \sum_{i
-= 0}^{M} \expval{O}_{S_i} + \mathcal{O}(\tfrac{1}{\sqrt{M}})\\
+\label{eq:thermal_expectation}
+\langle O \rangle & = \sum_{\vec{S}} p(\vec{S}) \langle O
+\rangle_{\vec{S}}\\
+ & = \sum_{i = 0}^{M} \langle O\rangle_{\vec{S}_i}
+\pm \mathcal{O}(\tfrac{1}{\sqrt{M}})
+\end{aligned}\] where the former sum runs over the entire state
+space while the later runs over all the state visited by a particular
+MCMC run. \[\begin{aligned}
+\langle O \rangle_{\vec{S}}& = \sum_{\nu} n_F(\epsilon_{\nu})
+\langle O \rangle{\nu}
\end{aligned}\] Where \(\nu\) runs over the
+eigenstates of \(H_c\) for a particular
+spin configuration and \(n_F(\epsilon) =
+\left(e^{-\beta\epsilon} + 1\right)^{-1}\) is the Fermi
+function. The choice of the transition function for MCMC is under-determined as
one only needs to satisfy a set of balance conditions for which there
are many solutions [6].Markov Chain Monte Carlo and Emergent Disorder
+
In our computations [7] we +employ a modification of the algorithm which is based on the observation +that the free energy of the FK system is composed of a +classical part which is much quicker to compute than the quantum part. +Hence, we can obtain a computational speedup by first considering the +value of the classical energy difference \(\Delta H_s\) and rejecting the transition +if the former is too high. We only compute the quantum energy difference +\(\Delta F_c\) if the transition is +accepted. We then perform a second rejection sampling step based upon +it. This corresponds to two nested comparisons with the majority of the +work only occurring if the first test passes and has the acceptance +function \[\mathcal{A}(a \to b) = +\min\left(1, e^{-\beta \Delta H_s}\right)\min\left(1, e^{-\beta \Delta +F_c}\right)\;.\]
+See Appendix [app:balance] for a proof that this +satisfies the detailed balance condition.
+For the model parameters used in Fig. 1, we find +that with our new scheme the matrix diagonalisation is skipped around +30% of the time at \(T = 2.5\) and up +to 80% at \(T = 1.5\). We observe that +for \(N = 50\), the matrix +diagonalisation, if it occurs, occupies around 60% of the total +computation time for a single step. This rises to 90% at N = 300 and +further increases for larger N. We therefore get the greatest speedup +for large system sizes at low temperature where many prospective +transitions are rejected at the classical stage and the matrix +computation takes up the greatest fraction of the total computation +time. The upshot is that we find a speedup of up to a factor of 10 at +the cost of very little extra algorithmic complexity.
+Our two-step method should be distinguished from the more common +method for speeding up MCMC which is to add asymmetry +to the proposal distribution to make it as similar as possible to \(\min\left(1, e^{-\beta \Delta E}\right)\). +This reduces the number of rejected states, which brings the algorithm +closer in efficiency to a direct sampling method. However it comes at +the expense of requiring a way to directly sample from this complex +distribution, a problem which MCMC was employed to solve in +the first place. For example, recent work trains restricted Boltzmann +machines (RBMs) to generate samples for the proposal distribution of the +FK model [8]. The RBMs are chosen as a +parametrisation of the proposal distribution that can be efficiently +sampled from while offering sufficient flexibility that they can be +adjusted to match the target distribution. Our proposed method is +considerably simpler and does not require training while still reaping +some of the benefits of reduced computation.
Markov Chain Monte Carlo (MCMC) is a technique for evaluating thermal
-expectation values \(\expval{O}\) with
-respect to some physical system defined by a set of states \(\{x: x \in S\}\) and a free energy \(F(x)\) [
+ Here, we incorporate a modification to the standard
+Metropolis-Hastings algorithm [5] gleaned from Krauth [7]. The thermal expectation value is
-defined via a Boltzmann weighted sum over the entire states: \[
-\begin{aligned}
- \expval{O} &= \frac{1}{\mathcal{Z}} \sum_{x \in S} O(x) P(x) \\
- P(x) &= \frac{1}{\mathcal{Z}} e^{-\beta F(x)} \\
- \mathcal{Z} &= \sum_{x \in S} e^{-\beta F(x)}
-\end{aligned}
-\] When the state space is too large to evaluate this sum directly, MCMC
-defines a stochastic algorithm which generates a random walk \(\{x_0\ldots x_i\ldots x_N\}\) whose
-distribution \(p(x_i)\) approaches a
-target distribution \(P(x)\) in the
-large N limit. \[\lim_{i\to\infty} p(x_i) =
-P(x)\] In this case the target distribution will be the thermal one \(P(x) \rightarrow \mathcal{Z}^{-1} e^{-\beta
-F(x)}\). The major benefit of the method being that as long as
-one can express the desired \(P(x)\) up
-to a multiplicative constant, MCMC can be applied. Since \(e^{-\beta F(x)}\) is relatively easy to
-evaluate, MCMC provides a useful method for finite temperature
-physics. Once the random walk has been carried out for many steps, the
-expectation values of \(O\) can be
-estimated from the MCMC samples: \[
- \expval{O} = \sum_{i = 0}^{N} O(x_i) +
-\mathcal{O}(\frac{1}{\sqrt{N}})
-\] The the samples in the random walk are correlated so the
-samples effectively contain less information than \(N\) independent samples would. As a
-consequence the variance is larger than the \(\expval{O^2} - \expval{O}^2\) form it would
-have if the estimates were uncorrelated. Methods of estimating the true
-variance of \(\expval{O}\) and decided
-how many steps are needed will be considered later.Two Step Trick
+
Markov chains are defined by a transition function $(x_{i+1} x_i) $ -giving the probability that a chain in state \(x_i\) at time \(i\) will transition to a state \(x_{i+1}\). Since we must transition -somewhere at each step, this comes with the normalisation condition that -\(\sum\limits_x \mathcal{T}(x' \rightarrow -x) = 1\).
-If we define an ensemble of Markov chains and consider the -distributions we get a sequence of distributions \(\{p_0(x), p_1(x), p_2(x)\ldots\}\) with -\[\begin{aligned} -p_{i+1}(x) &= \sum_{x' \in S} p_i(x') \mathcal{T}(x' -\rightarrow x) -\end{aligned}\] \(p_o(x)\) might -be a delta function on one particular starting state which would then be -smoothed out by the transition function repeatedly.
-As we’d like to draw samples from the target distribution \(P(x)\) the trick is to choose $(x_{i+1} -x_i) $ such that :
-\[\begin{aligned} -P(x) &= \sum_{x'} P(x') \mathcal{T}(x' \rightarrow x) -\end{aligned} -\] In other words the MCMC dynamics defined by \(\mathcal{T}\) must be constructed to have -the target distribution as their only fixed point. This condition is -called the global balance condition. Along with some more technical -considerations such as ergodicity which won’t be considered here, global -balance suffices to ensure that a MCMC method is correct.
-A sufficient but not necessary condition for global balance to hold -is detailed balance:
-\[ -P(x) \mathcal{T}(x \rightarrow x') = P(x') \mathcal{T}(x' -\rightarrow x) -\] % In practice most algorithms are constructed to satisfy -detailed balance though there are arguments that relaxing the condition -can lead to faster algorithms [8].
-The goal of MCMC is then to choose \(\mathcal{T}\) so that it has the desired -thermal distribution \(P(x)\) as its -fixed point and that it converges quickly onto it. This boils down to -requiring that the matrix representation of \(T_{ij} = \mathcal{T}(x_i \to x_j)\) has an -eigenvector equal to \(P_i = P(x_i)\) -with eigenvalue 1 and all other eigenvalues with magnitude less than -one. The convergence time depends on the magnitude of the second largest -eigenvalue.
-In order to actually choose new states according to \(\mathcal{T}\) one chooses states from a -proposal distribution \(q(x_i \to -x')\) that can be directly sampled from. For instance, this -might mean flipping a single random spin in a spin chain, in which case -\(q(x'\to x_i)\) is the uniform -distribution on states reachable by one spin flip from \(x_i\). The proposal \(x'\) is then accepted or rejected with -an acceptance probability \(\mathcal{A}(x'\to x_{i+1})\), if the -proposal is rejected then \(x_{i+1} = -x_{i}\). Now \(\mathcal{T}(x\to x') -= q(x\to x')\mathcal{A}(x \to x')\).
-The Metropolis-Hasting algorithm is a slight extension of the -original Metropolis algorithm that allows for non-symmetric proposal -distributions $q(xx’) q(x’x) $. It can be derived starting from detailed -balance [7]: \[\begin{aligned} -P(x)\mathcal{T}(x \to x') &= P(x')\mathcal{T}(x' \to x) -\\ -P(x)q(x \to x')\mathcal{A}(x \to x') &= P(x')q(x' -\to x)\mathcal{A}(x' \to x) \\ -\label{eq:db2} \frac{\mathcal{A}(x \to x')}{\mathcal{A}(x' \to -x)} &= \frac{P(x')q(x' \to x)}{P(x)q(x \to x')} = f(x, -x')\\ -\end{aligned} -\] % The Metropolis-Hastings algorithm is the choice: \[ -\begin{aligned} -\label{eq:mh} -\mathcal{A}(x \to x') = \min\left(1, f(x,x')\right) -\end{aligned} -\] % Noting that \(f(x,x') = -1/f(x',x)\), Eq. \(\ref{eq:mh}\) can be seen to satisfy Eq. -\(\ref{eq:db2}\) by considering the two -cases \(f(x,x') > 1\) and \(f(x,x') < 1\).
-By choosing the proposal distribution such that \(f(x,x')\) is as close as possible to -one, the rate of rejections can be reduced and the algorithm sped -up.
-%Thinning, burn in, multiple runs
-In a MCMC method a key property is the proportion of the time that -proposals are accepted, the acceptance rate. If this rate is too low the -random walk is trying to take overly large steps in energy space which -problematic because it means very few new samples will be generated. If -it is too high it implies the steps are too small, a problem because -then the walk will take longer to explore the state space and the -samples will be highly correlated. Ideal values for the acceptance rate -can be calculated under certain assumptions [9]. Here we monitor the acceptance rate -and if it is too high we re-run the MCMC with a modified proposal -distribution that has a chance to propose moves that flip multiple sites -at a time.
-In addition we exploit the particle-hole symmetry of the problem by -occasionally proposing a flip of the entire state. This works because -near half-filling, flipping the occupations of all the sites will -produce a state at or near the energy of the current one.
-The matrix diagonalisation is the most computationally expensive step -of the process, a speed up can be obtained by modifying the proposal -distribution to depend on the classical part of the energy, a trick -gleaned from Ref. [7]: \[ -\begin{aligned} -q(k \to k') &= \min\left(1, e^{\beta (H^{k'} - H^k)}\right) -\\ -\mathcal{A}(k \to k') &= \min\left(1, e^{\beta(F^{k'}- -F^k)}\right) -\end{aligned}\] % This allows the method to reject some states -without performing the diagonalisation at no cost to the accuracy of the -MCMC method.
-An extension of this idea is to try to define a classical model with -a similar free energy dependence on the classical state as the full -quantum, Ref. 6].
+In our computations [7] we +employ a modification of the algorithm which is based on the observation +that the free energy of the FK system is composed of a classical part +which is much quicker to compute than the quantum part. Hence, we can +obtain a computational speedup by first considering the value of the +classical energy difference \(\Delta +H_s\) and rejecting the transition if the former is too high. We +only compute the quantum energy difference \(\Delta F_c\) if the transition is accepted. +We then perform a second rejection sampling step based upon it. This +corresponds to two nested comparisons with the majority of the work only +occurring if the first test passes and has the acceptance function \[\mathcal{A}(a \to b) = \min\left(1, e^{-\beta +\Delta H_s}\right)\min\left(1, e^{-\beta \Delta +F_c}\right)\;.\]
+For the model parameters \(U=2/5, T = 1.5 / +2.5, J = 5,\;\alpha = 1.25\), we find that with our new scheme +the matrix diagonalisation is skipped around 30% of the time at \(T = 2.5\) and up to 80% at \(T = 1.5\). We observe that for \(N = 50\), the matrix diagonalisation, if it +occurs, occupies around 60% of the total computation time for a single +step. This rises to 90% at N = 300 and further increases for larger N. +We therefore get the greatest speedup for large system sizes at low +temperature where many prospective transitions are rejected at the +classical stage and the matrix computation takes up the greatest +fraction of the total computation time. The upshot is that we find a +speedup of up to a factor of 10 at the cost of very little extra +algorithmic complexity.
+Our two-step method should be distinguished from the more common +method for speeding up MCMC which is to add asymmetry to the proposal +distribution to make it as similar as possible to \(\min\left(1, e^{-\beta \Delta E}\right)\). +This reduces the number of rejected states, which brings the algorithm +closer in efficiency to a direct sampling method. However it comes at +the expense of requiring a way to directly sample from this complex +distribution, a problem which MCMC was employed to solve in the first +place. For example, recent work trains restricted Boltzmann machines +(RBMs) to generate samples for the proposal distribution of the FK +model [10] does this with restricted Boltzmann -machines whose form is very similar to a classical spin model.
+role="doc-biblioref">8]. The RBMs are chosen as a +parametrisation of the proposal distribution that can be efficiently +sampled from while offering sufficient flexibility that they can be +adjusted to match the target distribution. Our proposed method is +considerably simpler and does not require training while still reaping +some of the benefits of reduced computation.The main paper relies on extensively to evaluate thermal expectation -values within the model by walking over states of the classical spin -system \(S_i\). For a classical system, -the thermal expectation value of some operator \(O\) is defined by a Boltzmann weighted sum -over the classical state space: \[\being{aligned} - \tex{O} &= \frac{1}{\mathcal{Z}} \sum_{\s \in S} O(x) P(x) \\ - P(x) &= \frac{1}{\mathcal{Z}} e^{-\beta F(x)} \\ - \mathcal{Z} &= \sum_{\s \in S} e^{-\beta F(x)} -\end{aligned}\] While for a quantum system these sums are -replaced by equivalent traces. The obvious approach to evaluate these -sums numerically would be to directly loop over all the classical states -in the system and perform the sum. But we all know know why this isn’t -feasible: the state space is too large! Indeed even if we could do it, -it would still be computationally wasteful since at low temperatures the -sums are dominated by low energy excitations about the ground states of -the system. Even worse, in our case we must fully solve the fermionic -system via exact diagonalisation for each classical state in the sum, a -very expensive operation!~\footnote{The effort involved in exact -diagonalisation scales like \(N^2\) for -systems with a tri-diagonal matrix representation (open boundary -conditions and nearest neighbour hopping) and like \(N^3\) for a generic matrix [12,13].
-c
-MCMC sidesteps these issues by defining a random walk that focuses on -the states with the greatest Boltzmann weight. At low temperatures this -means we need only visit a few low energy states to make good estimates -while at high temperatures the weights become uniform so a small number -of samples distributed across the state space suffice. However we will -see that the method is not without difficulties of its own.
-%MCMC from an ensemble point of view In implementation can be boiled -down to choosing a transition function $(_{t} _t+1) $ where \(\s\) are vectors representing classical -spin configurations. We start in some initial state \(\s_0\) and then repeatedly jump to new -states according to the probabilities given by \(\mathcal{T}\). This defines a set of random -walks \(\{\s_0\ldots \s_i\ldots -\s_N\}\). Fig.~\(\ref{fig:single}\) shows this in practice: -we have a (rather small) ensemble of \(M = -2\) walkers starting at the same point in state space and then -spreading outwards by flipping spins along the way.
-In pseudo-code one could write the MCMC simulation for a single -walker as:
-= initial_state
- current_state
-for i in range(N_steps):
-= sample_T(current_state)
- new_state = current_state states[i]
Where the sample_T
function here produces a state with
-probability determined by the current_state
and the
-transition function \(\mathcal{T}\).
If we ran many such walkers in parallel we could then approximate the -distribution \(p_t(\s; \s_0)\) which -tells us where the walkers are likely to be after they’ve evolved for -\(t\) steps from an initial state \(\s_0\). We need to carefully choose \(\mathcal{T}\) such that after a large -number of steps \(k\) (the convergence -time) the probability \(p_t(\s;\s_0)\) -approaches the thermal distribution \(P(\s; -\beta) = \mathcal{Z}^{-1} e^{-\beta F(\s)}\). This turns out to -be quite easy to achieve using the Metropolis-Hasting algorithm.
-Considering \(p(\s)\) as a vector -\(\vec{p}\) whose jth entry is the -probability of the jth state \(p_j = -p(\s_j)\), and writing \(\mathcal{T}\) as the matrix with entries -\(T_{ij} = \mathcal{T}(\s_j \rightarrow -\s_i)\) we can write the update rule for the ensemble probability -as: \[\vec{p}_{t+1} = \mathcal{T} \vec{p}_t -\implies \vec{p}_{t} = \mathcal{T}^t \vec{p}_0\] where \(\vec{p}_0\) is vector which is one on the -starting state and zero everywhere else. Since all states must -transition to somewhere with probability one: \(\sum_i T_{ij} = 1\).
-Matrices that satisfy this are called stochastic matrices exactly -because they model these kinds of Markov processes. It can be shown that -they have real eigenvalues, and ordering them by magnitude, that \(\lambda_0 = 1\) and \(0 < \lambda_{i\neq0} < 1\). -%https://en.wikipedia.org/wiki/Stochastic_matrix
-Assuming \(\mathcal{T}\) has been -chosen correctly, its single eigenvector with eigenvalue 1 will be the -thermal distribution so repeated application of the transition function -eventually leads there, while memory of the initial conditions decays -exponentially with a convergence time \(k\) determined by \(\lambda_1\). In practice this means that -one throws away the data from the beginning of the random walk in order -reduce the dependence on the initial conditions and be close enough to -the target distribution.
-At this stage one might think we’re done. We can indeed draw -independent samples from \(P(\s; -\beta)\) by starting from some arbitrary initial state and doing -\(k\) steps to arrive at a sample. -However a key insight is that after the convergence time, every state -generated is a sample from \(P(\s; -\beta)\)! They are not, however, independent samples. In -Fig.~\(\ref{fig:raw}\) it is already -clear that the samples of the order parameter m have some -auto-correlation because only a few spins are flipped each step but even -when the number of spins flipped per step is increased, Fig.~\(\ref{fig:m_autocorr}\) shows that it can be -an important effect near the phase transition. Let’s define the -auto-correlation time \(\tau(O)\) -informally as the number of MCMC samples of some observable O that are -statistically equal to one independent sample or equivalently as the -number of MCMC steps after which the samples are correlated below some -cutoff, see [14] for a more rigorous definition -involving a sum over the auto-correlation function. The auto-correlation -time is generally shorter than the convergence time so it therefore -makes sense from an efficiency standpoint to run a single walker for -many MCMC steps rather than to run a huge ensemble for \(k\) steps each.
-Once the random walk has been carried out for many steps, the -expectation values of \(O\) can be -estimated from the MCMC samples \(\s_i\): \[ - \tex{O} = \sum_{i = 0}^{N} O(\s_i) + \mathcal{O}(\frac{1}{\sqrt{N}}) -\] The the samples are correlated so the N of them effectively -contains less information than \(N\) -independent samples would, in fact roughly \(N/\tau\) effective samples. As a -consequence the variance is larger than the \(\qex{O^2} - \qex{O}^2\) form it would have -if the estimates were uncorrelated. There are many methods in the -literature for estimating the true variance of \(\qex{O}\) and deciding how many steps are -needed but my approach has been to run a small number of parallel -chains, which are independent, in order to estimate the statistical -error produced. This is a slightly less computationally efficient -because it requires throwing away those \(k\) steps generated before convergence -multiple times but it is a conceptually simple workaround.
-In summary, to do efficient simulations we want to reduce both the -convergence time and the auto-correlation time as much as possible. In -order to explain how, we need to introduce the Metropolis-Hasting (MH) -algorithm and how it gives an explicit form for the transition -function.
-MH breaks up the transition function into a proposal distribution -\(q(\s \to \s')\) and an acceptance -function \(\mathcal{A}(\s \to -\s')\). \(q\) needs to be -something that we can directly sample from, and in our case generally -takes the form of flipping some number of spins in \(\s\), i.e if we’re flipping a single random -spin in the spin chain, \(q(\s \to -\s')\) is the uniform distribution on states reachable by one -spin flip from \(\s\). This also gives -the nice symmetry property that \(q(\s \to -\s') = q(\s' \to \s)\).
-The proposal \(\s'\) is then -accepted or rejected with an acceptance probability \(\mathcal{A}(\s \to \s')\), if the -proposal is rejected then \(\s_{i+1} = -\s_{i}\). Hence:
-\[\mathcal{T}(x\to x') = q(x\to -x')\mathcal{A}(x \to x')\]
-When the proposal distribution is symmetric as ours is, it cancels -out in the expression for the acceptance function and the -Metropolis-Hastings algorithm is simply the choice: \[ \mathcal{A}(x \to x') = \min\left(1, -e^{-\beta\;\Delta F}\right)\] Where \(F\) is the overall free energy of the -system, including both the quantum and classical sector.
-To implement the acceptance function in practice we pick a random -number in the unit interval and accept if it is less than \(e^{-\beta\;\Delta F}\):
-= initial_state
- current_state
-for i in range(N_steps):
-= proposal(current_state)
- new_state = free_energy_change(current_state, new_state, parameters)
- df
-if uniform(0,1) < exp(-beta * df):
- = new_state
- current_state
- = current_state states[i]
This has the effect of always accepting proposed states that are -lower in energy and sometimes accepting those that are higher in energy -than the current state.
-Here, we incorporate a modification to the standard -Metropolis-Hastings algorithm [15] gleaned from Krauth [7].
-In our computations [16] we -employ a modification of the algorithm which is based on the observation -that the free energy of the FK system is composed of a classical part -which is much quicker to compute than the quantum part. Hence, we can -obtain a computational speedup by first considering the value of the -classical energy difference \(\Delta -H_s\) and rejecting the transition if the former is too high. We -only compute the quantum energy difference \(\Delta F_c\) if the transition is accepted. -We then perform a second rejection sampling step based upon it. This -corresponds to two nested comparisons with the majority of the work only -occurring if the first test passes and has the acceptance function \[\mathcal{A}(a \to b) = \min\left(1, e^{-\beta -\Delta H_s}\right)\min\left(1, e^{-\beta \Delta -F_c}\right)\;.\]
-For the model parameters used in Fig. 2, we find -that with our new scheme the matrix diagonalisation is skipped around -30% of the time at \(T = 2.5\) and up -to 80% at \(T = 1.5\). We observe that -for \(N = 50\), the matrix -diagonalisation, if it occurs, occupies around 60% of the total -computation time for a single step. This rises to 90% at N = 300 and -further increases for larger N. We therefore get the greatest speedup -for large system sizes at low temperature where many prospective -transitions are rejected at the classical stage and the matrix -computation takes up the greatest fraction of the total computation -time. The upshot is that we find a speedup of up to a factor of 10 at -the cost of very little extra algorithmic complexity.
-Our two-step method should be distinguished from the more common -method for speeding up MCMC which is to add asymmetry to the proposal -distribution to make it as similar as possible to \(\min\left(1, e^{-\beta \Delta E}\right)\). -This reduces the number of rejected states, which brings the algorithm -closer in efficiency to a direct sampling method. However it comes at -the expense of requiring a way to directly sample from this complex -distribution, a problem which MCMC was employed to solve in the first -place. For example, recent work trains restricted Boltzmann machines -(RBMs) to generate samples for the proposal distribution of the FK -model [10]. The RBMs are chosen as a -parametrisation of the proposal distribution that can be efficiently -sampled from while offering sufficient flexibility that they can be -adjusted to match the target distribution. Our proposed method is -considerably simpler and does not require training while still reaping -some of the benefits of reduced computation.
-Given a MCMC algorithm with target distribution \(\pi(a)\) and transition function \(\mathcal{T}\) the detailed balance -condition is sufficient (along with some technical constraints [5]) to guarantee that in the long time -limit the algorithm produces samples from \(\pi\). \[\pi(a)\mathcal{T}(a \to b) = \pi(b)\mathcal{T}(b -\to a)\]
-In pseudo-code, our two step method corresponds to two nested -comparisons with the majority of the work only occurring if the first -test passes:
-= initial_state
- current_state
-for i in range(N_steps):
-= proposal(current_state)
- new_state
-= classical_energy_change(
- c_dE
- current_state,
- new_state)if uniform(0,1) < exp(-beta * c_dE):
- = quantum_free_energy_change(
- q_dF
- current_state,
- new_state)if uniform(0,1) < exp(- beta * q_dF):
- = new_state
- current_state
-= current_state states[i]
Defining \(r_c = e^{-\beta H_c}\) -and \(r_q = e^{-\beta F_q}\) our target -distribution is \(\pi(a) = r_c r_q\). -This method has \(\mathcal{T}(a\to b) = q(a\to -b)\mathcal{A}(a \to b)\) with symmetric \(p(a \to b) = \pi(b \to a)\) and \(\mathcal{A} = \min\left(1, r_c\right) \min\left(1, -r_q\right)\)
-Substituting this into the detailed balance equation gives: \[\mathcal{T}(a \to b)/\mathcal{T}(b \to a) = -\pi(b)/\pi(a) = r_c r_q\]
-Taking the LHS and substituting in our transition function: \[\begin{aligned} -\mathcal{T}(a \to b)/\mathcal{T}(b \to a) = \frac{\min\left(1, -r_c\right) \min\left(1, r_q\right)}{ \min\left(1, 1/r_c\right) -\min\left(1, 1/r_q\right)}\end{aligned}\]
-which simplifies to \(r_c r_q\) as -\(\min(1,r)/\min(1,1/r) = r\) for \(r > 0\).
-Our method already relies heavily on the split between the classical -and quantum sector to derive a sign problem free MCMC algorithm but it -turns out that there is a further trick we can play with it. The free -energy term is the sum of an easy to compute classical energy and a more -expensive quantum free energy, we can split the acceptance function into -two in such as way as to avoid having to compute the full exact -diagonalisation some of the time:
-
-= initial_state
- current_state
-for i in range(N_steps):
-= proposal(current_state)
- new_state
-= classical_free_energy_change(current_state, new_state, parameters)
- df_classical if exp(-beta * df_classical) < uniform(0,1):
- = quantum_free_energy(current_state, new_state, parameters)
- f_quantum
- if exp(- beta * df_quantum) < uniform(0,1):
- = new_state
- current_state
- = current_state
- states[i]
Now we can discuss how to minimise the auto-correlations. The general -principle is that one must balance the proposal distribution between two -extremes. Choose overlay small steps, like flipping only a single spin -and the acceptance rate will be high because \(\Delta F\) will usually be small, but each -state will be very similar to the previous and the auto-correlations -will be high too, making sampling inefficient. On the other hand, -overlay large steps, like randomising a large portion of the spins each -step, will result in very frequent rejections, especially at low -temperatures.
-I evaluated a few different proposal distributions for use with the -FK model.
-Fro Figure~\(\ref{fig:comparison}\) -we see that even at moderately high temperatures \(T > T_c\) flipping one or two sites is -the best choice. However for some simulations at very high temperature -flipping more spins is warranted. Tuning the proposal distribution -automatically seems like something that would not yield enough benefit -for the additional complexity it would require.
-Dimensionality can be both a blessing and a curse. In section ?? I’ll -discuss the fact that statistical physics can be somewhat boring in one -dimension where most simple models have no phase transitions. This -chapter is motivated by the the converse problem, high dimensional -spaces can sometimes be just too much.
-While there are many problems with high dimensions, my favourite -being that there are no stable gravitational orbits in 4D and above, the -specific issue we’ll focus on here is that it’s very hard to compute -integrals over high dimensional spaces.
-The standard methods for numerical integration in 1,2 and 3 -dimensions mostly work in the same way REFERENCE. You evaluate the -integrand at a grid of points, define an interpolating function over the -points that’s easy to integrate and then integrate the function. For a -fixed grid spacing \(d\) on a finite -domain of integration we’ll find that we need to evaluate \(\propto (1/d)^D\) points, which scales -exponentially with dimension!
-In statistical physics the main integral that one would love to be -able to evaluate is of course the partition function.
-\[Z = \int ds e^{-\beta F}\]
-And as this is condensed matter theory, we will mainly be looking at -quantum models in which our states are discrete occupation numbers of -single particle energy states. For a spin model with just two states per -site and N sites we therefore end up with \(2^N\) possible states of the system.
-domain of integration is bounded we can cif we take a discrete space -with \(M\) dimensions each taking \(N\) distinct values.
-Detailed and Global balance equation Mixing times Cluster updates and -Critical slowing down Effective Sample Size
-Dimensionality can be both a blessing and a curse. In I’ll discuss -the fact that statistical physics can be somewhat boring in one -dimension where most simple models have no phase transitions. This -chapter is motivated by the the converse problem, high dimensional -spaces can sometimes be just too much.
-While there are many problems with high dimensions 1 -it’s very hard to compute integrals over high dimensional spaces. If we -take a discrete space with \(M\) -dimensions each taking \(N\) distinct -values.
-For a classical system, the thermal expectation value of some -operator \(O\) is defined by a -Boltzmann weighted sum over the classical state space: \[\begin{aligned} - \tex{O} &= \frac{1}{\Z} \sum_{\s \in S} O(x) P(x) \\ - P(x) &= \frac{1}{\Z} e^{-\beta F(x)} \\ - \Z &= \sum_{\s \in S} e^{-\beta F(x)}\end{aligned}\] -While for a quantum system these sums are replaced by equivalent traces. -The obvious approach to evaluate these sums numerically would be to -directly loop over all the classical states in the system and perform -the sum. But we all know know why this isn’t feasible: the state space -is too large! Indeed even if we could do it, it would still be -computationally wasteful since at low temperatures the sums are -dominated by low energy excitations about the ground states of the -system. Even worse, in our case we must fully solve the fermionic system -via exact diagonalisation for each classical state in the sum, a very -expensive operation! 2
Where the sample_T
function here produces a state with
probability determined by the current_state
and the
-transition function \(\T\).
If we ran many such walkers in parallel we could then approximate the distribution \(p_t(\s; \s_0)\) which tells us where the walkers are likely to be after they’ve evolved for \(t\) steps from an initial state \(\s_0\). We need to carefully choose \(\T\) such that after a large number of -steps \(k\) (the convergence time) the -probability \(p_t(\s;\s_0)\) approaches -the thermal distribution \(P(\s; \beta) = -\Z^{-1} e^{-\beta F(\s)}\). This turns out to be quite easy to -achieve using the Metropolis-Hasting algorithm.
+class="math inline">\(\mathcal{T}\) such that after a large +number of steps \(k\) (the convergence +time) the probability \(p_t(\s;\s_0)\) +approaches the thermal distribution \(P(\s; +\beta) = \mathcal{Z}^{-1} e^{-\beta F(\s)}\). This turns out to +be quite easy to achieve using the Metropolis-Hasting algorithm.Considering \(p(\s)\) as a vector \(\vec{p}\) whose jth entry is the probability of the jth state \(p_j = -p(\s_j)\), and writing \(\T\) as -the matrix with entries \(T_{ij} = \T(\s_j -\rightarrow \s_i)\) we can write the update rule for the ensemble -probability as: \[\vec{p}_{t+1} = \T -\vec{p}_t \implies \vec{p}_{t} = \T^t \vec{p}_0\] where , and writing \(\mathcal{T}\) as the matrix with entries +\(T_{ij} = \mathcal{T}(\s_j \rightarrow +\s_i)\) we can write the update rule for the ensemble probability +as: \[\vec{p}_{t+1} = \mathcal{T} \vec{p}_t +\implies \vec{p}_{t} = \mathcal{T}^t \vec{p}_0\] where \(\vec{p}_0\) is vector which is one on the starting state and zero everywhere else. Since all states must transition to somewhere with probability one: \(\lambda_0 = 1\) and \(0 < \lambda_{i\neq0} < 1\). Assuming -\(\T\) has been chosen correctly, its -single eigenvector with eigenvalue 1 will be the thermal distribution 3 so repeated application of the -transition function eventually leads there, while memory of the initial -conditions decays exponentially with a convergence time \(\mathcal{T}\) has been chosen +correctly, its single eigenvector with eigenvalue 1 will be the thermal +distribution [^3] so repeated application of the transition function +eventually leads there, while memory of the initial conditions decays +exponentially with a convergence time \(k\) determined by \(\lambda_1\). In practice this means that one throws away the data from the beginning of the random walk in order reduce the dependence on the initial conditions and be close enough to the target distribution.