ReCoDE_MCMCFF/docs/learning/05 Adding Functionality.ipynb
gnikit f5e7e816dd
Apply suggestions from code review
Co-authored-by: Jeremy Cohen <jcohen02@users.noreply.github.com>
2022-07-20 00:46:37 +01:00

284 lines
9.8 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "5ac56056-ca33-4f13-8e36-564b94144c1e",
"metadata": {
"tags": []
},
"source": [
"<h1 align=\"center\">Markov Chain Monte Carlo for fun and profit</h1>\n",
"<h1 align=\"center\"> 🎲 ⛓️ 👉 🧪 </h1>"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "eb5d773e-4cc0-48ae-bb71-7ece7ab5f936",
"metadata": {},
"outputs": [],
"source": [
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"from numba import jit\n",
"\n",
"# This loads some custom styles for matplotlib\n",
"import json, matplotlib\n",
"\n",
"with open(\"../_static/matplotlibrc.json\") as f:\n",
" matplotlib.rcParams.update(json.load(f))\n",
"\n",
"np.random.seed(\n",
" 42\n",
") # This makes our random numbers reproducible when the notebook is rerun in order"
]
},
{
"cell_type": "markdown",
"id": "486f066c-f027-44e8-8937-8636a52f32fb",
"metadata": {},
"source": [
"# Adding Functionality\n",
"\n",
"The main thing we want to be able to do is to take measurements. The code, as I have written it, doesn't really allow that because it only returns the final state in the chain. Let's say we have a measurement called `average_color(state)` that we want to average over the whole chain. We could just stick that inside our definition of `mcmc`, but we know that we will likely make other measurements too, and we don't want to keep writing new versions of our core functionality!\n",
"\n",
"## Exercise 1\n",
"Have a think about how you would implement this and what options you have."
]
},
{
"cell_type": "markdown",
"id": "c28b0a86-28f8-426f-9013-70e962f02256",
"metadata": {},
"source": [
"## Solution 1\n",
"So I chatted with my mentors on this project on how to best do this, and we came up with a few ideas:\n",
"\n",
"### Option 1: Just save all the states and return them\n",
"\n",
"The problem with this is the states are very big, and we don't want to waste all that memory. For an `NxN` state that uses 8-bit integers (the smallest we can use in NumPy) `1000` samples would already use `2.5GB` (2.5 gigabytes) of memory! We will see later that we'd really like to be able to go a bit bigger than `50x50` and `1000` samples!\n",
"\n",
"### Option 2: Pass in a function to make measurements\n",
"```python\n",
"\n",
"def mcmc(initial_state, steps, T, measurement, energy=energy):\n",
" ...\n",
"\n",
" current_state = initial_state.copy()\n",
" E = energy(current_state)\n",
" for i in range(steps):\n",
" measurements[i] = measurement(state)\n",
" ...\n",
"\n",
" return measurements\n",
"```\n",
"\n",
"This could work, but it limits how we can store measurements and what shape and type they can be. What if we want to store our measurements in a NumPy array? Or what if your measurement itself is a vector or an object that can't easily be stored in a NumPy array? We would have to think carefully about what functionality we want."
]
},
{
"cell_type": "markdown",
"id": "c7c9575f-2450-4298-a507-90f0c1b9b284",
"metadata": {
"tags": []
},
"source": [
"### Option 3: Use Inheritance\n",
"```python\n",
"# This class would define the basic functionality of performing MCMC\n",
"class MCMCSampler(object):\n",
" def run(self, initial_state, steps, T):\n",
" ...\n",
" for i in range(steps):\n",
" self.measurement(state)\n",
"\n",
" \n",
"# This class would inherit from it and just implement the measurement\n",
"class AverageColorSampler(MCMCSampler):\n",
" measurements = np.zeros(10)\n",
" index = 0\n",
" \n",
" def measurement(self, state):\n",
" self.measurements[self.index] = some_function(state)\n",
" self.index += 1\n",
" \n",
"color_sampler = AverageColorSampler(...)\n",
"measurements = color_sampler.run(...)\n",
"```\n",
"\n",
"This would definitely work, but I personally am not a huge fan of object-oriented programming, so I'm going to skip this option!"
]
},
{
"cell_type": "markdown",
"id": "7d05d25d-c9ba-406d-9977-0ca4aeb430a7",
"metadata": {},
"source": [
"## Option 4: Use a generator\n",
"This is the approach I ended up settling on, we will use [python generator function](https://peps.python.org/pep-0255/). While you may not have come across generator functions before, you almost certainly will have come across generators, `range(n)` is a generator, `(i for i in [1,2,3])` is a generator. Generator functions are a way to build your own generators, by way of example here is range implemented as a generator function:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "5b17c054-230f-4188-98a6-51fd8fe5b437",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(<generator object my_range at 0x7fcaff25da50>, [0, 1, 2, 3, 4, 5, 6, 7, 8, 9])"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def my_range(n):\n",
" \"Behaves like the builtin range function of one argument\"\n",
" i = 0\n",
" while i < n:\n",
" yield i # sends i out to whatever function called us\n",
" i += 1\n",
" return # let's python know that we have nothing else to give\n",
"\n",
"\n",
"my_range(10), list(my_range(10))"
]
},
{
"cell_type": "markdown",
"id": "b74fadbe-80c2-4a20-b651-0e47188b005a",
"metadata": {},
"source": [
"This requires only a very small change to our `mcmc` function, and suddenly we can do whatever we like with the states! While we're at it, I'm going to add an argument `stepsize` that allows us to only sample the state every `stepsize` MCMC steps. You'll see why we would want to set this to a value greater than 1 in a moment."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "f73d6335-6514-45b1-9128-d72122d8b0b7",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"18.6 ms ± 645 µs per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"13.3 ms ± 1.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n",
"1x slowdown!\n"
]
}
],
"source": [
"from MCFF.ising_model import energy, all_up_state\n",
"from MCFF.mcmc import mcmc_original\n",
"\n",
"\n",
"@jit(nopython=True, nogil=True)\n",
"def mcmc_generator(initial_state, steps, T, stepsize=1000, energy=energy):\n",
" N, M = initial_state.shape\n",
" assert N == M\n",
"\n",
" current_state = initial_state.copy()\n",
" E = energy(current_state)\n",
" for _ in range(steps):\n",
" for _ in range(stepsize):\n",
" i, j = np.random.randint(N), np.random.randint(N)\n",
"\n",
" # modify the state a little, here we just flip a random pixel\n",
" current_state[i, j] *= -1\n",
" new_E = energy(current_state)\n",
"\n",
" if (new_E < E) or np.exp(-(new_E - E) / T) > np.random.random():\n",
" E = new_E\n",
" else:\n",
" current_state[i, j] *= -1 # reject the change we made\n",
" yield current_state.copy()\n",
" return\n",
"\n",
"\n",
"N_steps = 1000\n",
"stepsize = 1\n",
"initial_state = all_up_state(20)\n",
"without_yield = %timeit -o mcmc_original(initial_state, steps = N_steps, T = 5)\n",
"with_yield = %timeit -o [np.mean(s) for s in mcmc_generator(initial_state, T = 5, steps = N_steps, stepsize = 1)]\n",
"print(f\"{with_yield.best / without_yield.best:.0f}x slowdown!\")"
]
},
{
"cell_type": "markdown",
"id": "132bf8d1-d341-494a-9881-689605a104b7",
"metadata": {},
"source": [
"Fun fact: if you replace `yield current_state.copy()` with `yield current_state` your python kernel will crash when you run the code. I believe this is a bug in Numba that related to how pointers to numpy arrays work but let's not worry too much about it. \n",
"\n",
"We take a factor of two slowdown, but that doesn't seem so much to pay for the fact we can now sample the state at every single step rather than just the last."
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "b4e3c207-bc11-483a-aaf6-2daebb3194d8",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Author: Thomas Hodson\n",
"\n",
"Github username: T_Hodson\n",
"\n",
"Last updated: Mon Jul 18 2022\n",
"\n",
"Python implementation: CPython\n",
"Python version : 3.9.12\n",
"IPython version : 8.4.0\n",
"\n",
"Git hash: 03657e08835fdf23a808f59baa6c6a9ad684ee55\n",
"\n",
"Git repo: https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF.git\n",
"\n",
"Git branch: main\n",
"\n",
"json : 2.0.9\n",
"numpy : 1.21.5\n",
"matplotlib: 3.5.1\n",
"\n",
"Watermark: 2.3.1\n",
"\n"
]
}
],
"source": [
"%load_ext watermark\n",
"%watermark -n -u -v -iv -w -g -r -b -a \"Thomas Hodson\" -gu \"T_Hodson\""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:recode]",
"language": "python",
"name": "conda-env-recode-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}