ReCoDE_MCMCFF/docs/learning/04 Testing.ipynb
2022-07-17 16:43:57 +01:00

480 lines
33 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "69cf54db-101a-4cd3-8e68-1b7ba93078eb",
"metadata": {},
"source": [
"<h1 align=\"center\">Markov Chain Monte Carlo for fun and profit</h1>\n",
"<h1 align=\"center\"> 🎲 ⛓️ 👉 🧪 </h1>\n",
"\n",
"# Testing"
]
},
{
"cell_type": "markdown",
"id": "7bfb6308-24ec-474d-adbc-b60797d58c29",
"metadata": {
"tags": []
},
"source": [
"Further reading on Testing:\n",
"- [The official Pytest docs](https://docs.pytest.org/en/7.1.x/getting-started.html)\n",
"- [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/testing.html)\n",
"- [Essential Software Engineering for Researchers](https://imperialcollegelondon.github.io/grad_school_software_engineering_course/l2-01-testing_overview/index.html)\n",
"\n",
"Ok we can finally start writing and running some tests!\n",
"\n",
"I copied some of the initial tests that we did in chapter 1 into `test_energy.py` installed pytest into my development environment with `pip install pytest`. If you're using conda you need to use `conda install pytest` and now I can run the `pytest` command in the ReCoDE_MCFF directory. Pytest will automatically discover our tests and run them, to do this it relies on their being python files with functions named `test_\\*` which it will run.\n",
"\n",
"If that doesn't work and complains it can't find MCFF, try `python -m pytest`, this asks python to find a module and run it, which can be useful to ensure you're running pytest inside the correct environment. I ran into this problem because I used `pip install pytest` into a conda environment when I should have done `conda install pytest`.\n",
"\n",
"But hopefully you can get it working and get a lovely testy output! I've embedded a little video of this below but if it doesn't load, check out the [link](https://asciinema.org/a/498583)"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "d4082e07-c51f-46ba-9a5e-bf45c2c319ba",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div style=\"max-width:700px;margin:auto;\"><script id=\"asciicast-498583\" src=\"https://asciinema.org/a/498583.js\" async></script></div>\n"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%%html \n",
"<div style=\"max-width:700px;margin:auto;\"><script id=\"asciicast-498583\" src=\"https://asciinema.org/a/498583.js\" async></script></div>"
]
},
{
"cell_type": "markdown",
"id": "a28abe71-196e-41c9-85e3-8429871dd63e",
"metadata": {},
"source": [
"We can also add a few lines to `pyproject.toml` to tell pytest where to find our tests:\n",
"```toml\n",
"[tool.pytest.ini_options]\n",
"minversion = \"6.0\"\n",
"testpaths = [\n",
" \"tests\",\n",
"]\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "27a83dc3-eaa5-4b40-b61c-0cd969fa8049",
"metadata": {},
"source": [
"## Basic Testing with Pytest\n",
"\n",
"Take a look at `tests/test_energy.py`. You can see that I've done some imports, setup some test states and then defined two testing functions:\n",
"\n",
"```python\n",
"# Note that only functions whose name begins with test_ get run by pytest\n",
"def E_prediction_all_the_same(L): \n",
" \"The exact energy in for the case where all spins are up or down\"\n",
" return -(4*(L - 2)**2 + 12*(L-2) + 8)\n",
"\n",
"def test_exact_energies():\n",
" for state in [all_up, all_down]:\n",
" L = state.shape[0]\n",
" assert energy(state) == E_prediction_all_the_same(L)\n",
"```\n",
"\n",
"I will defer to external resources for a full discussion of the philosphy of testing but I generally think of tests as an aid to my future debugging. If I make a change that breaks something then I want my tests to catch that and to make it clear what has broken. As such I generally put tests that check very basic properties of my code early on in the file and then follow them with tests that probe more subtle things or more obscure edges cases.\n",
"\n",
"`test_exact_energies` checks that the energies of our exact states come out as we calculated they should in chapter 1. This is testing a very limited space of the possible inputs to `energy` so we'd like to find some way to be more confident that our implementation is correct.\n",
"\n",
"One was is to test multiple independant implementations against one another: `test_energy_implementations` checks our numpy implementation against our numba one. This should catch implementation bugs because it's unlikely we will make the same such error in both implementations. \n",
"\n",
"```python\n",
"def test_energy_implementations():\n",
" for state in states:\n",
" assert np.allclose(energy(state), energy_numpy(state))\n",
"```\n",
"\n",
"However if we have made some logical errors in how we've defined the energy, that error will likely appear in both implememtations and thus won't be caught by this. \n",
"\n",
"Generally what we will do now, is that as we write more code or add new functionality we will add tests to check that functionality."
]
},
{
"cell_type": "markdown",
"id": "eeb05f7e-913d-4d9e-9580-9c840e06d410",
"metadata": {},
"source": [
"## Coverage Testing\n",
"\n",
"A useful little trick for testing, are tools like pytest-cov that can measure *coverage*, that is, the amount of your code base that is activate by your tests. Unfortunatley Numba does not play super well with pytest-cov so we have to turn off numba to generate the test report using an environment variable.\n",
"\n",
"```bash\n",
"(recode) tom@TomsLaptop ReCoDE_MCMCFF % pip install pytest-cov # install the coverage testing plugin\n",
"(recode) tom@TomsLaptop ReCoDE_MCMCFF % NUMBA_DISABLE_JIT=1 pytest --cov=MCFF --cov-report=term\n",
"\n",
"================================================== test session starts ==================================================\n",
"platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0\n",
"rootdir: /Users/tom/git/ReCoDE_MCMCFF\n",
"plugins: hypothesis-6.46.10, cov-3.0.0\n",
"collected 3 items \n",
"\n",
"code/tests/test_energy.py .. [ 66%]\n",
"code/tests/test_energy_using_hypothesis.py . [100%]\n",
"\n",
"---------- coverage: platform darwin, python 3.9.12-final-0 ----------\n",
"Name Stmts Miss Cover\n",
"--------------------------------------------------\n",
"code/src/MCFF/__init__.py 0 0 100%\n",
"code/src/MCFF/ising_model.py 22 3 86%\n",
"code/src/MCFF/mcmc.py 14 14 0%\n",
"--------------------------------------------------\n",
"TOTAL 36 17 53%\n",
"\n",
"\n",
"=================================================== 3 passed in 1.89s ===================================================\n",
"```\n",
"\n",
"Ok so this is telling us that we currently test 86% of the lines in ising_model.py. We can also change `--cov-report=html` to get a really nice html output which shows which parts of your code aren't being run.\n",
"\n",
"A warning though, testing 100% of your lines of code doesn't mean it's correct, you need to think carefully about the data you test on, try to pick the hardest examples you can think of! What edge cases might there be that would break your code? Zero, empty strings and empty arrays are classic examples."
]
},
{
"cell_type": "markdown",
"id": "d70a8934-a58d-4aa6-afca-35fee23bf851",
"metadata": {},
"source": [
"## Advanced Testing Methods: Property Based Testing\n",
"\n",
"I won't do into huge detail here but I thought it would be nice to make you aware of a nice library called `Hypothesis` that helps with this problem of finding edge cases. `Hypothesis` gives you tools to generate randomised inputs to functions, so as long as you can come up with some way to verify the output is correct or has the correct _properties_ (or just that the code doens't throw and error!) then this can be a powerful method of testing. \n",
"\n",
"\n",
"Take a look in `test_energy_using_hypothesis.py`\n",
"```python\n",
"from hypothesis.extra import numpy as hnp\n",
"\n",
"@given(hnp.arrays(dtype = int,\n",
" shape = hnp.array_shapes(min_dims = 2, max_dims = 2),\n",
" elements = st.sampled_from([1, -1])))\n",
"def test_generated_states(state):\n",
" assert np.allclose(energy(state), energy_numpy(state))\n",
"```\n",
"You tell Hypothesis how to generate the test data, in this case we use some numpy specifc code to generate 2 dimensional arrays with `dtype = int` and entries randomly sampled from `[1, -1]`. We use the same trick as before of checking two implementations against one another."
]
},
{
"cell_type": "markdown",
"id": "128cc7f1-8c9c-44f9-8c6a-ec16acb6fc68",
"metadata": {
"tags": []
},
"source": [
"## Testing Stochastic Code\n",
"\n",
"We have a interesting problem here, most testing assumes that for the same inputs we will always get the same outputs but our MCMC sampler is a stochastic algorithm. So how can we test it? I can see three mains routes we can take:\n",
"\n",
"- Fix the seed of the random number generator to make it deterministic\n",
"- Do statistical tests on the output \n",
"- Use property based testing (see above)\n",
"\n",
"### Fixed Seeds\n",
"The random number generators we typically use are really pseudo-random number generators: given a value called a seed they generate a deterministic pattern that looks for most purposes like a random sequence. Typically the seed is determined by something that is _more random_ such as a physical random number generator. However if we fix the seed we can create reproducabile plots and test our code more easily!"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "c28d257e-5466-4cf0-9381-46f1cbdeaf8e",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"(array([ 0.55326059, 0.21760061, -0.05798999, -2.31893609, 0.43149417,\n",
" -2.12627978, 0.90992122, 0.60596557, 0.83005665, 0.82769834]),\n",
" array([-0.57820285, -0.65570117, 1.60871517, -0.83666294, 2.03363763,\n",
" 0.44904314, 0.31099544, -0.85810422, -0.87923828, 0.96426779]))"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"seed = [\n",
" 2937053738,\n",
" 1783364611,\n",
" 3145507090,\n",
"] # generated once with rng.integers(2**63, size = 3) and then saved\n",
"\n",
"# New Style\n",
"# numba doesn't yet support this so I haven't used it in our code\n",
"# but if you aren't using numba then you should get used to this new style)\n",
"from numpy.random import default_rng\n",
"\n",
"rng = default_rng(seed=23)\n",
"vals = rng.standard_normal(10)\n",
"\n",
"# Old style\n",
"from numpy import random\n",
"\n",
"random.seed(seed)\n",
"vals2 = random.standard_normal(10)\n",
"\n",
"vals, vals2 # note that the two styles do no give the same results"
]
},
{
"cell_type": "markdown",
"id": "fb281250-0f08-43a8-bcb2-4b9e2c262cd9",
"metadata": {},
"source": [
"However this has a major drawback, if we want this to work we must always generate the same random numbers in the same order and use them in the same way if we want the output to be the same. This is a problem because we might want to make a change to our MCMC sampler in a way that changes the way it call the rng but still want to compare it to the previous version. In this case we have to use statistical tests instead.\n",
"\n",
"### Statistical Tests\n",
"If we want to verify that two different implementations of our algorithm agree or that the output matches our expectations, we can use something like a t-test to check our samples. Now this gets complicated very fast but bear with me for this simple example:"
]
},
{
"cell_type": "code",
"execution_count": 116,
"id": "64cd4fd1-2a22-41ec-820f-5f5cac4a68b3",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Over 10 samples we got an average color of -0.00253 +/- 0.100466\n",
"That's -0.00253 +/- 3971%\n",
"\n",
"The standard error of the mean is about 91%\n",
"\n",
"The p value when comparing that with 0 is 0.26\n",
"p > 0.1 : True\n",
"\n",
"\n"
]
},
{
"data": {
"text/plain": [
"<matplotlib.legend.Legend at 0x7f8ac6ad04f0>"
]
},
"execution_count": 116,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/YYfK9AAAACXBIWXMAAAsTAAALEwEAmpwYAAAf7klEQVR4nO3dfXRV9b3n8fc3T4TnpwQIhBIsiIIgcBFttRoqCqKF3rl1wLYOFmfRTrWta25nxHtra73DlM5qXZ3W8d6yei3MrVekVzvQ2rGlXBirUrhQEUUeixEiSEJ4BkMe+M4f2dIAYedsOCe/5PB5rZV1zt7nt8/+5Cz4ZGdnn98xd0dERLJLTugAIiKSfip3EZEspHIXEclCKncRkSykchcRyUJ5oQMAFBUVeVlZWegYkkE1NTUA9O3bN3CS1G3btg2AESNGBE6Suo74OsvF27BhwwF3L27psXZR7mVlZaxfvz50DMmgRYsWAXDfffcFzZFEeXk5p06dYunSpQwePDh0nJR0xNdZLp6ZvXuhx3RaRiTGli1buPfee0PHEElM5S4SY8iQIXzjG98IHUMksZTK3cx6mdm/mNlWM9tiZh8zsz5mtsLMdkS3vZuNf8TMdprZNjObkrn4IpnVu3dvJk+eHDqGSGKpnnP/n8BL7v4ZMysAugB/A6x09wVmNg+YBzxsZiOBWcAoYCDwOzO70t0bM5BfJKNqa2vZtWsXV1xxRegoHV59fT2VlZXU1taGjtLhFBYWUlpaSn5+fsrbtFruZtYDuBm4D8Dd64A6M5sBlEfDFgOrgYeBGcASdz8FvGNmO4GJwJqUU4m0E1u3bmXOnDmsXr06dJQOr7Kyku7du1NWVoaZhY7TYbg7NTU1VFZWMnTo0JS3S+W0zBVANfBTM3vdzH5iZl2B/u6+L9r5PqBfNH4QsKfZ9pXRurOY2VwzW29m66urq1MOLNKWysrK+Pa3vx06Rlaora2lb9++KvaEzIy+ffsm/o0nlXLPA8YDf+/u44ATNJ2CuWCWFtadN/Wkuy909wnuPqG4uMXLNEWC69WrF7fcckvoGFlDxX5xLuZ1S6XcK4FKd18bLf8LTWW/38xKoh2XAFXNxje/KLgU2Js4mUg7cPLkyTNvZhLpSFotd3d/H9hjZh++Te9W4G1gOTA7WjcbWBbdXw7MMrNOZjYUGA6sS2tqkTayfft2vvjFL4aOIWn00ksvMWLECIYNG8aCBQtCx8mYVK+W+QrwTHSlzC7gCzT9YFhqZvcDu4G7Adx9s5ktpekHQAPwgK6UkY7qiiuuYGvpXZTNe7HN912x4M4232e2a2xs5IEHHmDFihWUlpZy3XXXMX36dEaOHBk6WtqlVO7uvhGY0MJDt15g/Hxg/sXHEmkfevTowcHSq0PHkDRZt24dw4YNO3Np66xZs1i2bFlWlrveoSoS48SJE9RVV4SOkZXKy8vPzIVTX19PeXk5P/vZz4Cmv3WUl5fz3HPPAXDkyBHKy8t54YUXADhw4ADl5eX88pe/BOD9999PaZ/vvffeWfMElZaW8t5776XrW2pXVO4iMXbs2MHBFf8QOoakSUufGZ2tV/C0i1khRdqrj370o+y95vOhY2Sl5m8My8/PP2u5S5cuZy337NnzrOWioqKzlgcMGJDSPktLS9mz589vw6msrGTgwIFJo3cIOnIXidG9e3c6lVwZOoakyXXXXceOHTt45513qKurY8mSJUyfPj10rIzQkbtIjOPHj1O3fxcF/TW3TDbIy8vjySefZMqUKTQ2NjJnzhxGjRoVOlZGqNxFYuzcuZPaqoUM+Gz2Xg99uZk2bRrTpk0LHSPjVO4iMYYNG8be0bNbHyjSzuicu0iMbt266ZSMdEgqd5EYx44d49S+7aFjiCSmcheJ8ac//YlDq54OHUMkMZ1zF4kxfPhw3htzX+gYIonpyF0kRteuXSkoLgsdQyQxlbtIjKNHj1JbuSV0DEkTM+Pee+89s9zQ0EBxcTF33XVXwFSZoXIXibFr1y4Ov7w4dAxJk65du/LWW2/xwQcfALBixQoGDTrvU0CzgspdJMaVV15J3ykPho4haXTHHXfw4otN8/M/++yz3HPPPWceO3HiBHPmzOG6665j3LhxLFvW9BlEFRUVfOITn2D8+PGMHz+e1157DWiaH6e8vJzPfOYzXHXVVXzuc59rcXKyEPQHVZEYXbp0Ib9vaegYWeehhx5i48aNaX3OsWPH8oMf/KDVcbNmzeLxxx/nrrvuYtOmTcyZM4ff//73AMyfP59PfvKTPP300xw+fJiJEycyefJk+vXrx4oVKygsLGTHjh3cc889rF+/HoDXX3+dzZs3M3DgQG688UZeffVVbrrpprR+bxdD5S4S4/Dhw9TufpPCj4wOHUXSZMyYMVRUVPDss8+eNw3Bb3/7W5YvX873vvc9AGpra9m9ezcDBw7kwQcfZOPGjeTm5rJ9+5/f+zBx4kRKS5sOAMaOHUtFRYXKXaS9q6iooPbgM5pbJs1SOcLOpOnTp/P1r3+d1atXU1NTc2a9u/P8888zYsSIs8Y/9thj9O/fnzfeeIPTp09TWFh45rFOnTqduZ+bm0tDQ0Pmv4EU6Jy7SIyrrrqKvtMeCh1D0mzOnDl885vfZPTos38jmzJlCj/60Y/OnDd//fXXgaZPgiopKSEnJ4d/+qd/orGx/X8stMpdJEZhYSH5vVL7IAjpOEpLS/na17523vpHH32U+vp6xowZwzXXXMOjjz4KwJe//GUWL17MDTfcwPbt2+natWtbR05Mp2VEYhw6dIgPKjbSuWxs6CiSBsePHz9vXXl5OeXl5QB07tyZH//4x+eNGT58OJs2bTqz/J3vfOe8bQGefPLJ9Aa+BDpyF4nx7rvvcuS1JaFjiCSmI3eRGFdffTV7xn0xdAyRxHTkLhKjU6dO5PUoDh1DJDGVu0iMgwcP8sGuDaFjiCSmcheJsXv3bo784eehY4gkllK5m1mFmb1pZhvNbH20ro+ZrTCzHdFt72bjHzGznWa2zcymZCq8SKaNHDmS4ukPh44hkliSI/dJ7j7W3SdEy/OAle4+HFgZLWNmI4FZwChgKvCUmeWmMbNImykoKCC3W+/WB0qHMH/+fEaNGsWYMWMYO3Ysa9euBZreMXvy5MkWt1m0aBEPPnj+5HGLFi2iuLiYsWPHnvl6++23M5o/iUu5WmYGUB7dXwysBh6O1i9x91PAO2a2E5gIrLmEfYkEUVNTw8mda+ky7PrQUeQSrVmzhl/96lf88Y9/pFOnThw4cIC6ujqgqdw///nP06VLl0TPOXPmzNhr2xsbG8nNzb3g8oU0NDSQl3dpFzOmeuTuwG/NbIOZzY3W9Xf3fQDRbb9o/SBgT7NtK6N1ZzGzuWa23szWV1dXX1x6kQzbs2cPR9f9InQMSYN9+/ZRVFR0Zi6YoqIiBg4cyA9/+EP27t3LpEmTmDRpEgA//elPufLKK7nlllt49dVXE+1n9erVTJo0ic9+9rOMHj36vOXa2lq+8IUvMHr0aMaNG8eqVauApt8E7r77bj71qU9x++23X/L3m+qPhhvdfa+Z9QNWmNnWmLHWwrrzJjh294XAQoAJEya0jwmQRc4xatQodv/FA6FjZJ2XXnqJ999/P63POWDAAKZOnXrBx2+//XYef/xxrrzySiZPnszMmTO55ZZb+OpXv8oTTzzBqlWrKCoqYt++fXzrW99iw4YN9OzZk0mTJjFu3LgWn/O5557jlVdeObO8Zk3TCYp169bx1ltvMXToUFavXn3W8ve//30A3nzzTbZu3crtt99+ZpbJNWvWsGnTJvr06XPJr0dKR+7uvje6rQJ+QdNplv1mVgIQ3VZFwyuBwc02LwX2XnJSkQDy8/PJ7dIzdAxJg27durFhwwYWLlxIcXExM2fOZNGiReeNW7t2LeXl5RQXF1NQUMDMmTMv+JwzZ85k48aNZ746d+4MNE0DPHTo0DPjmi+/8sorZz7q76qrrmLIkCFnyv22225LS7FDCkfuZtYVyHH3Y9H924HHgeXAbGBBdLss2mQ58M9m9gQwEBgOrEtLWpE2duDAAU5ue40uIz4eOkpWiTvCzqTc3Nwz88GMHj2axYsXc9999503zqylExCpO3disebLcZ/UlM4JyVI5cu8PvGJmb9BU0i+6+0s0lfptZrYDuC1axt03A0uBt4GXgAfcvf3PjynSgsrKSo5uWB46hqTBtm3b2LFjx5nljRs3MmTIEAC6d+/OsWPHALj++uvPzPNeX1/Pz3+e3vc53HzzzTzzzDMAbN++nd27d583f3w6tHrk7u67gGtbWF8D3HqBbeYD8y85nUhg11xzDbuv+2roGJIGx48f5ytf+QqHDx8mLy+PYcOGsXDhQgDmzp3LHXfcQUlJCatWreKxxx7jYx/7GCUlJYwfP/6C87efe879qaeeajXHl7/8Zb70pS8xevRo8vLyWLRo0Vkf+JEu1h4+zHXChAn+4ecRSnb68NxmS78Ct1cfTuVaccN/CbL/igV3Jt6mPb/OW7Zs4eqrrw4do8Nq6fUzsw3N3nt0Fk0/IBKjqqqKE1teDh1DJDGVu0iMvXv3cuz1X4eOIZKY5nMXiTFmzBgqrjv/49jk4rj7JV+Jcjm6mNPnOnIXiZGTk0NOfmHrA6VVhYWF1NTUXFRRXc7cnZqaGgoLk/071JG7SIz9+/dzfPMquo2aFDpKh1daWkplZSWabiS5wsJCSktLE22jcheJsW/fPmpP/Eblngb5+flnvWtTMkvlLhLj2muvpWLifw4dQyQxlbt0CGXzXmzzfb6/qwaAAR/TfxPpePQHVZEYjScOcfzN34WOIZKYyl0khspdOiqVu0iMgn5XMOCzC0LHEElM5S4ikoVU7iIxGo8f5NjGl0LHEElM5S4So/HkEU5u/X3oGCKJqdxFYhT0G0r/WfpoAul4VO4iIllI5S4So/H4QY79se3fQCVyqVTuIjEaPzjKyZ36fHfpeFTuIjEKisvo/++/HTqGSGIqdxGRLKRyF4nReKyGo+uXhY4hkpjKXSTG6drj1L77RugYIomp3EVi5BcPod9ffTN0DJHEVO4iIllI5S4So+HoAY6sfSF0DJHEUi53M8s1s9fN7FfRch8zW2FmO6Lb3s3GPmJmO81sm5lNyURwkbbgdSep27s1dAyRxJIcuX8N2NJseR6w0t2HAyujZcxsJDALGAVMBZ4ys9z0xBVpW/lFH6H4L/8mdAyRxFIqdzMrBe4EftJs9QxgcXR/MfDpZuuXuPspd38H2AlMTEtaERFJSapH7j8A/itwutm6/u6+DyC67RetHwTsaTauMlp3FjOba2brzWx9dXV10twibaLhaDVH/vDz0DFEEmu13M3sLqDK3Tek+JzWwjo/b4X7Qnef4O4TiouLU3xqkbbldbXU7d8VOoZIYnkpjLkRmG5m04BCoIeZ/QzYb2Yl7r7PzEqAqmh8JTC42falwN50hhZpK/lFgyme8XDoGCKJtXrk7u6PuHupu5fR9IfSf3X3zwPLgdnRsNnAh+/RXg7MMrNOZjYUGA5oWj0RkTZ0Kde5LwBuM7MdwG3RMu6+GVgKvA28BDzg7o2XGlQkhIYjVRx+9dnQMUQSS+W0zBnuvhpYHd2vAW69wLj5gD6bTDo8bzhFw8H3QscQSUzvUBWJkd93MEWf+nroGCKJqdxFRLKQyl0kRsOR/Rz+/c9CxxBJTOUuEsMb6mk4eiB0DJHEVO4iMfL7llJ050OhY4gkpnIXEclCKneRGA2H3+fQ/1sUOoZIYip3kRh+upHTHxwLHUMkMZW7SIz8PoPoO/UroWOIJKZyFxHJQip3kRgNh9/n0L/+Y+gYIomp3EViuJ/mdENd6BgiiancRWLk9x5I39v/U+gYIomp3EVEspDKXSRGw6F9HPzdwtAxRBJTuYuIZCGVu0iMvN4l9Jk8N3QMkcRU7iIiWUjlLhKj/tBean7796FjiCSmcheJYZZDTl5B6BgiiancRWLk9RpA70/eHzqGSGIqdxGRLKRyF4lRf/A9al76UegYIomp3EViWE4uOZ27h44hkpjKXSRGXq8B9L7lvtAxRBJTuYuIZKFWy93MCs1snZm9YWabzezb0fo+ZrbCzHZEt72bbfOIme00s21mNiWT34BIJtXXVHLgxR+EjiGSWCpH7qeAT7r7tcBYYKqZ3QDMA1a6+3BgZbSMmY0EZgGjgKnAU2aWm4HsIhlnefnk9SgKHUMksVbL3Zscjxbzoy8HZgCLo/WLgU9H92cAS9z9lLu/A+wEJqYztEhbyevZn16f+HzoGCKJpXTO3cxyzWwjUAWscPe1QH933wcQ3faLhg8C9jTbvDJad+5zzjWz9Wa2vrq6+hK+BREROVdK5e7uje4+FigFJprZNTHDraWnaOE5F7r7BHefUFxcnFJYkbZWX7OHA7/8XugYIoklulrG3Q8Dq2k6l77fzEoAotuqaFglMLjZZqXA3ksNKhKC5XUir895v3iKtHt5rQ0ws2Kg3t0Pm1lnYDLwXWA5MBtYEN0uizZZDvyzmT0BDASGA+sykF0k4/J69qPXjfcE2XfZvBcTbzO1oAaAxy5i2+YqFtx5SdtLeK2WO1ACLI6ueMkBlrr7r8xsDbDUzO4HdgN3A7j7ZjNbCrwNNAAPuHtjZuKLiEhLWi13d98EjGthfQ1w6wW2mQ/Mv+R0IoHVH9hD9bLvUjzj4dBRRBLRO1RFYlhBIQX9rwgdQySxVE7LiJxxMeeBIX3ngttaXo9iet5wd+gYIonpyF1EJAup3EVi1B/YTfUv/nvoGCKJqdxFYlhBFwoGXhU6hkhiKneRGHk9iuh5/b8LHUMkMZW7iEgWUrmLxKivfpeq5x8PHUMkMZW7SIycwm4UDrk2dAyRxFTuIjFyu/elx4QZoWOIJKZyFxHJQip3kRh11RXsX/qt0DFEElO5i8TI7dyDLsP0KZHS8ajcRWLkdutD9/Ga21w6HpW7iEgWUrmLxKireof9S/42dAyRxFTuIjFyu/Sky1WfCB1DJDGVu0iM3G596D52augYIomp3EVEspDKXSRGXdUu3v/neaFjiCSmcheJkdu1N91GTw4dQyQxlbtIDJW7dFQqd5E47nhjQ+gUIomp3EVi1FW/w/7nvhE6hkhiKneRGLld+9Dt2imhY4gkpnIXiZHbtRfdRk0KHUMksVbL3cwGm9kqM9tiZpvN7GvR+j5mtsLMdkS3vZtt84iZ7TSzbWamwx7puPw0p+trQ6cQSSyVI/cG4K/d/WrgBuABMxsJzANWuvtwYGW0TPTYLGAUMBV4ysxyMxFeJNPqqiuo+vljoWOIJNZqubv7Pnf/Y3T/GLAFGATMABZHwxYDn47uzwCWuPspd38H2AloQmzpkHK79aX7uGmhY4gkluicu5mVAeOAtUB/d98HTT8AgH7RsEHAnmabVUbrzn2uuWa23szWV1dXX0R0kczL7dKTrlffHDqGSGIpl7uZdQOeBx5y96NxQ1tY5+etcF/o7hPcfUJxcXGqMUTa1unTnD51InQKkcRSKnczy6ep2J9x9xei1fvNrCR6vASoitZXAoObbV4K7E1PXJG2VXeggqrn/y50DJHEUrlaxoB/BLa4+xPNHloOzI7uzwaWNVs/y8w6mdlQYDiwLn2RRdpOXrcievzF9NAxRBLLS2HMjcC9wJtmtjFa9zfAAmCpmd0P7AbuBnD3zWa2FHibpittHnD3xnQHF2kLOV160GXEx0PHEEms1XJ391do+Tw6wK0X2GY+MP8Scom0C366kcaTR8jt0jN0FJFE9A5VkRj1B96l+v98J3QMkcRU7iIx8roX02PiX4aOIZKYyl0kRk7n7nQZdn3oGCKJqdxFYnhjA43HD4WOIZKYyl0kRn3NbqqXfzd0DJHEVO4iMfJ69KPnDXeHjiGSmMpdJEZOYTc6X/EXoWOIJKZyF4nhjfU0HNXEdtLxqNxFYtTX7OHAr74fOoZIYqlMPyDtTNm8F0NHuGzk9ehHz4/PCh1DJDEduYvEyCnsRueysaFjiCSmcheJ4Q111B9+P3QMkcRU7iIx6g9WUvPrH4SOIZKYyl0kRl7P/vS66XOhY4gkpnIXiZHTqSuFHxkdOoZIYrpaRiSG15+ivqaS/L6loaO0qVBXZFUsuDPIfrORjtxFYtQfeo+a3zwZOoZIYip3kRh5PQfQ6+bZrQ8UaWdU7iIxcjp1obD06tAxRBJTuYvE8Ppa6qorQscQSUzlLhKj/tBeDq74h9AxRBJTuYvEyOtVQu9Jc0LHEElM5S4SI6egM51KrgwdQyQxlbtIDK+rpW7/rtAxRBJTuYvEqD+8l4MrF4aOIZKYyl0kRn6vgfS5dW7oGCKJtVruZva0mVWZ2VvN1vUxsxVmtiO67d3ssUfMbKeZbTOzKZkKLtIWrKCQgv5XhI4hklgqR+6LgKnnrJsHrHT34cDKaBkzGwnMAkZF2zxlZrlpSyvSxk7XfcCpfdtDxxBJrNVyd/eXgYPnrJ4BLI7uLwY+3Wz9Enc/5e7vADuBiemJKtL2Gg7v49Cqp0PHEEnsYs+593f3fQDRbb9o/SBgT7NxldE6kQ4pv/dA+tz2pdAxRBJL9x9UrYV13uJAs7lmtt7M1ldXV6c5hkh6WH4hBcVloWOIJHax5b7fzEoAotuqaH0lMLjZuFJgb0tP4O4L3X2Cu08oLi6+yBgimXX61ElqK7eEjiGS2MWW+3Lgw3lQZwPLmq2fZWadzGwoMBxYd2kRRcJpOPI+h19e3PpAkXam1U9iMrNngXKgyMwqgW8BC4ClZnY/sBu4G8DdN5vZUuBtoAF4wN0bM5RdJOPyew+i75QHQ8cQSazVcnf3ey7w0K0XGD8fmH8poUTaC8vvdNl9xJ5kB71DVSTG6VMnqN39ZugYIomp3EViNBzZz+FXngkdQyQxlbtIjPw+pfSd9lDoGCKJqdxFYlheAfm9BoSOIZKYyl0kxuna43xQsTF0DJHEVO4iMRqOVnHktSWhY4gkpnIXiZHfdzBFd/116BgiiancRWJYbj55PTQ9hnQ8KneRGKdrj/PBrg2hY4gkpnIXidFwtIojf/h56BgiiancRWLk9/0IxdMfDh1DJDGVu0gMy80jt1vv1geKtDMqd5EYpz84xsmda0PHEElM5S4So+FYNUfX/SJ0DJHEVO4iMfKLhlD86UdCxxBJTOUuEsNycsnt0jN0DJHEVO4iMU6fPMrJba+FjiGSWKufxCQXVjbvxdARJMMajh/g6IbldBnx8dBRRBLRkbtIjIKiMvr91aOhY4gkpnIXiZOTQ06nrqFTiCSm0zIiMRpPHuHElpfpevXNoaNcFkKd6qxYcGeQ/WaSjtxFYjQer+HY678OHUMkMZW7SIyC4jL63f1Y6BgiiancReJYDjn5haFTiCSmcheJ0XjiMMc3rwodQyQxlbtIjMYTBzn+xm9CxxBJLCuultGbiSRTCoqH0n/mfwsdQySxjB25m9lUM9tmZjvNbF6m9iOSUWZYblYcA8llJiP/as0sF/hfwG1AJfBvZrbc3d/OxP5EMqXxxCGOv/k7uo2eHDqKZFDI3/4zdY19po7cJwI73X2Xu9cBS4AZGdqXSMZ8WO4iHY25e/qf1OwzwFR3/4/R8r3A9e7+YLMxc4G50eIIYFvagyRTBBwInCEpZW4bHS1zR8sLynyxhrh7cUsPZOpkorWw7qyfIu6+EFiYof0nZmbr3X1C6BxJKHPb6GiZO1peUOZMyNRpmUpgcLPlUmBvhvYlIiLnyFS5/xsw3MyGmlkBMAtYnqF9iYjIOTJyWsbdG8zsQeA3QC7wtLtvzsS+0qjdnCJKQJnbRkfL3NHygjKnXUb+oCoiImFp+gERkSykchcRyUKXbbmbWR8zW2FmO6Lb3i2MKTSzdWb2hpltNrNvh8jaLE8qmQeb2Soz2xJl/lqIrM3ytJo5Gve0mVWZ2VttnTHaf+x0Gdbkh9Hjm8xsfIic52RqLfNVZrbGzE6Z2ddDZDxXCpk/F72+m8zsNTO7NkTOczK1lnlGlHejma03s5tC5DyPu1+WX8D/AOZF9+cB321hjAHdovv5wFrghnaeuQQYH93vDmwHRrbnzNFjNwPjgbcCZMwF/gRcARQAb5z7mgHTgP8b/Zu4AVgb6jVNkLkfcB0wH/h6yLwJMn8c6B3dv6ODvM7d+PPfL8cAW0O/1u5++R650zQdwuLo/mLg0+cO8CbHo8X86CvkX6BTybzP3f8Y3T8GbAEGtVXAFrSaGcDdXwYOtlGmc6UyXcYM4H9H/yb+APQys5K2DtpMq5ndvcrd/w2oDxGwBalkfs3dD0WLf6DpPTIhpZL5uEfNDnQlbEeccTmXe3933wdNhUjTUc55zCzXzDYCVcAKd1/bdhHPk1LmD5lZGTCOpt84QkmUOZBBwJ5my5Wc/wMxlTFtqb3lSUXSzPfT9NtSSCllNrO/NLOtwIvAnDbKFiur5zI1s98BA1p46G9TfQ53bwTGmlkv4Bdmdo27Z+y8cDoyR8/TDXgeeMjdj6YjW8y+0pI5oFany0hxTFtqb3lSkXJmM5tEU7mHPn+dUmZ3/wVN/XAz8HdA8GlEs7rc3f2CL7CZ7TezEnffF/16XdXKcx02s9XAVCBj5Z6OzGaWT1OxP+PuL2Qo6hnpfJ0DSWW6jPY2pUZ7y5OKlDKb2RjgJ8Ad7l7TRtkuJNHr7O4vm9lHzazI3YNOKnY5n5ZZDsyO7s8Glp07wMyKoyN2zKwzTT+Nt7ZVwBakktmAfwS2uPsTbZjtQlrN3A6kMl3GcuA/RFfN3AAc+fB0UyAdcYqPVjOb2UeAF4B73X17gIznSiXzsOj/HdFVVAVA6B9Kl/XVMn2BlcCO6LZPtH4g8Gv/81++Xwc20XS0/s0OkPkmmn5t3ARsjL6mtefM0fKzwD6a/vhXCdzfxjmn0XRl0Z+Av43WfQn4UnTfaPoAmj8BbwITQv5bSDHzgOi1PAocju73aOeZfwIcavZvd30HeJ0fBjZHedcAN4XO7O6afkBEJBtdzqdlRESylspdRCQLqdxFRLKQyl1EJAup3EVEspDKXUQkC6ncRUSy0P8HTTeUmd1xe6QAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"import numpy as np\n",
"from scipy import stats\n",
"from MCFF.mcmc import mcmc_generator\n",
"from matplotlib import pyplot as plt\n",
"\n",
"\n",
"### The measurement we will make ###\n",
"def average_color(state):\n",
" return np.mean(state)\n",
"\n",
"\n",
"### Simulation Inputs ###\n",
"N = 10 # Use an NxN system\n",
"T = 1000 # What temperatures to use\n",
"steps = 2000 # How many times to sample the state\n",
"stepsize = N**2 # How many individual monte carlo flips to do in between each sample\n",
"initial_state = np.ones(shape=(N, N)) # the initial state to use\n",
"\n",
"### Simulation Code ###\n",
"average_color_data = np.array(\n",
" [\n",
" average_color(s)\n",
" for s in mcmc_generator(initial_state, steps=steps, stepsize=stepsize, T=T)\n",
" ]\n",
")\n",
"\n",
"\n",
"mean, std_dev = np.mean(average_color_data), np.std(average_color_data)\n",
"std_error = std_dev / np.sqrt(steps)\n",
"\n",
"p_value = stats.ttest_1samp(average_color_data, 0).pvalue\n",
"print(\n",
" f\"\"\"\n",
"Over {N} samples we got an average color of {mean:g} +/- {std_dev:g}\n",
"That's {mean:g} +/- {std_dev/abs(mean)*100:.0f}%\n",
"\n",
"The standard error of the mean is about {std_err/abs(mean)*100:.0f}%\n",
"\n",
"The p value when comparing that with 0 is {p_value:.2g}\n",
"p > 0.1 : {p_value > 0.1}\n",
"\n",
"\"\"\"\n",
")\n",
"f, ax = plt.subplots()\n",
"ax.hist(average_color_data)\n",
"ax.axvline(0, linestyle=\"dotted\", color=\"k\", label=\"0\")\n",
"ax.axvline(mean, color=\"k\", label=\"Mean\")\n",
"ax.axvline(mean + std_dev, color=\"grey\", label=\"Std Error\")\n",
"ax.axvline(mean - std_dev, color=\"grey\")\n",
"ax.legend()"
]
},
{
"cell_type": "markdown",
"id": "1a9adfd8-bb59-4c4f-a51b-170f7c1daf86",
"metadata": {},
"source": [
"There are three things happening here:\n",
"\n",
"1. We are taking 2000 samples of a random variable X, those samples has some mean $m$ and standard deviation $\\sigma_X$, the mean is the center of mass of the above histogram and the standard deviation is a measure of how wide it is.\n",
"\n",
"2. However what we actually want to do is ask \"How close is the mean to 0?\", to answer that we need to know how much we expect the mean to vary by when we rerun the calculation. Turns the mean of N samples of a variable X then the mean varies by \n",
" $$\\sigma_m = \\sigma_X / \\sqrt{N}$$\n",
" this is usually called the standard error of the mean.\n",
"\n",
"3. Each time we run this code, we estimate the mean and the standard error of the mean, when it comes out to be a lot more than 100% then our t-test is very confident that the data is consistent with the true mean being 0. However when it's less than 100% we get a smaller p_value and this is saying that we should suspect that maybe the mean is not 0 after all.\n",
"\n",
"<img style=\"max-width:700px;margin:auto;\" src = \"https://imgs.xkcd.com/comics/p_values.png\" alt = \"An xkcd comic with a diagram of p values, saying that small ones are highly significant and giving humorous excuses for why larger ones are still intersting\">"
]
},
{
"cell_type": "markdown",
"id": "0d6622b6-6598-4b61-84a8-2a3907315599",
"metadata": {},
"source": [
"So to do our test, we check that the p value is less than some arbitrary cutoff such as 0.1 or 0.01. This test should usually pass if the mean is in fact close to zero and it should fail if the mean is not zero. However due to random variation it can also fail randomly.\n",
"\n",
"This is just one of those things that you can't really do anything about. Incidentally this can be used in reverse to generate fake \"highly significant\" scientific results in a practice called p-hacking. As usual XKCD has [explained this](https://xkcd.com/882/) better than I ever could."
]
},
{
"cell_type": "markdown",
"id": "041557a7-1965-4bea-9e61-4d0d6df335e8",
"metadata": {},
"source": [
"## Test Driven Development\n",
"\n",
"I won't talk about TDD much here but it's likely a term you will hear at some point. It essentially referes to the practice of writing tests as part of your process of writing code. Rather than writing all your code and then writing tests for them. You could instead write some or all of your tests upfront and then write code that passes them. \n",
"\n",
"This can be an incredibly prodctive way to work, it forces you think about the structure and interface of your software before you start writing it. It aslo gives you nice incremental goals that you can tick off once each test starts to pass, gamification maybe?"
]
},
{
"cell_type": "markdown",
"id": "91fa6842-f214-4fd3-bbe5-0f20d7d0d2cc",
"metadata": {},
"source": [
"## Autoformaters\n",
"Further reading on the topic of autoformatters:\n",
"- [The Turing Way](https://the-turing-way.netlify.app/reproducible-research/code-quality/code-quality-style.html)\n",
"- [Essential Software Engineering for Researchers](https://imperialcollegelondon.github.io/grad_school_software_engineering_course/l1-02-tools-II/index.html)\n",
"\n",
"While we're doing things that will help keep our code clean and tidy in the future, I would recommend installing a code formatter like `black`. This is a program that enforces a particular formatting style on your code by simply doing it for you. At first this sounds a bit weird, but it has a few benefits:\n",
"\n",
"- It makes git diffs as small as possible because formatting changes never show up\n",
"- It means you never have to discuss with your collaborators about code formatting, something which can waste a lot of time!\n",
"\n",
"Here I will show you how to setup `black` as a pre-commit hook, this means it runs before you commit anything to git, which is probably the best time to do it. We'll use a helper tool called [pre-commit](https://pre-commit.com/).\n",
"\n",
"```bash\n",
"pip install pre-commit\n",
"pre-commit sample-config >> .pre-commit-config.yaml # Generate an initial config\n",
"```\n",
"Now we add some additional lines to the `.pre-commit-config.yaml` config file to setup black:\n",
"```yaml\n",
"- repo: https://github.com/psf/black\n",
" rev: 21.12b0\n",
" hooks:\n",
" - id: black\n",
" - id: black-jupyter\n",
"```\n",
"And finally `pre-commit install` will make this run every time you commit to git. It's worth running it manually once the first time to check it works: `pre-commit run --all-files`. Running this I immediatly got a cryptic error that, on googling, turned out to be that something broke in version 21.12b0 of `21.12b0`. Running `precommit autoupdate` fixed this for me by updated `black` to a later version. Running `pre-commit run --all-files` a second time now gives me:\n",
"```bash\n",
"(recode) tom@TomsLaptop ReCoDE_MCMCFF % pre-commit run --all-files\n",
"trim trailing whitespace.................................................Passed\n",
"fix end of files.........................................................Passed\n",
"check yaml...........................................(no files to check)Skipped\n",
"check for added large files..............................................Passed\n",
"black....................................................................Passed\n",
"(recode) tom@TomsLaptop ReCoDE_MCMCFF % \n",
"```\n",
"\n",
"Now whenever you commit anything, `black` will autoformat it before it actually gets commited. I'll test this for myself by putting\n",
"```python\n",
"def ugly_litte_one_liner(a,b,c): return \" \".join([a,b,c/2. +3])\n",
"```\n",
"in a code cell below and we'll see how `black` formats it. The only gotcha here is that you have to reload jupyter notebooks from disk in order to see the changes that `black` makes."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "24874a21-b9b3-4c38-9a79-1a2b4bce88a9",
"metadata": {},
"outputs": [],
"source": [
"def ugly_litte_one_liner(a, b, c):\n",
" return \" \".join([a, b, c / 2.0 + 3])"
]
},
{
"cell_type": "markdown",
"id": "68cccdcc-b82e-4dec-bfc1-54072db8d762",
"metadata": {},
"source": [
"Finally, be aware that if you try to commit code with incorrect syntax then black will just error and prevent it, this is probably a good thing but there may be the occasional time where that's a problem."
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [conda env:recode]",
"language": "python",
"name": "conda-env-recode-py"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}