update learning materials

2025-06-26 08:51:16 +02:00 · 2022-06-09 09:55:49 +02:00 · 2022-06-09 09:55:49 +02:00 · da28951623
commit da28951623
parent 615558cb38
4 changed files with 157 additions and 68 deletions
--- a/README.md
+++ b/README.md
@ -26,12 +26,13 @@ Rather this project is primarily designed to showcase the tools and practices av
 1. [A short introduction][intro] 
 1. [Organising code and python packaging][packaging]
 1. [Testing your code][testing]
-1. Python packages and environments: Pip, Conda, setup.py and all that. 
+1. Planning the project, MVPs, Premature Optimisation, 
 1. Planning out a larger software project
 1. Using Jupyter Notebooks during development
 1. Documentation
 1. Reproducibility of software outputs
 1. Citing software in a publication: CITATION.cff
+1. Managing an open source project, issues, milestones

 ## How to use this repository

--- a/Introduction.ipynb
+++ b/Introduction.ipynb
@ -16,6 +16,16 @@
    "\n",
    "It's strongly encouraged that you follow along this notebook in an enviroment where you can run the cells yourself and change them. You can either clone this git repository and run the cells in a python environment on your local machine, or if you for some reason can't do that (because you're an a phone or tablet for instance) you can instead open this notebook in [binder](link)\n",
    "\n",
+    "I would also suggest you setup a python environment just for this. You can use your preferred method to do this, but I will recomend `conda` because it's both what I currently use and what is recommeded by Imperial: LINK \n",
+    "\n",
+    "```bash\n",
+    "#make a new conda environment named recode, with python 3.9 and the packages in requirements.txt\n",
+    "conda env create --name recode  python=3.9\n",
+    "\n",
+    "#activate the environment\n",
+    "conda activate recode\n",
+    "```\n",
+    "\n",
    "## The Problem\n",
    "\n",
    "So without further ado lets talk about the problem we'll be working on, you don't necessaryily need to understand the full details of this to learn the important lessons but I will give a quick summary here. We want to simulate a physical model called the **Ising model**, which is famous in physics because it's about the simplest thing you can come up with that displays a phase transition, a special kind of shift between two different behaviours."
--- a/learning/02
+++ b/learning/02
@ -95,12 +95,10 @@
    "\n",
    "`pyproject.toml` and `setup.cfg` are the current way to describe the metadat about a python package like how it should be installed and who the author is etc, but typically you just copy the standard layouts and build from there. The empty `__init__.py` file flags that this folder is a python module.\n",
    "\n",
-    "*NB* The [General Python Packaging advice](https://packaging.python.org/en/latest/tutorials/packaging-projects/) says to use `requires = [\"setuptools>=42\"]` but this did not work on my system, I founded removing the version restriction on setuptools seemed to fix the problem. Don't be afraid to google it if you're having problems.\n",
-    "\n",
    "pyproject.toml:\n",
    "```\n",
    "[build-system]\n",
-    "requires = [\"setuptools\"]\n",
+    "requires = [\"setuptools>=4.2\"]\n",
    "build-backend = \"setuptools.build_meta\"\n",
    "```\n",
    "\n",
@ -147,68 +145,6 @@
    "```\n",
    "The dot means we should install MCFF from the current directory and `--editable` means to do it as an editable package so that we can edit the files in MCFF and not have to reinstall. This is really useful for development."
   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "553ac586-39c4-4a3e-bf54-af89f46b8ba3",
-   "metadata": {},
-   "source": [
-    "## Testing\n",
-    "Ok we can finally start writing and running some tests! Check out the [pytest website](https://docs.pytest.org/en/7.1.x/getting-started.html) for a tutorial on how to write tests in pytest and head over to the [Turing Way](https://the-turing-way.netlify.app/reproducible-research/testing.html) for a great introduction to testing in general. \n",
-    "\n",
-    "I copied some of the initial tests that we did in chapter 1 into `test_energy.py` installed pytest into my development environemnt with `pip install pytest` and now I can run\n",
-    "```sh\n",
-    "python -m pytest\n",
-    "```\n",
-    "And get a lovely output!"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "id": "17186d0c-982c-45e4-b88f-c87d73b8d1a7",
-   "metadata": {},
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<script id=\"asciicast-498583\" src=\"https://asciinema.org/a/498583.js\" async></script>\n"
-      ],
-      "text/plain": [
-       "<IPython.core.display.HTML object>"
-      ]
-     },
-     "metadata": {},
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "%%html\n",
-    "<script id=\"asciicast-498583\" src=\"https://asciinema.org/a/498583.js\" async></script>"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "d8ab0ff0-e916-4903-9a2d-c85ad2d6a713",
-   "metadata": {},
-   "source": [
-    "[![asciicast](https://asciinema.org/a/498583.svg)](https://asciinema.org/a/498583)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "id": "a43c40eb-127e-4caf-8abf-b62931539130",
-   "metadata": {},
-   "source": [
-    "https://docs.pytest.org/en/6.2.x/goodpractices.html\n",
-    "\n",
-    "Now make sure you have pytest installed in this environment and then run it:\n",
-    "```\n",
-    "pip install pytest\n",
-    "python -m pytest\n",
-    "```\n",
-    "If you just run `pytest` you may run into issues where the you run pytest in the wrong python environment and it will complain to you that it can't find MCFF."
-   ]
  }
 ],
 "metadata": {
--- a/Testing.ipynb
+++ b/Testing.ipynb
@ -8,13 +8,155 @@
    "<h1 align=\"center\">Markov Chain Monte Carlo for fun and profit</h1>\n",
    "<h1 align=\"center\"> 🎲 ⛓️ 👉 🧪 </h1>\n",
    "\n",
-    "## Testing"
+    "# Chapter 3: Testing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7bfb6308-24ec-474d-adbc-b60797d58c29",
+   "metadata": {
+    "tags": []
+   },
+   "source": [
+    "Ok we can finally start writing and running some tests! Check out the [pytest website](https://docs.pytest.org/en/7.1.x/getting-started.html) for a tutorial on how to write tests in pytest and head over to the [Turing Way](https://the-turing-way.netlify.app/reproducible-research/testing.html) for a great introduction to testing in general. \n",
+    "\n",
+    "I copied some of the initial tests that we did in chapter 1 into `test_energy.py` installed pytest into my development environment with `pip install pytest`. If you're using conda you need to use `conda install pytest` and now I can run the `pytest` command in the ReCoDE_MCFF directory. Pytest will automatically discover our tests and run them, to do this it relies on their being python files with functions named `test_\\*` which it will run.\n",
+    "\n",
+    "If that doesn't work and complains it can't find MCFF, try `python -m pytest`, this asks python to find a module and run it, which can be useful to ensure you're running pytest inside the correct environment. I ran into this problem because I used `pip install pytest` into a conda environment when I should have done `conda install pytest`.\n",
+    "\n",
+    "But hopefully you can get it working and get a lovely testy output! I've embedded a little video of this below but if it doesn't load, check out the [link](https://asciinema.org/a/498583)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "d4082e07-c51f-46ba-9a5e-bf45c2c319ba",
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<script id=\"asciicast-498583\" src=\"https://asciinema.org/a/498583.js\" async></script>\n"
+      ],
+      "text/plain": [
+       "<IPython.core.display.HTML object>"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    }
+   ],
+   "source": [
+    "%%html\n",
+    "<script id=\"asciicast-498583\" src=\"https://asciinema.org/a/498583.js\" async></script>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "27a83dc3-eaa5-4b40-b61c-0cd969fa8049",
+   "metadata": {},
+   "source": [
+    "## Basic Testing with Pytest\n",
+    "\n",
+    "Take a look at `tests/test_energy.py`. You can see that I've done some imports, setup some test states and then defined two testing functions:\n",
+    "\n",
+    "```python\n",
+    "# Note that only functions whose name begins with test_ get run by pytest\n",
+    "def E_prediction_all_the_same(L): \n",
+    "    \"The exact energy in for the case where all spins are up or down\"\n",
+    "    return -(4*(L - 2)**2 + 12*(L-2) + 8) / L**2\n",
+    "\n",
+    "def test_exact_energies():\n",
+    "    for state in [all_up, all_down]:\n",
+    "        L = state.shape[0]\n",
+    "        assert energy(state) == E_prediction_all_the_same(L)\n",
+    "```\n",
+    "\n",
+    "I will defer to external resources for a full discussion of the philosphy of testing but I generally think of tests as an aid to my future debugging. If I make a change that breaks something then I want my tests to catch that and to make it clear what has broken. As such I generally put tests that check very basic properties of my code early on in the file and then follow them with tests that probe more subtle things or more obscure edges cases.\n",
+    "\n",
+    "`test_exact_energies` checks that the energies of our exact states come out as we calculated they should in chapter 1. This is testing a very limited space of the possible inputs to `energy` so we'd like to find some way to be more confident that our implementation is correct.\n",
+    "\n",
+    "One was is to test multiple independant implementations against one another: `test_energy_implementations` checks our numpy implementation against our numba one. This should catch implementation bugs because it's unlikely we will make the same such error in both implementations. \n",
+    "\n",
+    "```python\n",
+    "def test_energy_implementations():\n",
+    "    for state in states:\n",
+    "        assert np.allclose(energy(state), energy_numpy(state))\n",
+    "```\n",
+    "\n",
+    "However if we have made some logical errors in how we've defined the energy, that error will likely appear in both implememtations and thus won't be caught by this. \n",
+    "\n",
+    "Generally what we will do now, is that as we write more code or add new functionality we will add tests to check that functionality."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "eeb05f7e-913d-4d9e-9580-9c840e06d410",
+   "metadata": {},
+   "source": [
+    "## Coverage Testing\n",
+    "\n",
+    "A useful little trick for testing, are tools like pytest-cov that can measure *coverage*, that is, the amount of your code base that is activate by your tests. Unfortunatley Numba does not play super well with pytest-cov so we have to turn off numba to generate the test report using an environment variable.\n",
+    "\n",
+    "```bash\n",
+    "(recode) tom@TomsLaptop ReCoDE_MCMCFF % pip install pytest-cov # install the coverage testing plugin\n",
+    "(recode) tom@TomsLaptop ReCoDE_MCMCFF % NUMBA_DISABLE_JIT=1 pytest --cov=MCFF --cov-report=term\n",
+    "\n",
+    "================================================== test session starts ==================================================\n",
+    "platform darwin -- Python 3.9.12, pytest-7.1.1, pluggy-1.0.0\n",
+    "rootdir: /Users/tom/git/ReCoDE_MCMCFF\n",
+    "plugins: hypothesis-6.46.10, cov-3.0.0\n",
+    "collected 3 items                                                                                                       \n",
+    "\n",
+    "code/tests/test_energy.py ..                                                                                      [ 66%]\n",
+    "code/tests/test_energy_using_hypothesis.py .                                                                      [100%]\n",
+    "\n",
+    "---------- coverage: platform darwin, python 3.9.12-final-0 ----------\n",
+    "Name                           Stmts   Miss  Cover\n",
+    "--------------------------------------------------\n",
+    "code/src/MCFF/__init__.py          0      0   100%\n",
+    "code/src/MCFF/ising_model.py      22      3    86%\n",
+    "code/src/MCFF/mcmc.py             14     14     0%\n",
+    "--------------------------------------------------\n",
+    "TOTAL                             36     17    53%\n",
+    "\n",
+    "\n",
+    "=================================================== 3 passed in 1.89s ===================================================\n",
+    "```\n",
+    "\n",
+    "Ok so this is telling us that we currently test 86% of the lines in ising_model.py. We can also change `--cov-report=html` to get a really nice html output which shows which parts of your code aren't being run.\n",
+    "\n",
+    "A warning though, testing 100% of your lines of code doesn't mean it's correct, you need to think carefully about the data you test on, try to pick the hardest examples you can think of! What edge cases might there be that would break your code? Zero, empty strings and empty arrays are classic examples."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d70a8934-a58d-4aa6-afca-35fee23bf851",
+   "metadata": {},
+   "source": [
+    "## Advanced Testing Methods: Hypothesis\n",
+    "\n",
+    "\n",
+    "I won't do into huge detail here but I thought it would be nice to make you aware of a nice library called `Hypothesis` that helps with this problem of finding edge cases. `Hypothesis` gives you tools to generate randomised inputs to functions, so as long as you can come up with some way to verify the output is correct (or just that the code doens't throw and error!) then this can be a powerful method of testing. \n",
+    "\n",
+    "\n",
+    "Take a look in `test_energy_using_hypothesis.py`\n",
+    "```python\n",
+    "from hypothesis.extra import numpy as hnp\n",
+    "\n",
+    "@given(hnp.arrays(dtype = int,\n",
+    "                 shape = hnp.array_shapes(min_dims = 2, max_dims = 2),\n",
+    "                 elements = st.sampled_from([1, -1])))\n",
+    "def test_generated_states(state):\n",
+    "    assert np.allclose(energy(state), energy_numpy(state))\n",
+    "```\n",
+    "You tell Hypothesis how to generate the test data, in this case we use some numpy specifc code to generate 2 dimensional arrays with `dtype = int` and entries randomly sampled from `[1, -1]`. We use the same trick as before of checking two implementations against one another."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
-   "id": "58bef986-9d69-4ef6-8a64-9eaf29c3424e",
+   "id": "21270ceb-f5b7-496b-a530-2def9f70b89f",
   "metadata": {},
   "outputs": [],
   "source": []