mirror of
https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF.git
synced 2025-06-26 08:51:16 +02:00
Merge pull request #5 from ImperialCollegeLondon/fix/update-build-sections
fix: multiple changes in the Jupyter notebooks
This commit is contained in:
commit
eccba11833
80
README.md
80
README.md
@ -16,35 +16,64 @@
|
||||
</a>
|
||||
</p>
|
||||
|
||||
## Description
|
||||
|
||||
This is an exemplar project designed to showcase best practices in developing scientific software as part of the ReCoDE Project at Imperial College London.
|
||||
|
||||
**You do not need to know or care about Markov Chain Monte Carlo for this to be useful to you.**
|
||||
|
||||
Rather this project is primarily designed to showcase the tools and practices available to you when developing scientific software projects. Maybe you are a PhD student just starting, or a researcher just about to embark on a larger scale software project - there should be something interesting here for you.
|
||||
|
||||
## Table of contents
|
||||
## Learning Outcomes
|
||||
|
||||
1. [Introduction](docs/learning/01%20Introduction.ipynb)
|
||||
1. [Packaging It Up](docs/learning/02%20Packaging%20It%20Up.ipynb)
|
||||
1. [Writing a Markov Chain Monte Carlo Sampler](docs/learning/03%20Writing%20a%20Markov%20Chain%20Monte%20Carlo%20Sampler.ipynb)
|
||||
1. [Testing](docs/learning/04%20Testing.ipynb)
|
||||
1. [Adding Functionality](docs/learning/05%20Adding%20Functionality.ipynb)
|
||||
1. [Speeding It Up](docs/learning/06%20Speeding%20It%20Up.ipynb)
|
||||
1. [Producing Research Outputs](docs/learning/07%20Producing%20Research%20Outputs.ipynb)
|
||||
1. [Doing Reproducible Science](docs/learning/08%20Doing%20Reproducible%20Science.ipynb)
|
||||
1. [Adding Documentation](docs/learning/09%20Adding%20Documentation.ipynb)
|
||||
- Creating virtual environments using Anaconda
|
||||
- Plotting data using Matplotlib
|
||||
- Improving code performance with `numba` and Just-in-time compilation
|
||||
- Packaging Python projects into modules
|
||||
- Writing a simple Monte Carlo simulation using `numba` and `numpy`
|
||||
- Using Test Driven Development (TDD) to test your code
|
||||
- Creating unittests with `pytest`
|
||||
- Calculating the `coverage` of your codebase
|
||||
- Visualising coarse and detailed views of the `coverage` in your codebase
|
||||
- Creating property-based tests with `hypothesis`
|
||||
- Creating regression tests
|
||||
- Using autoformatters like `black` and other development tools
|
||||
- Improving performance using `generators` and `yield`
|
||||
- Making a reproducible Python environment using Anaconda
|
||||
- Documenting your code using `sphinx`
|
||||
- Writing docstrings using a standardised format
|
||||
|
||||
## How to use this repository
|
||||
## Requirements
|
||||
|
||||
### Academic
|
||||
|
||||
Entry level researcher with basic knowledge of Python.
|
||||
|
||||
**Complementary Resources to the exemplar:**
|
||||
|
||||
- [The Turing Way](https://the-turing-way.netlify.app/) has tons of great resources on the topics discussed here.
|
||||
- [Intermediate Research Software Development in Python](https://carpentries-incubator.github.io/python-intermediate-development/index.html)
|
||||
|
||||
### System
|
||||
|
||||
| Program | Version |
|
||||
| ---------------------------------------------------------- | ------- |
|
||||
| [Python](https://www.python.org/downloads/) | >= 3.7 |
|
||||
| [Anaconda](https://www.anaconda.com/products/distribution) | >= 4.1 |
|
||||
|
||||
## Getting Started
|
||||
|
||||
Take a look at the table of contents below and see if there are any topics that might be useful to you. The actual code lives in `src` and the documentation in `docs/learning` in the form of Jupyter notebooks.
|
||||
|
||||
When you're ready to dive in you have three options:
|
||||
When you're ready to dive in you have 4 options:
|
||||
|
||||
### 1. Launch them in Binder (easiest but a bit slow)
|
||||
### 1. Launch the notebooks in Binder
|
||||
|
||||
[](https://mybinder.org/v2/gh/ImperialCollegeLondon/ReCoDE_MCMCFF/HEAD?urlpath=lab%2Ftree%2Fdocs%2Flearning%2F01%20Introduction.ipynb)
|
||||
|
||||
### 2. Clone the repo and run the Jupyter notebooks locally. (Faster but requires you have python/jupyter installed)
|
||||
_NOTE: Performance might be a bit slow_.
|
||||
|
||||
### 2. Clone the repo and run the Jupyter notebooks locally
|
||||
|
||||
```bash
|
||||
git clone https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF mcmc
|
||||
@ -53,9 +82,18 @@ pip install .[dev]
|
||||
jupyter lab
|
||||
```
|
||||
|
||||
### 3. View them non-interactively in GitHub via the links in the table of contents
|
||||
_NOTE: Better performance but requires you have Python and Jupyter installed_.
|
||||
|
||||
## The map
|
||||
### 3. View the Jupyter notebooks non-interactively via the online documentation
|
||||
|
||||
You can read all the Jupyter notebooks online and non-interactively in the official **[Documentation](https://recode-mcmcff.readthedocs.io/)**.
|
||||
|
||||
### 4. View the Jupyter notebooks non-interactively on GitHub
|
||||
|
||||
Click [here](https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF/tree/main/docs/learning)
|
||||
to view the individual Jupyter notebooks.
|
||||
|
||||
## Project Structure
|
||||
|
||||
```bash
|
||||
.
|
||||
@ -76,13 +114,3 @@ jupyter lab
|
||||
│
|
||||
└── tests # automated tests for the code
|
||||
```
|
||||
|
||||
## External Resources
|
||||
|
||||
- [The Turing Way](https://the-turing-way.netlify.app/) has tons of great resources on the topics discussed here.
|
||||
- [Intermediate Research Software Development in Python](https://carpentries-incubator.github.io/python-intermediate-development/index.html)
|
||||
|
||||
[tdd]: learning/01%20Introduction.ipynb
|
||||
[intro]: learning/01%20Introduction.ipynb
|
||||
[packaging]: learning/02%20Packaging%20it%20up.ipynb
|
||||
[testing]: learning/02%20Testing.ipynb
|
||||
|
13
docs/conf.py
13
docs/conf.py
@ -34,9 +34,22 @@ release = "1.0"
|
||||
extensions = [
|
||||
"sphinx.ext.autodoc",
|
||||
"sphinx.ext.napoleon",
|
||||
"sphinx.ext.mathjax",
|
||||
"myst_nb",
|
||||
]
|
||||
|
||||
# Setup the myst_nb extension for LaTeX equations rendering
|
||||
myst_enable_extensions = [
|
||||
"amsmath",
|
||||
"colon_fence",
|
||||
"deflist",
|
||||
"dollarmath",
|
||||
"html_image",
|
||||
]
|
||||
myst_dmath_allow_labels = True
|
||||
myst_dmath_double_inline = True
|
||||
myst_update_mathjax = True
|
||||
|
||||
# Tell myst_nb not to execute the notebooks
|
||||
nb_execution_mode = "off"
|
||||
|
||||
|
@ -3,9 +3,7 @@ Imperial College London ReCoDE : Monte Carlo for Fun
|
||||
|
||||
This is an exemplar project designed to showcase best practices in developing scientific software as part of the ReCoDE Project at Imperial College London. These docs have been generated automatically with sphinx.
|
||||
|
||||
You can find the source code and main landing page for this project on `GitHub <https://github.com/TomHodson/ReCoDE_MCMCFF>`_
|
||||
|
||||
There is a `Jupyter notebook <https://github.com/TomHodson/ReCoDE_MCMCFF>`_ detailing how this page was generated in there.
|
||||
You can find the source code and main landing page for this project on `GitHub <https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF>`_
|
||||
|
||||
.. toctree::
|
||||
:maxdepth: 1
|
||||
@ -13,9 +11,8 @@ There is a `Jupyter notebook <https://github.com/TomHodson/ReCoDE_MCMCFF>`_ deta
|
||||
:caption: Table of Contents:
|
||||
|
||||
quickstart
|
||||
api_docs
|
||||
learning/*
|
||||
|
||||
api_docs
|
||||
|
||||
|
||||
Indices and tables
|
||||
|
@ -12,20 +12,27 @@
|
||||
"\n",
|
||||
"# Introduction\n",
|
||||
"\n",
|
||||
"Hello and welcome to the documentation for MCMCFF! These notebooks will guide you through the process of writing a medium sized scientific software project, discussing the decision and tradeoffs made along the way.\n",
|
||||
"Hello and welcome to the documentation for MCMCFF! These notebooks will guide you through the process of writing a medium-sized scientific software project, discussing the decision and trade-offs made along the way.\n",
|
||||
"\n",
|
||||
"## Setting up your environment\n",
|
||||
"\n",
|
||||
"It's strongly encouraged that you follow along this notebook in an enviroment where you can run the cells yourself and change them. You can either clone this git repository and run the cells in a python environment on your local machine, or if you for some reason can't do that (because you're an a phone or tablet for instance) you can instead open this notebook in [binder](https://mybinder.org/v2/gh/TomHodson/ReCoDE_MCMCFF/HEAD)\n",
|
||||
"It's **strongly encouraged** that you follow along this series of notebooks in an environment where you can run the cells yourself and change them. You can either clone this git repository and run the cells in a Python environment on your local machine,\n",
|
||||
"\n",
|
||||
"I would also suggest you setup a python environment just for this. You can use your preferred method to do this, but I will recomend `conda` because it's both what I currently use and what is recommeded by Imperial.\n",
|
||||
"```bash\n",
|
||||
"git clone https://github.com/ImperialCollegeLondon/ReCoDE_MCMCFF mcmc\n",
|
||||
"cd mcmc\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"or if for some reason you can't do that (because you are on a phone or tablet for instance) you can instead open this notebook in [binder](https://mybinder.org/v2/gh/TomHodson/ReCoDE_MCMCFF/HEAD)\n",
|
||||
"\n",
|
||||
"I would also suggest you set up a Python environment just for this project. You can use your preferred method to do this, but I will recommend `Anaconda` because it's both what I currently use and what is recommended by Imperial.\n",
|
||||
"\n",
|
||||
"```bash\n",
|
||||
"#make a new conda environment from the specification in environment.yml\n",
|
||||
"conda env create --file environment.yml\n",
|
||||
"\n",
|
||||
"#activate the environment\n",
|
||||
"conda activate recode\n",
|
||||
"conda activate mcmc\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"If you'd prefer to keep this environment nicely stored away in this repository, you can save in a folder called env by doing\n",
|
||||
@ -44,7 +51,7 @@
|
||||
"\n",
|
||||
"## The Problem\n",
|
||||
"\n",
|
||||
"So without further ado lets talk about the problem we'll be working on, you don't necessaryily need to understand the full details of this to learn the important lessons but I will give a quick summary here. We want to simulate a physical model called the **Ising model**, which is famous in physics because it's about the simplest thing you can come up with that displays a phase transition, a special kind of shift between two different behaviours."
|
||||
"So without further ado lets talk about the problem we'll be working on, you don't necessarily need to understand the full details of this to learn the important lessons, but I will give a quick summary here. We want to simulate a physical model called the **Ising model**, which is famous in physics because it's about the simplest thing you can come up with that displays a phase transition, a special kind of shift between two different behaviours."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -73,7 +80,7 @@
|
||||
"\n",
|
||||
"np.random.seed(\n",
|
||||
" 42\n",
|
||||
") # This makes our random numbers reproducable when the notebook is rerun in order"
|
||||
") # This makes our random numbers reproducible when the notebook is rerun in order"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -81,7 +88,7 @@
|
||||
"id": "e52245f1-8ecc-45f1-8d52-337916b0ce7c",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"We're going to be working with arrays of numbers so it will make sense to work with `Numpy` and we'll also want to plot things, the standard choice for this is `matplotlib`, though there are other options, `pandas` and `plotly` being notable ones.\n",
|
||||
"We're going to be working with arrays of numbers, so it will make sense to work with `NumPy`, and we'll also want to plot things, the standard choice for this is `Matplotlib`, though there are other options, `pandas` and `Plotly` being notable ones.\n",
|
||||
"\n",
|
||||
"Let me quickly plot something to aid the imagination:"
|
||||
]
|
||||
@ -122,15 +129,15 @@
|
||||
"id": "9a919be9-2737-4d79-9607-4daf3b457364",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"In my head, the Ising model is basically all about peer pressure. You're a tiny creature and you live in a little world where you can only be one of two things, up/down, left/right, in/out doesn't matter. \n",
|
||||
"In my head, the Ising model is basically all about peer pressure. You're a tiny creature, and you live in a little world where you can only be one of two things, up/down, left/right, in/out doesn't matter. \n",
|
||||
"\n",
|
||||
"But what *does matter* is that you're doing the same thing as you're neighbours. We're going to visualise this with images like the above, representing the two different camps, though at the moment what I've plotted is random, there's no peer pressure going on yet.\n",
|
||||
"\n",
|
||||
"The way that a physicist would quantify this peer pressure is to assign a number to each state, lower numbers meaning more of the little creatures are doing the same thing as their neighbours. We'll call this the Energy, because physicists always call things Energy, that's just what we do.\n",
|
||||
"\n",
|
||||
"To calculate the energy what we're gonna do is look at all the pixels/creatures, and for each one, we look at the four neighbours to the N/E/S/W, everytime we find a neighbour that agrees, we'll subtract 1 from our total and every time we find neighbours that disagree we'll add 1 to our total. Creatures at the edges will simply have fewer neighbours to worry about. \n",
|
||||
"To calculate the energy what we're going to do is look at all the pixels/creatures, and for each one, we look at the four neighbours to the N/E/S/W, every time we find a neighbour that agrees, we'll subtract 1 from our total and every time we find neighbours that disagree we'll add 1 to our total. Creatures at the edges will simply have fewer neighbours to worry about. \n",
|
||||
"\n",
|
||||
"I'll show you what the equation for this looks like, but don't worry to much about it, the word description should be enough to write some code. If we assign the ith creature the label $s_i = \\pm1$ then the energy is \n",
|
||||
"I'll show you what the equation for this looks like, but don't worry too much about it, the word description should be enough to write some code. If we assign the ith creature the label $s_i = \\pm1$ then the energy is \n",
|
||||
"$$E = \\sum_{(i,j)} s_i s_j$$\n",
|
||||
"\n",
|
||||
"Ok let's do some little tests, let's make the all up, all down and random state and see if we can compute their energies."
|
||||
@ -389,25 +396,25 @@
|
||||
"source": [
|
||||
"### Making it a little faster\n",
|
||||
"\n",
|
||||
"This project is not intended to focus on optimising for performance but it is worth putting a little effort into making this function faster so that we can run experiments more quickly later.\n",
|
||||
"This project is not intended to focus on optimising for performance, but it is worth putting a little effort into making this function faster so that we can run experiments more quickly later.\n",
|
||||
"\n",
|
||||
"The main thing that slows us down here is that we've written a 'tight loop' in pure python, the energy function is just a loop over the fundamental operation:\n",
|
||||
"```python\n",
|
||||
"E -= state[i,j] * state[i+di, j]\n",
|
||||
"```\n",
|
||||
"which in theoy only requires a few memory load operations, a multiply, an add and a store back to memory (give or take). However because Python is such a dynamic language, it will have to do extra things like check the type and methods of `state` and `E`, invoke their array access methods `object.__get__`, etc etc. We call this extra work overhead.\n",
|
||||
"which in theory only requires a few memory load operations, a multiply, an add and a store back to memory (give or take). However, because Python is such a dynamic language, it will have to do extra things like check the type and methods of `state` and `E`, invoke their array access methods `object.__get__`, etc. We call this extra work overhead.\n",
|
||||
"\n",
|
||||
"In most cases the ratio of overhead to actual computation is not too bad, but here because the fundamental computation is so simple it's likely the overhead accounts for much more of the overal time.\n",
|
||||
"In most cases the ratio of overhead to actual computation is not too bad, but here because the fundamental computation is so simple it's likely the overhead accounts for much more of the overall time.\n",
|
||||
"\n",
|
||||
"In scientific python like this there are usually two main options for reducing the overhead:\n",
|
||||
"\n",
|
||||
"#### Using Arrays\n",
|
||||
"One way is we work with arrays of numbers and operations defined over those arrays such as `sum`, `product` etc. `Numpy` is the canonical example of this in Python but many machine learning libraries are essentually doing a similar thing. We rely on the library to implement the operations efficiently and try to chain those operations together to achieve what we want. This imposes some limitations on the way we can write our code.\n",
|
||||
"One way is we work with arrays of numbers and operations defined over those arrays such as `sum`, `product` etc. `NumPy` is the canonical example of this in Python, but many machine learning libraries are essentially doing a similar thing. We rely on the library to implement the operations efficiently and try to chain those operations together to achieve what we want. This imposes some limitations on the way we can write our code.\n",
|
||||
"\n",
|
||||
"#### Using Compilation\n",
|
||||
"The alternative is that we convert our Python code into a more efficient form that incurs less overhead. This requires a compilation or transpilation step and imposes a different set of constraints on the code.\n",
|
||||
"\n",
|
||||
"It's a little tricky to decide which of the two approaches will work best for a given problem. My advice would be to have some familiarity with both but ultimatly to use what makes your development experience the best, since you'll likely spend more time writing the code than you will waiting for it to run!"
|
||||
"It's a little tricky to decide which of the two approaches will work best for a given problem. My advice would be to have some familiarity with both but ultimately to use what makes your development experience the best, since you'll likely spend more time writing the code than you will waiting for it to run!"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -613,7 +620,7 @@
|
||||
"## Conclusion\n",
|
||||
"So far we've discussed the problem we want to solve, written a little code, tested it a bit and made some speed improvements.\n",
|
||||
"\n",
|
||||
"In the next notebook we will package the code up into a little python package, this is has two big benefits to use: \n",
|
||||
"In the next notebook we will package the code up into a little python package, this has two big benefits when using the code: \n",
|
||||
"1. I won't have to redefine the energy function we just wrote in the next notebook \n",
|
||||
"1. It will help with testing and documenting our code later"
|
||||
]
|
||||
|
@ -30,9 +30,9 @@
|
||||
"- [Packaging for pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html)\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"Before we can do any testing, it is best practice to structure and then package your code up as a python project up. You don't have to do it like this but but it carrys with it the benefit that many only tutorial _expect_ you to do it like this and generally you want to reduce friction for yourself later. \n",
|
||||
"Before we can do any testing, it is best practice to structure and then package your code up as a python project. You don't have to do it like this, but it carries with it the benefit that many other tutorials _expect_ you to do it like this, and generally you want to reduce friction for yourself later. \n",
|
||||
"\n",
|
||||
"Like all things progamming, there are many opinions about how python projects should be structured, as I write this the structure of this repository is this: (This is the lightly edited output of the `tree` command if you're interested) \n",
|
||||
"Like all things programming, there are many opinions about how python projects should be structured, as I write this the structure of this repository is this: (This is the lightly edited output of the `tree` command if you're interested) \n",
|
||||
"```bash\n",
|
||||
".\n",
|
||||
"├── CITATION.cff # This file describes how to cite the work contained in this repository.\n",
|
||||
@ -41,7 +41,7 @@
|
||||
"├── docs\n",
|
||||
"│ ├── ... #Files to do with making the documentation\n",
|
||||
"│ └── learning\n",
|
||||
"│ └── #The Jupyer notebooks that form the main body of this project\n",
|
||||
"│ └── #The Jupyter notebooks that form the main body of this project\n",
|
||||
"│\n",
|
||||
"├── pyproject.toml # Machine readable information about the MCFF package\n",
|
||||
"├── readthedocs.yml # Tells readthedocs.com how to build the documentation\n",
|
||||
@ -53,15 +53,15 @@
|
||||
"└── tests # automated tests for the code\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"It's looks pretty intimidating! But let's quickly go through it, at the top level of most projects you'll find on Github and elsewhere you'll find files to do with the project as a whole:\n",
|
||||
"It looks pretty intimidating! But let's quickly go through it: at the top level of most projects you'll find on GitHub (and elsewhere) there are a group of files that describe the project as a whole or provide key project information - not all projects will have all of these files and, indeed, there a variety of other files that you may also see so this is an example of some of the more important files:\n",
|
||||
"- `README.md` - An intro to the project\n",
|
||||
"- `LICENSE` - The software license that governs this project, there are a few standard ones people use.\n",
|
||||
"- `environment.yaml` (or both) this list what python packages the project needs in a standard format\n",
|
||||
"- `environment.yml` (or alternatives) - this lists what Python packages the project needs in a standard format (other languages have equivalents).\n",
|
||||
"- `CITATION.cff` This is the new standard way to describe how a work should be cited, v useful for academic software.\n",
|
||||
"\n",
|
||||
"Then below that you will usually have directories breaking the project up into main categories, here I have `code/` and `learning/` but it would be more typical to have what is in `code` at the top level.\n",
|
||||
"Then below that you will usually have directories breaking the project up into main categories, here I have `src/` and `docs/learning/`.\n",
|
||||
"\n",
|
||||
"Inside `code/` we have a standard python package directory structure.\n",
|
||||
"Inside `src/` we have a standard Python package directory structure.\n",
|
||||
"\n",
|
||||
"## Packaging\n",
|
||||
"There are a few things going on here, our actual code lives in `MCFF/` which is wrapped up inside a `src` folder, the `src` thing is a convention related to pytests, check [Packaging for pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html) if you want the gory details.\n",
|
||||
@ -76,24 +76,16 @@
|
||||
"`pyproject.toml` and `setup.cfg` are the current way to describe the metadata about a python package like how it should be installed and who the author is etc, but typically you just copy the standard layouts and build from there. The empty `__init__.py` file flags that this folder is a python module.\n",
|
||||
"\n",
|
||||
"pyproject.toml:\n",
|
||||
"```\n",
|
||||
"```toml\n",
|
||||
"[build-system]\n",
|
||||
"requires = [\"setuptools>=4.2\"]\n",
|
||||
"build-backend = \"setuptools.build_meta\"\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"requirements.txt\n",
|
||||
"```\n",
|
||||
"ipykernel\n",
|
||||
"numpy\n",
|
||||
"scipy\n",
|
||||
"matplotlib\n",
|
||||
"numba\n",
|
||||
"```\n",
|
||||
"`ipykernel` is there because it lets you run the envronment in a jupyter notebook easily. \n",
|
||||
"\n",
|
||||
"setup.cfg\n",
|
||||
"```\n",
|
||||
"```ini\n",
|
||||
"[metadata]\n",
|
||||
"name = MCFF\n",
|
||||
"version = 0.0.1\n",
|
||||
"author = Tom Hodson\n",
|
||||
"author_email = tch14@ic.ac.uk\n",
|
||||
@ -112,26 +104,29 @@
|
||||
"packages = find:\n",
|
||||
"python_requires = >=3.6\n",
|
||||
"install_requires =\n",
|
||||
" numpy == 1.21 \n",
|
||||
" scipy == 1.7\n",
|
||||
" matplotlib == 3.5\n",
|
||||
" numba == 0.55\n",
|
||||
" ipykernel == 6.9 # Allows this conda environment to show up automatically in Jupyter Lab\n",
|
||||
" watermark == 2.3 # Generates a summary of package version for use inside Jupyter Notebooks\n",
|
||||
" numpy == 1.21\n",
|
||||
" scipy == 1.7\n",
|
||||
" matplotlib == 3.5\n",
|
||||
" numba == 0.55\n",
|
||||
"\n",
|
||||
"[options.extras_require]\n",
|
||||
"dev =\n",
|
||||
" pytest == 7.1 # Testing\n",
|
||||
" pytest-cov == 3.0 # For Coverage testing\n",
|
||||
" hypothesis == 6.29 # Property based testing\n",
|
||||
" pre-commit == 2.20\n",
|
||||
" jupyterlab == 3.4.3\n",
|
||||
" ipykernel == 6.9 # Allows this conda environment to show up automatically in Jupyter Lab\n",
|
||||
" watermark == 2.3 # Generates a summary of package version for use inside Jupyter Notebooks\n",
|
||||
"\n",
|
||||
"docs =\n",
|
||||
" sphinx == 5.0.0\n",
|
||||
" myst-nb == 0.16.0\n",
|
||||
"\n",
|
||||
"[options.packages.find]\n",
|
||||
"where = src\n",
|
||||
"dev = \n",
|
||||
" pytest == 7.1 # Testing\n",
|
||||
" pytest-cov == 3.0 # For Coverage testing\n",
|
||||
" hypothesis == 6.29 # Property based testing\n",
|
||||
" pre-commit == 2.20\n",
|
||||
" \n",
|
||||
"docs = \n",
|
||||
" sphinx == 5.0 # For building the documentation\n",
|
||||
" myst-nb == 0.16 \n",
|
||||
"```\n",
|
||||
"Phew, that was a lot. Python packaging has been evolving a lot over the years and the consequence is there is a lot of out of date advice and there are many other ways to do this. You're best bet to figure out what the current best practice is is to consult offical sources like python.org"
|
||||
"Phew, that was a lot. Python packaging has been evolving a lot over the years and the consequence is there is a lot of out of date advice and there are many other ways to do this. You're best bet to figure out what the current best practice is to consult official sources like python.org."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -139,11 +134,11 @@
|
||||
"id": "cef1ba97-db03-45ce-b428-a027133eabc9",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Once all that is setup, cd to the `code/` folder and install the module using:\n",
|
||||
"Once all that is set up, from the top level of the project you can run:\n",
|
||||
"```bash\n",
|
||||
"pip install --editable \".[dev,docs]\"\n",
|
||||
"```\n",
|
||||
"The dot means we should install MCFF from the current directory and `--editable` means to do it as an editable package so that we can edit the files in MCFF and not have to reinstall. This is really useful for development. `[dev,docs]` means we also want to install the packages that are needed to do development of this repository and to build the documentation, boths those things will become relevant later!"
|
||||
"The dot means we should install MCFF from the current directory and `--editable` means to do it as an editable package so that we can edit the files in MCFF and not have to reinstall. This is really useful for development. `[dev,docs]` means we also want to install the packages that are needed to do development of this repository and to build the documentation, both those things will become relevant later!"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -30,7 +30,7 @@
|
||||
"\n",
|
||||
"np.random.seed(\n",
|
||||
" 42\n",
|
||||
") # This makes our random numbers reproducable when the notebook is rerun in order"
|
||||
") # This makes our random numbers reproducible when the notebook is rerun in order"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -48,14 +48,14 @@
|
||||
"1. We've also got some nice tests running that give us some confidence the code is right.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"There isn't that much more work to do Markov Chain Monte Carlo. I won't go into the details of how MCMC works but put very simply MCMC lets us calculate thermal averages of a physical system at some temperature. For example, the physical system might be \"[$10^{23}$][wa] H20 molecules in a box\" and the thermal average we want is \"Are they organised like a solid or a liquid?\". We can ask that question at different temperatures and we will get different answers.\n",
|
||||
"There isn't that much more work to do Markov Chain Monte Carlo. I won't go into the details of how MCMC works but put very simply MCMC lets us calculate thermal averages of a physical system at some temperature. For example, the physical system might be \"[$10^{23}$][wa] H20 molecules in a box\" and the thermal average we want is \"Are they organised like a solid or a liquid?\". We can ask that question at different temperatures, and we will get different answers.\n",
|
||||
"\n",
|
||||
"\n",
|
||||
"For our Ising model the equivalent question would be what's the average color of this system? At high temperatures we expect the pixels to be random and average out ot grey, while at low temperatures they will all be either black or while.\n",
|
||||
"For our Ising model the equivalent question would be what's the average color of this system? At high temperatures we expect the pixels to be random and average out grey, while at low temperatures they will all be either black or white.\n",
|
||||
"\n",
|
||||
"What happens in between? This question is pretty hard to answer using maths, it can be done for the 2D Ising model but for anything more complicated it's pretty much impossible. This is where MCMC comes in.\n",
|
||||
"\n",
|
||||
"MCMC is a numerical method that lets us calculate these thermal averages. MCMC is essentially a description of how to probalistically step from one state of the system to another. \n",
|
||||
"MCMC is a numerical method that lets us calculate these thermal averages. MCMC is essentially a description of how to probabilistically step from one state of the system to another. \n",
|
||||
"\n",
|
||||
"If we perform these steps many times we get a (Markov) chain of states. The great thing about this chain is that if we average a measurement over it, such as looking at the average proportion of white pixels, the answer we get will be close to the real answer for this system and will converge closer and closer to the true answer as we extend the chain. \n",
|
||||
"\n",
|
||||
@ -139,9 +139,9 @@
|
||||
"id": "5d1874d4-4585-49ed-bc6f-b11c22231669",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"These images give a flavour of why physicists find this model useful, it gives window into how thermal noise and spontaneous order interact. At low temperatures the energy cost of being different from your neighbours is the most important thing, while at high temperatures, it doesn't matter and you really just do your own thing.\n",
|
||||
"These images give a flavour of why physicists find this model useful, it gives a window into how thermal noise and spontaneous order interact. At low temperatures the energy cost of being different from your neighbours is the most important thing, while at high temperatures, it doesn't matter, and you really just do your own thing.\n",
|
||||
"\n",
|
||||
"There's a special point somewhere in the middle called the critical point $T_c$ where all sorts of cool things happen, but my favourite is that for large system sizes you get a kind of fractal behaviour which I will demonstrate more once we've sped this code up and can simulate larger systems in a reasonable time. You can kinda see it for 50x50 systesm at T = 5 but not really clearly."
|
||||
"There's a special point somewhere in the middle called the critical point $T_c$ where all sorts of cool things happen, but my favourite is that for large system sizes you get a kind of fractal behaviour which I will demonstrate more once we've sped this code up and can simulate larger systems in a reasonable time. You can kinda see it for a 50x50 system at T = 5 but not really clearly."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -160,7 +160,7 @@
|
||||
"I've already missed at least one devastating bug in this code, and there are almost certainly more! Before we start adding too much new code we should think about how to increase our confidence that the individual components are working correctly. It's very easy to build a huge project out of hundreds of functions, realise there's a bug and then struggle to find the source of that bug. If we test our components individually and thoroughly, we can avoid some of that pain.\n",
|
||||
"\n",
|
||||
"**Performance**\n",
|
||||
"Performance only matters in so far as it limits what we can do. And there is a real danger that trying to optimise for performance too early or in the wrong places will just lead to complexity that makes the code harder to read, harder to write and more likely to contain bugs. However I do want to show you the fractal states at the critical point, and I can't currently generate those images in a reasonable time, so some optimisation will happen!"
|
||||
"Performance only matters in so far as it limits what we can do. And there is a real danger that trying to optimise for performance too early or in the wrong places will just lead to complexity that makes the code harder to read, harder to write and more likely to contain bugs. However, I do want to show you the fractal states at the critical point, and I can't currently generate those images in a reasonable time, so some optimisation will happen!"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
File diff suppressed because one or more lines are too long
@ -30,7 +30,7 @@
|
||||
"\n",
|
||||
"np.random.seed(\n",
|
||||
" 42\n",
|
||||
") # This makes our random numbers reproducable when the notebook is rerun in order"
|
||||
") # This makes our random numbers reproducible when the notebook is rerun in order"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -40,7 +40,7 @@
|
||||
"source": [
|
||||
"# Adding Functionality\n",
|
||||
"\n",
|
||||
"The main thing we want to be able to do is to take measurements, the code as I have writting it doesn't really allow that because it only returns the final state in the chain. Let's say we have a measurement called `average_color(state)` that we want to average over the whole chain. We could just stick that inside our definition of `mcmc` but we know that we will likely make other measurements too and we don't want to keep writing new versions of our core functionality!\n",
|
||||
"The main thing we want to be able to do is to take measurements. The code, as I have written it, doesn't really allow that because it only returns the final state in the chain. Let's say we have a measurement called `average_color(state)` that we want to average over the whole chain. We could just stick that inside our definition of `mcmc`, but we know that we will likely make other measurements too, and we don't want to keep writing new versions of our core functionality!\n",
|
||||
"\n",
|
||||
"## Exercise 1\n",
|
||||
"Have a think about how you would implement this and what options you have."
|
||||
@ -52,11 +52,11 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Solution 1\n",
|
||||
"So I chatted with my mentors on this project on how to best do this and we came up with a few ideas:\n",
|
||||
"So I chatted with my mentors on this project on how to best do this, and we came up with a few ideas:\n",
|
||||
"\n",
|
||||
"### Option 1: Just save all the states and return them\n",
|
||||
"\n",
|
||||
"The problem with this is the states are very big and we don't want to waste all that memory. For an NxN state that uses 8 bit integers (the smallest we can use in numpy) 1000 samples would already use 2.5Gb of memory! We will see later that we'd really like to be able to go a bit bigger than 50x50 and 1000 samples!\n",
|
||||
"The problem with this is the states are very big, and we don't want to waste all that memory. For an `NxN` state that uses 8-bit integers (the smallest we can use in NumPy) `1000` samples would already use `2.5GB` (2.5 gigabytes) of memory! We will see later that we'd really like to be able to go a bit bigger than `50x50` and `1000` samples!\n",
|
||||
"\n",
|
||||
"### Option 2: Pass in a function to make measurements\n",
|
||||
"```python\n",
|
||||
@ -73,7 +73,7 @@
|
||||
" return measurements\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This could work but it limits how we can store measurements and what shape and type they can be. What if we want to store our measurements in a numpy array? Or what if your measurement itself is a vector or and object that can't easily be stored in a numpy array? We would have to think carefully about what functionality we want."
|
||||
"This could work, but it limits how we can store measurements and what shape and type they can be. What if we want to store our measurements in a NumPy array? Or what if your measurement itself is a vector or an object that can't easily be stored in a NumPy array? We would have to think carefully about what functionality we want."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -106,7 +106,7 @@
|
||||
"measurements = color_sampler.run(...)\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"This would definitely work but I personally am not a huge fan of object oriented programming so I'm gonna skip this option!"
|
||||
"This would definitely work, but I personally am not a huge fan of object-oriented programming, so I'm going to skip this option!"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -153,7 +153,7 @@
|
||||
"id": "b74fadbe-80c2-4a20-b651-0e47188b005a",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"This requires only a very small change to our mcmc function and suddenly we can do whatever we like with the states! While we're at it I'm going to add an aditional argument `stepsize` that allows us to only sample the state every `stepsize` MCMC steps. You'll see why we would want to set this to value greater than 1 in a moment."
|
||||
"This requires only a very small change to our `mcmc` function, and suddenly we can do whatever we like with the states! While we're at it, I'm going to add an argument `stepsize` that allows us to only sample the state every `stepsize` MCMC steps. You'll see why we would want to set this to a value greater than 1 in a moment."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -215,7 +215,7 @@
|
||||
"source": [
|
||||
"Fun fact: if you replace `yield current_state.copy()` with `yield current_state` your python kernel will crash when you run the code. I believe this is a bug in Numba that related to how pointers to numpy arrays work but let's not worry too much about it. \n",
|
||||
"\n",
|
||||
"We take a factor of two slowdown but that doesn't seem so much to pay for the fact we can now sample the state at every single step rather than just the last."
|
||||
"We take a factor of two slowdown, but that doesn't seem so much to pay for the fact we can now sample the state at every single step rather than just the last."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -30,7 +30,7 @@
|
||||
"\n",
|
||||
"np.random.seed(\n",
|
||||
" 42\n",
|
||||
") # This makes our random numbers reproducable but only when the notebook is run once in order"
|
||||
") # This makes our random numbers reproducible but only when the notebook is run once in order"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -40,7 +40,7 @@
|
||||
"source": [
|
||||
"# Speeding It Up\n",
|
||||
"\n",
|
||||
"In order to show you a really big system will still need to make the code a bit faster. Right now we calculate the energy of each state, flip a pixel and then calculate the energy again. It turns out that you can actually directly calculate the energy change instead of doing this subtraction. Let's do this is a sort of test driven decelopment fashion: we want to write a function that when given a state and a pixel to flip, returns how much the energy goes up by (negative if down) upon performing the flip.\n",
|
||||
"In order to show you a really big system, we will still need to make the code a bit faster. Right now we calculate the energy of each state, flip a pixel, and then calculate the energy again. It turns out that you can actually directly calculate the energy change instead of doing this subtraction. Let's do this in a sort of test-driven development fashion: we want to write a function that when given a state and a pixel to flip, returns how much the energy goes up by (negative if down) upon performing the flip.\n",
|
||||
"\n",
|
||||
"I'll first write a slow version of this using the code we already have, and then use that to validate our faster version:"
|
||||
]
|
||||
@ -69,7 +69,7 @@
|
||||
"id": "7b16f42a-0178-4753-9e9d-2f78aed40509",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Now if you stare at the definition of energy long enough, you can convince yourself that the energy change when you flip one pixel only depends on the four surounding pixels in a simple way:"
|
||||
"Now if you stare at the definition of energy long enough, you can convince yourself that the energy change when you flip one pixel only depends on the four surrounding pixels in a simple way:"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -160,7 +160,7 @@
|
||||
"id": "e6ecbc7c-530f-494b-aa31-0a118a104328",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Ok great! And this function is much much faster because it only has to look at four pixels rather than all $N^2$ of them!"
|
||||
"Ok great! And this function is much, much faster because it only has to look at four pixels rather than all $N^2$ of them!"
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -30,7 +30,7 @@
|
||||
"\n",
|
||||
"np.random.seed(\n",
|
||||
" 42\n",
|
||||
") # This makes our random numbers reproducable when the notebook is rerun in order"
|
||||
") # This makes our random numbers reproducible when the notebook is rerun in order"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -40,7 +40,7 @@
|
||||
"source": [
|
||||
"# Producing Research Outputs\n",
|
||||
"\n",
|
||||
"So now that we have the ability to simulate our system lets do a little exploration. First let's take three temperatures at each we'll do 10 runs and see how the systems evolve. I'll also tack on a little histogram at the right hand side of where the systens spent their time."
|
||||
"So now that we have the ability to simulate our system let's do a little exploration. First let's take three temperatures. For each we'll do `10` runs and see how the systems evolve. I'll also tack on a little histogram at the right-hand side showing where the systems spent their time."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -63,7 +63,7 @@
|
||||
"steps = 200 # How many times to sample the state\n",
|
||||
"stepsize = N**2 # How many individual monte carlo flips to do in between each sample\n",
|
||||
"N_repeats = 10 # How many times to repeat each run at fixed temperature\n",
|
||||
"initial_state = np.ones(shape=(N, N)) # the intial state to use\n",
|
||||
"initial_state = np.ones(shape=(N, N)) # the initial state to use\n",
|
||||
"flips = (\n",
|
||||
" np.arange(steps) * stepsize\n",
|
||||
") # Use this to plot the data in terms of individual flip attemps\n",
|
||||
@ -138,9 +138,9 @@
|
||||
"source": [
|
||||
"There are a few key takeaways about MCMC in this plot:\n",
|
||||
"\n",
|
||||
"- It takes a while for MCMC to 'settle in', you can see that for T = 10 the natural state is somewhere around c = 0, which takes about 2000 steps to reach from the initial state with c = 1. In general when doing MCMC we want to throw away some of the values at the beginging because they're too affected by the initial state.\n",
|
||||
"- At High and Low temperatures the we basically just get small fluctuations about an average value\n",
|
||||
"- At intermediate temperature the fluctuations occur on much longer time scales! Because the systems can only move a little bit each timestep, it means that the measurements we are making are *correlated* with themselves at previous times. The result of this is that if we use MCMC to draw N samples, we don't get as much information as if we had drawn samples from an uncorrelated variable (like a die roll for instance)."
|
||||
"- It takes a while for MCMC to 'settle in', you can see that for T = 10 the natural state is somewhere around c = 0, which takes about 2000 steps to reach from the initial state with c = 1. In general when doing MCMC we want to throw away some values at the beginning because they're affected too much by the initial state.\n",
|
||||
"- At High and Low temperatures we basically just get small fluctuations around an average value\n",
|
||||
"- At intermediate temperatures the fluctuations occur on much longer time scales! Because the systems can only move a little each timestep, it means that the measurements we are making are *correlated* with themselves at previous times. The result of this is that if we use MCMC to draw N samples, we don't get as much information as if we had drawn samples from an uncorrelated variable (like a die roll for instance)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -15,15 +15,15 @@
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"# Doing Reproducible Science\n",
|
||||
"Further Reading on this software reproducability: [The Turing Way: Guide to producing reproducable research](https://the-turing-way.netlify.app/reproducible-research/reproducible-research.html)\n",
|
||||
"Further reading on the reproducibility of software outputs: [The Turing Way: Guide to producing reproducible research](https://the-turing-way.netlify.app/reproducible-research/reproducible-research.html)\n",
|
||||
"\n",
|
||||
"In the last chapter we made a nice littel graph, let's imagine we wanted to include that in a paper and we want other researchers to be able to understand and reproduce how it was generated.\n",
|
||||
"In the last chapter we made a nice little graph, let's imagine we wanted to include that in a paper, and we want other researchers to be able to understand and reproduce how it was generated.\n",
|
||||
"\n",
|
||||
"There are many aspects to this but I'll list what I think is relevant here:\n",
|
||||
"There are many aspects to this, but I'll list what I think is relevant here:\n",
|
||||
"1. We have some code that generates the data and some code that uses it to plot the output, let's split that into two python files.\n",
|
||||
"2. Our code has external dependencies on numpy and matplotlib, in the future those packages could change their behaviour in a way that breaks the code, so lets record what version our code is compatible with.\n",
|
||||
"2. Our code has external dependencies on `numpy` and `matplotlib`. In the future those packages could change their behaviour in a way that breaks (our changes the output of) our code, so let's record what version our code is compatible with.\n",
|
||||
"3. We also have an internal dependency on other code in this MCFF repository, that could also change so let's record the git hash of the commit where the code works for posterity.\n",
|
||||
"4. The data generating process is random so we'll fix the seed as discussed in the testing section to make it reproducable.\n",
|
||||
"4. The data generation process is random, so we'll fix the seed as discussed in the testing section to make it reproducible.\n",
|
||||
"\n"
|
||||
]
|
||||
},
|
||||
@ -36,27 +36,41 @@
|
||||
"\n",
|
||||
"There are many ways to specify the versions of your python packages but the two most common are with a `requirements.txt` or with an `environment.yml`.\n",
|
||||
"\n",
|
||||
"`requirements.txt` is quite simple and just lists packages and versions i.e `numpy==1.21`that can be installed with `pip install -r requirements.txt`, more details of the format [here][requirements]. The problem with requirements.txt is that it can only tell you about software installable with pip, which is often not enough.\n",
|
||||
"`requirements.txt` is quite simple and just lists packages and versions i.e. `numpy==1.21`that can be installed with `pip install -r requirements.txt`, more details of the format [here][requirements]. The problem with requirements.txt is that it can only tell you about software installable with pip, which is often not enough.\n",
|
||||
"\n",
|
||||
"Consequently, many people now use `environment.yml` files, which work with conda. There's a great intro to all this [here][conda-intro] which I will not reproduce here. The gist of it is that we end up with a file that looks like this:\n",
|
||||
"```\n",
|
||||
"```yaml\n",
|
||||
"#contents of environment.yml\n",
|
||||
"name: recode\n",
|
||||
"name: mcmc\n",
|
||||
"\n",
|
||||
"channels: # tells us what conda channels we need to look in\n",
|
||||
"channels:\n",
|
||||
" - defaults\n",
|
||||
" - conda-forge\n",
|
||||
"\n",
|
||||
"dependencies:\n",
|
||||
" - python=3.9\n",
|
||||
" - pytest=7.1\n",
|
||||
" - pytest-cov=3.0\n",
|
||||
" - ipykernel=6.9\n",
|
||||
"\n",
|
||||
" # Core packages\n",
|
||||
" - numpy=1.21\n",
|
||||
" - scipy=1.7\n",
|
||||
" - matplotlib=3.5\n",
|
||||
" - numba=0.55\n",
|
||||
" - pre-commit\n",
|
||||
" - ipykernel=6.9 # Allows this conda environment to show up automatically in Jupyter Lab\n",
|
||||
" - watermark=2.3 # Generates a summary of package version for use inside Jupyter Notebooks\n",
|
||||
"\n",
|
||||
" # Testing\n",
|
||||
" - pytest=7.1 # Testing\n",
|
||||
" - pytest-cov=3.0 # For Coverage testing\n",
|
||||
" - hypothesis=6.29 # Property based testing\n",
|
||||
"\n",
|
||||
" # Development\n",
|
||||
" - pre-commit=2.20 # For running black and other tools before commits\n",
|
||||
"\n",
|
||||
" # Documentation\n",
|
||||
" - sphinx=5.0 # For building the documentation\n",
|
||||
" - myst-nb=0.16 # Allows sphinx to include Jupyter Notebooks\n",
|
||||
"\n",
|
||||
" # Installing MCFF itself\n",
|
||||
" - pip=21.2\n",
|
||||
" - pip:\n",
|
||||
" - --editable . #install MCFF from the local repository using pip and do it in editable mode\n",
|
||||
@ -112,8 +126,9 @@
|
||||
"id": "07c8092c-dd23-470c-adc6-002ccd8e44d0",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"So I also output `conda env export`, this is annoying because it also gives you dependencies. While the versions of dependencies coudl potentially be important we usually draw the line at just listing the version of directly required packages. So what I usually do is to take the above output and then use the the output of `conda env export` to set the version numbers, leaving out the number because that indicates non-breaking changes\n",
|
||||
"```\n",
|
||||
"So I also output `conda env export`, this is annoying because it also gives you dependencies. While the versions of dependencies could potentially be important we usually draw the line at just listing the version of directly required packages. So what I usually do is to take the above output and then use the output of `conda env export` to set the version numbers, leaving out the number because that indicates non-breaking changes\n",
|
||||
"\n",
|
||||
"```yaml\n",
|
||||
"#output of conda env export\n",
|
||||
"name: recode\n",
|
||||
"channels:\n",
|
||||
@ -137,13 +152,13 @@
|
||||
"id": "fc858ba3-49db-47e3-89c8-a9945b61a8fb",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"## Spliting the code into files\n",
|
||||
"## Splitting the code into files\n",
|
||||
"\n",
|
||||
"To avoid you having to go away and find the files, I'll just put them here. Let's start with the file that generates the data. I'll give it what I hope is an informative name and a shebang so that we can run it with `./generate_montecarlo_walkers.py` (after doing `chmod +x generate_montecarlo_walkers.py` just once).\n",
|
||||
"\n",
|
||||
"I'll set the seed using a large pregenerated seed, you've likely seen me use 42 in some places but that's not really best practive because it might not be entropy to reliably seed the generator.\n",
|
||||
"I'll set the seed using a large pregenerated seed, you've likely seen me use `42` in some places, but that's not really best practice because it might not provide good enough entropy to reliably seed the generator.\n",
|
||||
"\n",
|
||||
"I've also added some code that get's the commit hash of MCFF and saves it into the data file along with the date. This helps us keep track of the generated data too."
|
||||
"I've also added some code that gets the commit hash of MCFF and saves it into the data file along with the date. This helps us keep track of the generated data too."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -240,7 +255,7 @@
|
||||
"\n",
|
||||
"np.random.seed(\n",
|
||||
" seed\n",
|
||||
") # This makes our random numbers reproducable when the notebook is rerun in order\n",
|
||||
") # This makes our random numbers reproducible when the notebook is rerun in order\n",
|
||||
"\n",
|
||||
"### The measurement we will make ###\n",
|
||||
"def average_color(state):\n",
|
||||
@ -253,7 +268,7 @@
|
||||
"steps = 200 # How many times to sample the state\n",
|
||||
"stepsize = N**2 # How many individual monte carlo flips to do in between each sample\n",
|
||||
"N_repeats = 10 # How many times to repeat each run at fixed temperature\n",
|
||||
"initial_state = np.ones(shape=(N, N)) # the intial state to use\n",
|
||||
"initial_state = np.ones(shape=(N, N)) # the initial state to use\n",
|
||||
"flips = (\n",
|
||||
" np.arange(steps) * stepsize\n",
|
||||
") # Use this to plot the data in terms of individual flip attemps\n",
|
||||
@ -372,11 +387,11 @@
|
||||
"source": [
|
||||
"## Citations and DOIs\n",
|
||||
"\n",
|
||||
"Now that we have a nicely reproducable plot, let's share it with the world. The easiest way is probably to put your code in a hosted git repository like Github or Gitlab. \n",
|
||||
"Now that we have a nicely reproducible plot, let's share it with the world. The easiest way is probably to put your code in a hosted git repository like GitHub or GitLab. \n",
|
||||
"\n",
|
||||
"Next, let's mint a shiny Digital Object Identifier (DOI) for the repository, using something like [Zenodo](https://zenodo.org/). These services archive a snapshot of the repository and assign a DOI to that snapshot, this is realy useful for citing a particular version of the software. \n",
|
||||
"Next, let's mint a shiny Digital Object Identifier (DOI) for the repository, using something like [Zenodo](https://zenodo.org/). These services archive a snapshot of the repository and assign a DOI to that snapshot, this is really useful for citing a particular version of the software, e.g. in a publication (and helping to ensure that published results are reproducible by others). \n",
|
||||
"\n",
|
||||
"Finally let's add a `citation.cff` file to the root of the repository, this makes it easier for people who might cite this software to generate a good citation for it. We can add the zenodo DOI to it too. You can read more about `citation.cff` files [here](https://citation-file-format.github.io/) and there is a convenient generator tool [here](https://citation-file-format.github.io/cff-initializer-javascript/)."
|
||||
"Finally, let's add a `CITATION.cff` file to the root of the repository, this makes it easier for people who want to cite this software to generate a good citation for it. We can add the Zenodo DOI to it too. You can read more about `CITATION.cff` files [here](https://citation-file-format.github.io/) and there is a convenient generator tool [here](https://citation-file-format.github.io/cff-initializer-javascript/)."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -30,7 +30,7 @@
|
||||
"source": [
|
||||
"We'll use sphinx along with a couple plugins: [autodoc][autodoc] allows us to generate documentation automatically from the docstrings in our source code, while [napoleon][napoleon] allows us to use [NUMPYDOC][numpydoc] and Google formats for the docstrings in addition to [reStructuredText][rst]\n",
|
||||
"\n",
|
||||
"What this means is that we'll be able to write documentation directly into the source code and it will get rendered into a nice website. This helps keep the documentation up to date beause it's right there next to the code!\n",
|
||||
"What this means is that we'll be able to write documentation directly into the source code and it will get rendered into a nice website. This helps keep the documentation up to date because it's right there next to the code and the web-based documentation will get automatically re-generated every time the documentation files are updated!\n",
|
||||
"\n",
|
||||
"[autodoc]: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html\n",
|
||||
"[napoleon]: https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html\n",
|
||||
@ -95,8 +95,10 @@
|
||||
"We add the extensions by adding this to `conf.py` too:\n",
|
||||
"```python\n",
|
||||
"extensions = [\n",
|
||||
" 'sphinx.ext.autodoc',\n",
|
||||
" 'sphinx.ext.napoleon',\n",
|
||||
" 'sphinx.ext.autodoc', # for generating documentation from the docstrings in our code\n",
|
||||
" 'sphinx.ext.napoleon', # for parsing Numpy and Google stye docstrings\n",
|
||||
" 'sphinx.ext.mathjax', # for equation rendering\n",
|
||||
"\n",
|
||||
"]\n",
|
||||
"```"
|
||||
]
|
||||
@ -131,7 +133,7 @@
|
||||
" return True\n",
|
||||
"```\n",
|
||||
"\n",
|
||||
"I normally just copy paste this and go from there but there's a full description [here](https://numpydoc.readthedocs.io/en/latest/format.html). You can also check out the docstrings in MCFF"
|
||||
"I normally just copy and paste this and go from there, but there's a full description [here](https://numpydoc.readthedocs.io/en/latest/format.html). You can also check out the docstrings in MCFF."
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -142,7 +144,7 @@
|
||||
"## Making the function declarations a bit nicer\n",
|
||||
"Longer function names in the generated documentation currently generate with no line break, I found a fix for that buried [inside a bug report on sphinx](https://github.com/sphinx-doc/sphinx/issues/1514#issuecomment-742703082) \n",
|
||||
"\n",
|
||||
"It involves adding some custom css and an extra line to `conf.py`:\n",
|
||||
"It involves adding some custom CSS and an extra line to `conf.py`:\n",
|
||||
"```python\n",
|
||||
"html_css_files = [\n",
|
||||
" 'custom.css',\n",
|
||||
@ -155,7 +157,7 @@
|
||||
"id": "c771a57d-c802-429a-b051-1bd0364b9317",
|
||||
"metadata": {},
|
||||
"source": [
|
||||
"Finally we add a `readthedocs.yaml` file (which you can copy from the root of this repo) to tell readthedocs how to build our documentation. https://docs.readthedocs.io/en/stable/config-file/v2.html#packages"
|
||||
"Finally, we add a `readthedocs.yaml` file (which you can copy from the root of this repo) to tell readthedocs how to build our documentation. https://docs.readthedocs.io/en/stable/config-file/v2.html#packages"
|
||||
]
|
||||
},
|
||||
{
|
||||
@ -173,7 +175,7 @@
|
||||
"source": [
|
||||
"### Documentation Ideas\n",
|
||||
"\n",
|
||||
"Readthedocs can be a bit tricky to setup, it is also possible to use Github pages to acomplish something similar. Another idea is to include some simple copyable code snippets in a quickstart guide. This lets people get up and running your code more quickly than is they need to read the API docs to understand how to interact with your module."
|
||||
"Readthedocs can be a bit tricky to set up, it is also possible to use [GitHub Pages](https://pages.github.com/) to accomplish something similar. Another idea is to include some simple copyable code snippets in a quickstart guide. This lets people get up and running with your code more quickly than if they need to read the API documentation to understand how to interact with your module."
|
||||
]
|
||||
},
|
||||
{
|
||||
|
@ -1,4 +1,4 @@
|
||||
name: recode
|
||||
name: mcmc
|
||||
|
||||
channels:
|
||||
- defaults
|
||||
|
Loading…
x
Reference in New Issue
Block a user