Apply suggestions from code review

Co-authored-by: Jeremy Cohen <jcohen02@users.noreply.github.com>
This commit is contained in:
gnikit 2022-07-20 00:33:32 +01:00
parent 27453fcf95
commit f5e7e816dd
No known key found for this signature in database
GPG Key ID: E9A03930196133F0
10 changed files with 45 additions and 55 deletions

View File

@ -67,7 +67,7 @@ Take a look at the table of contents below and see if there are any topics that
When you're ready to dive in you have 4 options:
### 1. Launch them in Binder
### 1. Launch the notebooks in Binder
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/ImperialCollegeLondon/ReCoDE_MCMCFF/HEAD?urlpath=lab%2Ftree%2Fdocs%2Flearning%2F01%20Introduction.ipynb)
@ -82,7 +82,7 @@ pip install .[dev]
jupyter lab
```
_NOTE: Better performance but requires you have python and Jupyter installed_.
_NOTE: Better performance but requires you have Python and Jupyter installed_.
### 3. View the Jupyter notebooks non-interactively via the online documentation

View File

@ -25,7 +25,7 @@
"\n",
"or if for some reason you can't do that (because you are on a phone or tablet for instance) you can instead open this notebook in [binder](https://mybinder.org/v2/gh/TomHodson/ReCoDE_MCMCFF/HEAD)\n",
"\n",
"I would also suggest you set up a Python environment just for this. You can use your preferred method to do this, but I will recommend `Anaconda` because it's both what I currently use and what is recommended by Imperial.\n",
"I would also suggest you set up a Python environment just for this project. You can use your preferred method to do this, but I will recommend `Anaconda` because it's both what I currently use and what is recommended by Imperial.\n",
"\n",
"```bash\n",
"#make a new conda environment from the specification in environment.yml\n",
@ -88,7 +88,7 @@
"id": "e52245f1-8ecc-45f1-8d52-337916b0ce7c",
"metadata": {},
"source": [
"We're going to be working with arrays of numbers, so it will make sense to work with `Numpy`, and we'll also want to plot things, the standard choice for this is `matplotlib`, though there are other options, `pandas` and `plotly` being notable ones.\n",
"We're going to be working with arrays of numbers, so it will make sense to work with `NumPy`, and we'll also want to plot things, the standard choice for this is `Matplotlib`, though there are other options, `pandas` and `Plotly` being notable ones.\n",
"\n",
"Let me quickly plot something to aid the imagination:"
]
@ -409,7 +409,7 @@
"In scientific python like this there are usually two main options for reducing the overhead:\n",
"\n",
"#### Using Arrays\n",
"One way is we work with arrays of numbers and operations defined over those arrays such as `sum`, `product` etc. `Numpy` is the canonical example of this in Python, but many machine learning libraries are essentially doing a similar thing. We rely on the library to implement the operations efficiently and try to chain those operations together to achieve what we want. This imposes some limitations on the way we can write our code.\n",
"One way is we work with arrays of numbers and operations defined over those arrays such as `sum`, `product` etc. `NumPy` is the canonical example of this in Python, but many machine learning libraries are essentially doing a similar thing. We rely on the library to implement the operations efficiently and try to chain those operations together to achieve what we want. This imposes some limitations on the way we can write our code.\n",
"\n",
"#### Using Compilation\n",
"The alternative is that we convert our Python code into a more efficient form that incurs less overhead. This requires a compilation or transpilation step and imposes a different set of constraints on the code.\n",
@ -620,7 +620,7 @@
"## Conclusion\n",
"So far we've discussed the problem we want to solve, written a little code, tested it a bit and made some speed improvements.\n",
"\n",
"In the next notebook we will package the code up into a little python package, this has two big benefits to use: \n",
"In the next notebook we will package the code up into a little python package, this has two big benefits when using the code: \n",
"1. I won't have to redefine the energy function we just wrote in the next notebook \n",
"1. It will help with testing and documenting our code later"
]

View File

@ -30,7 +30,7 @@
"- [Packaging for pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html)\n",
"\n",
"\n",
"Before we can do any testing, it is best practice to structure and then package your code up as a python project up. You don't have to do it like this, but it carries with it the benefit that many other tutorials _expect_ you to do it like this, and generally you want to reduce friction for yourself later. \n",
"Before we can do any testing, it is best practice to structure and then package your code up as a python project. You don't have to do it like this, but it carries with it the benefit that many other tutorials _expect_ you to do it like this, and generally you want to reduce friction for yourself later. \n",
"\n",
"Like all things programming, there are many opinions about how python projects should be structured, as I write this the structure of this repository is this: (This is the lightly edited output of the `tree` command if you're interested) \n",
"```bash\n",
@ -53,15 +53,15 @@
"└── tests # automated tests for the code\n",
"```\n",
"\n",
"It's looks pretty intimidating! But let's quickly go through it, at the top level of most projects you'll find on GitHub, and elsewhere you'll find files to do with the project as a whole:\n",
"It looks pretty intimidating! But let's quickly go through it: at the top level of most projects you'll find on GitHub (and elsewhere) there are a group of files that describe the project as a whole or provide key project information - not all projects will have all of these files and, indeed, there a variety of other files that you may also see so this is an example of some of the more important files:\n",
"- `README.md` - An intro to the project\n",
"- `LICENSE` - The software license that governs this project, there are a few standard ones people use.\n",
"- `environment.yml` (or both) this list what python packages the project needs in a standard format\n",
"- `environment.yml` (or alternatives) - this lists what Python packages the project needs in a standard format (other languages have equivalents).\n",
"- `CITATION.cff` This is the new standard way to describe how a work should be cited, v useful for academic software.\n",
"\n",
"Then below that you will usually have directories breaking the project up into main categories, here I have `src/` and `docs/learning/`.\n",
"\n",
"Inside `src/` we have a standard python package directory structure.\n",
"Inside `src/` we have a standard Python package directory structure.\n",
"\n",
"## Packaging\n",
"There are a few things going on here, our actual code lives in `MCFF/` which is wrapped up inside a `src` folder, the `src` thing is a convention related to pytests, check [Packaging for pytest](https://docs.pytest.org/en/6.2.x/goodpractices.html) if you want the gory details.\n",
@ -190,9 +190,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.10 ('venv': venv)",
"display_name": "Python [conda env:recode]",
"language": "python",
"name": "python3"
"name": "conda-env-recode-py"
},
"language_info": {
"codemirror_mode": {
@ -204,12 +204,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"vscode": {
"interpreter": {
"hash": "f5403acae4671aac0ae5a29dd5903d33d0105a9e9d4148f755d3321f5023d387"
}
"version": "3.9.12"
}
},
"nbformat": 4,

View File

@ -51,7 +51,7 @@
"There isn't that much more work to do Markov Chain Monte Carlo. I won't go into the details of how MCMC works but put very simply MCMC lets us calculate thermal averages of a physical system at some temperature. For example, the physical system might be \"[$10^{23}$][wa] H20 molecules in a box\" and the thermal average we want is \"Are they organised like a solid or a liquid?\". We can ask that question at different temperatures, and we will get different answers.\n",
"\n",
"\n",
"For our Ising model the equivalent question would be what's the average color of this system? At high temperatures we expect the pixels to be random and average out out grey, while at low temperatures they will all be either black or while.\n",
"For our Ising model the equivalent question would be what's the average color of this system? At high temperatures we expect the pixels to be random and average out grey, while at low temperatures they will all be either black or white.\n",
"\n",
"What happens in between? This question is pretty hard to answer using maths, it can be done for the 2D Ising model but for anything more complicated it's pretty much impossible. This is where MCMC comes in.\n",
"\n",
@ -139,9 +139,9 @@
"id": "5d1874d4-4585-49ed-bc6f-b11c22231669",
"metadata": {},
"source": [
"These images give a flavour of why physicists find this model useful, it gives window into how thermal noise and spontaneous order interact. At low temperatures the energy cost of being different from your neighbours is the most important thing, while at high temperatures, it doesn't matter, and you really just do your own thing.\n",
"These images give a flavour of why physicists find this model useful, it gives a window into how thermal noise and spontaneous order interact. At low temperatures the energy cost of being different from your neighbours is the most important thing, while at high temperatures, it doesn't matter, and you really just do your own thing.\n",
"\n",
"There's a special point somewhere in the middle called the critical point $T_c$ where all sorts of cool things happen, but my favourite is that for large system sizes you get a kind of fractal behaviour which I will demonstrate more once we've sped this code up and can simulate larger systems in a reasonable time. You can kinda see it for 50x50 system at T = 5 but not really clearly."
"There's a special point somewhere in the middle called the critical point $T_c$ where all sorts of cool things happen, but my favourite is that for large system sizes you get a kind of fractal behaviour which I will demonstrate more once we've sped this code up and can simulate larger systems in a reasonable time. You can kinda see it for a 50x50 system at T = 5 but not really clearly."
]
},
{
@ -206,9 +206,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.10 ('venv': venv)",
"display_name": "Python [conda env:recode]",
"language": "python",
"name": "python3"
"name": "conda-env-recode-py"
},
"language_info": {
"codemirror_mode": {
@ -220,12 +220,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
},
"vscode": {
"interpreter": {
"hash": "f5403acae4671aac0ae5a29dd5903d33d0105a9e9d4148f755d3321f5023d387"
}
"version": "3.9.12"
}
},
"nbformat": 4,

View File

@ -25,7 +25,7 @@
"\n",
"Ok we can finally start writing and running some tests!\n",
"\n",
"I copied some of the initial tests that we did in chapter 1 into `test_energy.py` installed pytest into my development environment with `pip install pytest`. If you're using conda you need to use `conda install pytest`, and now I can run the `pytest` command in the `mcmc` directory. Pytest will automatically discover our tests and run them, to do this it relies on their being python files with functions named `test_\\*` which it will run.\n",
"I copied some of the initial tests that we did in chapter 1 into `test_energy.py` and installed pytest into my development environment with `pip install pytest`. If you're using conda you need to use `conda install pytest`. I can now run the `pytest` command in the `mcmc` directory. Pytest will automatically discover our tests and run them. To do this it relies on there being Python files with functions named `test_\\*` which it will run. It's also a widely used convention to begin the name of Python files containing tests with `test_`\n",
"\n",
"If that doesn't work and complains it can't find MCFF, try `python -m pytest`, this asks python to find a module and run it, which can be useful to ensure you're running pytest inside the correct environment. I ran into this problem because I used `pip install pytest` into a conda environment when I should have done `conda install pytest`.\n",
"\n",
@ -92,7 +92,7 @@
" assert energy(state) == E_prediction_all_the_same(L)\n",
"```\n",
"\n",
"I will defer to external resources for a full discussion of the philosophy of testing, but I generally think of tests as an aid to my future debugging. If I make a change that breaks something then I want my tests to catch that and to make it clear what has broken. As such I generally put tests that check very basic properties of my code early on in the file and then follow them with tests that probe more subtle things or more obscure edges cases.\n",
"I will defer to external resources for a full discussion of the philosophy of testing, but I generally think of tests as an aid to my future debugging. If I make a change that breaks something then I want my tests to catch that and to make it clear what has broken. As such I generally put tests that check very basic properties of my code early on in the file and then follow them with tests that probe more subtle things or more obscure edge cases.\n",
"\n",
"`test_exact_energies` checks that the energies of our exact states come out as we calculated they should in chapter 1. This is testing a very limited space of the possible inputs to `energy` so we'd like to find some way to be more confident that our implementation is correct.\n",
"\n",
@ -116,7 +116,7 @@
"source": [
"## Coverage Testing\n",
"\n",
"A useful little trick for testing, are tools like pytest-cov that can measure *coverage*, that is, the amount of your code base that is activated by your tests. Unfortunately Numba does not play super well with pytest-cov, so we have to turn off numba to generate the test report using an environment variable.\n",
"A useful aspect of testing is *coverage*. This involves looking at how much of your code is actually "covered" by the tests you've written. That is, which individual lines of your code are actually being run by your tests. Tools like `pytest-cov` can measure _coverage_. Unfortunately Numba does not play super well with `pytest-cov`, so we have to turn off Numba using an environment variable so that we can run `pytest-cov` and generate the "test report".\n",
"\n",
"```bash\n",
"(recode) tom@TomsLaptop ReCoDE_MCMCFF % pip install pytest-cov # install the coverage testing plugin\n",
@ -144,7 +144,7 @@
"=================================================== 3 passed in 1.89s ===================================================\n",
"```\n",
"\n",
"Ok so this is telling us that we currently test 86% of the lines in ising_model.py. We can also change `--cov-report=html` to get a really nice `html` output which shows which parts of your code aren't being run.\n",
"Ok so this is telling us that we currently test 86% of the lines in ising_model.py. We can also change `--cov-report=html` to get a really nice `html` output which shows which parts of our code aren't being run.\n",
"\n",
"A warning though, testing 100% of your lines of code doesn't mean it's correct, you need to think carefully about the data you test on, try to pick the hardest examples you can think of! What edge cases might there be that would break your code? Zero, empty strings and empty arrays are classic examples."
]
@ -156,7 +156,7 @@
"source": [
"## Advanced Testing Methods: Property Based Testing\n",
"\n",
"I won't do into huge detail here, but I thought it would be nice to make you aware of a nice library called `Hypothesis` that helps with this problem of finding edge cases. `Hypothesis` gives you tools to generate randomised inputs to functions, so as long as you can come up with some way to verify the output is correct or has the correct _properties_ (or just that the code doesn't throw and error!) then this can be a powerful method of testing. \n",
"I won't go into huge detail here, but I thought it would be nice to make you aware of a nice library called [`Hypothesis`](https://hypothesis.readthedocs.io) that helps with this problem of finding edge cases. `Hypothesis` gives you tools to generate randomised inputs to functions, so as long as you can come up with some way to verify the output is correct or has the correct _properties_ (or just that the code doesn't throw and error!) then this can be a powerful method of testing. \n",
"\n",
"\n",
"Take a look in `test_energy_using_hypothesis.py`\n",
@ -169,7 +169,7 @@
"def test_generated_states(state):\n",
" assert np.allclose(energy(state), energy_numpy(state))\n",
"```\n",
"You tell Hypothesis how to generate the test data, in this case we use some numpy specific code to generate 2 dimensional arrays with `dtype = int` and entries randomly sampled from `[1, -1]`. We use the same trick as before of checking two implementations against one another."
"You tell Hypothesis how to generate the test data. In this case we use some NumPy specific code to generate 2 dimensional arrays with `dtype = int` and entries randomly sampled from `[1, -1]`. We use the same trick as before of checking two implementations against one another."
]
},
{
@ -181,7 +181,7 @@
"source": [
"## Testing Stochastic Code\n",
"\n",
"We have an interesting problem here, most testing assumes that for the same inputs we will always get the same outputs but our MCMC sampler is a stochastic algorithm. So how can we test it? I can see three mains routes we can take:\n",
"We have an interesting problem here, most testing assumes that for the same inputs we will always get the same outputs but our MCMC sampler is a stochastic algorithm. So how can we test it? I can see three main routes we can take:\n",
"\n",
"- Fix the seed of the random number generator to make it deterministic\n",
"- Do statistical tests on the output \n",
@ -383,7 +383,7 @@
"source": [
"## Test Driven Development\n",
"\n",
"I won't talk about TDD much here, but it's likely a term you will hear at some point. It essentially refers to the practice of writing tests as part of your process of writing code. Rather than writing all your code and then writing tests for them. You could instead write some or all of your tests upfront and then write code that passes them. \n",
"I won't talk much about Test Driven Development, or TDD, here, but it's likely a term you will hear at some point. It essentially refers to the practice of writing tests as part of your process of writing code. Rather than writing all your code and then writing tests for them. You could instead write some or all of your tests upfront, describing the expected bahviour of code that doesn't yet exist, and then write the necessary code so that your tests pass. \n",
"\n",
"This can be an incredibly productive way to work, it forces you think about the structure and interface of your software before you start writing it. It also gives you nice incremental goals that you can tick off once each test starts to pass, gamification maybe?"
]
@ -417,7 +417,7 @@
" - id: black\n",
" - id: black-jupyter\n",
"```\n",
"And finally `pre-commit install` will make this run every time you commit to git. It's worth running it manually once the first time to check it works: `pre-commit run --all-files`. Running this I immediately got a cryptic error that, on googling, turned out to be that something broke in version 21.12b0 of `21.12b0`. Running `precommit autoupdate` fixed this for me by updated `black` to a later version. Running `pre-commit run --all-files` a second time now gives me:\n",
"And finally `pre-commit install` will make this run every time you commit to git. It's worth running it manually once the first time to check it works: `pre-commit run --all-files`. Running this I immediately got a cryptic error that, on googling, turned out to be that something broke in version 21.12b0 of `black`. Running `precommit autoupdate` fixed this for me by updating `black` to a later version. Running `pre-commit run --all-files` a second time now gives me:\n",
"```bash\n",
"(recode) tom@TomsLaptop ReCoDE_MCMCFF % pre-commit run --all-files\n",
"trim trailing whitespace.................................................Passed\n",

View File

@ -40,7 +40,7 @@
"source": [
"# Adding Functionality\n",
"\n",
"The main thing we want to be able to do is to take measurements, the code as I have writing it doesn't really allow that because it only returns the final state in the chain. Let's say we have a measurement called `average_color(state)` that we want to average over the whole chain. We could just stick that inside our definition of `mcmc`, but we know that we will likely make other measurements too, and we don't want to keep writing new versions of our core functionality!\n",
"The main thing we want to be able to do is to take measurements. The code, as I have written it, doesn't really allow that because it only returns the final state in the chain. Let's say we have a measurement called `average_color(state)` that we want to average over the whole chain. We could just stick that inside our definition of `mcmc`, but we know that we will likely make other measurements too, and we don't want to keep writing new versions of our core functionality!\n",
"\n",
"## Exercise 1\n",
"Have a think about how you would implement this and what options you have."
@ -56,7 +56,7 @@
"\n",
"### Option 1: Just save all the states and return them\n",
"\n",
"The problem with this is the states are very big, and we don't want to waste all that memory. For an `NxN` state that uses 8-bit integers (the smallest we can use in numpy) `1000` samples would already use `2.5Gb` of memory! We will see later that we'd really like to be able to go a bit bigger than `50x50` and `1000` samples!\n",
"The problem with this is the states are very big, and we don't want to waste all that memory. For an `NxN` state that uses 8-bit integers (the smallest we can use in NumPy) `1000` samples would already use `2.5GB` (2.5 gigabytes) of memory! We will see later that we'd really like to be able to go a bit bigger than `50x50` and `1000` samples!\n",
"\n",
"### Option 2: Pass in a function to make measurements\n",
"```python\n",
@ -73,7 +73,7 @@
" return measurements\n",
"```\n",
"\n",
"This could work, but it limits how we can store measurements and what shape and type they can be. What if we want to store our measurements in a numpy array? Or what if your measurement itself is a vector or and object that can't easily be stored in a numpy array? We would have to think carefully about what functionality we want."
"This could work, but it limits how we can store measurements and what shape and type they can be. What if we want to store our measurements in a NumPy array? Or what if your measurement itself is a vector or an object that can't easily be stored in a NumPy array? We would have to think carefully about what functionality we want."
]
},
{
@ -153,7 +153,7 @@
"id": "b74fadbe-80c2-4a20-b651-0e47188b005a",
"metadata": {},
"source": [
"This requires only a very small change to our `mcmc` function, and suddenly we can do whatever we like with the states! While we're at it, I'm going to add an argument `stepsize` that allows us to only sample the state every `stepsize` MCMC steps. You'll see why we would want to set this to value greater than 1 in a moment."
"This requires only a very small change to our `mcmc` function, and suddenly we can do whatever we like with the states! While we're at it, I'm going to add an argument `stepsize` that allows us to only sample the state every `stepsize` MCMC steps. You'll see why we would want to set this to a value greater than 1 in a moment."
]
},
{

View File

@ -40,7 +40,7 @@
"source": [
"# Speeding It Up\n",
"\n",
"In order to show you a really big system will still need to make the code a bit faster. Right now we calculate the energy of each state, flip a pixel and then calculate the energy again. It turns out that you can actually directly calculate the energy change instead of doing this subtraction. Let's do this is a sort of test driven development fashion: we want to write a function that when given a state and a pixel to flip, returns how much the energy goes up by (negative if down) upon performing the flip.\n",
"In order to show you a really big system, we will still need to make the code a bit faster. Right now we calculate the energy of each state, flip a pixel, and then calculate the energy again. It turns out that you can actually directly calculate the energy change instead of doing this subtraction. Let's do this in a sort of test-driven development fashion: we want to write a function that when given a state and a pixel to flip, returns how much the energy goes up by (negative if down) upon performing the flip.\n",
"\n",
"I'll first write a slow version of this using the code we already have, and then use that to validate our faster version:"
]

View File

@ -40,7 +40,7 @@
"source": [
"# Producing Research Outputs\n",
"\n",
"So now that we have the ability to simulate our system let's do a little exploration. First let's take three temperatures at each we'll do `10` runs and see how the systems evolve. I'll also tack on a little histogram at the right-hand side of where the systems spent their time."
"So now that we have the ability to simulate our system let's do a little exploration. First let's take three temperatures. For each we'll do `10` runs and see how the systems evolve. I'll also tack on a little histogram at the right-hand side showing where the systems spent their time."
]
},
{
@ -138,9 +138,9 @@
"source": [
"There are a few key takeaways about MCMC in this plot:\n",
"\n",
"- It takes a while for MCMC to 'settle in', you can see that for T = 10 the natural state is somewhere around c = 0, which takes about 2000 steps to reach from the initial state with c = 1. In general when doing MCMC we want to throw away some values at the beginning because they're too affected by the initial state.\n",
"- At High and Low temperatures we basically just get small fluctuations about an average value\n",
"- At intermediate temperature the fluctuations occur on much longer time scales! Because the systems can only move a little each timestep, it means that the measurements we are making are *correlated* with themselves at previous times. The result of this is that if we use MCMC to draw N samples, we don't get as much information as if we had drawn samples from an uncorrelated variable (like a die roll for instance)."
"- It takes a while for MCMC to 'settle in', you can see that for T = 10 the natural state is somewhere around c = 0, which takes about 2000 steps to reach from the initial state with c = 1. In general when doing MCMC we want to throw away some values at the beginning because they're affected too much by the initial state.\n",
"- At High and Low temperatures we basically just get small fluctuations around an average value\n",
"- At intermediate temperatures the fluctuations occur on much longer time scales! Because the systems can only move a little each timestep, it means that the measurements we are making are *correlated* with themselves at previous times. The result of this is that if we use MCMC to draw N samples, we don't get as much information as if we had drawn samples from an uncorrelated variable (like a die roll for instance)."
]
},
{

View File

@ -15,15 +15,15 @@
"metadata": {},
"source": [
"# Doing Reproducible Science\n",
"Further Reading on this software reproducibility: [The Turing Way: Guide to producing reproducible research](https://the-turing-way.netlify.app/reproducible-research/reproducible-research.html)\n",
"Further reading on the reproducibility of software outputs: [The Turing Way: Guide to producing reproducible research](https://the-turing-way.netlify.app/reproducible-research/reproducible-research.html)\n",
"\n",
"In the last chapter we made a nice little graph, let's imagine we wanted to include that in a paper, and we want other researchers to be able to understand and reproduce how it was generated.\n",
"\n",
"There are many aspects to this, but I'll list what I think is relevant here:\n",
"1. We have some code that generates the data and some code that uses it to plot the output, let's split that into two python files.\n",
"2. Our code has external dependencies on numpy and matplotlib, in the future those packages could change their behaviour in a way that breaks the code, so let's record what version our code is compatible with.\n",
"2. Our code has external dependencies on `numpy` and `matplotlib`. In the future those packages could change their behaviour in a way that breaks (our changes the output of) our code, so let's record what version our code is compatible with.\n",
"3. We also have an internal dependency on other code in this MCFF repository, that could also change so let's record the git hash of the commit where the code works for posterity.\n",
"4. The data generating process is random, so we'll fix the seed as discussed in the testing section to make it reproducible.\n",
"4. The data generation process is random, so we'll fix the seed as discussed in the testing section to make it reproducible.\n",
"\n"
]
},
@ -156,7 +156,7 @@
"\n",
"To avoid you having to go away and find the files, I'll just put them here. Let's start with the file that generates the data. I'll give it what I hope is an informative name and a shebang so that we can run it with `./generate_montecarlo_walkers.py` (after doing `chmod +x generate_montecarlo_walkers.py` just once).\n",
"\n",
"I'll set the seed using a large pregenerated seed, you've likely seen me use `42` in some places, but that's not really best practice because it might not be entropy to reliably seed the generator.\n",
"I'll set the seed using a large pregenerated seed, you've likely seen me use `42` in some places, but that's not really best practice because it might not provide good enough entropy to reliably seed the generator.\n",
"\n",
"I've also added some code that gets the commit hash of MCFF and saves it into the data file along with the date. This helps us keep track of the generated data too."
]
@ -389,9 +389,9 @@
"\n",
"Now that we have a nicely reproducible plot, let's share it with the world. The easiest way is probably to put your code in a hosted git repository like GitHub or GitLab. \n",
"\n",
"Next, let's mint a shiny Digital Object Identifier (DOI) for the repository, using something like [Zenodo](https://zenodo.org/). These services archive a snapshot of the repository and assign a DOI to that snapshot, this is really useful for citing a particular version of the software. \n",
"Next, let's mint a shiny Digital Object Identifier (DOI) for the repository, using something like [Zenodo](https://zenodo.org/). These services archive a snapshot of the repository and assign a DOI to that snapshot, this is really useful for citing a particular version of the software, e.g. in a publication (and helping to ensure that published results are reproducible by others). \n",
"\n",
"Finally, let's add a `citation.cff` file to the root of the repository, this makes it easier for people who might cite this software to generate a good citation for it. We can add the Zenodo DOI to it too. You can read more about `citation.cff` files [here](https://citation-file-format.github.io/) and there is a convenient generator tool [here](https://citation-file-format.github.io/cff-initializer-javascript/)."
"Finally, let's add a `CITATION.cff` file to the root of the repository, this makes it easier for people who want to cite this software to generate a good citation for it. We can add the Zenodo DOI to it too. You can read more about `CITATION.cff` files [here](https://citation-file-format.github.io/) and there is a convenient generator tool [here](https://citation-file-format.github.io/cff-initializer-javascript/)."
]
},
{

View File

@ -30,7 +30,7 @@
"source": [
"We'll use sphinx along with a couple plugins: [autodoc][autodoc] allows us to generate documentation automatically from the docstrings in our source code, while [napoleon][napoleon] allows us to use [NUMPYDOC][numpydoc] and Google formats for the docstrings in addition to [reStructuredText][rst]\n",
"\n",
"What this means is that we'll be able to write documentation directly into the source code, and it will get rendered into a nice website. This helps keep the documentation up to date because it's right there next to the code!\n",
"What this means is that we'll be able to write documentation directly into the source code and it will get rendered into a nice website. This helps keep the documentation up to date because it's right there next to the code and the web-based documentation will get automatically re-generated every time the documentation files are updated!\n",
"\n",
"[autodoc]: https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html\n",
"[napoleon]: https://www.sphinx-doc.org/en/master/usage/extensions/napoleon.html\n",
@ -175,7 +175,7 @@
"source": [
"### Documentation Ideas\n",
"\n",
"Readthedocs can be a bit tricky to set up, it is also possible to use GitHub pages to accomplish something similar. Another idea is to include some simple copyable code snippets in a quickstart guide. This lets people get up and running your code more quickly than is they need to read the API docs to understand how to interact with your module."
"Readthedocs can be a bit tricky to set up, it is also possible to use [GitHub Pages](https://pages.github.com/) to accomplish something similar. Another idea is to include some simple copyable code snippets in a quickstart guide. This lets people get up and running with your code more quickly than if they need to read the API documentation to understand how to interact with your module."
]
},
{