best practices
From Wilson et al. 2014 on “best practices for scientific computing”:
- Write programs for people, not computers
- A program should not require its readers to hold more than a handful of facts in memory at once
- Make names consistent, distinctive, and meaningful
- Make code style and formatting consistent
- Let the computer do the work
- Make the computer repeat tasks
- Save recent commands in a file for re-use
- Use a build tool to automate workflows
- Make incremental changes
- Work in small steps with frequent feedback and course correction
- Use a version control system
- Put everything that has been created manually in version control
- Don’t repeat yourself (or others)
- Every piece of data must have a single authoritative representation in the system
- Modularize code rather than copying and pasting
- Re-use code instead of rewriting it
- Plan for mistakes
- Add assertions to programs to check their operation
- Use an off-the-shelf unit testing library
- Turn bugs into test cases
- Use a symbolic debugger [interactive program inspector]
- Optimize software only after it works correctly
- Use a profiler to identify bottlenecks
- Write code in the highest-level language possible
- Document design and purpose, not mechanics
- Document interfaces and reasons, not implementations
- Refactor code in preference to explaining how it works
- Embed the documentation for a piece of software in that software [plus documentation generator]
- Collaborate
- Use pre-merge code reviews
- Use pair programming when bringing someone new up to speed and when tackling particularly tricky problems
- Use an issue tracking tool
Python example (from Bioinformatics Data Skills)
EPS = 0.00001 # a small number to use when comparing floating-point values
def add(x, y):
"""Add two things together."""
return x + y
def test_add():
"""Test that the add() function works for a variety of numeric types."""
assert(add(2, 3) == 5)
assert(add(-2, 3) == 1)
assert(add(-1, -1) == -2)
assert(abs(add(2.4, 0.1) - 2.5) < EPS)
Ask yourself: which best practices are shown here? (hint: many more than 1)
Good enough practices in scientific computing
for all researchers more generally: Wilson et al. 2017 on data management, programming, collaborating with colleagues, organizing projects, tracking work, and writing manuscripts.
Of their summary, I would like to emphasize the following points:
- Project organization
- Put each project in its own directory, which is named after the project.
- Put text documents associated with the project in the
doc
directory. - Put raw data and metadata in a
data
directory and files generated during cleanup and analysis in a results directory. - Put project source code in the
src
directory. - Put external scripts or compiled programs in the
bin
directory. - Name all files to reflect their content or function.