SLIDE 1
Open-source without headaches
Edwin Dalmaijer
20 November 2018
@esdalmaijer
SLIDE 2 Wait, isn’t open source a Good Thing?
- To science, open-source if unequivocally good
- Tools are free and open to public scrutiny
SLIDE 3 Wait, isn’t open source a Good Thing?
- To science, open-source if unequivocally good
- Tools are free and open to public scrutiny
PyGaze PsychToolbox EEGLAB SPM
SLIDE 4 So what about those headaches?
- To a scientist, open-source is a distraction
- Publishing open code requires additional time and efgort
- Open code is not rewarded in systematic ways
- More open code => fewer papers => lower grant chances
SLIDE 5 So what about those headaches?
- To a scientist, open-source is a distraction
- Publishing open code requires additional time and efgort
- Open code is not rewarded in systematic ways
- More open code => fewer papers => lower grant chances
- But what if you publish a paper on your code?
- Unlike paper, software requires continued efgort
- Unlike authors, people join and leave development teams
SLIDE 6 So what about those headaches?
- To a scientist, open-source is a distraction
- Publishing open code requires additional time and efgort
- Open code is not rewarded in systematic ways
- More open code => fewer papers => lower grant chances
- But what if you publish a paper on your code?
- Unlike paper, software requires continued efgort
- Unlike authors, people join and leave development teams
Psychophysics toolbox (Matlab) PsychoPy (Python) Brainard (1997): 11578 citations Peirce (2007, 2009): 2471 citations Kleiner et al. (2007): 1954 citations Peirce et al: under review!
SLIDE 7 So what about those headaches?
- To a scientist, open-source is a distraction
- Publishing open code requires additional time and efgort
- Open code is not rewarded in systematic ways
- More open code => fewer papers => lower grant chances
- But what if you publish a paper on your code?
- Unlike paper, software requires continued efgort
- Unlike authors, people join and leave development teams
- But doesn’t your toolbox get you exposure?
- Important for early career, but doesn’t get you fellowships
- How many PIs are ‘methods people’?
SLIDE 8
Does it really take up that much time?
“What kind of data quality would be achievable with my webcam? (See attached image of my face.)” “When I try to run your Python script in OpenSesame / Unity / [other non-Python tool], it doesn’t work!” “Your code didn’t work, what should I do?” “Hi, I need help with this other software you didn’t develop!”
SLIDE 9 Does it really take up that much time?
- Continuous work on development
- Bug fjxes, new features, dependencies change
- Continuous work on support
- If people use your tools, they’ll ask questions
- Communities are hard to build, and require critical mass
that most science projects just don’t have
SLIDE 10 Is supporting open developers important?
Kelle Cruz, AstroPy May 2018
- Three omnipresent packages
- About 90 million downloads
- Estimated cost over $21 million
- Just 15 active maintainers!
SLIDE 11 We need to reward software contributions
- Science relies on crucial open software
- Without these, most of us couldn’t do our jobs
SLIDE 12 We need to reward software contributions
- Science relies on crucial open software
- Without these, most of us couldn’t do our jobs
- The current system punishes developers
- Matthew efgect: less time for papers => fewer grants
- Low pay, even lower job security
SLIDE 13 We need to reward software contributions
- Science relies on crucial open software
- Without these, most of us couldn’t do our jobs
- The current system punishes developers
- Matthew efgect: less time for papers => fewer grants
- Low pay, even lower job security
- We need to adjust academic reward structures
- Citations to associated papers are not enough
- More stable positions for open-source developers?
- Include software overhead in grants?
SLIDE 14
Post-soapbox usefulness
SLIDE 15 Two types of code among researchers
- Script: analysis pipeline
- Usually written in one long fjle
- Pretty specifjc to one project
- Usually not particularly useful to other people
- analysis_fjnal2-October 2018.m
- Libraries: set of more general functions
- Importable to scripts from a central place
- Combine functions for particular purposes
- Tend to be useful to other people
SLIDE 16 What do you hate in other people’s code?
- No README
- No docstrings
- Unhelpful commenting
- Unclear variable names
- All fjles reference each other
SLIDE 17 What is a good open-source project?
- Clearly documented
- README, function descriptions, and EXCESSIVE comments
- Sensible structure
- File structure and folders neatly organised
- Sensible fjle names
- Easy to fjnd and to download
- For example through GitHub, GitLab, BitBucket, or OSF
- Not dependent on hidden code.
- Sensible dependencies; don’t use obscure homebrew
SLIDE 18 Start with a sensible folder structure
- 2018 Super Amazing Study
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiments
- constants.py
- experiment_v4.py
- custom_functions.py
- literature
- writing
SLIDE 19 Start with a sensible folder structure
- 2018 Super Amazing Study
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiment
- constants.py
- experiment_v4.py
- custom_functions.py
- literature
- writing
SLIDE 20 Start with a sensible folder structure
- 2018 Super Amazing Study
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiment
- constants.py
- experiment_v4.py
- custom_functions.py
- literature
- writing
SLIDE 21 Start with a sensible folder structure
- 2018 Super Amazing Study
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiment
- constants.py
- experiment_v4.py
- custom_functions.py
- literature
- writing
SLIDE 22 Add a README to every project
- 2018 Super Amazing Study
- README.md
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiment
- constants.py
- experiment_v4.py
- custom_functions.py
SLIDE 23
Creating a new repository on GitHub
SLIDE 24
Creating a new repository on GitHub
SLIDE 25
Creating a new repository on GitHub
SLIDE 26
Creating a new repository on GitHub
SLIDE 27 Open folder in terminal / command prompt
- 2018 Super Amazing Study
- README.md
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiment
- constants.py
- experiment_v4.py
- custom_functions.py
cd “/home/documents/ 2018 Super Amazing Study”
SLIDE 28
Initialise a Git repository
git init git add . git commit -m "fjrst commit" git remote add origin https://github.com/esdalmaijer/2018_Super_Amazing_Study.git git push origin master
SLIDE 29
Add all current fjles to the repository
git init git add . git commit -m "fjrst commit" git remote add origin https://github.com/esdalmaijer/2018_Super_Amazing_Study.git git push origin master
SLIDE 30
Schedule fjles to be uploaded
git init git add . git commit -m "fjrst commit" git remote add origin https://github.com/esdalmaijer/2018_Super_Amazing_Study.git git push origin master
SLIDE 31
Connect the GitHub repository
git init git add . git commit -m "fjrst commit" git remote add origin https://github.com/esdalmaijer/2018_Super_Amazing_Study.git git push origin master
SLIDE 32
Upload committed fjles to GitHub repo!
git init git add . git commit -m "fjrst commit" git remote add origin https://github.com/esdalmaijer/2018_Super_Amazing_Study.git git push origin master
SLIDE 33 Edit, add, commit, push; repeat!
- 2018 Super Amazing Study
- README.md
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiment
- constants.py
- experiment_v4.py
- custom_functions.py
git add . git commit -m “description” git push origin master Change something here...
SLIDE 34 Edit, add, commit, push; repeat!
- 2018 Super Amazing Study
- README.md
- analysis
- data
- pp01.tsv
- pp01.cnt
- ...
- analysis_script_v3.py
- eeg_functions.py
- motion_tracking.py
- experiment
- constants.py
- experiment_v4.py
- custom_functions.py
git add . git commit -m “description” git push origin master Then run the magic words!
SLIDE 35 GitHub Desktop has a GUI instead
- Some people don’t like the command line
- Everyone has their preferences, don’t be embarrassed
- GitHub Desktop is a graphical alternative
- Available on Windows and on OS X
SLIDE 36
Principles of Object-Oriented Programming
Class (blueprint)
SLIDE 37
Principles of Object-Oriented Programming
Class (blueprint) Instance (realised object)
SLIDE 38
Principles of Object-Oriented Programming
Class (blueprint) Instance (realised object)
SLIDE 39 Principles of Object-Oriented Programming
- Hide specifjc implementation in functions
- Classes and functions do their own thing
- Internal variables don’t need to be exposed
- Functions return required output
SLIDE 40 Principles of Object-Oriented Programming
- Hide specifjc implementation in functions
- Classes and functions do their own thing
- Internal variables don’t need to be exposed
- Functions return required output
- Compartmentalise where possible
- General functions can be reused more easily!
SLIDE 41 Principles of Object-Oriented Programming
- Hide specifjc implementation in functions
- Classes and functions do their own thing
- Internal variables don’t need to be exposed
- Functions return required output
- Compartmentalise where possible
- General functions can be reused more easily!
- Stufg your functions in a library
- Import it when you need it; leave it alone when you don’t
SLIDE 42
Class defjnition example
class Car: def __init__(self, colour, engine): """Initialises the car. colour – tuple with 8-bit ints indicating the colour engine – instance of the Engine class """ # Define the number of wheels self.n_wheels = 4
SLIDE 43
Function defjnition example
def sum(numbers): """Computes the sum of passed numbers numbers – list of floats """ # Start at 0. s = 0.0 # Loop through the numbers for num in numbers: # Add the current number to total s += num # Return the sum return s
SLIDE 44 Best open-source practices
- Start straight away!
- Don’t wait with until publication to get organised
- Sensible structure of fjles and code
- Structure folders in an organised way
- Compartmentalise code where possible
- Re-using code? NO COPY-PASTING! Write a function!
- Sharing is easy via GitHub
- Or others: GitLab, BitBucket, Open Science Framework, etc
SLIDE 45
Useful resources
@esdalmaijer