Managing Code and Software for Applied Data Science Projects
Overview
1
Introduction
2
General project management
2.1
Definitions
2.2
Identification
3
Approaches to building code
3.1
The single developer model
3.2
The collaborative model
3.2.1
The fundamental problem of project management
3.2.2
Coordination and planning
3.2.3
However! Your planning process needs to be responsive to emerging needs and discoveries
4
Concrete project planning
4.1
Governance
4.2
Project checklist
4.3
Herding your cats
4.4
Scheduling
4.4.1
Evidence-based scheduling
4.4.2
Some comments on evidence-based scheduling
4.4.3
An aside about "methodologies"
4.4.4
An aside about boiling the ocean
5
Development workflows in the abstract
5.1
Choosing a language is choosing an ecosystem
5.1.1
Language features
5.1.2
When is a language ready?
5.2
Co-dependence and feedback between tools and methods
5.3
Tool evaluation
6
Concrete development workflow and tools
6.1
Your development process should be repeatable
6.2
Testing and Validation
6.2.1
The metaphysics of integration/system testing
6.3
Version Control
6.3.1
Version control in practice
6.4
Issue Tracking
6.4.1
Key features
6.4.2
Many options
6.4.3
Demo
6.5
Dependency management and environment management by language
6.5.1
Python
6.5.2
R
6.5.3
SQL
6.5.4
Parallel concerns for other languages
6.5.5
When does it make sense to use containers?
6.6
Deployment
6.7
How do we know when we're done?
7
Documentation
7.1
Documentation should describe what you actually do
7.2
Documentation workflow
8
Discussion
8.1
Sample project
9
Reflection
10
Coda: the cloud
11
References
12
Additional resources
Managing Code and Software for Applied Data Science Projects
9
Reflection
What is one step you can implement almost immediately for an existing project?