TDD katas for Data Scientists
Whenever I talk about TDD to data science and data engineering folks, they say that either TDD is not working for algorithmic work, or I’ve tried FizzBuzz and it is not fun [^1]. I can understand. Most katas in places like kata-log are more oriented towards CS/SW devs.
So, I’ve decided to come up with some katas that might be more interesting for DS folks. Here they are.
ModPlus
ModPlus is a simple hashing algorithm. It takes in a positive integer X and returns a combinatorial sum of a binary representation of X after it is split into chunks of given length.
For example, modplus(12, chunk_length=3) should return 6.
Variation: ModPlusDot. Instead of combinatorial sum, use a combination of binary AND and OR when combining chunks.
Simple enough for an algorithmic problem. Skills to practice: splitting the problems into smaller problems, naming things, simplify by extraction, refactoring while in green, make it work first.
Practical application: Luhn algorithm.
2D vector
Imagine you are working for Khan Academy (KA). KA wants to modernize their basic math classes, especially the display. Implement a cartesian 2D Vector class, that supports scalar multiplication, addition with another Vector, norm (magnitude) and dot product between 2 vectors(see vector).
This is more of a design kata, but still requires to implement a couple of formulas.
Other potential katas
- [Clock24][https://github.com/vanzaj/tdd-exercises/clock24]