Biased Variance & a Model of Problem Solving
- Matt Humbert
- Apr 1, 2024
- 6 min read
Biased Variance is a cheeky wordplay on the bias-variance tradeoff in statistics and machine learning that emerges when a model's predictions are generalized beyond its training data. My career in data science and experience as an educator have certainly driven my own biases.
I presume, for example, that there is a high noise-to-signal ratio between available data (lots of training courses focused on algorithms and coding) and what early-stage data professionals actually want (landing a job and advancing their careers). This discrepancy has motivated me to offer coaching and mentoring services for data scientists.
But I wonder: is my bias low enough for your expectations?
It's occurred to me that narrowly constraining data science applications to data scientists alone produces high variance. This is because the solution overfits data and cannot generalize to other domains. At the same time, ignoring the data altogether and forcing a simpler model produces high bias, which is not helpful either.
What about using a general framework for solving problems and identifying the areas where domain-specific data or knowledge is required? That's where a coach or mentor can really help.
To embrace our curiosity and try new things, despite not knowing if it'll work, is the very essence of being human.
Most data scientists are familiar with Cross Industry Standard Process for Data Mining, or CRISP-DM, which breaks down nearly every analysis project into six steps:
1. Business understanding – What does the business need?
2. Data understanding – What data do we have/need? Is it clean?
3. Data preparation – How do we organize the data for modeling?
4. Modeling – What modeling techniques should we apply?
5. Evaluation – Which model best meets the business objectives?
6. Deployment – How do stakeholders access the results?
Unless you're a specific type of nerd, one glance at this list will cross your eyes and might tempt you to search for a pre-packaged guide on how someone else did something you want to do yourself. What if you adapted the CRISP-DM framework to guide your curiosity instead?
What if you thought about business problems like a scientist?
It might look something like this:
1. Understanding — What is the problem?
2. Exploration — What do we need to know or figure out?
3. Mental Model — What does the explanation look like?
4. Evaluation — How well did that model work to solve our problem?
5. Execute — What action might we take to align with this model?
For an aspiring data professional looking to land a job, the process might look something like this:
1. Understanding – You want a job.
2. Exploration – You need the required skills and background, including job-specific skills.
3. Mental Model – Some combination of academic credentials and technical and communications skills will land you a job.
4. Evaluation – Compare this model to other online job-seeking research and advice.
5. Execute – Apply for some jobs until you land one.
Admittedly, this guide ignores the input data and is too generalizable—high bias. What if I swung the pendulum in the opposite direction? I'd have an overly prescriptive guide that data scientists might appreciate; I've done so here as a free download. Of course, it might not resonate across industries. That's an example of high variance or overfitting a model to a subset of data.
A teacher, coach, mentor, or expert often provides the most value -- regardless of industry. Experts have the experience and knowledge you lack, but they can also assist you in quickly obtaining the information you need to resolve your problem successfully.
Think about something unrelated to a job search, like training in the gym. You want to improve your bench press strength (problem statement.) You find information online or from talking to your fellow gym rats (exploration) regarding the concept of "progressive overload," a fancy exercise term for lifting heavier or more reps each training session (mental model.) You look up more information in the form of an exercise literature review, and the findings suggest that progressive overload achieves the goal of building strength (evaluation.) Finally, you enter the gym and try it out for yourself (execution.)
That's a good model if you're a beginner. However, what happens if you're an experienced lifter and start plateauing? A generalized set of inputs will do you no good because you already know that info. You need something more specific. That's where the information required at each step changes. Progressive overload is still the model, but technique, training regimens, mental cues, effort, equipment, and if you're gaining healthy weight, start to matter quite a bit at a certain point. It requires you to refine the original scope of your problem statement, getting more granular as your progress starts to slow. A coach or trainer is incredibly valuable in these situations because they'll save you time and frustration and get you toward your goals faster than you could on your own. However, it is possible to go through this iterative process of clarifying the problem statement and input required at each step. Many people do so — and you can too; you'd need to believe in your ability to improve and ask more intelligent questions continuously.
Asking the right questions helps you acquire knowledge, wisdom, and skill. Continuous curiosity and experimentation are crucial to achieving the optimal bias/variance tradeoff in real-world settings.
Children are naturally curious and openly explore their surroundings, attempting to make sense of their environments; by doing so, they build a mental model of the world. As babies, crying usually means a caregiver will arrive and resolve our problems, whether it be a soiled diaper, hunger, or need for a nap. As we get older and start to talk and think independently, our understanding of defining problems and determining how best to solve them gets more sophisticated. Crying might get the attention of a parent or teacher, but it isn't going to solve a homework assignment. We must use what we've learned from experience to work through any new challenges. Children naturally transcend and include their prior stages of development and learning as they continuously improve their mental models.
We fully grown adults can do the same. Determining how to ask the appropriate questions and progressively refining them until we get problem-specific answers is the essence of real growth. Sometimes, that process is challenging because the problem statement is quite abstract, and our knowledge limits us. Other times, the information required is domain-specific and complex, requiring years of prerequisites to understand a problem's basics. Indeed, we can't even begin tackling an audacious project like photographing a black hole without first understanding the nature of mathematical singularities, special relativity, wavelengths of photons as a property of their energy, and how much light would be required to capture such an image.
But inherently, curiosity is at the core.
A hunger to observe, learn, understand, question, and ultimately try something new is the very reason humans moved from hunter-gatherer tribes to the colossus of civilization we know today. To embrace our curiosity and try new things, despite not knowing if it'll work, is the very essence of being human.
My mission with Biased Variance is all about connection: with people, people to ideas, and people with themselves. That includes thinking about general data science principles as useful tools that can be applied elsewhere. You don't need to practice data science and comprehensively understand everything for it to be helpful.
I intend to demystify data science concepts for the curious, less for the technical aspects, but instead as a way of approaching the world and problems. On the other hand, it also helps connect data scientists with broader extensions of the technical skills and foundational knowledge we already possess. The hope is that you take something valuable back to your life and work and apply it in a way that makes a tiny difference in your perspective or approach to what matters to you. Growing and becoming more well-rounded is part of what it means to transcend and include.
Check out the home page for more details for 1:1 data science career mentorship and coaching!
Click to subscribe to emails from Biased Variance and receive a copy of the 2024 Biased Variance Data Science Career Prep Guide.
Follow us on social media:
X (formerly Twitter)



Comments