In an increasing number of domains, machine learning models have started being held to a higher standard. Model predictions are no longer enough. Companies can now be held responsible for any spurious predictions their models produce. With this shift, model explainability has arguably taken priority over predictive power. Metrics such as accuracy and R2 scores have taken the back seat while being able to explain model predictions has gained more and more importance. We’ve looked at several ways to explain your models and gain a better understanding of how they work. …
Most models never make it to production. We previously looked at deploying Tensorflow models using Tensorflow Serving. Once that process is completed, we may think that our work is all done. In actuality, we’ve just started a new journey of managing our model’s lifecycle and making sure it stays up-to-date and effective.
Like most things in software, there is a need for continuous development and improvement. The task of managing a model once it is deployed is one that is often overlooked. Here we’ll look at ways to do this effectively and make our model pipelines more efficient.
You’ve been slaving away for an innumerable number of hours trying to get your model just right. You’ve diligently cleaned your data, painstakingly engineered features, and tuned your hyperparameters to the best of your ability. Everything has finally fallen into place and you’re now ready to present your model to the world. There’s only one problem: your model lies trapped in your local machine with no access to the outside world.
Such is the fate of most machine learning models. In fact, around 87% of them never make it into production. A disproportionate amount of resources (not to mention hype)…
When working on production-level code, testing becomes as important as the code itself. If you’ve ever contributed to any notable software projects, you’ll know that even minor changes require you to write tests to make sure your code doesn’t break. In another article, we looked at the basics of testing your code using the unittest library. Here we will look at the other widely used Python testing library, pytest. Working with the pytest library can quickly get confusing and at times even un-Pythonic. …
Not all features are created equal. Some will have a large effect on your model’s predictions while others will not. In a previous article, we looked at the use of partial dependency in order to see how certain features affect predictions. Determining which features yield the most predictive power is another crucial step in the model building process. In this article, we’ll look at a few ways to figure out which features are most likely to make an impact.
We’ll be using this mobile price classification dataset from Kaggle to illustrate our examples. After loading in the dataset…
With all the complexity that comes with developing machine learning models, it comes as no surprise that some of these just don’t translate very well when being explained in plain English. The model inputs go in, the answers come out and no one knows how exactly the model arrived at this conclusion. This can result in some sort of disconnect or lack of transparency between different members working on the same team. As the prevalence of machine learning has increased in recent years, this lack of explainability when using complex models has grown even more. …
Why test at all?
A commonly overlooked aspect of data science is properly testing your code. This generally means making sure it works as intended and is free of major bugs. When working on smaller, isolated coding projects and analyses, writing tests may not be as important and at times, maybe skipped entirely. As a project grows in size and complexity, more users start to interact with it. Everything will still work fine… until it doesn’t.
This is where testing comes in. Writing tests is essential to be able to maintain clean and usable code. It may not be the…
Real-world datasets come in all shapes and sizes. At some point, you will come across a dataset with imbalanced target classes. What exactly does this mean? Let’s take a look at an example from Kaggle. This dataset contains details of credit card clients and defaults on their payments.
Our target variable here is
default.payment.next.month , a binary variable that has the class
0 if a client did not default and
1 if they did. First, we’ll check how many entries there are for each of these classes.
I write about data science and machine learning.