Issues With Machine Learning in Software Development

Machine learning transparency

To learn about the current and future state of machine learning (ML) in software development, we gathered insights from IT professionals from 16 solution providers. We asked, "What are the most common issues you see when using machine learning in the SDLC?" Here's what we learned:

You may also like: 6 Reasons Why Your Machine Learning Project Will Fail to Get Into Production

Data Quality

The most common issue when using ML is poor data quality. The adage is true: garbage in, garbage out. It is essential to have good quality data to produce quality ML algorithms and models. To get high-quality data, you must implement data evaluation, integration, exploration, and governance techniques prior to developing ML models.
ML is only as good as the data you provide it and you need a lot of data. Accuracy of ML is driven by the quality of the data. Lacking a data science team and not designing the product in a way that’s applicable to data science.
1) Integrating models into the application. Spin up the infrastructure for models. 2) Debugging, people don’t know how to retrace the performance of the model. 3) Deterioration of model performance over time. People don’t think about data upfront. Do I have the right data to solve the problem, to create a model?
Common issues include lack of good clean data, the ability to apply the correct learning algorithms, black-box approach, the bias in training data/algorithms, etc. Another issue we see is model maintenance. When you think about traditional and coded software, it becomes more and more stable over time, and as you detect bugs, you are able to make tweaks to fix it and make it better. With ML being optimized towards the outcomes, self-running and dependent on the underlying data process, there can be some model degradation that might lead to less optimal outcomes. Assuming ML will work faultlessly postproduction is a mistake and we need to be laser-focused on monitoring the ML performance post-deployment as well.

Transparency

The most common issue I find to be is the lack of model transparency. It is often very difficult to make definitive statements on how well a model is going to generalize in new environments. You have to often ask, “what are the modes of failure and how do we fix them.”
It’s a black box for most people. Developers like to go through the code to figure out how things work. Customers who instrument code with tracing before and after ML decision making can observe program flow around functions and trust them. Are decisions made in a deterministic way? Machine-based tools can mess with code (Kite example of automated code) injecting tracking code. Treat the machine-generated code and audit it as part of the process.
As with any AI/ML deployment, the “one-size-fits-all” notion does not apply and there is no magical ‘“out of the box” solution. Specific products and scenarios will require specialized supervision and custom fine-tuning of tools and techniques. Additionally, assuming ML models use unsupervised and closed-loop techniques, the goal is that the tooling will auto-detect and self-correct. However, we have found AI/ML models can be biased. Sometimes the system may be more conservative in trying to optimize for error handling, error correction, in which case the performance of the product can take a hit. The tendency for certain conservative algorithms to over-correct on specific aspects of the SDLC is an area where organizations will need to have better supervision.

Manpower

Having data and being able to use it so does not introduce bias into the model. How organizations change how they think about software development and how they collect and use data. Make sure they have enough skillsets in the organization. More software developers are coming out of school with ML knowledge. Provide the opportunity to plan and prototype ideas.
When you use a tool based on ML you have to take into account the accuracy of the tool and weigh the trust you put in the tool versus the effort in the event you miss something. When you are using a technology based on statistics, it can take a long time to detect and fix — two weeks. It requires training and dealing with a black box. When building software with ML it takes manpower, time to train, retaining talent is a challenge. How to test when it has statistical elements in it. You need to take different approaches to test products with AI.
This is still a new space. There are always innovators with the skills to pick up these new technologies and techniques to create value. Companies using ML have a lot of self-help. The ecosystem is not built out. You will need to figure out how to get work done and get value. Talent is a big issue. The second is training data sets. We need good training data to teach the model. The value is in the training data sets over time. The third is data availability and the amount of time it takes to get a data set. It takes a Fortune 500 company one month to get a data set to a data scientist. That’s a lot of inefficiencies and it hurts the speed of innovation.

Other

The most common issue by far with ML is people using it where it doesn’t belong. Every time there’s some new innovation in ML, you see overzealous engineers trying to use it where it’s not really necessary. This used to happen a lot with deep learning and neural networks. Just because you can solve a problem with complex ML doesn’t mean you should.
We have to constantly explain that things not possible 20 years ago are now possible. You have to gain trust, try it, and see that it works.
If you have not done this before it requires a lot of preparation. You pull historical data to train the model but then you need a different preparation step on the deployment side. This is a major issue typical implementations run into. The solution is tooling to manage both sides of the equation.
Traceability and reproduction of results are two main issues. For example, an experiment will have results for one scenario, and as things change during the experimentation process it becomes harder to reproduce the same results. Version control around the specific data used, the specific model, its parameters and hyperparameters are critical when mapping an experiment to its results. Often organizations are running different models on different data with constantly updated perimeters, which inhibits accurate and effective performance monitoring. Focusing on the wrong metrics and over-engineering the solution is also problems when leveraging machine learning in the software development lifecycle. The best approach we’ve found is to simplify a need to its most basic construct and evaluate performance and metrics to further apply ML.

Here’s who we heard from: