Ten Things to Know About Your AI Solutions

The quality of an AI solution is frequently thought of in terms of its predictive performance. If we can get good, generalizable, results on validation and test datasets, then we have satisfied the requirements for deploying an AI system, right? Not necessarily. Recently, the bar for AI solutions has been raised to include a broader and more holistic set of desirable characteristics that must be well-understood before we’re ready to bring them to life.

These characteristics stem from the knowledge that AI will, and already is, becoming a more powerful and pervasive agent in our everyday lives. Things that seemed like mere science fiction a few years ago are now everyday phenomena, with governments and organizations already working to understand how to prepare the workforce (and society) for major disruptions resulting from increasingly powerful AI technology.

As Data Scientists, we have an important role to play in this dynamic. To say the very least, our responsibilities have changed. Now, we must take center-stage in the entire ecosystem of AI development. There are questions that we are equipped to answer that will provide immense value to our organizations, helping ensure that the AI solutions we contribute to have a positive societal impact.

In this blog, we go over 10 of the most important things to know about your AI solutions. Having these answers readily available, and in a constant state of review and revision, will prepare you to build responsible and successful AI solutions and play your part in creating a future where AI is trusted and not feared.

The Detailed Use Case

Understanding and defining a clear use case for your AI solution is absolutely essential. Most other ‘must-know’ information will stem from the use case; if you lack a clear understanding of it, you will not be able to answer questions about users, risks, potential value, and more. You will also have a harder time selecting high-quality, and meaningful, metrics, KPIs, and other quantitative measures, essential elements of a successful strategy.

When documenting your use case, questions to ask include: How and for what purpose is the AI solution used? Is it replacing an existing process or is it doing something that wasn’t previously possible? What does the output look like, is it text, classification, probability, etc.? How does it run, how is it used, is it part of a larger product offering?

The Intended Users and Impacted Parties

With a well-defined use case, determining the intended users should be fairly straightforward. However, it is recommended that you take a bit of extra time to perform an exercise to really make sure you don’t leave any potential users out.

Answering the user question can be helpful because as you learn more about who might use your tool you simultaneously gain a more complete understanding of your use case. This is why it is important to go beyond just the intended users and take a deep dive into who the impacted parties of the AI decision will be. Many times, it is the impacted parties, not the users, who carry the most weight when making choices about our AI solution. When performing your exercise, you should not be afraid to search multiple layers out for how AI decisions are impacting people.

A great example of why we need to look at AI users and impacted parties with multiple layers of separation exists in AI hiring. The job seeker uploads their resume, and then an AI system performs matching for recruiters or employers. The job seeker is not using AI in this case, but they are impacted by the decisions it makes. If someone is denied a job based on an AI system’s output, they are affected, so is their family. If this person is unable to find work due to systematic and universal algorithmic bias, they may face financial hardships. Downstream effects could include defaults on loan payments which impact the banks who have underwritten the loans.

From this example, we can see how adverse impacts stemming from AI can affect multiple parties. The deeper we look; it becomes clear that the impact is larger than we initially thought. Having this understanding is imperative when making critical decisions about how to train, test, and deploy your AI system.

The Do’s and Don’ts

Pulling another layer back from the use case, we must dive deep into how this tool should be used. This means defining a clear set of intended and unintended uses of the AI system. These are helpful not only to limit liability for your organization, but also to ensure that users and impacted parties of your AI solution are protected from adverse outcomes due to unintentional misuse or abuse.

The Anticipated Value, Outcomes, and Risks

Understanding the anticipated value, expected outcomes, and risks of your AI solutions will bring transparency and accountability to your decisions. Having these items documented will give your team the ability to make informed decisions when faced with difficult choices. Furthermore, the process of defining these items will generally involve a cross-functional team, including members of other business units or users with different perspectives and backgrounds than those of organization members. Brining diversity into the decision process will have a positive impact on your organization as a whole.

It is extremely important to be as honest as possible with yourselves during this stage. Especially because discussions around ROI and value could lead to internal tensions for high value / high-risk solutions and external facing conversations about risks, biases, and adverse impacts (including discrimination) can be extremely uncomfortable for many. However, you cannot let this deter you.

Imagine, for example, a scenario under which financial rewards are weighed against adverse impacts for various vulnerable populations within the United States. These types of scenarios have come up time and time again in credit rating, insurance, and loan approval AI systems. It is not science fiction, it is reality. As hard as it may be, it is important as a Data Scientist that you participate in these discussions so that you can stand up for what is right and have a positive impact on your organization and parties impacted by your AI solution.

Fairness Metrics, KPIs, and Performance Metrics

Leveraging metrics and KPIs can guide how we measure success, bias, and failure. In principle, metrics and KPIs can bring alignment, accountability, and transparency to goals. However, in practice, KPIs need to be developed carefully to ensure that they are indeed helping organizations achieve their longer-term strategic roadmap. Poorly conceived metrics can result in increased risks, wasted effort, and undesired outcomes.

As Data Scientists this is our time to shine. We can use our broad understanding of different statistics and map that to our knowledge of the AI use case (including the users and impacted parties, the values, outcomes, and risks) developing high-quality metrics and KPIs that will be correlated with our desired outcomes. Having these KPIs defined and tracked as early as possible will help increase alignment and team morale around building and delivering a successful and responsible AI solution.

Data Lineage and Governance

Garbage in garbage out, as they say. Without high quality data it is impossible to build a useful model. Beyond spending time cleaning data to improve model performance and generalizability, taking the time to document the data’s linage and perform data governance will ensure lasting success for your organization. Data maturity is an organizational asset and is worth the time invested in it. The subject of data governance is broad enough to write a series of blogs on, so in the interest of time here are a few, non-exhaustive, aspects to consider:

The data is ethically and legally sourced, including consent, copywrite, etc.
The data is high quality and representative.The data is sufficiently diverse in terms of training examples.
The data will be available when applying our models in production.
Privacy of individuals included in the data is sufficiently protected.

These data governance suggestions are just the start of what should be a major focus of all data-enabled organizations.

Experimentation Process and Methods

Today’s frontier models are increasingly complex, trained on large amounts of data, and encoded using complex algorithms. Desired model outputs have become more open-ended and use concepts such as few-shot and one-shot learning to produce high quality multi-modal responses. Needless to say, the state-of-the-art systems in which models are developed, trained, tested, and deployed have uncountable degrees of freedom.

A whole set of technology, such as vector databases and transformers libraries, that were merely confined to the cutting-edge research realm just a few ago, are now must-have parts of an enterprise AI technology stack. Concepts such as Fine-tuning, Retrieval Augmented Generation, and Attention are now part of everyday conversations within the enterprise on how to best leverage pre-trained generalizable models.

Given this highly fluid technical ecosystem, it is imperative to have a solid and well-documented plan for how models are trained and tested to deliver the best outputs for your use case. This is no longer a point of academic discussion but rather a point of survival. In this day and age, it is just not possible to keep up with the pace of innovation. There is no one individual, or team, that can control an entire process internally. There are just too many sources of bias, risk, and data contamination; being as transparent as possible is nearly the only cost-effective way to ensure quality.

Final Testing and Model Selection Process

After experimentation is done it’s time to decide which version of the model will be used in production. Having clear, well-defined, and objective model selection criteria is the last layer of defense before putting a model in the wild. This is your last chance to limit adverse impacts, so it is important to take it very seriously. If you have taken the time to define, document, and understand items 1-7, you should put everything you know into this section to ensure your hard work does not go to waste.

Model Deployment and Change Control Process

When deploying any software, not just AI, having a deployment and change control process can help reduce the risks of software bugs, unintended consequences, and adverse impacts. For AI, especially, this is true due to the number of unknowns. Having the ability to deploy new models to select parts of your user base, rollback easily, and correct for errors is the only way your organization will be able to guarantee that you are meeting your standards for quality and responsible AI.

Post Deployment Monitoring and Retirement Process
When a model is deployed, it’s journey has just begun. Every experienced Data Scientist knows that production is a whole different ball game. Many AI solutions with excellent performance in testing and validation don’t generalize to production data. This could be due to false assumptions about what data is actually available in production, poor collection of training and validation data (see #6), or a variety of other unknown or undetectable reasons.

Furthermore, having a good understanding of when covariate shift is affecting model performance and when it’s just natural cycles in the data (i.e. seasonality, economic trends, etc.) will ensure that you don’t retire models too early or too late. There are various tools and tests you can use on a regular basis to evaluate real-time model performance and covariate shift.

Additionally, when (not if) the models have biases or are providing users with undesired responses on certain subsets of input, it is important to have a plan for remediation of these errors. This could mean having a manual process that you institute as a backup plan for when your AI fails or figuring out how to disable certain types of output. Putting software guardrails and using LLM guardrail tools, firewalls, or custom-built input / response classifiers will help ensure your algorithms are behaving as expected. Making sure you manually check a certain percent of responses and ensuring user-based error reporting is in place will increase the likelihood that you catch and fix issues with your models in a timely fashion.

Conclusion

AI is a disruptive technology with the potential to bestow unprecedented progress upon humanity. However, similar to other disruptive advancements like the printing press, electricity, and the internet, extreme risks for regression occur if the technology is not used responsibly.

Today, the AI-enabled world is moving faster than most of us can handle. Both high-quality and low-quality content is being generated faster than we can consume it. Every day, it can feel like challenge to not be swept away by the wave of new ‘information’ that is generated and published. It’s tempting to rush to stay ahead at all costs. However, history tells us this will not pay off in the long run. Progress in innovation and human knowledge is not guaranteed, and spending time at the surface will not provide us with the advancements that we desire in the long term.

As Data Scientists we are equipped to handle these unprecedented times because we sit at the center of all AI-related interactions within our organizations. If we take the extra time to understand our AI Solutions inside and out, we will be better prepared to build AI that upholds ethical principles, performs well, and brings value to our organizations, our users, and all impacted parties.

Championing these 10 must-know attributes for our AI solutions is just the starting point of creating an organizational culture that will breed AI success and responsibility. No solution is too small, and it’s never too late to start. Now is the time to make the commitment to help our organizations reach even greater levels of understanding so that we can leverage AI to have a positive impact on society.