Before we found ourselves in the current AI frenzy, analytics has been doing something we currently blame AI algorithms for. Analytics has been biased. Analytics discriminated. And it was public knowledge then as well.
Hold your horses before you conclude that what analytics was doing was despicable. Bias and discrimination need not always be bad. The very nature of analytics forces it to judge the data. In most cases, it needs to categorize something as undesirable that needs to be avoided or eliminated. The problem arises when the undesirable category is based on aspects like race.
Consider optimization problems that have been around much before AI hype swamped us. In the optimization modeling world, there is a saying:
“The Model will sell its grandma for a penny.”
This means that if a model’s objective is cost minimization, it will do anything it can to minimize it. It can favor a decision over another, even if it saves a penny. You control this crazy behavior through constraints etc., to ensure we get realistic results. But this is an example of a bias. In this case, bias is not harmful to the organization and can be managed through other methods.
But this was not always the case. Has any credit scoring agency ever considered the socioeconomic scenario of anyone before downgrading their credit scores? Since their inception? Has that been questioned with the same zeal and enthusiasm? The way their algorithms are designed, eliminating the variables that introduce biases is currently not possible. It is also not impossible to recalibrate these algorithms to make them more fair.
How do you think credit card APRs and mortgage rates are calculated? While race may not be an explicit variable, it still gets baked into these algorithms indirectly. Income and zip codes are examples. Without incorporating these, the lending organization increases its risk exposure.
If NSA were to design an algorithm to identify potential lone wolves for a possible terror attack on US soil, what do you think will be one of the variables that will be leveraged to zero in on the suspect pool? Unfortunate but reality.
So why this heightened attention on bias and ethics for AI algorithms? Of course, to get more coverage. Anything associated with AI gets more attention. But I have rarely seen perspectives on how the bias embedded in the data that these algorithms are trained on can be minimized (full elimination is not realistic). The bias has always been embedded in the input data, irrespective of whether the algorithm was AI.
Design remains a key tool for controlling bias and risks. However, as highlighted above, some types of bias are embedded in the training data. These biases can be corrected with design, but beyond that, it will be at the risk of objectives like risk minimization or profit maximization.
Data blinding is a design approach that can work, but only when it eliminates the bias without hampering the entire efficacy of the algorithm itself. In simple terms, the blinding approach evaluates the efficacy of an algorithm by blinding the algorithm to variables that inculcate the bias. And if the performance does not suffer, the biased variable can be excluded.
But what if the performance does suffer? Well, there is a way around that as well. You can synthesize a certain percentage of input data in a specific way. That will still hurt the predictability but helps eliminate bias somewhat.
AI algorithms are not the only ones biased. We have been using biased algorithms for a while now. Rather than trying to cash in on the AI wave by harping around bias in AI algorithms, we need to recognize the fundamental aspects that lead to biased input data across all analytics algorithms. And then work genuinely towards minimizing or eliminating those biases.

