While Artificial Intelligence and Machine learning-powered systems are becoming an essential part of the decision-making process, policymakers are still concerned about the bias these systems might take into account. For NBFCs, it has become of utmost importance to make their machine learning models bias-free as biased systems could cost them both customers and revenue.
Lendingkart, a provider of MSME lending in India is on a similar road to make its systems and models fair.
“When I joined here, everybody had a sense of what the model was doing. So they knew if the customer has more bounced cheques in the bank statement, he would be a higher risk rate. So interpretation was available but was anecdotal. Since we are a growing business and therefore every rupee that I don’t lend, the revenue team comes back and questions why has this happened,” said Abhishek Singh, Chief Analytics Officer at Lendingkart.
A slightly high-risk rated profile meant lower exposure to that particular customer. The revenue teams and business teams at Lendingkart were coming back to IT teams and questioning why a particular borrower was rated B and not A and therefore approved for a certain amount.
The interpretation of data was easy because the company could see acceptance or rejection but explainability was a challenge.
“What we have done over the period of time is that we have created a highly explainable ML model. We have transparently created features to ensure that we are able to fully explain as which components are taking the lending decision. The second thing which we have done is assigning codes to the risk rate of a particular customer. So we know that this particular score is because the variable 1, 2, and 3 have these values,” Singh explained.
Singh believes that the foundation of a fair ML system is to make sure that it is consistent. The stability of the model has to be crystal clear to have predictable outcomes. Having biases in machine learning systems could cost a company new segments of customers. This bias could also creep in when the business expands. The moment we have a bias in the decision-making process, the new incoming population starts to suffer.
Explaining with an example, Singh said, “In initial time, if the model is receiving inputs for a particular “x” variable based on which the model is trained to provide output. As the business expands, there will be inputs received from variables “y” and “z” and less of “x”, which might result in rejections/ negative results. It makes the business miss the upcoming “y” and “z” customer profiles which are equally worthy borrowers as “x” but model might fail to capture it due to bias towards “x”
Lendingkart is taking the following steps to make its systems bias-free:
1.Monitoring the data and systems regularly:
Once Lendingkart has created an output which is either an acceptance or rejection over the particular borrower, it continues to monitor the results. “We use the data which is coming from the credit bureau to look at both acceptances and rejections to figure out if there are any false positives or false negatives that the model has identified. If so how do I continue to keep improving the performance to ensure that I do not miss out on customers whom I should have approved or at the same time should not have approved the who perform badly in my books or any other books. So it needs continues review using data which is available through the credit bureau,” Singh explained.
2. Actively looking at bias
Singh said that whenever the company created a model, it made sure to eliminate all kinds of bias.
Citing an example he said, “Back in 2016 when we had launched the first model, we were only giving loans for 12 months loan. Cut to 2018 when we started giving 24 months or 36-month loans. Meaning there was inherent data bias with my performance data availability because the entire performance data was available for 10-12 months of tenure.”
To make the systems fair and bias-free, the organisation actually took a call and went to the Credit bureau to figure out the performance of 24-36 months and tried to build it back into their model.
“But more importantly, as I started sourcing the particular population of 24-26 months on a continuous basis we tried to evaluate if the model using the bureau data is also making the same evaluations. We kept on tweaking the model and the latest model is now able to properly to predict all kinds of tenure and is performing well,” Singh added.
3.Matching the characteristics of training data to incoming data
Distribution of characteristics of training data should not be too different from the incoming population. The moment the incoming population starts deviating from the basic characteristics of training data, problems arise. Meaning that is a high time to modify and tweak the model.