The Role of Data Science in Short-Term Lending

A significant portion of the U.S. population needs short-term credit.  In fact, short-term credit, or “payday loans,” as they are often called, are used by over ten million households every year.  What drives this need? According to a 2012 study from The Pew Charitable Trusts, 69% of borrowers use these loans simply to manage recurring expenses.  An additional 16% use short-term credit to help manage unexpected or emergency expenses and only 8% use them to buy something special.

The overwhelming demand for short-term loans is not a luxury; it represents a very clear need for many people.  And yet, there are not many options for people in this position. In the absence of payday loans, borrowers are more likely to cut back on expenses (81%), delay paying bills (62%), borrow from friends or family (57%) or sell personal possessions (57%) than they are to seek a bank or credit union for a short-term loan.

These choices are at least in part the result of a lack of fast, short-term credit available at traditional financial institutions.  Products have been slow to materialize because banks and credit unions fear that providing short-term loans will damage their reputation and drain resources.  They also fear that changing regulations may be hard to manage. But seemingly more prominent is fear of the financial uncertainty associated with the unknown performance of these loans.

For those institutions that can overcome these fears, there is the very real potential to not only help a disadvantaged sector of their communities, but also build a new, sustainable business.  Minimizing many of the most cited risks associated with short-term lending requires embracing the principles and practice of data science. For many institutions, loan performance confidence is the critical factor underpinning all the others – and this is exactly what data science and deep analytics can provide the growing world of short-term lending programs.

The Challenges of Heuristics and the Need for Data Science

Why is data science at the core of this solution?  Because in its absence, short-term lending programs can run into a very serious set of limitations. Based upon ten years of experience in this field, the Washington State Employee Credit Union developed successful practices founded upon data science and deep analytics.  When those techniques are applied, many of the fears that surround short-term lending can be converted to increased confidence and performance.

Inability to understand performance fluctuations:

Most short-term lending programs rely on some kind of heuristic approach.  Heuristic models are, by definition, based on intuition. To be fair, a heuristic approach is often the most useful place to start because [a] it is often sufficient to serve immediate needs, and [b] allows a process to begin and data to be collected.  A heuristic model however cannot be expected to provide optimal results. It is not based on a rigorous approach to understanding the relationship between program lending outcomes and the available data.

As a result of this lack of a scientific approach, it can be very difficult to determine cause and effect when program performance changes.  An analytical approach relies on mathematically/statistically analyzing the relationship between program lending outcomes and data. When loan performance changes, well-defined levers can be brought to bear to manage performance in the short-term while, over the longer term, data can be collected and systematically analyzed to understand the events.  The statistical models allow the investigator to analyze the impact of a change in one factor at a time. This is indispensable when managing a loan program where many key variables are changing at the same time.

Pixabay: Pexels

Inability to optimize:

Optimization implies reaching a desired goal efficiently or at the lowest possible cost.  It also implies pushing the boundaries of what is possible. In terms of a payday lending program, optimization might mean adjusting the level of predicted risk of default to increase desired lending volume at the lowest possible rate of default or charge-offs.

Whereas heuristic models cannot be expected to optimize a lending portfolio, the analytical model is well suited to computationally optimize for a stated goal through a process of implementation, observation, and adjustment.  Intuition is an important part of the model development process, but the variables deemed important and the weights attached to them are based on educated guesses. Such an approach has obvious limitations.

Difficulty efficiently leveraging data in underwriting:

Efficiently leveraging data is simply the idea that the data is used in a way that maximizes the useful information extracted from it.  A heuristic model makes it difficult to leverage data properly in the loan origination process.

The relationship between the key measures of performance and the data is likely to be ever changing.  Models tend to depreciate over time, and they need to be refreshed or updated to include the most recent trend data.  With a heuristic model, it is difficult to know what those adjustments should be and when they are necessary. The result is a process of relatively uninformed guessing.

With a data science, approach the data itself suggests what the adjustments should look like and takes some of the guesswork out of model upkeep.  The process is still iterative, but iteration to an appropriately weighted set of variables is much faster.

Building A Three-State Analytical Solution

Short-term lending success is based on overcoming these challenges and building an analytical model that is reliable, tested and provides an institution the tools it needs to effectively manage and improve its program over time.  That model can be achieved by understanding and executing three critical stages.

State 1:  Moving from a heuristic to an analytical model.

The process of moving from a heuristic model to an analytical model means committing to a process of scientifically and systematically learning to use the data to make loan portfolio decisions while carefully moving away from the more trial and error approach.

It must be emphasized that moving from a tried approach to a brand new one requires careful implementation to mitigate risks.  The nature of data and statistics is one of dealing with and limiting the role of uncertainty in decision-making. However, uncertainty is never fully eliminated.  If a heuristic model has achieved an acceptable historic record and an analytical model is to be used for the first time, the approach should involve a careful roll out with constant monitoring/assessment of the analytical model.  The heuristic model can be used as a fall back model until the performance of the new model matches and exceeds the performance of the heuristic model. This should be a relatively rapid process.

While the original heuristic model is a useful fall back during this phase, the greatest benefit of the heuristic model is the data it generates.  During this phase of the process data on key variables such as the default rate and variables that are deemed to be good predictors of default are collected from the period over which the heuristic model has been used.

Once a sufficient amount of data is generated, the benefits of moving to a data science based approach can be realized.  These variables are used to estimate an appropriate statistical model. The results allow the calculation of the best possible estimate of the probability, or chance, that a borrower will not default.

The resulting model is then applied to a set of data held out for the purpose of evaluating the model’s performance.  This is data that the model has not “seen”. The analysis defines a tradeoff between identifying borrowers who, if given a loan, will repay the loan (called a true positive) and those who, if given a loan, will not repay (called a false positive).  This tradeoff is used to contract a Receiver Operator Characteristic curve or ROC curve. If a probability of repayment of, for example, 80% is used in determining loan approval (any applicant with an 80% or greater chance of repayment is approved) then the rate or percentage of true positives identified and the rate or percentage of false positives identified can be calculated.  This is true for any chosen probability of repayment cutoff. It is this information that allows the model to optimize for various goals. For example, it is common to optimize for the difference between the true positive rate and false positive rate, and optimization known as the K-5 statistic.

Stage 2:  Moving from an initial analytical model to an analytical infrastructure that allows for continuous model adjustment and improvement.

Implementing a data science model requires more than collecting data and estimating a good model.  Models go through a continuous cycle of development, testing, implementation, depreciation and updating.  New processes, hardware and software are required to move from an initial set of model estimates found in the development phase through the implementation and the ultimate refreshment of the model with new variables or coefficients.  These processes, hardware and software comprise the analytical infrastructure.

Legacy infrastructure typically exists as a result of the heuristic model.  However it is likely that additional pieces of software will be needed (e.g., to accomplish tasks such as translating model coefficients) for easy application in applicant scoring or running multiple models at the same time to compare results.  If new metrics are used in evaluating models, data engineering work must support the efforts to implement a data science infrastructure. Without an adequate infrastructure to support the data science model, it cannot be put into production.

Stage 3:  Using the analytical infrastructure to manage an optimized portfolio, allowing for diagnosis and improved performance over time.

Once the analytical model is developed and implemented within the analytical infrastructure, it is put into production and used for scoring credit and approving loans.  Key metrics of loan portfolio performance are developed and monitored, and goals are set relative to those metrics to define optimal portfolio performance. These metrics are a function of many factors, some within our control and some outside of our control.  Diagnosing problems with performance through clearly defined metrics and knowing what levers can be used to attempt to influence those metrics allows for improvement of performance over time.

Before Getting Started

Before any organization can even begin implementing a data science program, whether as a tool for developing loan products or as a means of instrumenting and measuring business performance , there are a host of activities and tasks that must be performed in preparation.

A new data science program requires a lot of data as raw material.  The systems that currently produce and use that data are often located throughout different functional departments and groups within a financial institution.  This means sorting out where the most useful data is being produced, and figuring out how to bring it together in a useful form. The only way to do that is to begin developing a strong cross-organizational team to work across the organization.  In addition to identifying people with relevant knowledge and skills across different domains, time and resources need to be re-allocated for these new activities.

This new cross-organization team needs to develop a common language, new relationships and new workflows.  The terminology, nomenclature, workflows, pacing and skills that make them successful in their current roles may also cause unexpected problems.  Even simple words that seem obvious, shared and common have proven to convey different meanings in different business contexts.

Current systems will require the development of custom code to support the data science program.  All that valuable data is produced (or not) by systems that were created to perform very different types of tasks, typically operational functions.  On the surface, this may sound fairly straightforward. Unfortunately, it is not. Identifying, evaluating, assessing and preparing that data is a specialized and challenging process that typically requires it be performed by experienced data science personnel.

New data science programs generally require a breadth and depth of technical talent that your organization likely does not possess.  Data Science is an emerging frontier, and the only organizations that currently have access to the appropriate skill sets are actively engaged in performing data science already.  There are other challenges to be faced along the path to preparation, but gaining access to the right people and skills is likely to be one of the greatest!

The broad buckets listed above – the formation of your internal team from existing departments, supplementing that team with qualified data science specialists to direct the creation and operation of new systems to acquire and apply the relevant data sets – speak to the planning and execution elements of your new program.  But any new program is going to impact current business in profound ways that are hard to anticipate from the outset. To be effective, businesses will have to change their approach, strategy, and likely their culture as well.

Next Steps

No matter how good the existing team is, once the project is underway new information about your business, people or capabilities can challenge any organizations’ plans.  As a result, the executive team will need to buy in completely to the value and the vision. Adopting a short-term lending program and the data science needed to effectively execute it, requires work and new thinking.  But the good news is that there is a lot of support to help your financial institution take advantage of the new revenue streams, new customers and new populations that you can help serve. Consider the following three steps as a starting point:

  1. Find a partner with experience.  There are firms with experience in this very area that have been delivering programs like this for years.  Talk to them and learn from their mistakes.
  1. Find a data science resource.  As mentioned, most organizations entering this area do not have all the expertise they need on staff.  There are firms that specialize in providing sophisticated data science support in the financial industry.
  1. Select a technology that already incorporates data science.  Seek out technology platforms that already incorporate deep analytics into their platform.  More than anything, it can ease your approach to moving past a heuristic model to a rigorous analytical approach.

Short-term lending provides a set of compelling opportunities for many financial institutions.  A large population that remains either under-served, or badly served, by current providers desperately needs these products.  Realizing this opportunity while minimizing risk requires a new challenge – data science. It carries its own risks. But with a clear understanding of the benefits, barriers and best practices, financial institutions can use the effectiveness of data science to power an entirely new line of business and continue to evolve and meet the needs of their communities.

Share