inSOLVEncy

Project Info

Dmitry's Angels thumbnail

Team Members


4 members with unpublished profiles.

Project Description


Using ML to identify which individuals will commit insolvency by creating a compliance risk model and visualizing the results.


Data Story


Our project inSOLVEnt takes a multifaceted approach to what is a multifaceted problem by creating not only a risk model for addressing non-compliance to personal insolvency, but visualisations and infographics addressing the common factors leading to negative insolvent outcomes.

We utilised the non-compliance personal insolvency data to first identify cases of non-compliance versus compliance. We streamlined this data using other data sources such as Regional Statistics, the ATO GovHack 2018 statistics, and ANZSCO occupation and regional classifications.

Once we had a clean data set we ran through tensa flows to identify a model. We tried neural networks first, which were overfitting and not generalising in our tests. We decided to simplify using linear regressions which worked well. Out of 250,000 records we misidentified 5. This is an incredible accuracy result.
When training our model, we first separated all compliance and all non-compliance. Each where then randomly split using an 80% training, and 20% validation split. As non-compliance events were the minority, this method was to ensure that our training subsets were balanced.

We further delved into these results by isolating Gold Coast data by utilising AS3 data sets. We retrained our model in the same method. Our validation results for the Gold Coast data was 100%. The Gold Coast data was consistent with the National model, reinforcing the robustness of our solution.
We retained both models using mean absolute error, rather than mean squared error, as mean squared error amplifies outliers.

Our model is able to predict non-compliance events to a high degree of accuracy. This risk model can be used by regulatory bodies to target audit and compliance services, and individuals and corporate entities to self-identify their compliance risk.


Evidence of Work

Video

Homepage

High-Res Image

Team DataSets

Australian Statistical Geography Standard (ASGS)

Description of Use The information in this data set was used to isolate insolvency specific to SA3 regions.

Data Set

Gold Coast Streets and Suburbs

Description of Use The information in this data set was used to isolate insolvency information relevant to the Gold Coast area.

Data Set

Address and zone data

Description of Use The information from this data set was used to isolate insolvency cases relevant to the Gold Coast area.

Data Set

ANZSCO -- Australian and New Zealand Standard Classification of Occupations, 2013, Version 1.2

Description of Use The information from this data set was used to streamline occupation codes for the creation of our compliance model and visualization of insolvency in Australian.

Data Set

Gov Hack 2018

Description of Use This data was used to contextualize who Australian taxpayers are and inform our visualization of insolvency in Australia.

Data Set

Regional statistics

Description of Use The information from this data set was used in creating our visualization of insolvency in Australia.

Data Set

Non-compliance in personal insolvencies

Description of Use Information from this data set was used to create our compliance model and in our visualization of insolvency in Australia.

Data Set

Challenges

The Friendly ATO

How can the ATO use artificial intelligence or machine learning to better understand and develop ways to engage with our clients?

Go to Challenge | 15 teams have entered this challenge.

Bounty: Is seeing truely believing?

How can we tell a story with visualisations, that speaks the truest representation of our data?

Go to Challenge | 28 teams have entered this challenge.

Bounty: Visualise the Numbers

How can people better view data on GovCMS in visuals?

Go to Challenge | 10 teams have entered this challenge.

Bounty: Mix and Mashup

How can we combine the uncombinable?

Go to Challenge | 61 teams have entered this challenge.

Best use of Gold Coast Data

Best use of Gold Coast Data

Go to Challenge | 13 teams have entered this challenge.

Out of the Box - New take on data for regional development

Use an existing data set outside its normal context to both display and encourage innovate solutions to regional problems and promote and foster regional economic development.

Go to Challenge | 11 teams have entered this challenge.

To bankruptcy or not to bankruptcy, keeping the process real.

Helping predict non-compliance in the personal insolvency system. How can Artificial Intelligence and Machine Learning assist us in the future?

Go to Challenge | 13 teams have entered this challenge.

Government Services Challenge

How might we better understand citizens' transaction preference and behaviours to make Queensland Government services easier to use?

Go to Challenge | 9 teams have entered this challenge.