Building a Risk Map of New Zealand's Roads

Introduction

In this workflow we will use Conode to quickly find insights from across a public road traffic event database. We will cover how Conode can be used to automatically reveal outliers and hidden patterns from across a dataset's many dimensions, and accelerate exploratory data analysis to ultimately build a personalised risk dashboard within minutes.

Overview of Data

The dataset that we will be loading and exploring today is NZTA Waka Kotahi’s Crash Analysis System dataset which tracks traffic incidents across

New Zealand. It contains details such as the types of vehicles involved, local environment features, and weather conditions during each event.

The dataset comes in the form of a large, sparse, tabular .csv file which can be overwhelming and challenging to query without code and a clear direction.

Let's use Conode to quickly make sense of this dataset.

From Upload to Dashboard

Data can be imported into Conode using the API, by connecting to a database, or by uploading a file from your local drive. During upload, all your data is converted automatically into a knowledge graph which is then presented back to you in the form of a simplified dashboard containing views on the data that Conode thinks might be of interest to you.

Reviewing Geospatial Coverage

Conode recognised spatial dimensions and so opened a map view to show us the geospatial distribution of our data. For example, by selecting the event nodes located in the North Island and then checking out the node count (bottom LHS of view), we immediately find out that the vast majority of incidents take place in the North Island, compared to the South.

some blue text

You can toggle on and off the google-maps style background using the Globe icon in the tool bar.

some blue text

Verifying an Anomaly

During upload, Conode will search for fields in the taxonomy which hold unusual distributions, and present them to you during upload. In this case, we find a Histogram of the "Longitude" field has been opened.

some blue text some blue text

Over on the LHS of the view, we can see some events that have much lower values of Longitude that the rest. Why might this be?

some blue text some blue text some blue text some blue text

some blue text

By selecting individual nodes and opening the Inspect agent from the RHS drawer, we can read through all the details logged for each of these collision events. A feature related to their location which pops up is "Chatham Islands Territory".

Having recognised the spatial dimensions, a bolstering agent which works in the background during data upload has assigned a google maps link to every collision record, such that once selected, we can use the Open URL button from the context menu to jump over to google maps and check out the environment at which the collision took place.

some blue text

They did indeed take place on the Chatham Islands, and as such this anomalie is likely the product of the coordinate conversion between NZMG and Lat/Long. At this point, we can either update their coordinate values by moving them in the view, or simply not consider them by removing them

from our histogram and geospatial map views.

Conode helped surfaced outliers which we can quickly validate, then clean or remove from our investigation.

Backtracking Outlier to Source

Let's now take a look at this embedding view which was created for us during import...

First, remind me, what actually is an embedding again?

Embeddings are a common AI tool that cluster records based on their similarity according to the features that define them. In this case, all the event features we see in the "Home" view were used to cluster the collisions such that similar collision types will be located close together in the embedding.

Embeddings are a great tool for working with high-dimensional data as we just uploaded the dataset and have got an immediate feeling for the size, shape and even underlying patterns that are not easily noticeable in the tabular form.

Querying the Data

We can quickly answer a few questions of the dataset...

Where do collisions involving pedestrians take place?

We can use Conode to quickly select subsets of our data. If we are interested in the events that involved a pedestrian hazard, simply click on the “Pedestrian” feature in the Home view then Select -> Successors. All the collisions that were connected to the pedestrian node are now highlighted, giving us immediate access to the spatial distribution.

What is the weather distribution across these collision events?

Select the "Weather A" feature from our Home view, followed by New View -> Barplot.

Immediately we can see that more collisions take place in light rain than heavy rain, and by selecting each group of nodes we can see where across the country such weather events are most likely.

What are the highest risk areas of New Zealand?

We can colour the whole view by node density using Colour By -> Density. We find that the densest region is of course Auckland in the North Island, which makes sense with respect to the country's population distribution. some blue text

But Conode is flexible in that every tool can be applied on any subset of the data you care about. So if we select the events that took place in Auckland, then use Select -> Inverse and reapply the density colouring, we can colour by density again but this time exclude the Auckland region.

Red hotspots pop up elsewhere around the country. We can quickly find out that what differentiates these data points from the rest by simply highlighting these dense regions and checking out the Describer tool. In the example above, the describer tells us these collision events take place in Christchurch city, in the Canterbury region.

Personalising the Dashboard

Since the dataset we are examining is essentially a log of vehicle collisions across New Zealand, let's transform it into a risk map. We will first categorize the incidents based on the severity of the collision.

Search for the "crashSeverity" feature in the Home view
Select the "crashSeverity" node, then navigate from New View -> Bar Plot to open the distribution of this dataset across severity in a new view
Update the node size and colour of each node group to differentiate each class of event

We have now personalised our dashboard to display road risk across New Zealand. Let's zoom and pan through the view to identify patterns of risk at the local region or road level.

Beyond Auckland which we know is a hotspot in general, we can see a few highways which have suffered a noticeably large number of fatal collisions. For example, the State Highway 1 in the Kapiti Coast region.

Having identified a local area with high road risk, we may want to make a note of this insight. To do so, group the collisions that took place here under a new dedicated header by first selecting the data points then creating a new node and assigning it a meaningful label. Following this, there a few options for how we can export and share this insight:

Share the graph with your team or individual email addresses using the Share tool located in the top RHS of your screen
Share just this spatial view by right-clicking on the view title and selecting Copy View URL
Export the view to a csv by selecting all nodes in the spatial view and then using Export -> Selected subgraph as .csv

In a matter of minutes, we have identified a region of high risk, recorded this insight, and shared it with a colleague.

Summary

Within seconds of uploading the NZTA Crash Analysis dataset to Conode we received fully-interactive dashboard showing us the spatial and taxonomic coverage of road risk in New Zealand, as well as outliers and patterns hidden across the dataset's many dimensions. From here we:

Developed an understanding of our data's distribution across taxonomy features.
Investigated patterns surfaced by the tool and traced clusters to their origins.
Addressed outliers with contextual assistance provided by the tool.
Identified collision classes exhibiting unexpected properties.
Formulated data inquiries and personalized views based on relevant metrics, such as severity.
Pinpointed high-risk regions for further investigation or collaboration with colleagues.

In the subsequent workflow, we'll delve deeper by integrating additional diverse data sources into our risk mapping analysis.

Next Workflow →

Last update: 2025-01-06

Building a Risk Map of New Zealand's Roads

Introduction

Overview of Data

From Upload to Dashboard

Reviewing Geospatial Coverage

Verifying an Anomaly

Backtracking Outlier to Source

Querying the Data

Personalising the Dashboard

Sharing Insights

Summary