Skip to content

Fusing and Comparing Heterogeneous Transport Datasets

Introduction

In Building a Risk Map of New Zealand's Roads we uploaded and analysed the Waka Kotahi's Crash

Analysis System dataset, uncovering hidden patterns and outliers then querying the data to perform a preliminary high-level assessment of the road risk in New Zealand.

In this workflow we will build on our risk map by incorporating two more datasets that provide us with further insight into the risk faced on New Zealand's roads. We will see how Edge can be used to fuse, clean, extract features, then compare such datasets with no-code graphical queries and helpful AI agents.

  • Perhaps you are a Local authority hoping to better allocate the resources you have on hand for infrastructure improvements such that you can minimise the risk of collisions occurring on your roads
  • Alternatively, you may work in the Autonomous Vehicle industry and are looking for a fast
    but comprehensive way to analyse the possible hazards that exist along a route
    on which you plan to deploy your AV.

Overview of Data

The Crash Analysis System

dataset considered in the previous workflow gave us a good overview of the risk space which is covered by collisions documented by the NZ police force. However, the space of near misses - scenarios which almost led to a collision - is vast and should be considered when building such a risk map.

While in Auckland, the dRISK team made some notes and snapped a few photos and videos of potentially high risk road scenarios we observed. Having explored the public accident database, we now wish to incorporate these heterogeneous datasets into our risk map.

Crash Analysis System (CAS)

Large accident report database, detailing road traffic collisions that took place across the country. A 10k sample is considered in this workflow.

some blue text

Media Collection

While images and videos can be uploaded to Edge directly, we had already first packaged this media into a table that contains the geospatial coordinates where the scenarios took place, so we'll add this in to our risk map. some blue text

Notes Taken

While strolling the streets of Auckland, we jotted down descriptions of hazardous moments we observed on the road, alongside the rough location at which we observed these scenarios.

some blue text

Jump straight into Edge using the link below, then follow along as we fuse, clean, extract, and compare these three datasets.

Open the Three Datasets in Edge

The only view in our graph at this point is the "Home" view, containing the taxonomy of each dataset (see starting views). Our goal is to get these three heterogeneous

datasets into one risk map, so we will first need to fuse them geospatially...

some blue text


Extracting Geospatial Coordinates from Written Address

Reading through the taxonomy of our three datasets, you might notice that the CAS dataset and Media Collection datasets contain "latitude" and "longitude" features, while Notes Taken contains only "Address". In order to fuse these three datasets geospatially, we will first need to extract the geospatial coordinates from these addresses. Such feature extraction is easy in Edge as we can simply use the dedicated agent in the RHS drawer.

  1. Select the "Address" feature from the Home view
  2. Open the "Extract Features" from the RHS Drawer
  3. Hit "Extract Coordinates"

Views opened by geospatial coordinate extraction agent

The agent will calculate the latitude and longitude for each data point, then open the following two views:

  • "Extracted Geospatial Coordinates" contains the two new features, alongside the data points they describe
  • "Spatial view longitude vs latitude" displays the data points on the spatial map

Open Graph with Extracted Coordinates in Edge

As the coordinate feature nodes are all we need at this stage, we ensure they're stored in our "Home" view and close these two views for now.


Auto-Fusing

Now that we have geospatial coordinates for all three of our datasets, we can fuse them using the Auto-Fusion agent in the RHS drawer.

  1. Select All in the Home View
  2. Open the Auto-Fusion agent from the RHS drawer
  3. Hit "Auto-fuse"

Now that our three pairs of "latitude" and "longitude" have been simplified into one, we can create a map view of our fused graph by selecting the nodes and hitting New View -> Scatter Plot (see creating scatter plots).

By working directly on our graph alongside AI agents, we have managed to transform and fuse our 3 datasets in only a few clicks, and seconds!

Jump straight to this fused graph via the following link: text Open Fused Graph in Edge


Exploring Fused Graph

We now have all our datasets that could tell the story of risk, present on the same risk map and can work with them all in one place

  • Select the media notes (large, blue nodes) and hit the shortcut w to watch videos and images,
  • Select the written notes (large, pink nodes) and hit . to open node properties such that you can read through the scenario description,
  • Open the Inspect agent while selecting any of nodes representing accident reports (small, black nodes) to browse through all details collected on the incident reports. Note that the size of node here represents the number of fatalities recorded for each incident, as a suggested measure of risk. You can resize and recolour as you wish by editing the node properties.

Edge enables you to interact directly with all of your unstructured, structured, and mixed-media data sources, from a single fused view.


Cluster Datasets

Let's identify the hotspots of high-risk around the country. To do so, we simply select all the data points plotted on the map, then open the Clustering agent

from the RHS drawer and enter the number of clusters we wish to consider.

Running clustering algorithms across heterogeneous datasets is as simple as a single button click in Edge

In this example, we split up the risk map into 20 unique clusters, create a barplot to check out their relative size, Colour by Group to visualise our clustering across the map, then focus in on the top 5 by removing the rest from the view.

Take a look at the clustered graph in Edge: text Explored Clustered Spatial View in Edge

Exporting and cleaning back up our spatial view

Before proceeding, let's save our work so far and clean back up the nodes.

some blue text
some blue text some blue text

To export our "Top Hotspots Barplot" view:

  1. Select all nodes in the barplot view
  2. Navigate from Export -> Selected subgraph as .csv to export our node groupings into a .csv table

some blue text

Note you can also directly share the view with a colleague, share the full graph, or simply edit the appearance further then screenshot to add to a presentation! See How to export and share graphs.







To clean up the appearance of our map view, we remove the Cluster nodes by first searching (cmd-f or ctrl-f) for "Cluster", then using Remove from view located in the right-click context menu. We also revert the node colouring by selecting all the nodes, then choosing a black colour from the colour pallette, located on the right hand side of the top menu bar.


Extract Features from Free Text

The nodes which represent Media and Notes have descriptive labels, containing important information about road risk that we may want to extract. To do so, we can use the in-app LLM agent to ask natural language questions of these data.

  1. Select the nodes with descriptive labels you wish to ask questions of
  2. Open the "Extract Features" agent from the RHS agents drawer
  3. Provide a prompt into the LLM text box

The agent will run, then open a new view titled "Extracted Features" containing the new feature nodes alongside the data points that contained each feature. Explore and cross-highlight with the map view to identify patterns in hazards observed at each recorded event.

Form Extract from your unstructured data sources in seconds and with no code - simply ask questions of your data in natural language and the in-app agents will do the hard work for you


Comparing the Datasets

With our fused graph and newly extracted feature nodes, we can now explore and compare these sources of road risk. The goal is to uncovering patterns of hazardous behaviour that could inform an Autonomous Vehicle route risk assessment, or data-driven decision about resource allocation for road infrastructure improvements across Auckland.

Open Views in Edge

Across these written observations and media recordings, instances featuring cones and animals are notable, yet they are absent from the Collision Analysis System (CAS) records in Auckland. Consequently, the risk space depicted in near-miss datasets differs from that represented in the collision database.

Furthermore, the locations where scenarios involving e-scooters were observed demonstrate minimal overlap with areas reported in police records (CAS collisions).

These insights emphasise the importance of considering more than just the publicly available accident databases when performing a comprehensive risk assessment.


Summary

In this workflow we have covered how Edge can be used to extract and transform features from across heterogeneous datasets such that they can be fused and quickly analysed in one place. Using AI Agents and no-code queries directly on the graph, we uncovered valuable insights that would not have been visible if each dataset was to be considered alone.

We hope Edge can be used to enable more individuals across New Zealand to make faster, fully-informed data driven decisions from heterogeneous datasets related to road risk, which will accelerate New Zealand’s Road to Zero strategy.


Last update: 2024-06-07