Fusing and Comparing Heterogeneous Transport Datasets
Introduction
In Building a Risk Map of New Zealand's Roads we uploaded and analysed the Waka Kotahi's Crash
Analysis System dataset, uncovering hidden patterns and outliers then querying the data to perform a preliminary high-level assessment of the road risk in New Zealand.
In this workflow we will build on our risk map by incorporating two more datasets that provide us with further insight into the risk faced on New Zealand's roads. We will see how conode can be used to fuse, clean, extract features, then compare such datasets with no-code graphical queries and helpful AI agents.
- Perhaps you are a Local authority hoping to better allocate the resources you have on hand for infrastructure improvements such that you can minimise the risk of collisions occurring on your roads
- Alternatively, you may work in the Autonomous Vehicle industry and are looking for a fast
but comprehensive way to analyse the possible hazards that exist along a route on which you plan to deploy your AV.
Overview of Data
dataset considered in the previous workflow gave us a good overview of the risk space which is covered by collisions documented by the NZ police force. However, the space of near misses - scenarios which almost led to a collision - is vast and should be considered when building such a risk map.
While in Auckland, the we made some notes and snapped a few photos and videos of potentially high risk road scenarios we observed. Having explored the public accident database, we now wish to incorporate these heterogeneous datasets into our risk map.
Crash Analysis System (CAS)
Large accident report database, detailing road traffic collisions that took place across the country. A 10k sample is considered in this workflow.
some blue text
Media Collection
While images and videos can be uploaded to conode directly, we had already first packaged this media into a table that contains the geospatial coordinates where the scenarios took place, so we'll add this in to our risk map. some blue text
Notes Taken
While strolling the streets of Auckland, we jotted down descriptions of hazardous moments we observed on the road, alongside the rough location at which we observed these scenarios.
some blue text
Jump straight into conode using the link below, then follow along as we fuse, clean, extract, and compare these three datasets.
Open the Three Datasets in conode
The only view in our graph at this point is the "Home" view, containing the taxonomy of each dataset (see starting views). Our goal is to get these three heterogeneous
datasets into one risk map, so we will first need to fuse them geospatially...
some blue text
Extracting Geospatial Coordinates from Written Address
Reading through the taxonomy of our three datasets, you might notice that the CAS dataset and Media Collection datasets contain "latitude" and "longitude" features, while Notes Taken contains only "Address". In order to fuse these three datasets geospatially, we will first need to extract the geospatial coordinates from these addresses. Such feature extraction is easy in conode as we can simply use the dedicated agent in the RHS drawer.
- Select the "Address" feature from the Home view
- Open the "Extract Features" from the RHS Drawer
- Hit "Extract Coordinates"
Views opened by geospatial coordinate extraction agent
The agent will calculate the latitude and longitude for each data point, then open the following two views:
- "Extracted Geospatial Coordinates" contains the two new features, alongside the data points they describe
- "Spatial view longitude vs latitude" displays the data points on the spatial map
Open Graph with Extracted Coordinates in conode
As the coordinate feature nodes are all we need at this stage, we ensure they're stored in our "Home" view and close these two views for now.
Auto-Fusing
Now that we have geospatial coordinates for all three of our datasets, we can fuse them using the Auto-Fusion agent in the RHS drawer.
- Select All in the Home View
- Open the Auto-Fusion agent from the RHS drawer
- Hit "Auto-fuse"
Now that our three pairs of "latitude" and "longitude" have been simplified into one, we can create a map view
of our fused graph by selecting the nodes and hitting New View
-> Scatter Plot
(see creating scatter plots).
By working directly on our graph alongside AI agents, we have managed to transform and fuse our 3 datasets in only a few clicks, and seconds!
Jump straight to this fused graph via the following link: text Open Fused Graph in conode
Exploring Fused Graph
We now have all our datasets that could tell the story of risk, present on the same risk map and can work with them all in one place
- Select the media notes (large, blue nodes) and hit the shortcut
w
to watch videos and images, - Select the written notes (large, pink nodes) and hit
.
to open node properties such that you can read through the scenario description, - Open the Inspect agent while selecting any of nodes representing accident reports (small, black nodes) to browse through all details collected on the incident reports. Note that the size of node here represents the number of fatalities recorded for each incident, as a suggested measure of risk. You can resize and recolour as you wish by editing the node properties.
conode enables you to interact directly with all of your unstructured, structured, and mixed-media data sources, from a single fused view.
Cluster Datasets
Let's identify the hotspots of high-risk around the country. To do so, we simply select all the data points plotted on the map, then open the Clustering agent
from the RHS drawer and enter the number of clusters we wish to consider.
Running clustering algorithms across heterogeneous datasets is as simple as a single button click in conode
In this example, we split up the risk map into 20 unique clusters, create a barplot to check out their relative size, Colour by Group to visualise our clustering across the map, then focus in on the top 5 by removing the rest from the view.
Take a look at the clustered graph in conode: text Explored Clustered Spatial View in conode
Exporting and cleaning back up our spatial view
Before proceeding, let's save our work so far and clean back up the nodes.
some blue text
some blue text
some blue text
To export our "Top Hotspots Barplot" view:
- Select all nodes in the barplot view
- Navigate from
Export
->Selected subgraph as .csv
to export our node groupings into a.csv
table
some blue text
Note you can also directly share the view with a colleague, share the full graph, or simply edit the appearance further then screenshot to add to a presentation! See How to export and share graphs.
To clean up the appearance of our map view, we remove the Cluster nodes by first searching (cmd-f
or ctrl-f
)
for "Cluster", then using Remove from view
located in the right-click context menu. We also revert the node
colouring by selecting all the nodes, then choosing a black colour from the colour pallette, located on the right
hand side of the top menu bar.
Extract Features from Free Text
The nodes which represent Media and Notes have descriptive labels, containing important information about road risk that we may want to extract. To do so, we can use the in-app LLM agent to ask natural language questions of these data.
- Select the nodes with descriptive labels you wish to ask questions of
- Open the "Extract Features" agent from the RHS agents drawer
- Provide a prompt into the LLM text box
The agent will run, then open a new view titled "Extracted Features" containing the new feature nodes alongside the data points that contained each feature. Explore and cross-highlight with the map view to identify patterns in hazards observed at each recorded event.
Form Extract from your unstructured data sources in seconds and with no code - simply ask questions of your data in natural language and the in-app agents will do the hard work for you
Comparing the Datasets
With our fused graph and newly extracted feature nodes, we can now explore and compare these sources of road risk. The goal is to uncovering patterns of hazardous behaviour that could inform an Autonomous Vehicle route risk assessment, or data-driven decision about resource allocation for road infrastructure improvements across Auckland.
Across these written observations and media recordings, instances featuring cones and animals are notable, yet they are absent from the Collision Analysis System (CAS) records in Auckland. Consequently, the risk space depicted in near-miss datasets differs from that represented in the collision database.
Furthermore, the locations where scenarios involving e-scooters were observed demonstrate minimal overlap with areas reported in police records (CAS collisions).
These insights emphasise the importance of considering more than just the publicly available accident databases when performing a comprehensive risk assessment.
Summary
In this workflow we have covered how conode can be used to extract and transform features from across heterogeneous datasets such that they can be fused and quickly analysed in one place. Using AI Agents and no-code queries directly on the graph, we uncovered valuable insights that would not have been visible if each dataset was to be considered alone.
We hope conode can be used to enable more individuals across New Zealand to make faster, fully-informed data driven decisions from heterogeneous datasets related to road risk, which will accelerate New Zealand’s Road to Zero strategy.