Fusing Unstructured Data to Inform Data Standards of AVs

Introduction

In this workflow we aim to take unstructured accident reports and extract references to objects of interest. We achieve this by exploring, extracting, refining, and fusing datasets with no-code graphical queries and helpful AI agents.

Value

While we hope that anyone will find this useful, this particular case study will be especially relevant to individuals in the transport industry.

You could be in the local transport authority looking to identify where high-risk road infrastructures are to minimise the risk of accidents.
You might be in the autonomous vehicles space, searching for a fast and comprehensible way to anticipate objects that might cause a collision.

Try with us!

Jump straight into Conode and follow along as we explore, extract, and fuse the datasets to focus on our objects of interest. If you are a new user, please follow the simple directions to create an account first and then feel free to utilize Conode to your content!

Build your graph

Overview of Data

The 3 datasets we worked with are the Tesla-related deaths, the US-based accidents report from the Fatality Analysis Reporting System (FARS), and the California-based Department of Motor Vehicles (DMV).

Tesla Deaths dataset

Contains information about deaths that involve Tesla cars. It details the location it took place, person(s) who died, description of the incident, sources of news outlets that reported, etc.
Data source.

FARS dataset

Contains information about fatal crashes in the US. It details the crash, the vehicles, location, the people involved, etc.
Data source.

DMV dataset

Contains information in California about drivers, licensing, vehicle registration, etc.
Data source.

Our Process

1. Upload Data

We first upload the Tesla Deaths dataset into Conode, which automatically generates a graph of features and a table to give us an overview of the data.

2. Extract Full Address & Use Geocode Agent

We can see that the data contains a state and country column, but not the exact location of where to accident occurred, so we used the Conode AI Extract Agent to obtain the address from other nodes that might contain the information we require.

Once the address feature has been extracted, we use it as input in the Geocode agent which instantly extracts the longitude and latitude to plot a map view of the data.

Views opened by the Geocode agent

The agent will calculate the latitude and longitude for each data point, then open the following two views:

Extracted Coordinates contains the two new features, alongside the data points they describe
Spatial view longitude vs latitude displays the data points on the spatial map

Understand more about building maps here.

Open view of extracted coordinates in Conode

3. Extract Road Infrastructure and Dynamic Objects

We are interested in the road infrastructures that are involved in accidents, so we ask the Conode AI Extract Agent to find references of them in the description data.

Extract objects of interest from your unstructured data in a matter of seconds and with no code! Simply use natural language to type in your request and the extract agent does the work for you.

We’re also interested in whether these infrastructures are static or dynamic in nature, so we follow the same process to obtain those features. With our spatial view, we can easily see where exactly the accidents involving specific dynamic objects took place on the map.

Open view of extracted road infrastructures and dynamic objects in Conode

4. Adding FARS dataset

Adding on another dataset to your current graph is easily doable in Conode. In this step we import the FARS accidents dataset, and the application automatically produces relevant views -in this case, a table and spatial view.

5. Exploring the data with map

We can simply explore the FARS dataset by looking at their road infrastructure features mapped on the spatial view.

Open view of FARS maps view in Conode

6. Extract Static & Dynamic Features from FARS

As the FARS dataset already has a feature that contains road infrastructure, we only have to ask the extractor agent to classify them into static or dynamic groups, and see this reflected on the map. A quick barplot can also show us how frequent accidents that involve a certain static or dynamic object occur.

Open view of extracted objects from FARS dataset in Conode

7. Adding DMV dataset

We follow the same step 4 to import our third dataset -DMV which is California specific, and step 2, to extract the location from the address feature.

8. Extract Features from DMV dataset

Repeat step 3 to pull out our objects of interest -road infrastructure and static vs dynamic objects. The spatial map is useful in visually identifying where high clusters of, example, accidents involving a curb happen. Additionally, coloring our nodes can also help quickly differentiate if the accident was a static or dynamic type object.

Open view of extracted objects from DMV dataset in Conode

9. Merge all 3 datasets

To obtain a high-level view of all the data we have so far, we can combine them into a single spatial view simply by copy and pasting the nodes.

Merging datapoints from different datasets on a map view is that simple!

Open view of merged datasets in Conode

10. Fuse Taxonomies of Features

We’ve exacted from all the 3 datasets references to road infrastructure and their nature. To fuse them all into a single taxonomy, we again employ the extract agent to create representative groups and segment them by the static or dynamic tag. What we’re presented with is a graph that shows how all the objects we extracted can be generally grouped.

Once again, by using the AI agent, we have managed to fuse objects of interest from all 3 datasets into a single view with only a few clicks.

Open view of fused taxonomies in Conode

Summary

In this workflow we have covered how Conode can be used to extract features from unstructured datasets, and fused easily to visualise all information on a single view. Using the AI geocode agent and no-code feature extractor to act directly on the graph, we were able to quickly produce observable views within an hour.

This can serve as a starting ground for conversations with data developers to inform on the approach of how to take this knowledge and translate to policy making. With a fused, data-rich graph with key extracted features, users will able to analyse potential locations of road risk and the type & nature of infrastructures involved.

Last update: 2025-01-03