Exploring Heterogeneous Data from Across the Globe
The Dataset
The following workflow uses data from the Autonomous Vehicle Sandbox which should be available in the "Example Graphs" section of your homepage. You can also access a copy of the graph by selecting the following link:
Open Autonomous Vehicle Sandbox in conode →
The AV sandbox dataset contains a fusion of structured and unstructured datasets, static video data from accident hotspots, dash-cam videos, media articles, and simulated recreations of risky events to map out the tricky events and phenomena autonomous vehicles face on the road.
Navigate through the landscape of risk
Let's begin our exploration with a high-level overview of this dataset, using the following spatial view: "Example Data Sources - Spatial View".
- Open the spatial view using the Navigation drawer on the LHS of your screen
- Zoom using your mouse wheel or the up/ down motion of two fingers on your track-pad
- Pan within a view by pressing down and dragging a right-click, or holding down the space bar and moving your mouse around in the view
- Hover your mouse over nodes to read their labels
- Count how many traffic incidents took place in different regions by highlighting groups of nodes and observing the count displayed on the bottom LHS of your screen.
Read, watch and explore each data source
Key details about these traffic incidents such as their location, data type and source, are stored in a Taxonomy such as that in the view "Example Data Sources - Taxonomy View".
What is a Taxonomy View?
The nodes in this taxonomy represent scenario features such as details of the local environment or type of entities
involved in the incident, and connect via an edge to the individual scenarios which contain that feature. For example,
the bus hazard
node will point to all traffic incidents which involved a bus. We dub our taxonomy nodes “tags” if
they are categorical (e.g. junction type) and “metrics” if they are continuous (e.g. the speed of a vehicle).
We use such taxonomies to organise and analyse our database. For example, if we want to find out which of our data points represent scenarios that involved a certain feature, we need only follow the following two steps:
- Highlight the taxonomy node of interest
- Press the down-arrow
↓
key on our keyboard (or theSelect Successors
button within theNode(s)
menu) to select the successors of your taxonomy node.
All the traffic incidents which contain your chosen annotation will now be highlighted. We will use these two actions to now dive into examples of each data source in our database.
Within the "Example Data Sources - Taxonomy View" you will find the Data Source
taxonomy category which
identifies the origin of the data for each real-world traffic scenario in the spatial view. For example all
blue-y purple nodes represent
video from high-risk locations, green node contain written descriptions of incidents,
and black represent collisions reported in structured accident databases. We can use the actions outlined
above to find out where each type of data takes place across the globe, and then take a look at specific examples.
Tabular Databases
Written reports
Videos
Finding out everything there is to know about each data point
By reading node labels, watching their videos, and checking out their URLS we get a holistic and context-rich understanding of each individual traffic incident. But we can go one step further in finding out everything there is to know about our data points by gathering all their connected annotation nodes:
- Create a new view
- Copy and paste any node of interest into the new view
- Turn on the labels.
- Select the node, and gather all Predecessors using right-click -> Get -> Predecessors union. All features of your chosen collision will now appear in the view
- Lay out the nodes using force-direct or horizontal layout button - You can now see all features which were included in the original data source, as well as those automatically extracted using dRISK’s NLP or CV pipeline.
Alternatively, simply highlight your nodes of interest from the spatial view, and in the Select
menu bar press
Select Predecessors
. All relevant scenario features will now be highlighted in the taxonomy view. You can move
these aside and/or turn on all their labels for a quick overview.
Query the database
We have now seen that using a geospatial view alongside a taxonomy can give a full picture of our database while maintaining traceability to each data point's source. By exploring the content and connected annotations to each node we can understand exactly what they represent more deeply. Now let’s use conode to quickly answer some questions about this dataset.
Where did collisions involving pedestrians at crossroads taken place?
To answer this, we will grab the pedestrian hazard
and x intersection
annotation, then select the intersection of
their successors:
- Locate the annotations of interest using the search tool (
cmd-f
on Mac,ctrl-f
on Windows) or by panning and zooming - Highlight the annotations by selecting one followed by the other while holding down
shift
- Navigate to the
Select
menu at the top of your screen, then pressSelect Successors Intersection
The scenarios which took place on a crossroad intersection, and involved a pedestrian hazard will now be highlighted in your spatial view. You can read the total number of such incidents in the node count at the bottom LHS of the view border.
We now know how many instances of this particular scenario type we have in our database, where exactly they took place. You can use the above steps to locate any subset of your dataset based on annotations you care about. For example, if you wished to filter by the incidents which originated from videos, or written reports, you can simply include that annotation in your search (Step 2).
At which type of junction do most collision occur in Missouri?
- Create a new view for our investigation and relabel it accordingly
- Gather all the incidents which took place in Missouri using a node label search, and paste them into our new view
- Highlight, copy and paste all the
juction type
annotations from the "Example Data Sources - Taxonomy View" view into our new view - We now have everything we need, so we can close the Taxonomy and Spatial view. Remember you can always open them again using the Navigation Drawer on the LHS of your screen.
- To organise the scenarios node by junction type, we can use the Horizontal layout button from the
View
menu - Highlight everything in your view (
cmd-a
on a mac,ctrl-a
on Windows) and select theBar Plot
button in theTools
menu
It is clear that the majority of collisions took place not at a junction. There were 116 in total from this subset of 2018 FARS, and can see their exact locations on the map by highlighting each scenario node.
If you wish to return to this bar plot, you can now give the view a more memorable name and close it. Alternatively
delete these views using the Delete View
button in the View
menu.
How many collisions involving 3+ vehicles took place in the UK?
- Create a new view for our investigation and relabel it accordingly
- Gather all the incidents which took place in the UK by highlighting them in the spatial view, and paste them into our new view
- Lay out the data points according to how many collisions were involved in the incident,
by assigning the
vehicle_count
metric as the y axis - Highlight all those with y-values of 3+
Using the above steps we quickly discovered that 413 of the scenarios recorded in this subset of 2018 STATS19 involved 3 or more vehicles. We could repeat the above instead using the number of casualties, or any other metric of interest.