Skip to content

Exploring Heterogeneous Data from Across the Globe

The Dataset

The following workflow uses data from the Autonomous Vehicle Sandbox which should be available in the "Example Graphs" section of your homepage. You can also access a copy of the graph by selecting the following link:

Open Autonomous Vehicle Sandbox in Edge →

The AV sandbox dataset contains a fusion of structured and unstructured datasets, static video data from accident hotspots, dash-cam videos, media articles, and simulated recreations of risky events to map out the tricky events and phenomena autonomous vehicles face on the road.

Let's begin our exploration with a high-level overview of this dataset, using the following spatial view: "Example Data Sources - Spatial View".

  1. Open the spatial view using the Navigation drawer on the LHS of your screen
  2. Zoom using your mouse wheel or the up/ down motion of two fingers on your track-pad
  3. Pan within a view by pressing down and dragging a right-click, or holding down the space bar and moving your mouse around in the view
  4. Hover your mouse over nodes to read their labels
  5. Count how many traffic incidents took place in different regions by highlighting groups of nodes and observing the count displayed on the bottom LHS of your screen.

Read, watch and explore each data source

Key details about these traffic incidents such as their location, data type and source, are stored in a Taxonomy such as that in the view "Example Data Sources - Taxonomy View".

What is a Taxonomy View?

The nodes in this taxonomy represent scenario features such as details of the local environment or type of entities involved in the incident, and connect via an edge to the individual scenarios which contain that feature. For example, the bus hazard node will point to all traffic incidents which involved a bus. We dub our taxonomy nodes “tags” if they are categorical (e.g. junction type) and “metrics” if they are continuous (e.g. the speed of a vehicle).

We use such taxonomies to organise and analyse our database. For example, if we want to find out which of our data points represent scenarios that involved a certain feature, we need only follow the following two steps:

  1. Highlight the taxonomy node of interest
  2. Press the down-arrow key on our keyboard (or the Select Successors button within the Node(s) menu) to select the successors of your taxonomy node.

All the traffic incidents which contain your chosen annotation will now be highlighted. We will use these two actions to now dive into examples of each data source in our database.

Within the "Example Data Sources - Taxonomy View" you will find the Data Source taxonomy category which identifies the origin of the data for each real-world traffic scenario in the spatial view. For example all blue-y purple nodes represent video from high-risk locations, green node contain written descriptions of incidents, and black represent collisions reported in structured accident databases. We can use the actions outlined above to find out where each type of data takes place across the globe, and then take a look at specific examples.

Tabular Databases

Written reports

Videos

Finding out everything there is to know about each data point

By reading node labels, watching their videos, and checking out their URLS we get a holistic and context-rich understanding of each individual traffic incident. But we can go one step further in finding out everything there is to know about our data points by gathering all their connected annotation nodes:

  1. Create a new view
  2. Copy and paste any node of interest into the new view
  3. Turn on the labels.
  4. Select the node, and gather all Predecessors using right-click -> Get -> Predecessors union. All features of your chosen collision will now appear in the view
  5. Lay out the nodes using force-direct or horizontal layout button - You can now see all features which were included in the original data source, as well as those automatically extracted using dRISK’s NLP or CV pipeline.

Alternatively, simply highlight your nodes of interest from the spatial view, and in the Select menu bar press Select Predecessors . All relevant scenario features will now be highlighted in the taxonomy view. You can move these aside and/or turn on all their labels for a quick overview.

Query the database

We have now seen that using a geospatial view alongside a taxonomy can give a full picture of our database while maintaining traceability to each data point's source. By exploring the content and connected annotations to each node we can understand exactly what they represent more deeply. Now let’s use Edge to quickly answer some questions about this dataset.

Where did collisions involving pedestrians at crossroads taken place?

To answer this, we will grab the pedestrian hazard and x intersection annotation, then select the intersection of their successors:

  1. Locate the annotations of interest using the search tool (cmd-f on Mac, ctrl-f on Windows) or by panning and zooming
  2. Highlight the annotations by selecting one followed by the other while holding down shift
  3. Navigate to the Select menu at the top of your screen, then press Select Successors Intersection

The scenarios which took place on a crossroad intersection, and involved a pedestrian hazard will now be highlighted in your spatial view. You can read the total number of such incidents in the node count at the bottom LHS of the view border.

We now know how many instances of this particular scenario type we have in our database, where exactly they took place. You can use the above steps to locate any subset of your dataset based on annotations you care about. For example, if you wished to filter by the incidents which originated from videos, or written reports, you can simply include that annotation in your search (Step 2).

At which type of junction do most collision occur in Missouri?

  1. Create a new view for our investigation and relabel it accordingly
  2. Gather all the incidents which took place in Missouri using a node label search, and paste them into our new view
  3. Highlight, copy and paste all the juction type annotations from the "Example Data Sources - Taxonomy View" view into our new view
  4. We now have everything we need, so we can close the Taxonomy and Spatial view. Remember you can always open them again using the Navigation Drawer on the LHS of your screen.
  5. To organise the scenarios node by junction type, we can use the Horizontal layout button from the View menu
  6. Highlight everything in your view (cmd-a on a mac, ctrl-a on Windows) and select the Bar Plot button in the Tools menu

It is clear that the majority of collisions took place not at a junction. There were 116 in total from this subset of 2018 FARS, and can see their exact locations on the map by highlighting each scenario node.

If you wish to return to this bar plot, you can now give the view a more memorable name and close it. Alternatively delete these views using the Delete View button in the View menu.

How many collisions involving 3+ vehicles took place in the UK?

  1. Create a new view for our investigation and relabel it accordingly
  2. Gather all the incidents which took place in the UK by highlighting them in the spatial view, and paste them into our new view
  3. Lay out the data points according to how many collisions were involved in the incident, by assigning the vehicle_count metric as the y axis
  4. Highlight all those with y-values of 3+

Using the above steps we quickly discovered that 413 of the scenarios recorded in this subset of 2018 STATS19 involved 3 or more vehicles. We could repeat the above instead using the number of casualties, or any other metric of interest.


Last update: 2024-06-07