Skip to content

Extract features

Feature Extraction Agent

The feature extration agent can be used to mine information from text data. Either by using a prompt for guidance or by finding anything interesting. This can be used to featurize unstructured datasets (e.g. categorizing the genre of movie reviews) or to perform general text processing tasks (e.g. converting between US state names and codes).

This can be used in two modes. The On selected button will extract features from the labels of all selected nodes (or their successors if only one is selected). The On features button will treat the selected nodes as features and extract features from their data nodes using the header values as data. This mode can be used to combine multiple features. For example, to combine street name, number and city into a full address.

The resulting view will showcase the extracted features alongside your chosen input nodes. This currently uses OpenAI's GPT4 model as a wrapper around dRISK Edge.

Example 1: Using the On selected mode to extract movie genres from descriptive plot overviews

Example 2: Using the On features button to combine address components.

Geospatial Coordinate Extraction

If your data contains address information, the geocoding agent can be used to extract the corresponding latitude and longitude coordinates for each address, facilitating geospatial visualization of your dataset

Select the nodes whose label contains the address, then select Extract Coordinates from the Feature Extraction menu in the agents drawer. The agent will search for the most representative latitude and longitude coordinates for each address and annotate the nodes accordingly. Alternatively, if there is an address feature node then switch to the As features mode and select the single feature node. Each unique address will be geocoded and the coordinates applied to the data nodes.

Two views will open:

  • The "geocoded" taxonomy view displays the new latitude and longitude tags alongside the original location features.
  • The scatter plot "Spatial view latitude vs longitude" visualizes your address features.
Transferring Coordinates to Data Nodes

Note that the address feature nodes are now annotated by latitude and longitude, but the data (row) nodes which the address features point to will not yet be annotated themselves. To position these data nodes at the right location, either utilize Propagate to transfer edges from the address nodes, or directly send them into the 'Spatial view' scatter plot and use the align by neighbours layout to send them to the right position, as shown in the following video.

Clustering

Create new features using the clustering agent, which will either act on all nodes in your active view when none are selected or on your selection. When the number of clusters is specified the clustering agent will use a K-Means approach; when no input is given DBSCAN is used.

Describer

The Describer tells you the predecessor nodes that best “describe” your selection of nodes. By default, this description is by comparison to the other “background” nodes in the active view; open the settings by clicking the button in the bottom right for more advanced options.

Describing predecessors are presented in a table alongside their importance: large values of importance, whether positive or negative, indicate that a given predecessor is a good description of the selection. Positive importance values indicate that the predecessor is likely to connect to the selection of nodes or does so with a greater edge weight than the background nodes, negative values indicate the converse.

To see all of the describing predecessors and their importances in a view, click “Create View”.


Last update: 2024-05-21