Skip to content

Clean your data

Transform Numeric Features

When dealing with numeric variables, you have the ability to manipulate the values of their outgoing edges through operations such as Add, Multiply, Subtract, and Divide, using either another numeric feature or a numerical value.

Example 1

Consider a dataset detailing product features, including a column labeled "box width [m]". To convert these units into centimeters, simply select the corresponding node and apply the Multiply operation with a Numeric Value of 100.

Example 2

Suppose you've just imported a dataset containing customer orders, with two numeric features in your taxonomy labeled "orders of product 1" and "orders of product 2". By selecting these nodes and utilizing the Add transformation with Headers, you can create a new node indicating the total number of product orders per customer.

Transform Node Labels

You can modify node labels using simple pattern matching in the following transformations:

Replace

The Replace agent allows you to update portions of all your selected nodes labels using pattern matching.

  • From: Specify the segment of the label you wish to update.
  • To: Define the replacement for the segment specified in "From".

Split

The Split agent generates a new set of nodes, each containing a portion of the labels from your current selection, split based on the input provided in the Value field.

Transform Feature Types

You can modify the data types of features within your taxonomy using the Convert Nodes agent located in the Transformations drawer.

String Transformation:

To convert a numeric or datetime variable into a categorical variable, select the corresponding feature node and apply a string transformation. This action converts the outgoing edges from the header node into intermediate nodes, with labels representing the weight of the edges.

For instance, imagine transforming a metric like "maximum vehicle speed," which is directly connected to nodes representing collision events with edge weights equivalent to the vehicle's speed during those events. After transformation the same header node "maximum vehicle speed" links to a new set of nodes with labels reading the vehicle speeds. These nodes, in turn, connect to the event nodes.

Numerical Transformation:

You can convert a categorical variable into a numerical variable when all categorical values stored in node labels are numbers.

For example, consider a categorical header "number of items purchased," linked to three intermediate nodes labeled "10," "5," and "8". These nodes connect to data nodes representing rows in your database. Upon applying a numerical transformation, the "number of items purchased" header node will directly connect to the data nodes with edge weights of 10, 5, and 8 respectively.

Note that for all transformations, enabling "In Place" applies changes directly to your current selection of nodes.

When the "In Place" box is unchecked, a new set of nodes will be created for each transformation and the new header node will be stored in the Home view under a "transformed_columns" header.

Cleaning using the Feature Extractor Agent

Note that the Feature Extractor Agent can also be used to clean your taxonomy via natural language queries such as "Remove the punctuation from the labels of my selected nodes". New feature nodes will always be created in this case.

Propagate

The Propagate tool can be used to transfer labels, edges, and URLs from one group of nodes to another, provided they are connected by edges and intermediary nodes.

  • Source: Nodes containing the URL label or the edge intended for relocation.
  • Target: Nodes set to receive the updated label, URL, or edges.
  • Header: Required for edge propagation, these nodes possess outgoing edges intended for transfer.

Note that Propagate can only transfer properties between nodes which are at a maximum of four hops (edge + node pair) away in the graph.

Delete Nodes

To delete nodes, simply select the nodes you wish to delete then use the Delete Nodes option located in the Node menu along the top tool bar or towards the bottom of your context menu


Last update: 2024-04-12