Fuse your data
Data fusion is the process of connecting siloed, complex and heterogeneous datasets to uncover valuable relationships and insights. Conode's Data Fusion Agent can connect all types of data in a unified, navigable structure.
How does Conode achieve seamless data fusion
Enterprise data often reside in siloed systems, come in heterogenous structures, and contain semantic mismatches which traditional integration methods struggle to address. Our AI agent is able to overcome these challenges by identifying overlaps between different and sometimes even within the same dataset(s), resolving these inconsistencies, and links relationships between them to produce a unified graph on which to begin analyses.
Auto Schema Fusion
To speed up the process of fusing structured table data sources, Conode is able to automatically fuse data that is imported from Postgres database connection. Once uploaded, Conode examines all the nodes and merge ones with identical labels, eliminating duplicates among their successors. The result is one unified dataset, devoid of any redundant variables.
Auto-Fuse Example
In the video below we see a dataset of reports to the police about personal-injury collisions that consists of 3 different, separate tables -casualities, vehicles, and collisions. There are a few shared features (also known as headers in the tables) such as accident_year between the datasets, and Conode has conveniently simplified this for us by identifying and fusing them to be the same across all 3 tables.
Group VS Merge
-
Group
creates additional predecessor nodes that are used to categorise the input nodes. -
Merge
only acts on the nodes in your selection, and reduces the number of nodes after having combined similar ones together.
In short, grouping organises nodes based on shared characteristics, while merging combines distinct nodes into a single representation, eliminating redundancies.
Group your Data
Data can be grouped in three ways:
Group By Meaning
Finds common themes between node labels and group them by proposed categories.For example if given nodes “Apple”, “Orange”, “Oak”, “Maple“ it would likely identify two groups: “Fruit Types” and “Tree Types”, and would add these 2 groups as features to a new view.
Group By Identical Label
Finds and group nodes with the exact same labels.String matching will take into consideration punctuation, numerical characters, capitalisation and white spaces, so “Conode” and “conode”, will not be grouped together.
Group By Position
Clusters nodes based on their spatial position in the view.When the number of groups is specified in the Advanced Settings section, the clustering agent will use a K-Means approach, but if no number is specified, Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is used.
Merge your Data
Merging reduces the nodes in your selection to a single node, which contains the combined properties of your selection -imagine 2 car lanes merging onto 1, now all the cars from both lanes are on this single lane.
Merge All Nodes
Combines all selected nodes into a single node.The label for the representative node becomes a combination of the labels of all the selected nodes. Note that the successors of the selected nodes get combined under this new single representative node.
Merge By Label
Combines all nodes with identical labels.Remember that merging only takes place on the nodes that you select, so if there are any hierarchical relations that you want to preserve, ensure that you do not select the predecessor node. The video below shows how merging results differ with slightly different selections.
Merge By Group
Combines all successor nodes into their predecessor.Even if the nodes in the group do not have the same labels, as long as they are connected to a predecessor node, they will be ‘absorbed’ into it with a merge-by-group action, and the representative node will no longer have its successor nodes.
Fusing in Action
You might find it useful to use both Group
and Merge
in succession, especially when there's duplicates in your data. In this example we’ve managed to reduce the number of scientific paper topics from 79 to 22 with this method.