Surfacing Evidence of Collusion and Audit Failures from Arthur Andersen

Introduction

In the first chapter of our investigation into the Enron Email Corpus, we queried our graph to evaluate graph content, retrieve examples of data within the chat, and generated visualisations of email distribution across sender and time.

In the following chapter, we enriched our analysis by using the Extract Agent to identify emails that referenced accounting irregularities and ones that exhibit fraudulent behavior by certain key board members.

In the next chapter, we applied the same set of tools to identify Special Purpose Entities -a key function that Enron employed to manipulate its balance sheet, including hiding debs and inflating profits. We saw how this analysis process was iterated to filter down our large 10k email dataset to a manageable, more meaningful subset.

Now, wrapping up the investigation in this final chapter, we will build a graph from scratch using the Upload Agent to investigate the role of accounting firm Arthur Anderson, examining its communications with Enron staff to surface any indication of collusion and or audit failings.

Try with us!

We’ve included this dataset as an example graph in Conode, so you can jump straight in and follow along as we query, enrich, and explore the graph structure to conduct our analysis on the Enron Corpus. If you are a new user, please follow the simple directions to create an account first and then feel free to utilize Conode to your content!

Hop into Conode

Step 1: Build Graph 🛠️

To gather all emails from the Postgres database which are addressed TO or FROM Arthur Andersen, we simply ask the Upload agent to import emails that have the auditing company’s domain “andersen.com”. Follow along!

1. Navigate to the Upload Agent on the right-side drawer.

2. Drag and drop the “ENRON_Email_Dataset” node from the Home view, into the PostgreSQL Connection Node box.

3. Prompt with:
Get me all the records where the emails "From" or "To" contain andersen.com..
The agent will return a SQL code where you can review and refine if necessary, and specifies how many rows (data points) it will import.

4. Once satisfied, hit ‘Run SQL Query’.

What am I looking at?

Upload Agent Output Table: Gives us an overall of all the data imported by the agent
Upload Agent Output: This view shows us the exact same data, just depicted in a graph structure.

👀 We can browse through the results table and verify that these emails were either sent from, or to, an email address containing the domain andersen.com.

Step 2: Prepare Graph 💪🏻

Now that we have imported all the emails we need for our investigation, we will perform a couple of data cleaning operations on the graph. The goal here is to curate and simplify the graph for our upcoming analytics.

Update Label Property of Email Nodes

You might have noticed that the labels of each data point has been set to the default ‘row_0’, ‘row_1’, etc. Since we know that each row represents an email, and that this can be found in the Body column/feature, we want to update our nodes’ labels from its current default ‘row_x’ to the content of the email body.

We do this using the Propagate Agent:

Search for the ‘Body’ feature using ctrl-F and select it.
Navigate to the top tool bar and Select > Successors Union.
Pop the selection from 1 & 2 into a new view.
Select all the successors and Get > Successors Union.
You should now have 3 layers of nodes.
Lay the nodes horizontally for better visibility.
Navigate to the Propagate Agent in the right-side drawer.
Input the Body nodes into “From”.
Input Row nodes into “To”.
Hit “Apply”!

Now take a look at the nodes and you’ll notice the labels of the individual email nodes have changed from the generic ‘row_x’ , to the body of text we wish to analyse! 💯

Clean the Email Addressee Field

A final graph preparation step we will do is to clean the ‘To’ feature. Why is this necessary? Reopen the Upload Agent Output Table and take a look 👇🏻

The ‘To’ features contains lists of email addresses, but what we expect were single instances of each email address so that we can see exactly who was emailing who. To achieve this, we:

Reopen the Home view.
Select ‘To’ feature and Get > Successors Union.
Select and input these into the Extract Agent with this prompt:
Extract the unique email addresses, ignore characters like <>.
Hit “Extract Features”.

Now we have unique email addresses, and since we don’t require the original nodes of lists of email address, we can simplify the graph structure by removing them.

Here’s how to do this:

Navigate to the Data Fusion Agent on the right-side drawer.
Set to Merge, Merge By Group, and check Merge in place.
Input the list of email addresses and extracted individual email addresses.
Hit “Apply”.

✅ Perfect! We have simplified the graph structure such that the unique email address nodes connect directly to our email nodes.

Step 3: Data Exploration 🔍

With a tidied graph, we can start to do some basic exploration to get an understanding of our data. Quite simply, we can generate some views to give us visibility into communication patterns of the Arthur Andersen staff.

Emails sent over time

Open the Home View.
Select ‘From’ and ‘Date (tstamp ms)’.
Navigate to the New View > Open View from Node from the top tool bar.
Customise view:
1. Format x-axis values to datetime by toggling temporal axis in the view menu.
2. Rename the view title, for example to Emails Sent over time by sender.
You can scale axes to zoom in spread out the data, by hovering mouse over an axis label and pinching fingers!

Who was emailing whom?

Select ‘To’ and ‘From’ fields from the Home view.
Add these two nodes to a new view.
Get all their successors and get all their successors.
Colour by group to keep track of which email address was the sender, and which was the recipient.
Remove the ‘To’ and ‘From’ nodes in the view.
Choose the force-directed layout from the top toolbar.

Step 4: Characterise Emails 📧

Classify emails by tone

Open the Emails sent over time, by sender view.
Select all the email (black coloured) nodes.
Drop them into the Extract Agent.
Prompt with:
Can you clasify these emails by the emotions expressed in them?
Run the featurizer!

Classify emails by topic

In a similar fashion as we had done for classifying emails by tone above, we just change our prompt for what we want to characterise our emails by.

Export enriched data

We’ve characterised our data and now say we’d like to export our results to share it with someone. This is a straightforward process:

Select header nodes for ‘emotion’ and ‘email_topic’ from earlier classification steps.
Select New Table View from New View up in the toolbar.
A table view will be generated; each row showing the email body, emotion and topic it was labelled with.
With the feature headers still selected, navigate to the same top toolbar and choose Export > Selected subgraph as .csv.

You can just as easily replicate this export process with the results of any extraction task that you do, capturing views as an image or sharing the entire graph with your fellow analyst. Learn more about the export and share process in our documentation here!

Step 5: Identify Collusion Patterns 🕵🏻

After some of the initial data exploration, we come back to the aim of the investigation -identifying instances or indications of collusion and or audit failings. You might have thought of this by now, “Let’s just ask the AI Agent again?” so yes, lets!

As before, drop our email nodes into the Extract Agent and prompt it to:
Can you identify which of these emails from auditors and executives contain inappropriate topics, and which contain potential collusion indicators?

From the results you can then use Human Oversight to take a look at the email contents. Notice how you can traverse back through the graph to check the source of the email, who sent it to whom, when was the email sent, and if you’d like to dig deeper into the sender, you can quite simply see all the emails he/she sent in the strip plot.

Enriching our graph with the Extract Agent allows us to hone in on a specific or a combination of features. In the following image, you can easily look into emails discussing Trading Strategy where humor was expressed.

Conclusion

In this workflow we managed to build and prepare a custom graph in minutes, visualise patterns of communication between all entities involved, extract tone, topics from the emails, and surface mentions of collusion and auditing that we otherwise would not have managed to without our AI agent!

Last update: 2025-03-25