Investigating the Enron Email Corpus Part 2
Introduction
👈🏼 Last Session
We introduced Conode and how to get started on a project using Ask Your Graph agent to evaluate the graph content, retrieve examples of data within the chat, and also as a means by which to create visualisations such as time-series histograms, barplot of email frequencies which highlighted anomalies and trends in the data.
We then jumped into graph-based Conode to interact with the data directly, such as inspecting individual email nodes and cross highlighting between views to understand communication patterns of key individuals over time.
We also briefly introduced how to build on graphs by using the chat-based Extract Agent. This agent specialises in enriching a graph, generating new features that let us navigate through the email dataset to instances of emails containing a high stress, urgent tone.
📍 This session
We will build on the tools we introduced last time, to further enrich our analysis of the Enron communications.
Our aim will be to find emails with awareness of accounting irregularities by key board members such as Ken Lay, Jeff Skilling, and Andy Fastow.
We will perform keyword extraction and show how you can add human-made annotations, use the Feature Extract Agent to dive deeper into sentiment analysis, and then combine our new features with views on the data which should show patterns of suspicious and fraudulent behaviour.
👉🏼 Next session
We will identify Special Purpose Entities (SPEs) now universally known to have been used by executives to conceal fraud. The week after, we’ll cover how to build a graph from scratch using the Upload agent with Arthur Anderson as an example and explore Enron’s company culture.
You should then be comfortable both with the graph building, enrichment, and analytic workflows, and free to repeat everything we’ve done with any subset of the Enron Corpus, and future projects. All with speed, and no code!
Getting Started
This week we will remain in the curated graph of emails sent by the 26 C-Level Executives of the Enron Corporation. The graph is available on the Conode Homepage under Example Graphs.
The goal for today will be to filter down the graph to emails containing references to fraudulent financial activities of Enron. We will achieve this by finding emails with intersection of tone and topics that interest us.
Let’s begin our investigation by filtering based on a key individual whom we know (with benefit of hindsight) expressed concerns over the accounting practices of the company. Namely, let’s check out whether Sherron Watkins’ whistleblower activity is contained in the email corpus.
Identifying Sherron Watkins’ Emails Which Express Concern over Accounting Practices
1. Open the Enron Email graph and navigate to the ‘Ask’ toggle button located at the top right corner of your screen.
2. Ask the agent what it knows about Sherron Watkins:
Using the FROM field how many emails do we have sent by Sherron Watkins?
The agent should confirm there are 7 of her emails loaded in to this graph.
3. Let’s be bold and ask directly if these contain any references to concerns over accounting practices:
Which of the emails sent from sherron.watkins@enron.com express her concern around accounting? I'd like to read the email contents.
Here we see the agent has indentified a snippet from one of her early emails which expreses concern:
I've been horribly uncomfortable about some of our accounting in the past few years and with the number of 'redeployments' up, I'm concerned some disgruntled employee will tattle. Can you influence some sanity?
-sherron.watkins@enron.com
4. Sometimes the agent struggles to jump straight to the email containing her expression of concern. In such cases, we can simply ask for the Agent to add all of her emails to a new view, and read through the email content to arrive at the same proof point.
Conode enables users to identify key whistleblower communications from individuals such as Sherron Watkins.
Identifying Emails containing Financial and Accounting Terms
Let’s zoom out to the full graph of emails from C-Level Execs and assess whether we can efficiently find references to Complex Accounting Practices from across these data.
Keyword Search
1. Searching in View
With the histogram open we now have full visibility over the emails sent by key individuals, so we can simply search in the view for references to these key words using command-F (or ctrl-F) and the relevant emails will be selected. Here, we searched for the keyword ‘Suspend’.
Email nodes that contain that keyword in their label will be highlighted, and we can directly annotate the graph by creating a node group this selection using the Node menu (or right click context menu) → Create Node .
2. Extract Agent
Alternatively, drag and drop the email nodes into the Extract agent, then ask the following:
Run keyword extraction using python, against the following accounting terms: bid, fund, stock, mark-to-market, mark-to-model,insider trading, omit, debt, profit, audit, hedge
.
More examples of financial terms
cash flow, budget, turnover, credit, equity, liabilities, interest, margin, capital, financial health, financial performance, liquidity, profit, asset, balance sheet, accounting.
Tip!
-
Ask the agent to generate unique binary features for each term, and to use a python approach. This will simplify the structure of the graph output.
-
Note that you can can also remove any emails which did not get flagged as having keywords, by selecting the successor of “no features returned” and removing those nodes from the view.
Exploring the results of the Extract Agent
Once the feature extraction has been run, we can investigate the results in a few ways:
-
Lay out the results horizontally (using
Layouts
menu along the top) for better visibility. -
Open a table view (select the feature nodes then navigate to
New View
→Table
) -
Force direct
the view to cluster emails by the keywords they contain! 👇🏼 -
Or even run a quick count (using
Statistics
agent on RHS) to see which terms were referenced the most!
Conode enables the user to quickly find communication about complex accounting practices such as mark-to-market and ones expressing concerns about their use.
Identifying Emails Expressing Concerns about Financial Health
Now that we’ve extracted keywords related to accounting and finance, let’s filter down further based on the tone of the email.
1. Select all the emails which contained keywords of interest from open views, drag & drop them into the Extract Agent.
2. Run Keyword Extraction by telling the Extract Agent to:
Find emails discussing financial health or performance
3. Ask the Extract Agent to run LLM-based sentiment analysis, suggesting tones like negative, suspicious, urgency to collect any instances of dispute, urgency or strong language.
Conode enables users to identify emails that discuss concerns about Enron’s financial health and pressure on financial performance within seconds.
Clustering Emails based on Sentiment and Topic
Finally, let’s chuck in all the features we’ve extracted from the emails, as well as any indicators of negative tone of voice, into a new view along with the email nodes.
Using Layout
→ Force Direct
we can now explore clusters of emails containing the same combinations of topic and sentiment. Show the emails that intersect urgency and strong language for example.
Such data can also be represented in a table for easier visibility. Simply select all the feature nodes, then navigate from New View
→ New Table View
to visualise the data accordingly.
Visualising Communication Patterns
Let’s zoom out to the full graph of emails from C-Level Execs and use some cross-highlighting to find WHO was talking about WHAT, WHEN, and with what TONE. 🙂
Communication over Time
You’ll notice a new view (strip plot) in the example graph which displays the emails sent per individual over time. It may look a bit foreign to begin with, but will provide a useful insights to patterns of communications by key individuals.
How do I get that view?
The x axis is the Date each email was sent, while the y axis splits up the emails based on who sent them. Their height in the view corresponds to the order of how many emails they sent each. For example, David Delainey sent the most emails, followed by Louise Kitchen, etc.
Conode's e-review capabilities enable advanced searching and filtering across communication patterns.
Summary
In this session we have used our Extract agent more heavily to find keywords of interest about financial health and accounting, and conduct sentiment analysis to gauge email tones.
We looked into whistleblower Sherron Watkins' emails and identified one of concern over accounting practises, as well as visualised communication patterns using a helpful strip plot.