A Georgia Tech researcher's deep dive into some half a million Enron emails could some day result in smarter messaging software.
The "Enron corpus", as the database of employee emails revealed during investigations into Enron’s massive fraud in the early 2000s became known, has turned into a gift that keeps on giving to researchers.
For Eric Gilbert, assistant professor in the School of Interactive Computing at Georgia Tech, the email database has shed light on which words are used most frequently in messages going up and down the corporate hierarchy (and yes, a conservative approach and many filters were used to ensure the types of messages examined would apply to a typical organisation, not just a fraudulent one).
The top 5 "upward predictors" found in the Enron corpus were 'the ability to', 'I took', 'are available', 'kitchen' and 'thought you would'. Words/phrases found most frequently in messages heading in the opposite direction were 'have you been', 'you gave', 'we are in', 'title' and 'need in'.
The value of identifying such words and phrases is that they can be considered reliable indicators of message types, and that kind of insight could be used to develop artificial intelligence-based messaging systems that automatically prioritise emails (say fast tracking only emails from other higher-ups into the CEO’s inbox).
"We have organisational charts, but they don’t tell the whole story," and the research could help map "informal power and reporting structures," said Gilbert, in a statement. "A classic example is the CEO’s administrative assistant: That person may not occupy a high box on the org chart, but he or she still has a large amount of influence."
Gilbert formally presented his research at the ACM Conference on Computer Supported Cooperative Work. Gilbert's track record also includes an effort to reduce email overload via a tool called Courteous.ly that signals would-be email senders about how busy recipients are.