Finding Alpha Needles in Unstructured Data Haystacks
By Barry L. Star, CEO, Wall Street Horizon
The challenges of working with unstructured data came into sharp focus last year when a self-driving car was involved in a fatal crash on a Florida highway. The National Transportation and Safety Board ultimately found that the driver-assistance system was not at fault; federal regulators warned that they can only rely on these systems to handle some situations that occur on the roads.
The Wall Street equivalent changes only a few words of warning. Traders can only rely on these systems to handle some of the situations that occur in the markets. While not as catastrophic as a collision with an 18-wheeler, a bad trade based on a poor assessment of unstructured data could cost millions. There lies untold value and potential alpha in unstructured data. The key questions are how best to identify, extract and leverage that data to improve trading.
Unstructured data does not follow a defined data model. Unlike the tidy rows and columns format of structured data, unstructured data is free-form content that people generate. It’s data from email and text messages, blogs, videos, podcasts and chats. It’s what does not neatly fit into spreadsheets.
Gartner and other market research firms agree that unstructured data comprises the lion’s share of most organizations’ informational assets – somewhere in the 80% range. Yet we’re still doing a poor job of mining the value out of this data.
Wall Street firms want to leverage unstructured data to generate profits and avoid losses. They seek to wring out the value to gain a clearer picture of their markets, spot patterns and anticipate developments more effectively – taking faster action to seize opportunities while sidestepping risk. Firms ultimately have three choices. They can build the software themselves, try a shortcut that skips the hard translation and analytics by using keywords to attempt to gauge sentiment, or buy/subscribe to a third-party.
Read the full article here.