In an effort to beat benchmarks, investment companies are looking at the entire dataset of Twitter, known in the business as the “full firehose”.
Few people can manage the sheer scale and storage challenges that come with it, not to mention the costs. You could start searching the social media stream using a hashtag approach.
Peter Hafez, chief data scientist at data analytics firm RavenPack, knows how tricky it is to process large volumes of noisy unstructured data.
He remembers a small hedge fund, which tried the hashtag approach on “gold”, hoping to create a gold sentiment indicator to trade the related futures contracts.
Unfortunately, their algorithms didn’t take into account how often gold is mentioned during larger sports events like the Olympic Games; and they ended up not being particularly successful.