The rise of alternative data has given quants a new source of information, but how do you quantify text-based or other multimedia generated data? And more importantly, how can you translate that into a strategy? Sylvain Forté, CEO of SESAMm, and Guillaume Garchery, Head of Quantitative Research and Development at La Française Investment Solutions, presented how they collaborated to create a functioning framework and how they avoided irrelevant methodologies at QuantMinds International.
Data is a very real fact of modern life. Everything we do – every time we browse the internet, buy something online or post on social media – we create a digital record, and one that is alive with possibilities for asset managers.
While the sheer volume available has the potential to give money managers the edge they seek, it also presents many challenges.
How can such vast swathes of information be captured, stored, analysed, processed and used to create predictive indicators and investment strategies?
Harnessing the power of data and making sense of it is set to separate the winners from the losers, as finance continues to be disrupted by tech. At QuantMinds International in Vienna, delegates learnt how one asset manager teamed up with a start-up to create a machine-learning framework that could add a new dimension to its investment decisions.
“The idea was to go from signals to actual trading strategies,” said Sylvain Forté, CEO of fintech company, SESAMm. “The aim was to exploit alternative data to build robust forecasting models based on machine learning or systematic investment strategies.”
Using technologies like artificial intelligence, natural language processing and machine learning matter because they can help to quickly decipher the large volumes of data that are now available. Sifting through and recognising complex patterns in the data and using that to inform strategy could result in better trades and ultimately lower transaction costs.
“You need a framework to mine and extract the value of more and more data sets,” said Guillaume Garchery, Head of Quantitative Research and Development at La Française Investment Solutions in Paris, who worked with SESAMm on the project.
With companies easily amassing terabytes of data each day, it’s becoming ever more important to recognise complex patterns and turn them in to something useful. Garchery and Forté set out to answer several questions, including what added value the data set would create for a specific use case and whether the chosen methodology was relevant. They were also mindful of the need to create something that was robust, that could be trusted and that avoided overfitting.
“We wanted to be as agnostic as possible to the data type,” said Forté. “We wanted to be able to absorb numerical data from all sources, including natural language programming and images.”
They also aimed to compare their results to market data as well as random data and for the technology to be able to be massively distributed. The outputs would be a series of different statistical scores – or indicators – rather than a complete strategy, which would give insight into the explanatory power of each alternative data set.
A simple example of how the technology works in practice, looked at the pound exchange rate against the dollar and the stock price of coffee company Starbucks. Using four years’ worth of history, 650,000 market data points were incorporated, including volatility, as well as 106 million data points from natural language programming.
Scanning millions of articles and messages from different sources, the team analysed the language used sentence by sentence, picking out the sentiment and emotions and converting the results to data that could then be processed using machine learning techniques.
“You get very different results if you analyse the whole article, or just the headline,” Forté explained. “What we can see with the sentences is the sentiment.”
The result was a framework for time-series analysis that can help guide investment decisions and now they are moving toward using this approach for other factors like looking into market shocks, qualitative risks and environmental and social governance factors.
Analysed in this robust and scientifically very rigorous way, the data and the methodology is proven to be trusted, something that’s been questioned in the past. In this way, the work represents one way that data science and natural language processing are set to become central to the analysis of financial markets.
“We have a system that is very complete, the universe of possibilities in data testing is wide open and it takes much less time to evaluate the data sets,” said Forté. “So we’re scaling the capability very quickly and the goal is to have the complete, real-time, on-demand system on billions of articles and messages.”