Smart Investing: The Rise of Artificial Intelligence in Investing

Smart Investing: The Rise of Artificial Intelligence in Investing

The application of machine learning techniques has significantly impacted the investment and portfolio-building process, making it an essential tool for investors. By utilising machine learning, investors can construct well-diversified investment portfolios to enhance investment returns and reduce portfolio risks.

Author: FundFront–




  • Machine learning is a subfield of artificial intelligence that focuses on developing algorithms to analyse, learn and make decisions based on data input.
  • The use of machine learning algorithms allows for quick and efficient data analysis, potentially leading to unique insights and observations that may not be immediately apparent to a human observer. These algorithms can also work within established parameters to draw conclusions faster than a human brain.
  • By utilising machine learning, investors can construct well-diversified investment portfolios to enhance investment returns and reduce portfolio risks. The application of machine learning techniques has significantly impacted the investment and portfolio-building process, making it an essential tool for investors.


Investors are constantly looking for ways to optimise their investment choices. One common technique is systematic or quantitative analysis, which uses mathematical models to forecast the outcomes of different investment options. As an enhancement to this approach, artificial intelligence has gained popularity as a means of making better investment choices, allowing models to use data input from investors to train themselves. This method utilises advanced algorithms and data analysis to provide more accurate and efficient investment recommendations.


Artificial intelligence and access to large amounts of data are revolutionising the way investment decisions are made. Specifically, machine learning, a type of artificial intelligence, offers numerous advanced capabilities such as predicting asset prices, selecting the best assets to invest in, and determining the optimal mix and weighting of assets. It can also analyse news stories and calculate the risk of a recession. As a result, machine learning is becoming an increasingly valuable tool for investors.


Understanding the artificial intelligence and machine learning divide


Artificial intelligence is a term that encompasses various algorithms and technologies that enable computers to mimic human intelligence. Different approaches to artificial intelligence involve gathering, analysing, and utilising data in various ways to enable intelligent decision-making. Machine learning is a subfield of artificial intelligence that uses algorithms and statistical models to allow computers to learn and improve their performance on a specific task. This is achieved through the use of historical data inputs and outputs, which the machine uses to identify patterns and make predictions or decisions.


Applications of machine learning in investing


When it comes to investing, machine learning can be used in algorithmic or systematic trading to process vast amounts of data and make investment decisions faster and more accurately than humans. Using algorithms can also remove human bias from investment decisions, as emotions do not influence them.


Machine learning can also be used to improve retirement planning by creating personalised plans based on inputs such as short surveys and historical data. AI-powered Robo advisors, as an example, use artificial intelligence to offer investors personalised portfolios and financial advice. In addition, they provide digital services that are often cheaper than human advisors and have 24/7 access.


Optimising investment portfolios using machine learning


For investors, selecting the optimal assets and investment strategies for a portfolio that meets performance, risk, diversification, and other requirements is essential. The optimal portfolio construction process involves choosing diverse asset classes, investment strategies, and geographies. This process involves analysing a large amount of data from various investment products in terms of their absolute performance and their similarities and differences to each other and the broader capital markets. Investors can leverage machine learning to construct optimal portfolios designed to enhance investment returns by maximising diversification and minimising risks.


Types of machine learning techniques


There are various machine learning techniques available today. Figure 1 shows a sample taxonomy of machine learning that illustrates the investigative tools that investors can use to understand data, relationships, and behaviours. This article discusses the clustering technique and uses a portfolio of hedge funds as a case study to demonstrate its application.


ML Blog Figure 1


Figure 1: Example taxonomy of machine learning techniques


Clustering involves using algorithms to group similar items. These items could be anything from companies to products to animals. The groups are formed based on features, which are characteristics that can be used to measure the similarity between the items. Of course, the similarity measure is also essential, but more on that later. Figure 2 below shows a simple clustering with three clusters represented by yellow, red, and blue.


ML Blog Figure 2

Figure 2: Illustration of clustering


Case study 1: Using clustering to analyse hedge fund returns


Using clustering on individual hedge fund returns over time or on groups or categories of hedge funds can be very insightful. For example, once the clustering algorithm has been run, the output can be plotted in two dimensions using a scaling algorithm, resulting in a figure like the one below, which shows categories of hedge funds. Note that the data for this section is taken from the Credit Suisse Hedge Fund indices dataset (reference).


ML Blog Figure 3

Figure 3: Clustering diagram of hedge fund types


To explain how the chart above is generated: the performance of each hedge fund index over time is used (e.g., Global Macro, Managed Futures, etc.). The similarity of each possible pair of indices is then calculated. For example, the similarity of Global Macro to Managed Futures, Global Macro to Event Driven, etc. All possible pairs have a corresponding similarity value, which is then used in another algorithm called Multidimensional Scaling or MDS to produce a 2D map of the hedge fund types, as shown in the figure above. The distance between each pair of points reflects the calculated similarity values, allowing us to see which hedge fund types are most similar to each other and which are different.


As shown in Figure 3, it’s interesting (and somewhat reassuring) to see that most hedge fund types differ. There are few clusters in the data, and the clusters that do exist tend to be pairs of strategies rather than larger groups. The closest pairs are highlighted with red rectangles. For instance, event-driven risk arbitrage is somewhat similar to convertible arbitrage in terms of returns and behaviour. The same can be said of multi-strategy and global macro, as well as event-driven multi-strategy and the world equity market. However, even though these fund types appear close together on the diagram, they are not necessarily very similar to each other (it’s all relative). For example, note the orange multi-strategy fund discussed later in this article.


ML Blog Figure 4

Figure 4: Returns of Convertible Arbitrage compared to Event-Driven Risk Arbitrage


In the above figure, you can see some similarities between 1995 and 2000 and again from 2012 to 2020. However, some differences exist, particularly the significant drawdown in 2008. Moreover, the differences are even more striking for fund types like Emerging Markets and Managed Futures, as shown below. Therefore, finding many similarities between these two fund types would be difficult.


ML Blog Figure 5


Figure 5: Returns of Emerging Markets compared to Managed Futures


This analysis highlights similarities and differences systematically, and it shows that most of the leading fund types are reasonably different from each other. The following figure shows a comparison between the two fund types mentioned earlier and the multi-strategy version. Note that these are all far from each other on the MDS chart in Figure 1, indicating that they are supposed to be quite different. Looking at the figure, this certainly seems to be the case. The multi-strategy fund has the smoothest line of all three, which is also interesting. This shows that a multi-strategy fund (made up of many different types of funds) generally has smoother, more predictable returns than other fund types – a valuable characteristic and part of the reason why they are so popular with investors.


ML Blog Figure 6

Figure 6: Returns of Emerging Markets, Managed Futures, and Multi-Strategy


Case study 2: Using clustering to analyse asset returns


Another type of analysis that can be performed using clustering is examining how different asset types respond to market conditions. A list of the assets used is shown in the table below.


Asset Name
USA Equities
USA High Yield Bonds
USA Government Bonds
USA Real Estate
Europe Equities
China Equities


Table 1: List of assets used in the cluster diagram of Figures 7 & 8


This same set of assets is used to create the MDS plot below, valid for approximately 1996 to the present day, so around 25 years.


ML Blog Figure 7


Figure 7: Cluster Diagram (MDS plot) of a variety of asset types


As shown in the figure above, there is a main cluster to the right and then a set of spaced-out assets moving to the left. The clustering centres are around US-centric equities and bonds, while other country/regional equities (Europe and China), together with gold, are further away from the US cluster. Energy, which has the most extreme returns of all the assets, is far from all others.


The clustering diagram makes sense, but what else can we learn from this analysis? It’s possible to look at how each asset responds in crisis situations, such as the 2008 financial crisis, the 2012 European debt crisis, the 2020 COVID crash, etc. For example, examining the 2008 financial crisis from 2007 to 2010 gives the following MDS plot.


ML Blog Figure 8

Figure 8: Cluster diagram for asset returns during the financial crisis


It’s immediately apparent that the arrangement has changed significantly. Many equity markets are clustered together, particularly the US and Europe, as they exhibited similar responses during this time. However, there is some difference with Chinese equities, which saw a more substantial initial rise but a drawdown at the same time as the other equity markets. The other assets – bonds, gold, and energy – all experienced different responses compared to equities. Bonds and gold had relatively flat returns during the period, while energy experienced a significant increase, a large drawdown, and a similar return pattern to bonds and gold immediately after the crisis. This chart helps to highlight the lower similarity between the assets on the left and those on the right and suggests that there may be some diversification benefit during periods of extreme market stress.




This article introduced some applications of machine learning, specifically clustering, in asset management and discussed some of the primary considerations when using clustering. Clustering is an unsupervised learning process that is human-driven. Therefore, it requires careful consideration of the data, the process, and the end goal, effectively defining the key features, similarity measures, clustering methods, and what defines a “good” cluster at the end.


Interested in learning more about clustering? See below for a deeper dive


There are hundreds of clustering algorithms, so there are many different ways to split and group the items. Table 1 below shows some of the more common methods and their respective advantages and disadvantages.


Clustering Method Advantages Disadvantages
K-means Simple and efficient. Need to define several clusters in advance.
Hierarchical Easy to implement, doesn’t require specific number of clusters in advance. Can be slow to run. Can be sensitive to noise and outliers in the data.
Density Don’t need to specify the number of clusters in advance. Works well with noisy data. Results depend on the distance measure.


Table 1: Comparison of some standard clustering methods


Most simple analyses start with the k-means algorithm. Although we need to define the number of clusters in advance, there are ways of analysing the data and inferring the optimum number of clusters to use with the k-means algorithm. More complex analyses can potentially use the other two methods or one of the several hundred different algorithms available.


One inherent challenge in the process of clustering is the coupling between the context and the act of clustering itself. The context is defined by the questions of why the user wants to cluster the data and what they want to do with the resulting clusters. It is a problem-specific human-driven operation, so it can be challenging to find the right combination of techniques and analysis methods to apply to a particular problem. Another way to think about this is that different clustering methods will produce other clusters for the same data and problem, but not all arrangements are equally helpful. Some argue that clustering is as much an art as it is a science, but we leave further discussion and debate on this idea to other authors.


Measuring Similarity


One of the critical elements of the clustering process is the definition or choice of the similarity measure used. In the above analysis (and for both examples), we use a similarity score that provides a measure of the time-varying similarity of each set of asset returns.


It’s possible to use other measures, and there is a great deal of research in this area. In contrast, looking at conventional comparisons of asset returns, the most common measure used is a correlation. Correlation is a good choice for a simple, easy-to-understand analysis of returns. It produces a value that can be used as a measure of similarity, and several more complex variations on the correlation coefficient can be used. It is a linear measure, but in the case of time series, non-linear relationships can provide helpful information. This is the main reason we use the aforementioned proprietary measure, which considers non-linear information and, in our opinion, produces higher-quality clusters. The following table summarises some of the potential similarity measures that can be used in clustering problems, particularly for time series. There is no “best” similarity measure; some work better than others for specific domains and problem types.


Distance / Similarity Measure Notes
Euclidean Distance One of the simplest distance measures. Easy to understand and fast.
Correlation A relatively simple calculation. Is strictly speaking not a ‘measure’ as it does not satisfy the triangle inequality.
Dynamic Time Warping A signal processing technique that forms a non-linear mapping between two time series to measure the distance between them. Can work better than the above for certain problems but is also slower to compute.


Table 2: Some example similarity measures









Disclaimer – FundFront Ltd., does not provide advice and the information in this article should not be construed as such. FundFront Ltd., is registered in England and Wales, and its Registered Office is at C/O Zeeta House, 200 Upper Richmond Road, London, United Kingdom, SW15 2SH. Company Number: 13711456. FundFront Ltd., is an Appointed Representative of Brooklands Fund Management Limited, which is authorised and regulated by the Financial Conduct Authority with the firm reference number 757575 and the Securities and Exchange Commission with the registered number 286221.

Make possibility reality

Become an IA FinTech Member
and see where it takes you.

Login to your account