Creating a Systematic ESG Scoring System: Related Works

15 Jun 2024


(1) Aarav Patel, Amity Regional High School – email:;

(2) Peter Gloor, Center for Collective Intelligence, Massachusetts Institute of Technology and Corresponding author – email:

Existing ESG-related research falls under two main categories. Some papers aim to correlate ESG performance with financial performance and see if a company’s Corporate Social Responsibility (CSR) can be used to predict future stock performance (Jain et al., 2019). Other papers propose new data-driven methods for enhancing and automating ESG rating measurement to avoid existing fallacies/inefficiencies (Hisano et al., 2020; Krappel et al., 2021; Liao et al., 2017; Lin et al., 2018; Shahi et al., 2011; Sokolov et al., 2021; Venturelli et al., 2017; Wicher et al., 2019). This paper will fall into the latter category.

Since many firms publish sustainability reports on an annual basis, many researchers use this content for analysis. This is typically done using text mining to identify ESG topics and trends. In order to parse out and leverage this data, researchers have created classification models that can classify sentences/paragraphs into various ESG subdimensions (Liao et al., 2017; Lin et al., 2018). Additionally, some researchers have used these text classification algorithms to analyze the completeness of sustainability reports (Shahi et al., 2011). This is because companies sometimes limit disclosure regarding negative ESG aspects within their filings. Both tools can assist in automatic ESG scoring using company filings, which increases access for companies without ESG coverage.

However, there are deficiencies in solely relying on self-reported filings for analysis since it fails to consider omitted data or newer developments. As a result, researchers have been testing out alternative methods to solve this. For instance, some researchers utilize Fuzzy Expert System (FES) or a Fuzzy Analytic Network Process (FANP), pulling data from quantitative indicators (i.e., metrics provided by the Global Reporting Initiative) and qualitative features from surveys/interviews (Venturelli et al., 2017; Wicher et al., 2019). Others collected data from online social networks like Twitter to analyze a company’s sustainability profile. For example, some used Natural Language Processing (NLP) frameworks to classify Tweets into various ESG topics and determine whether they are positive or negative (Sokolov et al., 2021). Furthermore, some used heterogeneous information networks that combined data from various negative news datasets and used machine learning to predict ESG (Hisano et al., 2020). Finally, others explored the viability of using fundamental data such as a company’s profile and financials to predict ESG (Krappel et al., 2021). Overall, all these methods aimed to improve self-reported filings by using more balanced, unbiased, and real-time data.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.