Tuesday, September 26, 2023

A Grassroots Initiative to Bridge Practice, Education, and Research.

Unlocking Deeper Insights into Customer Engagement Through AI-Powered Analysis of Social Media Data

P.K. Kannan
Robert H. Smith School of Business, University of Maryland

Yi Yang
Hong Kong University of Science and Technology

Kunpeng Zhang
Robert H. Smith School of Business, University of Maryland

P.K. Kannan, Yi Yang, and Kunpeng Zhang describe their method for using AI to map and analyze the structure of social media engagement, which spans thousands of brands in different categories. By using this method, managers can extract valuable information about customers, trends, ties to other firms, and impending opportunities or threats.

Brands use social media channels to engage with their customers by posting content informing them of new products and services, requesting feedback, and increasing sales through influencer marketing. Users interact and engage with these posts by liking, commenting on, or sharing the content.

Using artificial intelligence (AI), companies can analyze the social engagement structure that spans several thousands of brands in different product and service categories, anticipating trends and opportunities in the market and gathering intelligence on competitors. Managers can use this analysis to engage customers and grow their business.

In our paper, “A Deep Network Representation Learning of Social Engagement,”1 we investigated companies’ less common uses of social media: anticipating trends and opportunities in the market and gathering competitor intelligence. Specifically, we focused on how AI can help firms to understand the structure of social engagement and draw insights from it to engage potential customers.

A user could interact with any number of different brands on these sites and a brand could have similar interactions with any number of users. If we consider each interaction between a brand and a user to be a link between them, social media interactions can be characterized as a network of links between brands and users, highlighting the users who different brands have in common and the brands that different users have in common.

We define these nodes of brands and users, and their links, as comprising the social engagement structure on a social media platform. Firms can analyze this structure, using deep learning auto-encoders to process the network data and identify the competitive structures – those that exist within the brands’ markets as well as those that cross the boundaries into other products and brands.

Relevance of the Social Engagement Structure

The information inherent in the social engagement structure on social media platforms can be useful to companies. At a minimum, it can help them to identify other brands with which they share common users and fans. These brands could be either substitutes or complementary.

An automobile company could identify its close competitors from the engagement structure, spotting other brands of automobiles that users also like or with which they engage. These brands share the same group of users and so could be substituted for one another. A hotel that highlights an airline with which it shares many common users is pointing out a complementary link which could lead to opportunities for cross-promotion and brand tie-ins.

Beyond such obvious conclusions is a wealth of latent information that our AI unearths from the overall network and that can be immensely valuable to companies.

But beyond such obvious conclusions is a wealth of latent information which our AI unearths from the overall network and that can be immensely valuable to companies.

The social engagement structure allows us to identify product- market boundaries, that is, brands that compete within a market, and changes in those boundaries that occur over time. Identifying the strength of competition between brands within the product market can inform strategy for next-generation product design, product positioning, new customer acquisition, and pricing and promotion decisions.

As technology advances, however, the product-market boundaries themselves are changing. These changes open the door to competitive threats and business opportunities outside the previously narrow boundaries of markets. The social engagement structure will immediately reflect these changes, allowing companies to identify both threats and opportunities.

And product-market boundaries are extremely fluid. Not only do technological developments upend markets but new products also change their structure. Tesla, which initially entered the high-end automobile market with an innovative fuel technology, has since rolled out products for the lower-end market, changing competition in that market as well.

Firms also acquire other firms and enter new markets. Amazon crossed market boundaries when it acquired Whole Foods and entered the offline grocery product market. In many such situations, product-market boundaries are defined on the basis of SIC (Standard Industrial Classification) and NAICS (North American Industry Classification System) industry classification codes which cannot provide adequate indicators for emerging threats and opportunities.2 The social engagement structure, however, offers a more dynamic view as customers react to these changes, and thus provides more foresightful information on these fluid boundaries.

Our AI-based technique differs from prior methods in that it scrutinizes disaggregated individual relationships between users and brands within the social engagement structure. This distinction is important when the product market is not specified in advance because it captures relationships between a diversity of brands in a range of markets.

Identifying such latent relationships is the essence of our approach.

For example, if a user likes American Airlines and Marriott, while another user likes Delta and Marriott, the indirect relationship between American Airlines and Delta is captured through Marriott, which is in a different market. Identifying such latent relationships is the essence of our approach.

Data Collection

We collected the social engagement data from Facebook because it is one of the largest and most representative social network platforms. We began by using Socialbakers, a social media marketing website, to obtain a list of U.S. brands with the most followers. Public fan pages on Facebook are categorized into several groups on Socialbakers, including brands, celebrities, community, entertainment, media, place, society, and sport. For this analysis, we focused on brands. Every brand on Facebook selects its category from a set of predefined options when it creates its public page. We included 5,478 different brands, spanning twenty-five categories.

We used Facebook Graph API to download from each brand page all visible activities including posts by the brand administrator as well as posts by users, including comments and likes on brand posts. To ensure privacy protection, we did not download any user profile information nor did we examine the content of user comments. All engagement activities were represented by unique user identifiers, regardless of whether the user had a public or private Facebook profile, and by brand identifiers.

The data for this study covered the period between January 1, 2017 and January 1, 2018. In total, we used 106,580,172 user- brand engagement activities from 25,992,832 unique users.

To ensure data quality and robust results, we designed a set of rules to filter out fake users and their activities.3 We then constructed a brand-user network including all of our selected brands and all users engaging with them. We considered a brand node and a user node to be connected if the user engaged with the brand. The strength of a link between a brand node and a user node is defined by the frequency of engagement.

AI-Based Deep Network Representation Learning

The AI system we use is a network representation tool, also known as network embedding, whose goal is to compactly represent the underlying structure and relationships in a network.

The system uses an autoencoder whose mapping function translates the original network data of brands and users into a low dimensional form, preserving the brand proximities, brand-user proximities, and user proximities as much as possible.4

Specifically, in encoding these brands and users, we seek to preserve two network structures: proximity to neighbors and proximity to neighbors of neighbors.

The autoencoder creates a bottleneck through which the input data passes and, using this encoded data as input, constructs a representation of the original data while reducing unuseful complexity. We think of this as the autoencoder training the network to ignore the noise in the data and focus on the primary latent structure in the network data, a method comparable to principal component analysis (PCA) or multi-dimensional scaling (MDS).5

For purposes of developing market structure, it is this bottleneck-reduced encoding that interests us. It allows us to identify and visualize the product markets downstream (See Figure 1).

Figure 1. The Overall Approach
Figure 1. The Overall Approach

Visualizing the Structure

We next graphed the global structure of the brands in our Facebook social engagement data. Each data point in Figure 2 represents a brand belonging to one of the twenty-five Facebook categories, with each category indicated by a different color. The closer any two brands are in the figure, the more similar their brand representations are. The two dimensions of the map illustrate the contrast between service- to goods-focused categories (x-axis) and retail to technology-focused categories (y-axis).6 The map reveals some salient clusters of categories.

Figure 2. The Global Market Structure7
Figure 2. The Global Market Structure7

Zooming in, in Figure 3, we see non-luxury domestic and imported automobile brands such as Toyota, Nissan, and Mazda, as well as automobile accessory brands like General Tire, Auto Alliance, and Auto Parts in one area. Meanwhile several luxury automobile brands including BMW, Mercedes-Benz, Audi, Tesla, and Maserati are clustered in a different region of the map, with other luxury brands like Channel, Gucci, and Cartier. This separation between luxury car brands and non-luxury car brands further confirms that our approach captures brand representation in multiple dimensions, not just in industry verticals like product categories, but also in price, luxury, and more. The strength of our methodology lies in how easily it expresses all these relationships on a single map.

Figure 3. Automobile Brands and Their Vicinity
Figure 3. Automobile Brands and Their Vicinity

Another zoomed area, Figure 4, shows airline brands as well as some hotel and cruise brands that have complementary relationships with airlines. These maps validate our methodology, revealing the core brands that make up an industry as well as the overlaps between markets. Disney Cruise Line, Hyatt, and Southwest Airlines, for example, appear near each other in the circled area, indicating the possibility of joint promotions. This map also shows some customer segments.

Figure 4. Airline Brands and Their Vicinity
Figure 4. Airline Brands and Their Vicinity

Finding Proximal Brands

Visual mapping provides a gestalt of all 5,000-plus brands in the aggregate, but it does not show the distance between the brand vectors in the reduced space. Since identifying proximal brands for substitute or complement analysis is critical to marketing decisions, we identify proximal brands from the perspective of a focal brand. This perspective reflects the various relationships in the social engagement space, from substitute to complement.

In Table 1, we chose United Airlines and Southwest Airlines from the airlines category and Audi USA and Nissan from the automobile category because these brands are generally considered to have different consumer bases and to belong to different sub-markets. Using each of these as a focal brand, we found their top ten proximal brands based on similarity between their vectors.

Focal brand United Southwest Airlines Audi USA Nissan 
Rank 1AmericanJetBlueMercedes-Benz USAMazda
2DeltaFrontierBMW USAToyota
3LufthansaAllegiantLand RoverVolkswagen
4SouthwestDeltaLexusKia Motors America
5AlaskaAlaskaChevrolet CamaroSubaru of America
6All NipponUnitedMaserati USAChrysler
7Air ChinaAirfarewatchdog Kawasaki USAFIAT
8LATAMAmericanFirestone TiresJaguar
9Air New ZealandVirgin AmericaTeslaAlfa Romeo
10Airfare-watchdogHyattRam TrucksKLIM
Table 1. Proximal Brands 

These proximal brands reveal several interesting points. First, that our method reveals the latent characteristics of specific brands. Southwest Airlines is generally considered a lower- budget airline than United. An examination of proximal brands reflects this difference. The brands closest to Southwest are JetBlue, Frontier Airline, and Allegiant, while those closest to United are major domestic and international airlines, including American, Delta, Lufthansa, All Nippon, Air China, LATAM Airlines, and Air New Zealand. Second, that we can observe asymmetric competition. Southwest is the fourth closest brand to United, while United is the sixth closest to Southwest.

Third, by analyzing social engagement structure, we discover that brands close to each of our focal brands represent different industries. A brand called Airfarewatchdog, a flight deal finder with over a million followers on Facebook, is close to both United and Southwest Airlines. Traditional market analyses could simply ignore this brand, since it is not an airline.8 And Southwest Airlines is closer to Airfarewatchdog than United is, which may indicate that fans of Southwest Airlines are more likely to use a deal finder before purchasing flight tickets. Airfarewatchdog could thus be a complement to Southwest directing customers to Southwest’s cheap flights, or it could compete with Southwest by directing customers to other airlines.

Our social engagement mapping can help managers to spot brands outside their product market that are close to a given brand.

Identifying Opportunities and Threats

Our social engagement mapping can help managers to spot brands outside their product market that are close to a given brand, and thus see the opportunities and threats posed by different brands. Consider the airline product market as an example.

Our analysis identifies Disney Cruise Line and Hyatt, both outside the airline market, as proximal to Southwest but not United. This proximity arises from the greater number of users in our dataset who liked both Southwest and Hyatt (2,709) (Segment 2 in Figure 4) compared with those who liked both United and Hyatt (954). Similarly, a greater number of users liked both Southwest and Disney Cruise Line (3,050) (Segment 1 in Figure 4) than liked both United and Disney Cruise (729).

These proximities reveal opportunities for Southwest to target users who liked Disney Cruise and Hyatt on social media. Southwest could cross-promote with Disney Cruise and perhaps Hyatt on each other’s websites. They could also launch coalition loyalty programs to take advantage of their common user base.

From the viewpoint of Hyatt’s competitors, these possibilities could be threats, so the same information could help them to take preemptive action. These opportunities and threats are hard to spot using pre-specified categories, so they are difficult, if not impossible, to identify through other means.

When Brand Relationships Change

The structure of markets evolves over time but can change rapidly, especially under an unexpected shock. By learning adaptively from such changes, our method could provide useful insights to practitioners. We analyzed how the structure of markets changes under the influence of outside shocks, using Amazon’s acquisition of Whole Foods and Tesla’s introduction of the Model 3 as case studies.

We used data from three months before the event was announced and three months after to calculate changes in the distance between the focal brands (Amazon and Tesla) and other brands selected from the same category.

We hoped to discover how a major event changes the focal brand’s relationship with other brands. We selected several brands from the retail and e-commerce category for comparison with Amazon-Whole Foods, and several automobile brands for comparison with Tesla.

We investigate the AI’s understanding of the change in proximity between focal brand i and target brand j after the event. We used the cosine similarity9 of its before and after representations to measure this change: CosSim(after) – CosSim(before). Positive numbers indicate an increase in similarity while negative numbers indicate a decrease.

Figure 5. Changes in Cosine Similarities for Amazon
Figure 5. Changes in Cosine Similarities for Amazon

Amazon Acquires Whole Foods

Amazon acquired Whole Foods in June 2017. The event had a significant impact on grocery and retail industries. At the time, many believed that Amazon’s plan for Whole Foods was to fulfill online orders by entering the offline grocery delivery business. (Amazon and Whole Foods ran separate Facebook pages.) After the merger, the system shows that Amazon moved closer to retail brands, while its proximity to other brands decreased slightly.

In Figure 6, for example, the cosine similarity between Amazon and Lowe’s Home Improvement decreased by 0.184. Meanwhile, the cosine similarity between Amazon and other supermarket retailers increased. The proximity of Amazon to Whole Foods increased by 0.202, and that between Amazon and Kroger by 0.165.

Figure 6. Changes in cosine similarities for Tesla
Figure 6. Changes in cosine similarities for Tesla

Our model showed that Amazon even moved closer to Walmart, indicating that Amazon’s competitive market landscape has shifted. We also found that, after the Whole Foods acquisition, the number of common users who interacted with both Amazon and Whole Foods on their Facebook public pages increased.

In short, after Amazon acquired Whole Foods, online social media users who were Amazon’s fans paid more attention to Whole Foods, and users who were fans of other supermarket brands engaged more with Whole Foods as well. The deep autoencoder captures these dynamics and updates the brand representation accordingly.

Being acquired by Amazon had an impact on the market structure of Whole Foods, too. When we examined Whole Foods as the focal brand and calculated the change in its proximities to other brands before and after, we found that Whole Foods’ proximity to other retail brands such as Target, Walmart, and Best Buy increased.

Perhaps unsurprisingly, Whole Foods’ proximity to Amazon rose the most, with the increase in the number of users who liked both. Meanwhile its proximity to supermarket brands such as Goya Foods, Enjoy Life Foods, and HelloFresh decreased slightly.

The magnitude of change in Whole Foods’ proximity to other brands was smaller than that of Amazon. This difference seems to indicate that the acquisition affected Whole Foods less, leaving it still positioned near other supermarket brands, while Amazon expanded closer to the grocery retail category. While this analysis is retrospective, it demonstrates how our approach offers managers a series of snapshots of the structure by which to measure changes in a brand’s relative position, allowing them to identify potential shifts in the market structure as social engagement with these brands changes.

Suppose, for example, that the leaders of supermarket chain A observe that Amazon is moving closer to its position on the map. This shift may indicate that Amazon is getting more likes or comments from A’s customers. Since one motivation for liking a brand on Facebook is to receive some benefit, like a coupon or discount, it could specifically indicate that Amazon is conducting effective promotional marketing campaigns on social media.

Whatever the underlying reasons, the increasing proximity of Amazon on the brand map can give A’s marketing managers early warning of the potential threat.

Tesla Announces the Model 3

Tesla sells two types of sedans, the Model S and the Model 3. The Model S, released first, is a luxury premium sedan with a greater range of acceleration and customization options. The Model 3 is a more affordable mass-market electric vehicle. The Model S can cost over $100,000 depending on the configuration, while the Model 3 costs about $35,000. Our method reveals that, after unveiling the Model 3, Tesla moved further from luxury car brands and closer to non-luxury car brands.

In Figure 6, we can see in that the cosine similarity between Tesla and the luxury car brand Maserati decreased by 0.209, and that the proximity between Tesla and other high-end or luxury car brands, such as BMW, Mercedes-Benz, Audi, and so forth, changed in similar ways. Meanwhile, Tesla drew closer to Kia, Mazda, and other more affordable car brands.

Potential Uses of the Social Engagement Structure

As brands increasingly use social media to engage, new opportunities arise, allowing them to interact with customers, better understand their preferences, and serve them better. They also see new opportunities to use data describing engagement with the broader market to learn about customers’ affinities for brands both inside and outside their own product markets.

In addition to mapping the broader social engagement structure, our methodology can also produce insights that will help managers decide which segments of their customer base to target.

Returning to Figure 4, we can locate user groups on the same maps we use for brands, identifying those most receptive to marketing from nearby brands (Segments 1 and 2). Disney Cruise Line and Hyatt are outside of the airline market but are proximal to Southwest and not United. Identifying clusters of users who are also proximal can help these companies to target them accordingly.

Another important strategic use of our market structure maps is to identify competitors and complementors across industries and track how these relationships change over time. We provide a more dynamic structure than previous methods, rooted in actual customer or user social media activity.

Our market structure map is also more prescient than earlier methods, predicting emerging competition and complementors by focusing on each stage of the customer’s purchase journey, through many categories. By understanding customers’ affinities to brands without being confined by product category, managers can keep alert to impending opportunities and threats.

The power of our method lies in its ability to capture the dynamic changes in market structure.

The power of our method lies in its ability to capture the dynamic changes in market structure. We recommend that firms make such analyses of social engagement structure, and of the related maps, part of their routine environment and market monitoring and intelligence gathering.

By applying this method on a quarterly basis, managers will be able to spot changes without being distracted by noise in the data. Our method is readily generalized to other platforms, as long as we can construct a heterogeneous brand-user network from the engagement data of public pages. This versatility opens new vistas of opportunity for gathering powerful information from customers’ social engagements.

Author Bio

P. K. Kannan

P. K. Kannan is the Dean’s Chair in Marketing Science and the Associate Dean for Strategic Initiatives at the Robert H. Smith School of Business at the University of Maryland. His research is on marketing modeling, applying statistical, econometric, machine learning, and AI methods to marketing data. He has won several prestigious awards and grants. He is a Distinguished Scholar-Teacher at the University of Maryland and consults widely for nonprofit and profit organizations.

Yi Yang

Yi Yang is an Assistant Professor in the Department of Information Systems, Business Statistics and Operations Management (ISOM), School of Business and Management, at the Hong Kong University of Science and Technology (HKUST). His research interests include natural language processing (NLP), machine learning, and statistical inference, as well as their applications in management science. He has received several research grants from the Hong Kong government. He has been consulting for a leading hedge fund firm on financial NLP topics.

Kunpeng Zhang

Kunpeng Zhang is an Assistant Professor at the Robert H. Smith School of Business at the University of Maryland. His research concerns developing and applying machine learning algorithms to analyze large-scale unstructured data in social media, healthcare, and finance. He has won several research and teaching awards. He has also received several research grants from government and industry agencies.


  1. Yang, Y., Zhang, K., and Kannan, P. K. (2022). “Identifying Market Structure: A Deep Network Representation Learning of Social Engagement.” Journal of Marketing. 86(4):37- 56. doi:10.1177/00222429211033585
  2. Hoberg, Gerard, and Gordon Phillips (2010), “Product Market Synergies and Competition in Mergers and Acquisitions: A Text-Based Analysis,” The Review of Financial Studies, 23 (10), 3773–3811.
  3. Zhang, Kunpeng, Siddhartha Bhattacharyya, and Sudha Ram (2016), “Large-Scale Network Analysis for Online Social Brand Advertising,” MIS Quarterly, 40 (4).
  4. The system learns through a deep autoencoder, an unsupervised learning model consisting of two joint components, an encoder and a decoder. The encoder, fueled by a FFNN (fully-connected feedforward neural network), is a compressor that transforms input data into a compressed low-dimensional latent representation, while the decoder reconstructs it back to the original input data. Because the input data is often high dimensional, such as images or text, in a magnitude of millions, learning effective low dimensional representation (in a magnitude of low hundreds, typically 300) in an efficient way while preserving information in the input data as much as possible is not trivial.
    See also: Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean (2013), “Efficient Estimation of Word Representations in Vector Space,” arXiv:1301.3781 [cs].
  5. While PCA reduces dimensions by linear combinations of the input variables, the autoencoder’s reduced dimensions are non-linear and non-orthogonal, which are achieved through non-linear activations of the neurons, allowing the model to learn more powerful generalizations than PCA can. We can also compare it to multidimensional scaling (MDS), which reduces the information while preserving brand similarity data. While MDS can deal with data in the thousands, the Autoencoder can handle big data in the millions. Extant literature shows that the Autoencoder performs better than MDS and provides generally different results. For further information about our approach, technical details, and a discussion of the validation of the results, see Yang, Zhang and Kannan (2022).
  6. Maaten, Laurens van der, and Geoffrey Hinton (2008), “Visualizing data using t-SNE,” Journal of Machine Learning Research, 9 (Nov): 2579–2605.
  7. https://market-structure.github.io/index.html
  8. Shugan, Steven M. (2014), “Market Structure Research,” in The History of Marketing Science, 129–64.
  9. https://en.wikipedia.org/wiki/Cosine_similarity