Social networks nowadays are big data production engines. Their analytics can produce insights on trending topics that can be used in various domains, from advertising to politics. Social media trends are also indicators for various phenomena, from opinion shifts to emergency situations and even disease outbreaks. However, the prediction of a social network’s topic as a trend needs to be first declared a trend by the social network itself (e.g. Twitter), and this can count as a classification problem.
Our need for infrastructure resources that would make the build of such architecture possible was accommodated by GRNET. Athena Vakali
Managing massive data volumes to extract valuable information and doing that in a real-time fashion are both additional obstacles to predicting trend topics on social networks.
Athena Vakali and her colleagues at the Aristotle University of Thessaloniki in Greece addressed these challenges by working on a new model of detecting social media trends. The team wanted to observe the effectiveness of some of the known techniques and algorithms of this field in a near-real world context.
They started by using actual Twitter large-scale data threads and employed trend prediction in a real-time manner under a framework designed in Lambda architecture.
Lambda architecture is a data-processing model capable of handling massive quantities of data using both batch-processing and stream-processing methods to provide views of online data. The team chose to use this model because it tackles the manipulation problems of both the volume and the velocity of data.
And while it was easy to decide on which model to use, the project lacked the necessary infrastructure resources upon which to build the whole architecture.
Vakali and her colleagues decided to contact GRNET, one of EGI’s federated cloud providers, to help them with the much needed Cloud Compute resources. The team was welcomed to install their model on GRNET’s cluster Okeanos and to implement a distribution of Lambda Architecture.
“Lambda architecture is by its definition, a complex consisting of a couple of frameworks for distributed analysis and NoSQL databases. In that manner, it would be useless to execute our experiments in our lab’s standalone servers. Our need for infrastructure resources that would make the build of such architecture possible was accommodated by GRNET”, says Vakali.
In total, they used about 48 CPU cores, 46 GB of memory and 600GB of disk storage available at Okeanos and installed 14 virtual machines to help them run the experiments.
They found that almost 80% of the actual trending topics were classified as potential trending topics. The results, published in Advances in Big Data, validate the performance of the proposed research framework and emphasise its ability to early detect trending topics.
Athena Vakali and her colleagues used 48 CPU cores, 46 GB of memory and 600 GB of disk storage available at GRNET’s cloud cluster Okeanos.
Vakali et al. 2016. A Distributed Framework for Early Trending Topics Detection on Big Social Networks Data Threads. Advances in Big Data. doi: 10.1007/978-3-319-47898-2_20