Over 1000 Masters-level students at the University of Melbourne have been taught big data analytics on the
NeCTAR Research Cloud since 2013 as part of the Cluster and Cloud Computing course taught by the
presenter. This course covers HPC programming including MPI as well as the hands-on experiences in dynamic
deployment and scaling of applications on the Cloud. Students are exposed to technologies such as noSQL
systems such as CouchDB, Hadoop/HDFS and Spark, as well as how to write scalable Cloud solutions using
scripting approaches such as Boto and Ansible, as well as latest technologies such as Docker, Docker SWARM
and Kubernetes.
The cornerstone of this course is teaching students how to develop and scale big data solutions. Social
media has been used for this course throughout as the basis of a live and challenging big data resource. This
talk will illustrate examples of student work that focuses on social media data analytics including Twitter,
Instagram, Flickr, Foursquare and Reddit. A range of scenarios and solutions will be presented including use
of such data to better under the way in which individuals move around the city; commuting patterns; the
adult (sex) industry; identification of the gender of social media users; the dietary habits of individuals; linking
users across different platforms, through to historic data mining to identify the social media use of suicide
completers and the challenges that arise in the reliable reidentification of accounts. Wherever possible the
social media data scenarios are compared and validated with official data sources from the AURIN platform
(www.aurin.org.au) also developed, supported and maintained by Prof Sinnott’s team.
The talk will also cover a recent grant funded by ARDC to capture all social media data across Australia
through the Digital Data Observatory. The architecture of this platform is shown in Figure 1. The images are
examples of student works that will form part of the talk.
ABOUT THE AUTHOR
Professor Richard O. Sinnott is Professor of Applied Computing Systems and Director of the Melbourne
eResearch Group at the University of Melbourne. He has been lead software engineer/architect on an
extensive portfolio of national and international projects, with specific focus on those research domains
requiring finer-grained access control (security) and those dealing with big data challenges. He has over 400
peer reviewed publications across a range of applied computing research areas.