Fall 2017 Workshops

Collaborating with ECDS - Please RSVP

Thursday, Sept 14 - Data Viz and R (John Bernau) 4:30-7:30 pm

Wednesday, Oct 4 - Carto+Tableau (Megan Slemons) 11-2pm (12:15-12:45 lunch break)

Wednesday, Nov 1 - Text Mining (Sara Palmer) 11-2pm (12:15-12:45 lunch break)

Tuesday, Nov 14 - Voyant (Sara Palmer) – 2:30-4pm

Python for Data Scientists (Spring)

Thank You for your interest, this workshop has reached capacity.  Please check back often for more events.

Geographic Information Systems (GIS) Workshop

Thank You for your interest, this workshop has reached capacity.  Please check back often for more events.

Data Visualization in R Workshop

Tableau Workshop

Text Analysis for Social Scientists

January 20, 2017, 1:00 - 5:00 PM,  Modern Languages Building RM 201

Python for Data Science

November 2 -3, 2016, 7:15- 8:45
Modern Languages Building RM 201

Please RSVP, seating is limited.  Workshop Prep: bring lapton with Python 2.7 installed.  Use this link for an easy download.

Text Analysis for Social Scientists

February 12, 2016, 1:00-5:00pm

Modern Languages Building 201

Led by Joshua Fjelstul

This workshop is designed to provide attendees with a basic introduction to text analysis — computational techniques for studying the content of text — for social science applications. It will feature applications to social media analysis (Twitter). No prior experience working with text data is needed. Basic knowledge of R is helpful but not required. The workshop will include three mini-units: (1) collecting and structuring raw text for analysis; (2) cleaning and manipulating text to prepare it for analysis; and (3) describing and analyzing text for social science applications. 

The first mini-unit will focus on how to collect text data. It will cover how to import pre-collected text-based data into R, how to implement basic web-scraping tools in R to automatically collect data from HTML sources, and how to use the Twitter API in R (as an example of how to extract social media content). It will also cover the best ways to organize and structure raw text in R. 
The second mini-unit will focus on how to clean and manipulate text in R. It will introduce regular expressions — a syntax for matching strings — and a suite of functions for implementing them in R; cover the basics of creating, cleaning, and manipulating corpora (collections of text fragments); and show how to make document-term matrixes (the primary input for many forms of text analysis). 
The third mini-unit will focus on how to describe, visualize, and analyze text data. It will cover common statistics for describing text data — including frequency, association, and clustering — and how to calculate them in R; show how to visualize text data to communicate the content of document; and introduce basic tools (supervised and unsupervised) for estimating the topic and sentiment of text fragments, including the unsupervised latent Dirichlet allocation (LDA) model for topic classification and the supervised Naive Bayes classifier for sentiment analysis. The emphasis will be on implementing these tools in R. 

Spatial Analysis in Architecture and Archaeology

November 14th, 2014, 1:30-4:00pm

Callaway Center S108

Led by Dr. Ermal Shpuza

The workshop introduces the study of space in the built environment from a social viewpoint. The continuous built space is represented in discrete components according to basic aspects of human behavior and activities including movement, co-awareness and co-presence. Case studies of various scales, including houses, complex buildings, settlements, and archaeological records, are examined according to three main representational techniques of convex partitions, axial maps and visibility fields. Relational patterns of connections and separations among spatial components are analyzed as networks according to graph-theoretic methods to reveal the underlying social function (interfaces and program). The workshop covers the drawing of maps and isovists; justified graphs; UCL Depthmap software; data export and import; space syntax configuration, topological measures; and interpretation of graphical and numerical results. There are no requirements for digital drawing literacy.

Data Curation

January 12th, 2015, 9:00 am - 5:00 pm

Woodruff Library 312

Led by instructors from ICPSR

Instructors from the Inter-university Consortium for Political and Social Research (ICPSR) Data Archives are partnering with the Emory Center for Digital Scholarship (ECDS) and the Institute for Quantitative Theory and Methods to host a Data Curation Workshop in the Spring of 2015. ECDS an ICPSR have created a full-day agenda for researchers to learn about and get hands-on experience curating quantitative datasets for long-term access and preservation in accordance with federal funding agency mandates and requirements from journals and publishers. Please visit for additional information and to register.

Data Scraping for Non-programmers

April 13th, 2015, 10:00am-12:00pm

Woodruff Library 314

Led by Trent Ryan, Sociology graduate student

“Big data” research has become very popular among social scientists over the past decade. Many psychology, sociology, and political science researchers use data from online sources to study individual and group behavior. Similarly, literature classics have been digitally archived into large collections of text to be studied by linguistics and history scholars. However, some social scientists and humanists struggle with mastering data extraction due to a lack of programming and/or tech skills. While platforms such as R, Python, Java, and Perl have excellent packages for web scraping, they often require considerable time investment to learn and may be intimidating to those who have no interest in computational methods beyond accessing such data.

This workshop is designed to provide an introduction to web scraping for non-programmers. Scraping is the method by which researchers collect large online data quickly and efficiently. We will use a free point-and-click software program called OutWit Hub to explore this method. We will begin by explaining what scraping is and how it works. Along the way, we will learn how to navigate around a website’s structure, identify the information we need, and extract and output it as data. We will also explore some of the common misconceptions and legal implications of scraping copyrighted content. Finally, we will have a demo to get some hands-on experience with this powerful tool. By the end of this workshop, participants will be able to use this method on their own.

For those planning to attend, we ask that you download the free software program needed for the workshop here:

Research Design in Anthropological Studies

September 25th, 26th, and 27th, 2013, 9:00 am-4:00 pm 

ANT 206

Led by Dr. Russell Bernard

This workshop is about writing effective research proposals. We begin with the basics of research design, including units of analysis, measurement, independent and dependent variables, validity, reliability, and cause and effect. Then we cover sampling (probability and nonprobability sampling) and methods for collecting and analyzing data. Throughout, the focus will be on research design¿that is, formulating a research question; tying the question to existing knowledge or theory; developing hypotheses; and laying out the methods for testing the hypotheses. The objective is to produce intellectually convincing, fundable research proposals at the Ph.D. level of study. Students should come to the course with a research agenda in mind or a draft of a research proposal. We will spend a lot of the time in going over students¿ proposals and/or student¿s ideas for research.


How to Collect Social and Health Data with Mechanical Turk and Qualtrics

October 9th, 2013, 10:00-11:30 am

Woodruff Library 312

Led by Chris Martin

Mechanical Turk, or MTurk, is a low-cost service provided by Amazon that allows researchers to collect survey data from people around the world. Social, psychological, and health researchers have used MTurk extensively in the past five years to gather survey data or implement simple experiments. In this workshop, participants will learn how to efficiently use MTurk in combination with Qualtrics to implement a study. Participants will also learn the advantages and drawbacks of Mechanical Turk, and understand what online resources can keep your MTurk knowledge up to date. The workshop will include a hands-on implementation of a survey.

Please activate you Emory Qualtrics account ( and create an MTurk Requester account ( before you attend. We also suggest you create an Amazon Payments account and deposit two dollars in it prior to the workshop.

Chris Martin has master's degrees in human-computer interaction and experimental psychology, and is currently a doctoral student in sociology. He has used Mechanical Turk to study political, social, and fiscal attitudes among American adults.

Networks from the Real World

December 2nd and 3rd, 2013, 1:00-5:00 pm

Woodruff Library 312

Led by Dr. Ernesto Estrada

Participants will have the opportunity to work with real-world networks from different scenarios and make calculations of the parameters studies in the course as well as visualization of these networks. 

Dr. Ernesto Estrada is a QuanTM faculty fellow visiting Emory for the Fall 2013 semester

Collecting Social Science and Public Health Data with Qualtrics

February 3rd, 2014, 10:00-11:30 am

Woodruff Library 312

Led by Chris Martin

Qualtrics ( is an application to collect and analyze survey data. Qualtrics has many advanced features, which make it a more useful and efficient tool than SurveyMonkey and other competitors. This workshop will introduce participants to these features.

  • Features that save time in pilot testing include the simulation of data collection.
  • Features that allow better survey design include skip rules, display rules, randomization, piped text, and real-time survey customization.
  • Features that allow better question design include heat maps, hot spots and sliding scales.
  • Features that allow collaboration and re-use include public and private libraries.
  • Features that save time during data analysis include automatic data recoding.

Participants will also learn how to integrate Qualtrics with Amazon Mechanical Turk. 

Chris Martin has master's degrees in human-computer interaction and experimental psychology, and is currently a doctoral student in sociology. He has used Mechanical Turk to study political, social, and fiscal attitudes among American adults.

Large-Scale Topic Analysis with Mallet 

April 10th, 2014, 10:00 am-3:00 pm

Woodruff Library 312

Led by Dr. David Mimno:

Over the last ten years we have seen the creation of massive digital text collections, from Twitter feeds to million-book libraries. At the same time, researchers have developed text-mining methods that go beyond simple word frequency analysis to uncover thematic patterns. This workshop will introduce participants to topic modeling through hands-on tutorials using the Mallet package and the R statistical language. After a theoretical presentation of the method, we will discuss inference and model training, data preparation techniques such as stoplist curation and text segmentation, model analysis techniques in the presence of metadata, and finally model diagnostics.

David Mimno is an assistant professor in the Information Science department at Cornell University. His research is on developing machine learning models and algorithms, with a particular focus on applications in Humanities and Social Science. He received his BA in Classics and Computer Science from Swarthmore College and PhD in Computer Science from the University of Massachusetts, Amherst. He was a CRA Computing Innovation fellow at Princeton University. Before graduate school, he served as Head Programmer at the Perseus Project, a digital library for cultural heritage materials, at Tufts University. Mimno is currently chief architect for the MALLET machine learning toolkit.

Scraping Data from the Web

November 16th & 30th, 2012, 10:30am-3:30pm

ECIT Room 214

Led by Qiaoling Liu and Yu Wang

In this workshop, we will talk about how to crawl web pages and PDF files from the web, as well as how to extract target information from the crawled raw data using R, Python, and Java. Several examples will be given, e.g. data scraping from Twitter and Wikipedia.

We will start by introducing two crawling methods: crawling based on HTTP request and crawling via web API. After the pages are crawled, we will then show several methods of data extraction from the web pages, including table data extraction, regular expression based data extraction, and Xpath based data extraction. Several data scraping examples will be demonstrated, including the entire pipeline of scraping data¿from crawling to analysis¿using Twitter and Wikipedia. How to automatically crawl a batch of PDF files from the web and extract the text will also be demonstrated.

Missed the workshop?

11/30 Workshop Part 1

11/30 Workshop Part 2


A Primer on Recent Advances in Nonparametric Estimation and Inference

February 25th & 27th, March 1st, 4th & 6th, 2013, 2-3:30pm & 4-5:30pm

Woodruff Library Room 312

Led by Jeffrey Racine

In this work we shall study a unified framework for nonparametric and semiparametric kernel-based analysis with an emphasis on applied modeling. We focus on kernel-based methods capable of handling the mix of categorical (nominal and ordinal) and continuous datatypes one typically encounters in the course of applied data analysis. Applications will be emphasized throughout, and we shall use R for data analysis (

Data Visualization

April 18th & 23rd, 2013, 10:00 am-3:00 pm

Woodruff Library Room 312

Led by Robi Ragan

Data visualization is an essential tool for discovery and communication of quantitative information, especially as datasets have grown in size and complexity.  This workshop will provide an overview of the "language" of data visualization, providing participants with a set of ideas, techniques, and best practices for creating effective graphical data presentations. 
The workshop will start with R Basics, including importing data from various formats, and reshaping the data before displaying it. It will cover many different visualization formats, including scatter plots, line graphs, histograms, box plots, balloon plots, density plots, violin plots, correlation matrices, network graphs, heat maps, geospatial maps as well as several more advanced visualizations, and, if time allows, visualizations requested by the workshop's participants. The workshop will also cover the final crucial step: how to best output your graphics for various types of presentation formats. 
The workshop will be conducted using the R statistical language, along with the RStudio development environment (both are free!). Prior knowledge of R is not required, but would be helpful.