Our tutorial focuses on blind spots in social data. This figure presents a small example of real blind spots. Staring at the plus cross causes the colored dots to disappear due to the natural blind spots in human vision.
Image Source: http://www.psy.ritsumei.ac.jp/~akitaoka/kieru-e.html.
In order to build machine learning models that automatically make accurate predictions about the world around them, we need to train these models with representative datasets. When this condition is not met, major errors can occur. These errors can manifest in camera software that over-predicts Asians as "blinking" , to models that over-predict African-Americans as likely criminals . Non-representative datasets can also lead us to errors when interpreting social media data . In this tutorial we will present different ways in which a dataset can be biased, and the potential issues that can arise from using biased datasets in a predictive setting. Finally, we will present solutions to this problem, ranging from adjusting existing datasets to approaches to collecting data so that bias is minimized in the resulting dataset.
This tutorial it targeted at [is designed for] researchers and practitioners in the area of artificial intelligence. After attending this tutorial, the attendees will leave with an understanding of various sources of bias, the effects of using biased data to train machine learning models, and approaches for mitigating and correcting bias in their own data.
-  Crawford, Kate. "Artificial intelligence's white guy problem." New York Times, June 25 2016. http://nyti.ms/28VgTst
-  Angwin, Julia, Larson, Jeff, Mattu, Surya and Kirchner, Lauren. "Machine Bias: There's software used across the country to predict future criminals. And it's biased against blacks." Propublica, May 23, 2016. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
-  Morstatter, Fred, Jürgen Pfeffer, Huan Liu, and Kathleen M. Carley. "Is the Sample Good Enough? Comparing Data from Twitter's Streaming API with Twitter's Firehose." In Seventh International AAAI Conference on Weblogs and Social Media. 2013. http://www.public.asu.edu/~fmorstat/paperpdfs/icwsm2013.pdf
Saturday, February 4th from 4:15 - 6:00 PM.
Introduction - Blindspots and Bias in Social Data
- Blindspot identification exercise
- Machine learning examples
Bias in Social Media Data
- Overview of bias in social media data.
- Social Data Bias
- Correction techniques
- Mitigation techniques
Social Bias in Other Areas of Machine Learning
- When to stop collecting data?
- When are learned patterns valid?
- The cost of false positives
Conclusion and Q&A
Handouts and Reference Materials
Fred Morstatter is a PhD student in computer science at Arizona State University in Tempe, Arizona. Fred won the Dean's Fellowship for outstanding leadership and scholarship during his time at ASU. He is a 2016 Faculty Emeriti Fellow, and has won the 2016 University Graduate Fellowship. Fred's research focuses on finding and removing biases that impinge social media research. Among his publications is an ICWSM paper that investigates the representativeness of Twitter's Streaming API, a WWW Web Science paper that seek to find periods of bias automatically in streaming Twitter data, 2 KDD demo papers, an article in IEEE Intelligent Systems, and a book: Twitter Data Analytics. He won the World Wide Web conference's Best Poster Award in 2016. He has served as a PC member of ICWSM 2014, 2016, and 2017, the IEEE/CIC ICCC 2014 Symposium on Social Networks and Big Data, and has been a co-chair of the Social Computing, Behavioral-Cultural Modeling and Prediction Conference's Grand Challenge organizing committee in 2014, 2015, and 2016. He has been a Visiting Scholar at Carnegie Mellon University as well as a Research Intern at Microsoft Research. He is the Principal Architect for TweetXplorer, an advanced visual analytic system for Twitter data. More information can be found at http://www.public.asu.edu/~fmorstat. Contact him at email@example.com.
Huan Liu is a professor of Computer Science and Engineering at Arizona State University. He obtained his Ph.D. in Computer Science at University of Southern California and B.Eng. in Computer Science and Electrical Engineering at Shanghai JiaoTong University. Before he joined ASU, he worked at Telecom Australia Research Labs and was on the faculty at National University of Singapore. At Arizona State University, he was recognized for excellence in teaching and research in Computer Science and Engineering and received the 2014 President's Award for Innovation. His research interests are in data mining, machine learning, social computing, and artificial intelligence, investigating interdisciplinary problems that arise in many real-world, data-intensive applications with high-dimensional data of disparate forms such as social media. His well-cited publications include books, book chapters, encyclopedia entries as well as conference and journal papers. He is a co-author of Social Media Mining: An Introduction by Cambridge University Press. He serves on journal editorial boards and numerous conference program committees, and is a founding organizer of the International Conference Series on Social Computing, Behavioral-Cultural Modeling, and Prediction. He is an IEEE Fellow. More can be found at http://www.public.asu.edu/~huanliu.
Page last updated February 3, 2017.