Lecture 11
Duke University
STA 199 - Summer 2023
2023-10-03
– Congrats! Exam-1 is done!
– Clone ae-010
– hw-4 - released Thursday
– lab-4 will be in groups
Without teamwork, real-world data science problems would be impossible to solve
– We are not experts in everything
– Learn from different perspectives
– Efficiency
– etc
Practice working in groups
— Technical skills
— Communicate
Fill out survey on Sakai (Due Wednesday 11:59 PM)
– Groups of 4-5
– Will take preferences into account (need to be within lab)
– Randomly assigned
Think
Data Ethics
Data privacy
Bias
– Response Bias
– Non-Response Bias
– Sampling Bias
– Observing a systematic pattern of inaccurate or false responses based on external factors (eg social pressure, avoid scrutiny, etc.)
In North Carolina, you need to be 18 to legally get a tattoo. Do you think the legal age of drinking should be lowered to 18?
Do you think the legal age of drinking should be lowered to 18?
Do you think it’s appropriate to drink alcohol every single day?
The respondent may want to answer honestly, but they could respond in a more socially acceptable way
What is your political party affiliation?
Do you approve of the president?
Answering the previous question could influence you on how to answer the next question to avoid social pressure / judgement
– Suppose you wanted to collect data on student’s GPA. What is an example of a question (or how it is asked) that may elicit response bias?
– Think about leading questions
– Think about how multiple questions relate to each other
– Think about what your question is asking, and if you are asking multiple questions
– Think about sensitive questions
– Think about the information you are collecting
– Think about “setting the stage”
Did you cheat on the STA199 Exam?
Students who did may be more likely not to respond
Assessing the link between smoking and heart disease
Studies generally show that respondents report better health outcomes and more positive health-related behaviors than nonrespondents. People with poorer health tend to avoid participating in health surveys
Suppose you were interested in alcohol consumption by residents in Durham. What is an example of a question (or how it is asked) that may elicit non-response bias?
– Think critically about your target audience
– Think about potential sensitive questions
– Language barrier
– Think about how participants are being contacted
What is your name?
Who is your least favorite instructor at Duke?
Response bias if there is a pattern of inaccurate responses
Non-response bias if there is a pattern of missingness
– bias in which a sample is collected in such a way that some members of the intended population have a lower or higher sampling probability than others
– I’m interested in height of all Duke students! I go out and collect data on
The basketball team
This 199 class
Students walking by the chapel
Put a survey out on Facebook
– Random Sample
Every time we use apps, websites, and devices, our data is being collected and used or sold to others. More importantly, decisions are made by law enforcement, financial institutions, and governments based on data that directly affect the lives of people.
– Name
– Age
– Phone Number
– How long you spend on different content
– List of all your private messages (date, time, person sent to)
– Info about your photos (how it was taken, where it was taken (GPS), when it was taken)
– Browsing history
– In 2016, researchers published data of 70,000 OkCupid users—including usernames, political leanings, drug usage, and intimate sexual details
– Researchers didn’t release the real names and pictures of OKCupid users, but their identities could easily be uncovered from the details provided, e.g. usernames
Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.
Researchers Emil Kirkegaard and Julius Daugbjerg Bjerrekær
Should you scrape these data?
How do you not violate reasonable expectations of privacy?
Bias is a disproportionate weight in favor of or against an idea or thing
We all have bias
Bias can be a part of science and research
Ask questions
Slow down
Think critically