library(tidyverse)
Group 6 Project
Proposal
Data 1
Introduction and data
Identify the source of the data.
We are using a dataset about food access from the CORGIS Dataset Project. The data is originally from the United States Department of Agriculture’s Economic Research Service.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data comes from multiple sources. The population data, such as group quarter residences and population sizes of counties, were taken from the 2010 Census of the Population. Information on income-levels and access to vehicles came from American Community Survey responses from 2014-2018. The data for grocery stores is combined form two existing lists of grocery stores. The data was then divided into counties.
Write a brief description of the observations.
Each observation is a county and its corresponding information on population, state, and how many people do or do not have stable access to food based on their distance from the nearest grocery store. Each observation also has data on how many children, seniors, people with low-income, and people without a car are far or close from food. The observations come from counties from all over the US.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How do demographic factors like age and income level relate to the levels of food accessibility that different counties have?
A description of the research topic along with a concise statement of your hypotheses on this topic.
Individuals with lower income levels and higher ages are more likely to be in the “low access” pool; counties with more lower-income and older individuals are likely to have large portion of their population beyond a 1 mile distance from a nearest grocery store.
When people live in food deserts, they don’t access convenient or stable access to healthy and fresh foods. The nearest grocery markets would be too far to make consistent trips, for example. The research topic we’re exploring is which different populations are more likely to be affected by food deserts.
Identify the types of variables in your research question. Categorical? Quantitative?
The variables used would be the county (categorical), population size (quantitative), and the number of overall people, low-income individuals, children, and seniors who are 1/2, 10, and 20 miles away from the nearest supermarket (quantitative variables). It would also be interesting to include the amount of people who have vehicle access to supermarkets (quantitative) to add some nuance to our findings.
Literature
Find one published credible article on the topic you are interested in researching.
https://www.aecf.org/blog/exploring-americas-food-deserts
Provide a one paragraph summary about the article.
“Food desert” is a term for a geographic area where residents have little to no access to groceries—particularly produce like fruits and vegetables. Generally, they are found in areas with small populations, and rural locations, and low income/educated residents. Towns with a high black population also tend to have less access to grocery stores. There are factors that officially identify an area as a food desert. Distance from the average home in the community to a grocery store, household resources (average income level, employment), and availability of community resources (public transportation, average income), are all defining characteristics. Approximately 6.2% of Americans—39.5 million people—lack access to nutritious foods. Food deserts exist for reasons such as transportation access, the existence of snack/convenience foods, income inequality, and business profits. Although the issue may seem difficult to tackle, there are potential solutions to this crisis. First, the government can incentivize grocery stores to open in underprivileged and rural areas. Similarly, growing the local agriculture industry can allow greater access to fresh produce. Cities can also implement programs to encourage healthier diets.
In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.
This article directly supports the argument that factors such as age and income level relate to levels of food accessibility. It claims that food deserts are most commonly found in rural areas with low household incomes—while also mentioning that the racial demographics of an area has a strong correlation to food access.
Glimpse of data
<- read_csv("data/food_access.csv") food_access
Rows: 3142 Columns: 25
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): County, State
dbl (23): Population, Housing Data.Residing in Group Quarters, Housing Data....
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(food_access)
Rows: 3,142
Columns: 25
$ County <chr> "Autauga County", "Bal…
$ Population <dbl> 54571, 182265, 27457, …
$ State <chr> "Alabama", "Alabama", …
$ `Housing Data.Residing in Group Quarters` <dbl> 455, 2307, 3193, 2224,…
$ `Housing Data.Total Housing Units` <dbl> 20221, 73180, 9820, 79…
$ `Vehicle Access.1 Mile` <dbl> 834, 1653, 545, 312, 7…
$ `Vehicle Access.1/2 Mile` <dbl> 1045, 2178, 742, 441, …
$ `Vehicle Access.10 Miles` <dbl> 222, 32, 201, 0, 0, 80…
$ `Vehicle Access.20 Miles` <dbl> 0, 0, 0, 0, 0, 0, 0, 0…
$ `Low Access Numbers.Children.1 Mile` <dbl> 9973, 30633, 3701, 419…
$ `Low Access Numbers.Children.1/2 Mile` <dbl> 13281, 38278, 4943, 48…
$ `Low Access Numbers.Children.10 Miles` <dbl> 1199, 516, 791, 90, 0,…
$ `Low Access Numbers.Children.20 Miles` <dbl> 0, 0, 0, 0, 0, 0, 0, 0…
$ `Low Access Numbers.Low Income People.1 Mile` <dbl> 12067, 38848, 9290, 64…
$ `Low Access Numbers.Low Income People.1/2 Mile` <dbl> 15518, 48117, 11901, 8…
$ `Low Access Numbers.Low Income People.10 Miles` <dbl> 2307, 846, 2440, 102, …
$ `Low Access Numbers.Low Income People.20 Miles` <dbl> 0, 0, 0, 0, 0, 0, 0, 0…
$ `Low Access Numbers.People.1 Mile` <dbl> 37424, 132442, 19007, …
$ `Low Access Numbers.People.1/2 Mile` <dbl> 49497, 165616, 23762, …
$ `Low Access Numbers.People.10 Miles` <dbl> 5119, 2308, 4643, 365,…
$ `Low Access Numbers.People.20 Miles` <dbl> 0, 0, 0, 0, 0, 0, 0, 0…
$ `Low Access Numbers.Seniors.1 Mile` <dbl> 4393, 21828, 2537, 226…
$ `Low Access Numbers.Seniors.1/2 Mile` <dbl> 5935, 27241, 3348, 263…
$ `Low Access Numbers.Seniors.10 Miles` <dbl> 707, 390, 629, 72, 0, …
$ `Low Access Numbers.Seniors.20 Miles` <dbl> 0, 0, 0, 0, 0, 0, 0, 0…
Data 2
Introduction and data
Identify the source of the data.
EDGAR - Emissions Database for Global Atmospheric Research
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The data was collected from 1970 until 2012. It was collected by referencing data from various other sources to examine global greenhouse gas emissions. The GDP columns, for example, are most likely incorporated from the World Bank source and the greenhouse gas emissions could derive from IEA Greenhouse Gas Emissions from Energy. Specific data on emissions by sector comes from the sources on those specific sectors, like IATA (2022), International Air Transport Association Statistics, link, 2022. Refer to https://edgar.jrc.ec.europa.eu/dataset_ghg70#p1 for the complete list of sources.
Write a brief description of the observations.
The observations represent each country in a specific year from 1970 - 2012 and their corresponding emissions, GDP, and other relevant data. There are specific columns breaking down emissions by type: CO2, N20, CH4. There are also columns breaking down emissions by sector (power, infrastructure, transport, industry, other), allowing for helpful visualizations of the data. Finally, the GDP is given as GDP and GDP per capita, which may allow for better comparisons between countries as we can scale economic output by population and relate it to emissions.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
How have emissions in countries changed over timed with respect to Gross Domestic Product (GDP)?
A description of the research topic along with a concise statement of your hypotheses on this topic.
Hypothesis: There is a positive correlation between GDP and greenhouse gas emissions, suggesting that higher GDP is associated with higher emissions.
The research topic aims to explore the relationship between greenhouse gas emissions, Gross Domestic Product (GDP), and time in various countries. It seeks to understand how emissions have evolved over a specified time period (1970-2012) and how this evolution relates to economic growth (GDP) while also considering the specific types of emissions, such as carbon dioxide (CO2), methane (CH4), and nitrous oxide (N2O). The study will offer insights into the environmental impact of economic development and the relative contributions of different greenhouse gases.
Identify the types of variables in your research question. Categorical? Quantitative?
In my research question, the dependent variables are the emissions (each specific type), which are all quantitative. Time, in years is an independent variable that is categorical in nature because it does not hold numerical significance to the data, only categorical significance. Finally, GDP is an independent, quantitative variable that helps to explain (explanatory variable) emissions.
Literature
Find one published credible article on the topic you are interested in researching.
https://unctad.org/news/carbon-emissions-anywhere-threaten-development-everywhere
Provide a one paragraph summary about the article.
During the past six decades, economic development and a rapidly increasing global population have caused significant environmental costs: CO2 emissions have nearly tripled per capita. GDP has increased in correlation. The top three emitters today—China, the US, and India—are also countries in which industrial production is high. Although the list of top C02 emitters is diverse in GDP, ranging from most developed to developing nations, recently developed countries with high GDPs tend to have higher CO2 emissions. For example, China’s remarkable economic growth, poverty reduction, and dominance of the global production change has come with a skyrocketing rate of CO2 emissions highlights.
In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.
This article supports the argument that CO2 emissions have increased over time in correlation to GDP. It claims that CO2 emissions are higher in recently developed countries, such as China—leading to the conclusion that there is a correlation between economic development and environmental impact.
Glimpse of data
<- read.csv("data/emissions.csv")
emissions
glimpse(emissions)
Rows: 8,385
Columns: 12
$ Country <chr> "Afghanistan", "Afghanistan", "Afghani…
$ Year <int> 1970, 1971, 1972, 1973, 1974, 1975, 19…
$ Emissions.Type.CO2 <dbl> 2670, 2630, 2180, 2310, 2520, 2720, 28…
$ Emissions.Type.N2O <dbl> 1820, 1850, 1810, 1830, 2190, 1930, 18…
$ Emissions.Type.CH4 <dbl> 12800, 12900, 11900, 11600, 12800, 128…
$ Emissions.Sector.Power.Industry <dbl> 0.06, 0.06, 0.12, 0.17, 0.21, 0.21, 0.…
$ Emissions.Sector.Buildings <dbl> 0.58, 0.58, 0.46, 0.57, 0.77, 0.59, 0.…
$ Emissions.Sector.Transport <dbl> 0.23, 0.23, 0.27, 0.24, 0.24, 0.29, 0.…
$ Emissions.Sector.Other.Industry <dbl> 0.07, 0.07, 0.05, 0.02, 0.03, 0.02, 0.…
$ Emissions.Sector.Other.sectors <dbl> 0.53, 0.53, 0.61, 0.47, 0.65, 0.58, 0.…
$ Ratio.Per.GDP <dbl> 1.557705, 1.517670, 1.357590, 1.307901…
$ Ratio.Per.Capita <dbl> 0.000000, 0.000000, 0.000000, 0.000000…
Data 3
Introduction and data
Identify the source of the data.
This data is from Our World in Data, specifically a section focused on child mortality in relation to global health.
State when and how it was originally collected (by the original data curator, not necessarily how you found the data).
The original data was published by the Institute for Health Metrics and Evaluation in 2021. This was for a global burden of disease study up to 2019.
Write a brief description of the observations.
The data shows the top five most lethal infectious diseases from 1990-2019. It specifically focuses on the number of child deaths under the age of 5. Each observation is a country and the corresponding information of children death (under the age of 5) for each infectious disease: HIV/AIDS, measles, diarrheal diseases, malaria, and lower respiratory infections. It also indicates the year in which these deaths were recorded, from 1990-2019.
Research question
A well formulated research question. (You may include more than one research question if you want to receive feedback on different ideas for your project. However, one per data set is required.)
What countries have made significant progress in reducing childhood deaths from the categorized infectious diseases?
A description of the research topic along with a concise statement of your hypotheses on this topic.
This topic is relating to the top 5 infectious diseases for global morbidity. This will be used to further understand areas that have been most affected and potentially analyze the components and factors which have worked to decreased deaths. The analysis of this data could look deeper into the countries and factor roles of climate, environment, health policy, and the economy. We hypothesize that there have been countries with significant progress in reducing childhood deaths as the amount of global deaths has decreased, which could be due to improved healthcare, socioeconomic factors, and more.
Identify the types of variables in your research question. Categorical? Quantitative?
Quantitative: Number of deaths from the categorized infectious diseases
Categorical: Names of countries
Literature
Find one published credible article on the topic you are interested in researching.
https://www.who.int/southeastasia/news/detail/19-09-2023-experts–officials-meet-to-accelerate-reduction-of-maternal–newborn-and-child-deaths-in-who-south-east-asia-region
Provide a one paragraph summary about the article.
Health experts in the Southeast Asian Region have initiated strategies and interventions to reduce deaths related to childbirth through addressing inadequate healthcare. While the region has made significant progress in reducing infant mortality rates, there are still disparities in health care within its countries. Currently, World Health Organization officials are discussing ways to achieve universal coverage for high-risk patients in the region in order to improve child health. Several boundaries to this goal exist—low coverage of essential practices (breastfeeding, newborn care, and nutrition) and the impact of environmental factors (birth defects, disabilities, climate change, disease). Despite these challenges, the Southeast Asian region has seen a significant decline in maternal, newborn, and child deaths in the past ten years.
In 1-2 sentences, explain how your research question builds on / is different than the article you have cited.
This article does not fully answer our research question of which countries have made significant progress in reducing childhood deaths—rather, it gives an example one specific region that has worked to improve their child mortality rates.
Glimpse of data
<- read_csv("data/lethal_infectious_diseases.csv") lethal_infectious_diseases
Rows: 6840 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Entity, Code
dbl (6): Year, Deaths - HIV/AIDS - Sex: Both - Age: Under 5 (Number), Deaths...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
glimpse(lethal_infectious_diseases)
Rows: 6,840
Columns: 8
$ Entity <chr> …
$ Code <chr> …
$ Year <dbl> …
$ `Deaths - HIV/AIDS - Sex: Both - Age: Under 5 (Number)` <dbl> …
$ `Deaths - Measles - Sex: Both - Age: Under 5 (Number)` <dbl> …
$ `Deaths - Diarrheal diseases - Sex: Both - Age: Under 5 (Number)` <dbl> …
$ `Deaths - Malaria - Sex: Both - Age: Under 5 (Number)` <dbl> …
$ `Deaths - Lower respiratory infections - Sex: Both - Age: Under 5 (Number)` <dbl> …