My first newspaper article implementing mixed methodologies of journalism and data science was for the newspaper La Nación Argentina. I would like to share my experience during the process.
Methodology
Topic
Journalistic note applying data science to life stories
Data Science Techniques
data collection. Statistics and visualzation with rstudio
Journalism Techniques
Research and Interview
Complementary material delivered to the newspaper
codes, database
Additional alternatives for publish
cross platforms
Goal:
Provide scientific analysis to human events under a
storytelling format. It allows to connect emotionally
with the other based on a story but analyze the data
more rigorously.
In vitro fertilization (IVF) is a laboratory technique where oocytes are fertilized with sperm. This article focuses on the story of Alejandra Ginel, a 44-year-old woman who shares her experience of becoming a mother through IVF.
Data
The data used in the article were collected from the Latin American Registry of Assisted Reproduction and the RAFA Argentine Registry of Assisted Fertilization of pregnancies and births.
One of the challenges faced during this stage was the limited volume and inconsistency of data collection. As a result, a significant amount of time was dedicated to working with the obtained data and generating the required estimators.
Code
summary(trat_nac) # Fertilization treatments performed
Años Tratamientos Nacimientos
Min. :1990 Min. : 741 Min. : 73
1st Qu.:1998 1st Qu.: 2716 1st Qu.: 453
Median :2005 Median : 6614 Median :1491
Mean :2005 Mean : 8175 Mean :1529
3rd Qu.:2012 3rd Qu.:11678 3rd Qu.:2472
Max. :2020 Max. :21409 Max. :4024
Code
summary(e_eyn2020) # Births and Pregnancies data
Edades Nacimientos Embarazos Prob
Length:4 Min. : 79.0 Min. :102.0 Min. :0.6242
Class :character 1st Qu.:213.2 1st Qu.:295.5 1st Qu.:0.6685
Mode :character Median :322.5 Median :490.0 Median :0.7000
Mean :335.2 Mean :496.2 Mean :0.6997
3rd Qu.:444.5 3rd Qu.:690.8 3rd Qu.:0.7311
Max. :617.0 Max. :903.0 Max. :0.7745
Data visualization
As part of the exploratory analysis, a scatterplot graph @fig_scatterplot was created to represent the relationship between time (X-axis) and the number of fertilization treatments performed (Y-axis).
Code
#| label: fig-scatterplot#| fig-cap: Evolutions of IVF treatments in Argentina#| fig-subcap:#| - "Color by number of treatment"#| - "confidence band"#| warning: false#| layout-ncol: 2#| column: page-rightt <-ggplot(trat_nac,aes(Años,Tratamientos))t+geom_point()
Code
t+geom_point(aes(color=Tratamientos), size=3)+geom_smooth() +ggtitle("Evolutions of IVF treatments in Argentina") +theme_minimal()
`geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Analysis: The annual growth in the number of treatments remained constant over time. However, there was a significant increase from 2013 to 2014, with more than double the number of treatments (29%) compared to the same period of the previous year (12%). This leap can be attributed to the implementation of the In Vitro Fertilization Law in Argentina (Law: 27862), which regulated the coverage provided by health insurance companies in the country. Another notable event occurred in 2020 when the number of treatments dropped significantly due to the impact of the COVID-19 pandemic.
Probablity of births over pregnancies
Next, a boxplot graph @fig_boxplot was generated to depict the probability of births over pregnancies using data from the year 2020. The graph was divided into age facets, with pregnancies on the X-axis and the probability of birth on the Y-axis.
Code
e <-qplot(Embarazos, Prob, data = e_eyn2020, facets=~Edades,color=Edades)e+geom_point(size=3)b <- e+geom_boxplot(size=1.7) +ggtitle("probability of births over pregnancies")
Analysis: The probability of births over pregnancies decreases with age. After the age of 40, the probability of successful birth through IVF drops to around 62%, compared to 77% for women under 30 years of age. Additionally, it is observed that 46% of pregnancies in 2020 were among women aged between 35 and 39 years.