| Type: | Package | 
| Title: | A Curated Collection of Pulmonary and Respiratory Disease Datasets | 
| Version: | 0.2.0 | 
| Maintainer: | Renzo Caceres Rossi <arenzocaceresrossi@gmail.com> | 
| Description: | Provides a comprehensive and curated collection of datasets related to the lungs, respiratory system, and associated diseases. This package includes epidemiological, clinical, experimental, and simulated datasets on conditions such as lung cancer, asthma, Chronic Obstructive Pulmonary Disease (COPD), tuberculosis, whooping cough, pneumonia, influenza, and other respiratory illnesses. It is designed to support data exploration, statistical modeling, teaching, and research in pulmonary medicine, public health, environmental epidemiology, and respiratory disease surveillance. | 
| License: | GPL-3 | 
| Language: | en | 
| URL: | https://github.com/lightbluetitan/pulmodatasets, https://lightbluetitan.github.io/pulmodatasets/ | 
| BugReports: | https://github.com/lightbluetitan/pulmodatasets/issues | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Depends: | R (≥ 4.1.0) | 
| Imports: | utils | 
| Suggests: | ggplot2, dplyr, testthat (≥ 3.0.0), knitr, rmarkdown | 
| RoxygenNote: | 7.3.2 | 
| Config/testthat/edition: | 3 | 
| VignetteBuilder: | knitr | 
| NeedsCompilation: | no | 
| Packaged: | 2025-09-06 05:55:51 UTC; Renzo | 
| Author: | Renzo Caceres Rossi | 
| Repository: | CRAN | 
| Date/Publication: | 2025-09-07 17:30:24 UTC | 
PulmoDataSets: A Curated Collection of Pulmonary and Respiratory Disease Datasets
Description
This package provides a wide variety of datasets focused on the lungs, respiratory system, tuberculosis, whooping cough, pneumonia, influenza and associated diseases.
Details
PulmoDataSets: A Curated Collection of Pulmonary and Respiratory Disease Datasets
 
A Curated Collection of Pulmonary and Respiratory Disease Datasets.
Author(s)
Maintainer: Renzo Caceres Rossi arenzocaceresrossi@gmail.com
See Also
Useful links:
UK Female Lung Disease Deaths
Description
This dataset, UK_female_lung_deaths_ts, is a time series object containing monthly deaths from bronchitis, emphysema and asthma in the UK from 1974 to 1979, for females.
Usage
data(UK_female_lung_deaths_ts)
Format
A time series (ts) object with 72 monthly observations from 1974 to 1979.
- value
- Number of deaths (numeric vector) 
- time
- Time index (1974 to 1979) 
Details
The dataset name has been kept as 'UK_female_lung_deaths_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'ts' indicates that the dataset is a time series object. The original content has not been modified in any way.
Source
Data taken from the datasets package (R version 4.5.0), fdeaths dataset
UK Male Lung Disease Deaths
Description
This dataset, UK_male_lung_deaths_ts, is a time series object containing monthly deaths from bronchitis, emphysema and asthma in the UK from 1974 to 1979, for males.
Usage
data(UK_male_lung_deaths_ts)
Format
A time series (ts) object with 72 monthly observations from 1974 to 1979.
- value
- Number of deaths (numeric vector) 
- time
- Time index (1974 to 1979) 
Details
The dataset name has been kept as 'UK_male_lung_deaths_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'ts' indicates that the dataset is a time series object. The original content has not been modified in any way.
Source
Data taken from the datasets package (R version 4.5.0), mdeaths dataset
US Mortality Rates by Cause and Gender
Description
This dataset, USMortality_df, is a data frame containing mortality rates across all ages in the USA by cause of death, sex, rural and urban status from 2011 to 2013. The data represent national aggregate rates under the Department of Health and Human Services (HHS).
Usage
data(USMortality_df)
Format
A data frame with 40 observations and 5 variables:
- Status
- Rural/Urban status (factor with 2 levels) 
- Sex
- Gender (factor with 2 levels) 
- Cause
- Cause of death (factor with 10 levels) 
- Rate
- Mortality rate (numeric vector) 
- SE
- Standard error of mortality rate (numeric vector) 
Details
The dataset name has been kept as 'USMortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the lattice package version 0.22-6
US Regional Mortality Rates by Cause and Gender
Description
This dataset, USRegionalMortality_df, is a data frame containing region-wise mortality rates across all ages in the USA by cause of death, sex, rural and urban status from 2011 to 2013. The data represent rates for each administrative region under the Department of Health and Human Services (HHS).
Usage
data(USRegionalMortality_df)
Format
A data frame with 400 observations and 6 variables:
- Region
- HHS administrative region (factor with 10 levels) 
- Status
- Rural/Urban status (factor with 2 levels) 
- Sex
- Gender (factor with 2 levels) 
- Cause
- Cause of death (factor with 10 levels) 
- Rate
- Mortality rate (numeric vector) 
- SE
- Standard error of mortality rate (numeric vector) 
Details
The dataset name has been kept as 'USRegionalMortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the lattice package version 0.22-6
AI Assessment of Pulmonary Nodules
Description
This dataset, ai_ipn_performance_dt, is a data table containing performance metrics of an artificial intelligence tool for risk stratification of 200 indeterminate pulmonary nodules (IPNs) on chest CT scans.
Usage
data(ai_ipn_performance_dt)
Format
A data table with 200 observations and 2 variables:
- cancer
- Malignancy status (0 = benign, 1 = malignant) (integer) 
- rating
- AI risk assessment rating (integer) 
Details
The dataset name has been kept as 'ai_ipn_performance_dt' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'dt' indicates that this is a data table object. The original content has not been modified in any way.
Source
Data taken from the R4HCR package version 0.1
Air Pollution and Mortality
Description
This dataset, air_polution_mortality_df, is a data frame containing information from an early study exploring the relationship between air pollution and mortality across 5 Standard Metropolitan Statistical Areas in the U.S. between 1959 and 1961.
Usage
data(air_polution_mortality_df)
Format
A data frame with 60 observations and 7 variables:
- City
- Metropolitan area (factor with 60 levels) 
- Mort
- Mortality rate (numeric vector) 
- Precip
- Annual precipitation in inches (integer vector) 
- Educ
- Median years of education (numeric vector) 
- NonWhite
- Percentage of non-white population (numeric vector) 
- NOX
- Nitrogen oxide concentration (integer vector) 
- SO2
- Sulfur dioxide concentration (integer vector) 
Details
The dataset name has been kept as 'air_polution_mortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the Sleuth3 package version 1.0-6
COPD and Asthma Patients
Description
This dataset, asthma_patients_tbl_df, is a tibble containing clinical information about 300 asthma (COPD) patients tracked over 3 years, including demographics, smoking status, diagnosis details, medications, and peak flow measurements.
Usage
data(asthma_patients_tbl_df)
Format
A tibble with 300 observations and 7 variables:
- Patient_ID
- Unique patient identifier (numeric) 
- Age
- Patient age in years (numeric) 
- Gender
- Patient gender (character) 
- Smoking_Status
- Current/Former/Never smoker status (character) 
- Asthma_Diagnosis
- Specific asthma/COPD diagnosis (character) 
- Medication
- Prescribed treatment regimen (character) 
- Peak_Flow
- Peak expiratory flow rate (numeric) 
Details
The dataset name has been kept as 'asthma_patients_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble object. The original content has not been modified in any way.
Source
Data taken from Kaggle: https://www.kaggle.com/datasets/jatinthakur706/copd-asthma-patient-dataset
Chronic Bronchitis in Cardiff Men
Description
This dataset, bronchitis_Cardiff_df, is a data frame containing information from a study assessing the effects of smoking and pollution on bronchitis diagnosis in a sample of 212 men from Cardiff.
Usage
data(bronchitis_Cardiff_df)
Format
A data frame with 212 observations and 4 variables:
- cig
- Number of cigarettes smoked per day (numeric) 
- poll
- Pollution exposure level (numeric) 
- r
- Bronchitis diagnosis (0 = no, 1 = yes) (integer) 
- rfac
- Bronchitis diagnosis as a factor with 2 levels (factor) 
Details
The dataset name has been kept as 'bronchitis_Cardiff_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the gamclass package version 0.62.5
Chicago Mortality and Pollution
Description
This dataset, chicago_pollution_df, is a data frame containing daily mortality, weather, and pollution data for Chicago from 1987 to 2000 from the National Morbidity, Mortality and Air Pollution Study (NMMAPS). It includes all-cause mortality, cardiovascular and respiratory deaths, temperature, humidity, and pollution levels (PM10 and ozone).
Usage
data(chicago_pollution_df)
Format
A data frame with 5114 observations and 14 variables:
- date
- Date (Date object) 
- time
- Time index (integer vector) 
- year
- Year (numeric vector) 
- month
- Month (numeric vector) 
- doy
- Day of year (integer vector) 
- dow
- Day of week (factor with 7 levels) 
- death
- All-cause mortality count (integer vector) 
- cvd
- Cardiovascular mortality count (integer vector) 
- resp
- Respiratory mortality count (integer vector) 
- temp
- Temperature (numeric vector) 
- dptp
- Dew point temperature (numeric vector) 
- rhum
- Relative humidity (numeric vector) 
- pm10
- PM10 pollution level (numeric vector) 
- o3
- Ozone level (numeric vector) 
Details
The dataset name has been kept as 'chicago_pollution_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the dlnm package version 2.4.10
Child Wheeze and Pollution
Description
This dataset, child_wheeze_pollution_df, is a data frame containing longitudinal data on wheezing status for 16 children measured four times yearly at ages 9 through 12, with associated pollution exposure information.
Usage
data(child_wheeze_pollution_df)
Format
A data frame with 64 observations and 5 variables:
- ID
- Child identifier (integer vector) 
- Wheeze
- Wheezing status (integer vector) 
- City
- City identifier (integer vector) 
- Age
- Child's age in years (integer vector) 
- Smoke
- Smoking exposure indicator (integer vector) 
Details
The dataset name has been kept as 'child_wheeze_pollution_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the geessbin package version 1.0.0
Children Respiratory Rates Data
Description
This dataset, children_respiratory_rates_df, is a data frame containing respiratory rate measurements from 618 Italian children aged between 15 days and 3 years, collected to establish normal respiratory rate distributions for clinical assessment.
Usage
data(children_respiratory_rates_df)
Format
A data frame with 618 observations and 2 variables:
- Age
- Child's age in days (numeric vector) 
- Rate
- Respiratory rate in breaths per minute (integer vector) 
Details
The dataset name has been kept as 'children_respiratory_rates_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the Sleuth3 package version 1.0-6
Lung cancer in 4 Danish cities 1968-71
Description
This dataset, danish_lung_incidence_df, is a data frame containing counts of incident lung cancer cases and population size in four neighbouring Danish cities by age group from 1968 to 1971.
Usage
data(danish_lung_incidence_df)
Format
A data frame with 24 observations and 4 variables:
- city
- City of observation (factor with 4 levels) 
- age
- Age group (factor with 6 levels) 
- pop
- Population size (integer) 
- cases
- Number of incident lung cancer cases (integer) 
Details
The dataset name has been kept as 'danish_lung_incidence_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.
Source
Data taken from the ISwR package version 2.0-10
UK lung and nasal cancer deaths 1936–80
Description
This dataset, engwales_cancer_mortality_df, is a data frame containing England and Wales mortality rates from lung cancer, nasal cancer, and all causes between 1936 and 1980. The 1936 rates are repeated as 1931 rates in order to accommodate follow-up for the nickel study.
Usage
data(engwales_cancer_mortality_df)
Format
A data frame with 150 observations and 5 variables:
- year
- Year of observation (numeric) 
- age
- Age group (numeric) 
- lung
- Lung cancer mortality rate (numeric) 
- nasal
- Nasal cancer mortality rate (numeric) 
- other
- Mortality rate from all other causes (numeric) 
Details
The dataset name has been kept as 'engwales_cancer_mortality_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.
Source
Data taken from the ISwR package version 2.0-10
US 1975-76 Influenza-Like Illness Data
Description
This dataset, influenza_us_1975_df, is a data frame containing influenza-like illness (ILI) data for the lower 48 US states and District of Columbia during the 1975-76 season, which was dominated by the A H3N2 Victoria strain.
Usage
data(influenza_us_1975_df)
Format
A data frame with 49 observations (states + DC) and 7 variables:
- State
- State identifier (integer) 
- Acronym
- State abbreviation (factor with 51 levels) 
- Pop
- State population (integer) 
- Latitude
- Geographic latitude (numeric) 
- Longitude
- Geographic longitude (numeric) 
- Start
- Week of season start (integer) 
- Peak
- Week of peak activity (integer) 
Details
The dataset name has been kept as 'influenza_us_1975_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the epimdr package version 0.6-5
Lung Cancer Survival Data
Description
This dataset, lung_cancer_survival_df, is a data frame containing survival information for 228 lung cancer patients, with 10 clinical variables including survival time, patient status, age, gender, performance scores, and nutritional indicators.
Usage
data(lung_cancer_survival_df)
Format
A data frame with 228 observations (patients) and 10 variables:
- inst
- Institution code where patient was treated (numeric) 
- time
- Survival time in days from diagnosis (numeric) 
- status
- Censoring status (1 = censored, 2 = died) (numeric) 
- age
- Patient age at diagnosis in years (numeric) 
- sex
- Gender (1 = male, 2 = female) (numeric) 
- ph.ecog
- ECOG performance score (0=asymptomatic to 4=fully disabled) (numeric) 
- ph.karno
- Karnofsky performance score (0-100) as rated by physician (numeric) 
- pat.karno
- Karnofsky performance score (0-100) as self-reported by patient (numeric) 
- meal.cal
- Daily calories consumed at meals (numeric) 
- wt.loss
- Weight loss in last six months (pounds) (numeric) 
Details
The dataset name has been kept as 'lung_cancer_survival_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the acro package version 0.1.4
Incidental or Screen-Detected Lung Nodules
Description
This dataset, lung_nodules_detection_dt, is a data table containing clinical and radiological characteristics of 999 pulmonary nodules (up to 15mm in size) detected on routine chest CT scans from 3 UK academic centers.
Usage
data(lung_nodules_detection_dt)
Format
A data table with 999 observations and 8 variables:
- sex
- Patient sex (factor with 2 levels) 
- age
- Patient age in years (numeric) 
- num.annotated
- Number of annotated nodules (numeric) 
- location
- Nodule location (factor with 6 levels) 
- spiculate
- Spiculation status (factor with 2 levels) 
- smoke.status
- Smoking history (factor with 5 levels) 
- diameter
- Nodule diameter in mm (numeric) 
- malignant
- Malignancy status (0=benign, 1=malignant) (numeric) 
Details
The dataset name has been kept as 'lung_nodules_detection_dt' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'dt' indicates that this is a data table object. The original content has not been modified in any way.
Source
Data taken from the R4HCR package version 0.1
Male Lung Cancer by Smoking Duration
Description
This dataset, lungca_cancer_deaths_df, is a data frame containing data on man-years of smoking risk and observed lung cancer deaths among male smokers. It includes 63 observations across 4 variables measuring smoking exposure and mortality outcomes.
Usage
data(lungca_cancer_deaths_df)
Format
A data frame with 63 observations and 4 variables:
- yrs_smk
- Years of smoking (factor with 9 levels) 
- pys
- Person-years of smoking exposure (numeric) 
- num_cigs
- Number of cigarettes smoked daily (factor with 7 levels) 
- deaths
- Number of lung cancer deaths (numeric) 
Details
The dataset name has been kept as 'lungca_cancer_deaths_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the R4HCR package version 0.1
Neonatal Intubation Simulation
Description
This dataset, neonatal_intubation_times_df, is a data frame containing execution times (in seconds) for specific actions performed by 37 midwife students during a high-fidelity neonatal resuscitation simulation. The simulation was video recorded, and each critical action in the intubation process was tagged for timing analysis.
Usage
data(neonatal_intubation_times_df)
Format
A data frame with 37 observations and 7 variables:
- id
- Participant ID (integer) 
- deci_intub
- Time to decision to intubate (seconds) (integer) 
- stop_ventil
- Time to stop ventilation (seconds) (integer) 
- blade_in
- Time to insert laryngoscope blade (seconds) (integer) 
- insert_tube
- Time to insert endotracheal tube (seconds) (integer) 
- blade_out
- Time to remove laryngoscope blade (seconds) (integer) 
- restart_ventil
- Time to restart ventilation (seconds) (integer) 
Details
The dataset name has been kept as 'neonatal_intubation_times_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.
Source
Data taken from the ViSiElse package version 1.2.2
Nicotine Gum and Smoking Cessation
Description
This dataset, nicotine_gum_df, is a data frame containing meta-analysis data on the effectiveness of nicotine gum for smoking cessation across 26 studies.
Usage
data(nicotine_gum_df)
Format
A data frame with 26 observations (studies) and 4 variables:
- qt
- Number of successful quitters in treatment group (integer) 
- tt
- Total participants in treatment group (integer) 
- qc
- Number of successful quitters in control group (integer) 
- tc
- Total participants in control group (integer) 
Details
The dataset name has been kept as 'nicotine_gum_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the HSAUR3 package version 1.0-15
Ohio Children Wheeze Status
Description
This dataset, ohio_children_wheeze_df, is a data frame containing wheeze status data from 2148 observations of children in Ohio. The data are part of a subset from the Six-City Study, a longitudinal study examining the health effects of air pollution on children.
Usage
data(ohio_children_wheeze_df)
Format
A data frame with 2148 observations and 4 variables:
- resp
- Wheeze status (0 = no wheeze, 1 = wheeze) (integer) 
- id
- Child identifier (integer) 
- age
- Age of the child in years (integer) 
- smoke
- Parental smoking status (0 = no, 1 = yes) (integer) 
Details
The dataset name has been kept as 'ohio_children_wheeze_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.
Source
Data taken from the geepack package version 1.3.12
Lung Disease Patients
Description
This dataset, patients_lung_diseases_tbl_df, is a tibble containing detailed clinical information about 5,200 patients with various lung conditions, including demographics, smoking status, lung capacity measurements, disease types, treatments received, hospital visits, and recovery status.
Usage
data(patients_lung_diseases_tbl_df)
Format
A tibble with 5,200 observations and 8 variables:
- Age
- Patient age in years (numeric) 
- Gender
- Patient gender (character) 
- Smoking Status
- Smoker or non-smoker status (character) 
- Lung Capacity
- Measured lung function (numeric) 
- Disease Type
- Specific lung condition (character) 
- Treatment Type
- Therapy, medication or surgery received (character) 
- Hospital Visits
- Number of hospital visits (numeric) 
- Recovered
- Recovery status (character) 
Details
The dataset name has been kept as 'patients_lung_diseases_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'tbl_df' indicates that the dataset is a tibble object. The original content has not been modified in any way.
Source
Data taken from Kaggle: https://www.kaggle.com/datasets/samikshadalvi/lungs-diseases-dataset
Monthly Pneumonia and Influenza Deaths in the U.S.
Description
This dataset, pneumonia_influenza_ts, is a time series containing monthly rates of pneumonia and influenza deaths in the United States from 1968 to 1978.
Usage
data(pneumonia_influenza_ts)
Format
A time series with 132 monthly observations from January 1968 to December 1978:
- Value
- Mortality rate (numeric vector) 
- Time
- Monthly index from 1968 to 1978 (time series vector) 
Details
The dataset name has been kept as 'pneumonia_influenza_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'ts' indicates that the dataset is a time series. The original content has not been modified in any way.
Source
Data taken from the astsa package version 2.2
Respiratory Clinical Trial
Description
This dataset, respiratory_clinical_trial_df, is a data frame containing information from a clinical trial of patients with respiratory illness, where 111 patients from two different clinics were randomized to receive either placebo or an active treatment. Patients were examined at baseline and at four visits during treatment. The respiratory status was determined at each visit, with 1 representing good status and 0 representing poor status.
Usage
data(respiratory_clinical_trial_df)
Format
A data frame with 444 observations and 8 variables:
- center
- Study identifier (integer vector) 
- id
- Patient identifier (integer vector) 
- treat
- Treatment group (factor with 2 levels) 
- sex
- Patient sex (factor with 2 levels) 
- age
- Patient age in years (integer vector) 
- baseline
- Baseline respiratory status (integer vector) 
- visit
- Visit number (integer vector) 
- outcome
- Respiratory status (integer vector) 
Details
The dataset name has been kept as 'respiratory_clinical_trial_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the geepack package version 1.3.12
Azithromycin for Respiratory Infections
Description
This dataset, respiratory_infections_df, is a data frame containing results from 15 clinical trials comparing the effectiveness of azithromycin versus amoxycillin or amoxycillin/clavulanic acid (amoxyclav) in the treatment of acute lower respiratory tract infections.
Usage
data(respiratory_infections_df)
Format
A data frame with 15 observations and 11 variables:
- author
- Study author(s) (character vector) 
- year
- Year of publication (integer vector) 
- ai
- Number of successful treatments in azithromycin group (integer vector) 
- n1i
- Total number of participants in azithromycin group (integer vector) 
- ci
- Number of successful treatments in control group (integer vector) 
- n2i
- Total number of participants in control group (integer vector) 
- age
- Patient age characteristics (character vector) 
- diag.ab
- Number diagnosed with acute bronchitis (integer vector) 
- diag.cb
- Number diagnosed with chronic bronchitis (integer vector) 
- diag.pn
- Number diagnosed with pneumonia (integer vector) 
- ctrl
- Type of control treatment (character vector) 
Details
The dataset name has been kept as 'respiratory_infections_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the metadat package version 1.4-0
Respiratory Illness Clinical Trial
Description
This dataset, respiratory_trial_df, is a data frame containing the respiratory status of patients recruited for a randomized clinical multicenter trial, with 555 observations across 111 subjects.
Usage
data(respiratory_trial_df)
Format
A data frame with 555 observations and 7 variables:
- centre
- Study center (factor with 2 levels) 
- treatment
- Treatment group (factor with 2 levels) 
- gender
- Patient gender (factor with 2 levels) 
- age
- Patient age in years (numeric) 
- status
- Respiratory status (factor with 2 levels) 
- month
- Follow-up month (ordered factor with 5 levels) 
- subject
- Patient identifier (factor with 111 levels) 
Details
The dataset name has been kept as 'respiratory_trial_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the HSAUR3 package version 1.0-15
Ordinal respiratory outcomes
Description
This dataset, respiratory_trial_outcomes_df, is a data frame containing outcome data from a randomized clinical trial described in Miller et al. (1993) evaluating a new treatment for respiratory disorder. The study includes 111 patients who were randomly assigned to one of two treatments (active or placebo). The patients were followed up at four visits, and their response status was classified on an ordinal scale at each visit.
Usage
data(respiratory_trial_outcomes_df)
Format
A data frame with 111 observations and 5 variables:
- y1
- Ordinal response at visit 1 (integer) 
- y2
- Ordinal response at visit 2 (integer) 
- y3
- Ordinal response at visit 3 (integer) 
- y4
- Ordinal response at visit 4 (integer) 
- trt
- Treatment group (0 = placebo, 1 = active) (integer) 
Details
The dataset name has been kept as 'respiratory_trial_outcomes_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.
Source
Data taken from the geepack package version 1.3.12
UK Smoking Habits
Description
This dataset, smoking_UK_tbl_df, is a tibble containing survey data on smoking habits from the UK, with demographic characteristics and tobacco consumption patterns from 1,691 respondents.
Usage
data(smoking_UK_tbl_df)
Format
A tibble with 1,691 observations and 12 variables:
- gender
- Gender of respondent (factor with 2 levels) 
- age
- Age in years (integer) 
- marital_status
- Marital status (factor with 5 levels) 
- highest_qualification
- Highest education qualification (factor with 8 levels) 
- nationality
- Nationality (factor with 8 levels) 
- ethnicity
- Ethnic group (factor with 7 levels) 
- gross_income
- Income bracket (factor with 10 levels) 
- region
- UK region (factor with 7 levels) 
- smoke
- Smoking status (factor with 2 levels) 
- amt_weekends
- Cigarettes smoked on weekends (integer) 
- amt_weekdays
- Cigarettes smoked on weekdays (integer) 
- type
- Type of tobacco used (factor with 5 levels) 
Details
The dataset name has been kept as 'smoking_UK_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'tbl_df' indicates that this is a tibble data frame. The original content has not been modified in any way.
Source
Data taken from the openintro package version 2.5.0
Smoking Deaths Among Doctors (British)
Description
This dataset, smoking_doctors_df, is a data frame containing data from a study on smoking habits and coronary artery disease mortality among British doctors. It includes 10 observations across 5 variables representing person-years of observation and deaths during the study period.
Usage
data(smoking_doctors_df)
Format
A data frame with 10 observations and 5 variables:
- age
- Age group (factor with 5 levels) 
- smoke
- Smoking status (numeric) 
- n
- Number of person-years at risk (numeric) 
- y
- Number of deaths from coronary artery disease (numeric) 
- ns
- Standardized mortality ratio (numeric) 
Details
The dataset name has been kept as 'smoking_doctors_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the boot package version 1.3-31
Smoking and Lung Cancer
Description
This dataset, smoking_lung_cancer_df, is a data frame containing data from a retrospective case-control study comparing smoking status between 86 lung cancer patients and 86 controls.
Usage
data(smoking_lung_cancer_df)
Format
A data frame with 2 observations and 3 variables:
- Smoking
- Smoking status (factor with 2 levels: "NonSmokers", "Smokers") 
- Cancer
- Number of lung cancer cases (integer vector) 
- Control
- Number of control cases (integer vector) 
Details
The dataset name has been kept as 'smoking_lung_cancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the Sleuth3 package version 1.0-6
Youth Smoking and Lung Function
Description
This dataset, smoking_youth_tbl_df, is a tibble containing data from the Childhood Respiratory Disease Study collected in the late 1970s, examining the effects of smoking and second-hand smoke exposure on pulmonary function in 654 youths.
Usage
data(smoking_youth_tbl_df)
Format
A tibble with 654 observations and 5 variables:
- age
- Age in years (integer) 
- FEV
- Forced Expiratory Volume in liters (numeric) 
- height
- Height in centimeters (numeric) 
- sex
- Sex of participant (character) 
- smoker
- Smoking status (character) 
Details
The dataset name has been kept as 'smoking_youth_tbl_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'tbl_df' indicates that this is a tibble data frame. The original content has not been modified in any way.
Source
Data taken from the LSTbook package version 0.6
Total Lung Capacity
Description
This dataset, tlc_lung_capacity_df, is a data frame containing data on pretransplant total lung capacity (TLC) measured by whole-body plethysmography for recipients of heart-lung transplants.
Usage
data(tlc_lung_capacity_df)
Format
A data frame with 32 observations and 4 variables:
- age
- Age in years (integer) 
- sex
- Sex (0 = female, 1 = male) (integer) 
- height
- Height in centimeters (integer) 
- tlc
- Total lung capacity in liters (numeric) 
Details
The dataset name has been kept as 'tlc_lung_capacity_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.
Source
Data taken from the ISwR package version 2.0-10
BCG Vaccine Against Tuberculosis
Description
This dataset, tuberculosis_vaccine_df, is a data frame containing results from 13 clinical trials examining the effectiveness of the Bacillus Calmette-Guerin (BCG) vaccine against tuberculosis.
Usage
data(tuberculosis_vaccine_df)
Format
A data frame with 13 observations and 9 variables:
- trial
- Trial identifier number (integer vector) 
- author
- Study author(s) (character vector) 
- year
- Year of publication (integer vector) 
- tpos
- Number of TB positive cases in vaccinated group (integer vector) 
- tneg
- Number of TB negative cases in vaccinated group (integer vector) 
- cpos
- Number of TB positive cases in control group (integer vector) 
- cneg
- Number of TB negative cases in control group (integer vector) 
- ablat
- Absolute latitude of study location (integer vector) 
- alloc
- Method of treatment allocation (character vector) 
Details
The dataset name has been kept as 'tuberculosis_vaccine_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package and assists users in identifying its specific characteristics. The suffix 'df' indicates that the dataset is a data frame. The original content has not been modified in any way.
Source
Data taken from the metadat package version 1.4-0
Veterans Administration Lung Cancer Study
Description
This dataset, veterans_lung_cancer_df, is a data frame containing information from a randomized trial of two treatment regimens for lung cancer. This is a standard survival analysis data set.
Usage
data(veterans_lung_cancer_df)
Format
A data frame with 137 observations and 8 variables:
- trt
- Treatment group (numeric) 
- celltype
- Cell type (factor with 4 levels) 
- time
- Survival time in days (numeric) 
- status
- Censoring status (numeric) 
- karno
- Karnofsky performance score (numeric) 
- diagtime
- Time from diagnosis to randomization (numeric) 
- age
- Age in years (numeric) 
- prior
- Number of prior therapies (numeric) 
Details
The dataset name has been kept as 'veterans_lung_cancer_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a data frame object. The original content has not been modified in any way.
Source
Data taken from the survival package version 3.8-3
View Available Datasets in PulmoDataSets
Description
This function lists all datasets available in the 'PulmoDataSets' package. If the 'PulmoDataSets' package is not loaded, it stops and shows an error message. If no datasets are available, it returns a message and an empty vector.
Usage
view_datasets_PulmoDataSets()
Value
A character vector with the names of the available datasets. If no datasets are found, it returns an empty character vector.
Examples
if (requireNamespace("PulmoDataSets", quietly = TRUE)) {
  library(PulmoDataSets)
  view_datasets_PulmoDataSets()
}
Copenhagen Whooping Cough 1900-1937
Description
This dataset, whooping_cough_dk_df, is a data frame containing weekly incidence data of whooping cough in Copenhagen, Denmark between January 1900 and December 1937. It includes 1,982 weekly observations across 8 demographic and epidemiological variables.
Usage
data(whooping_cough_dk_df)
Format
A data frame with 1,982 weekly observations and 8 variables:
- date
- Date of observation (factor) 
- births
- Number of births (integer) 
- day
- Day of month (integer) 
- month
- Month (integer 1-12) 
- year
- Year (integer 1900-1937) 
- cases
- Number of whooping cough cases (integer) 
- deaths
- Number of whooping cough deaths (integer) 
- popsize
- Population size (numeric) 
Details
The dataset name has been kept as 'whooping_cough_dk_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the epimdr package version 0.6-5
Philadelphia Whooping Cough 1925-1947
Description
This dataset, whooping_cough_phila_df, is a data frame containing weekly incidence data of whooping cough in Philadelphia between 1925 and 1947, with 1,200 weekly observations across 5 variables.
Usage
data(whooping_cough_phila_df)
Format
A data frame with 1,200 weekly observations and 5 variables:
- YEAR
- Year of observation (integer) 
- WEEK
- Week number (integer) 
- PHILADELPHIA
- Weekly incidence count of whooping cough cases (integer) 
- TIME
- Time index (numeric) 
- TM
- Time marker (integer) 
Details
The dataset name has been kept as 'whooping_cough_phila_df' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'df' indicates that this is a standard data frame. The original content has not been modified in any way.
Source
Data taken from the epimdr package version 0.6-5
Whooping Cough Deaths in London (1740-1881)
Description
This dataset, whooping_cough_ts, is a time series object containing annual counts of deaths from whooping cough in London from 1740 to 1881, with three measurement variables recorded each year.
Usage
data(whooping_cough_ts)
Format
A multivariate time series with 142 annual observations from 1740 to 1881 and 3 variables:
- wcough
- Number of whooping cough deaths (integer) 
- ratio
- Death ratio (numeric) 
- alldeaths
- Total deaths from all causes (integer) 
Details
The dataset name has been kept as 'whooping_cough_ts' to avoid confusion with other datasets in the R ecosystem. This naming convention helps distinguish this dataset as part of the PulmoDataSets package. The suffix 'ts' indicates that this is a time series object. The original content has not been modified in any way.
Source
Data taken from the DAAG package version 1.25.6