Title: | Data, Functions and Support Materials from the Book "industRial Data Science" |
---|---|
Description: | Companion package to the book "industRial data science", J.Ramalho (2021) <https://j-ramalho.github.io/industRial/>. Provides data sets and functions to complete the case studies and contains the book original Rmd files and tutorials. |
Authors: | Joao Ramalho [aut, cre] |
Maintainer: | Joao Ramalho <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.1.0 |
Built: | 2025-03-04 04:04:13 UTC |
Source: | https://github.com/j-ramalho/industrial |
A data set with charging time in hours required to recharge a lithium-ion battery based on a full factorial design of experiment with four variables (A, B, C, D) coded as +/- 1. Design effects are coded as numerical variables in order to allow to build models without coding the contrasts and then to make predictions on a continuous range from -1 to +1.
Variable A (numerical)
Variable B (numerical)
Variable B (numerical)
Variable B (numerical)
The independent repeat of each unique factor combination.
Battery charging time [h]
battery_charging
battery_charging
A tibble with 32 observations on 6 variables.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
data(battery_charging) head(battery_charging) # Building a linear model: battery_lm <- lm( formula = charging_time ~ A * B * C, data = battery_charging ) summary(battery_lm)
data(battery_charging) head(battery_charging) # Building a linear model: battery_lm <- lm( formula = charging_time ~ A * B * C, data = battery_charging ) summary(battery_lm)
Generate a histogram type chart from a set of consecutive measurements.
chart_Cpk(data)
chart_Cpk(data)
data |
A dataset generated by the function |
This type of chart is typically applied in product manufacturing to monitor
deviations from the target value over time. It is usually accompanied by
the statistical process control time series chart_I
and
chart_IMR
This function returns an object of class ggplot
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Generate a single point time series chart from a set of consecutive measurements.
chart_I(data)
chart_I(data)
data |
A dataset generated by the function |
This type of chart is typically applied in product manufacturing to monitor
deviations from the target value over time. It is usually accompanied by
the chart_IMR
This function returns an object of class ggplot
For a complete case study application refer to https://j-ramalho.github.io/industRial/
Generate a moving range chart chart from a set of consecutive measurements.
chart_IMR(data)
chart_IMR(data)
data |
A dataset generated by the function |
This type of chart is typically applied in product manufacturing to monitor
deviations from the target value over time. It is usually accompanied by
the chart_IMR
This function returns an object of class ggplot
For a complete case study application refer to https://j-ramalho.github.io/industRial/
This data set contains observations of visual defects present
in watch dials such as indentations and scratches taken during production.
It provides a practical case to establish pareto charts typically with a
function like paretochart
.
The shop floor operator collecting the data
Data collection date
Defect type ("Indent", "Scratch")
Position on the watch dial refered to as the hour (1h, 2h)
Part unique id number
dial_control
dial_control
An object of class tibble with 58 observations on 4 variables.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
head(dial_control)
head(dial_control)
A data set with the results of aging tests on several groups of ebikes frames (g1, g2, ...). Each entry corresponds to the number of cycles to failure for each level of treatment temperature-
Position of the part on the device
group 1, remaining groups have names g2 to g5
ebike_hardening
ebike_hardening
A tibble with 4 observations on 6 variables.
The ebike_hardening2 dataset contains alternative data that gives non significant results in the analysis of variance study.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
data(ebike_hardening)
data(ebike_hardening)
Takes a linear model formula and returns it expanded version.
expand_formula(formulae)
expand_formula(formulae)
formulae |
Takes as input object of class formula, e.g.: Y ~ A * B, see ?formula for syntax details |
Supports verification and understanding of the creation of linear models syntax such as *,+ and other conventions.
Returns a character vector such as A + B + A:B
For an example application refer to https://j-ramalho.github.io/industRial/
This package contains datasets and toy functions to run the examples from the book "industRial data science". It also contains all the book original Rmd files and the learnr Rmd original tutorial files.
João Ramalho
For complete case studies refer to https://j-ramalho.github.io/industRial/
This data set contains laboratory measurements of the dry matter content of different fruit juices obtained with two different measurement devices. One of the devices is considered the reference (REF) and the other one is a new device (DRX) on which a linearity and bias study has to be performed.
The juice base fruit ("Apple", "Beetroot")
Target drymatter content in [g]
Production line speed
Dry matter powder particle size [micrometers]
Part number
Drymatter content measured with device DRX
Drymatter content measured with reference device
juice_drymatter
juice_drymatter
An object of class tibble with 108 observations on 7 variables.
Adapted from a real gage bias and linearity study performed in 2021 on industrial beverages dry matter content measurement. The structure of the data corresponds to a full factorial design of 5 factors (3 with 3 levels and 2 with 2 levels).
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
library(dplyr) # Calculate the bias between the new device and the reference: juice_drymatter <- juice_drymatter %>% dplyr::mutate(bias = drymatter_DRX - drymatter_REF) # Establish the analysis of variance: juice_drymatter_aov <- aov( bias ~ drymatter_TGT * speed * particle_size, data = juice_drymatter) summary(juice_drymatter_aov)
library(dplyr) # Calculate the bias between the new device and the reference: juice_drymatter <- juice_drymatter %>% dplyr::mutate(bias = drymatter_DRX - drymatter_REF) # Establish the analysis of variance: juice_drymatter_aov <- aov( bias ~ drymatter_TGT * speed * particle_size, data = juice_drymatter) summary(juice_drymatter_aov)
This function takes process variables and calculates the probability that parts are produced out of specification on the long run.
off_spec(UCL, LCL, mean, sd)
off_spec(UCL, LCL, mean, sd)
UCL |
the process upper control limit |
LCL |
the process lower control limit |
mean |
the process mean |
sd |
the process standard deviation |
This function returns an object of class numeric
For a complete case study application refer to https://j-ramalho.github.io/industRial/
off_spec(100, 0, 10, 3)
off_spec(100, 0, 10, 3)
The data set contains the expected correlation (expressed in 1 to 10) of an experiment anonymized input variables. The dataset consists in a double entry table with the same variables in row and column. It is coded as a tibble but subsequent utilization in network plots requires it to be converted to a matrix format.
perfume_experiment
perfume_experiment
A tibble with 22 observations on 23 variables.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
data(perfume_experiment)
data(perfume_experiment)
Measurements of tensile strength of two different deliveries of PET raw material used in the clothing industry. The two data sets follow approximately a normal distribution.
Tensile strenght measurements for product A [Mpa] (numeric)
Tensile strenght measurements for product B [Mpa] (numeric)
pet_delivery
pet_delivery
An object of class tibble with 28 observations on 2 variables.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
data(pet_delivery)
data(pet_delivery)
The data corresponds to full factorial design with two factors coded as +/- and 3 replicates for each combination.
PET formulation A (factor)
PET formulation B (factor)
the measurement replicate I to III (factor)
the output variable measured on the PET, (numerical)
pet_doe
pet_doe
An object of classes design and data.frame with 12 observations of 4 variables.
Original data set generated with the function
fac.design
form the package DoE.base.
For a complete case study application refer to https://j-ramalho.github.io/industRial/
data(pet_doe) contrasts(pet_doe$A)
data(pet_doe) contrasts(pet_doe$A)
This function takes process variables and calculates the Cpk index which is a measure of the process centering and variability against specification.
process_Cpk(UCL, LCL, mean, sd)
process_Cpk(UCL, LCL, mean, sd)
UCL |
the process upper control limit |
LCL |
the process lower control limit |
mean |
the process mean |
sd |
the process standard deviation |
This function returns an object of class numeric
For a complete case study application refer to https://j-ramalho.github.io/industRial/
process_Cpk(100, 0, 10, 3)
process_Cpk(100, 0, 10, 3)
This function takes process variables and calculates summary statistics and presents them in a easy readable table format.
process_stats(data, part_spec_percent)
process_stats(data, part_spec_percent)
data |
This function takes the dataset tablet_thickness cleaned with the clean_names function from the janitor package |
part_spec_percent |
the process tolerance in percentage. |
This function returns an object with class tibble (tbl_df)
For a complete case study application refer to https://j-ramalho.github.io/industRial/
This function takes summary statistics and presents them in a easy readable table format.
process_stats_table(data)
process_stats_table(data)
data |
A data set generated by the function |
This function returns an object with classes gt_tbl and list
For a complete case study application refer to https://j-ramalho.github.io/industRial/
A dataset with the energy output resulting from tests on solarcells made of three different configurations. The fill factor provides an indication of the cell quality and is a non controlled variable that can be taken into consideration in an analysis of covariance to better assess the output variation from material to material.
The solar cell material (character)
he yearly energy output (numberic)
The fill factor measured for each cell (numberic)
solarcell_fill
solarcell_fill
A tibble with 15 observations of 3 variables.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
hist(solarcell_fill$output)
hist(solarcell_fill$output)
A dataset with the energy output resulting from tests on solarcells made of three different raw materials / configurations.
The solar cell type (character)
The test run (numberic)
The yearly output for the test result at temperature of 10°C
The yearly output for the test result at temperature of 20°C
The yearly output for the test result at temperature of 50°C
solarcell_output
solarcell_output
A tibble with 12 observations of 5 variables.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
data(solarcell_output)
data(solarcell_output)
Extracts stand alone plots from the ss.rr function of the SixSigma package.
ss.rr.plots( var, part, appr, lsl = NA, usl = NA, sigma = 6, data, main = "Six Sigma Gage R&R Study", sub = "", alphaLim = 0.05, errorTerm = "interaction", digits = 4 )
ss.rr.plots( var, part, appr, lsl = NA, usl = NA, sigma = 6, data, main = "Six Sigma Gage R&R Study", sub = "", alphaLim = 0.05, errorTerm = "interaction", digits = 4 )
var |
Measured variable |
part |
Factor for parts |
appr |
Factor for appraisers (operators, machines, ...) |
lsl |
Numeric value of lower specification limit used with USL to calculate Study Variation as %Tolerance |
usl |
Numeric value of upper specification limit used with LSL to calculate Study Variation as %Tolerance |
sigma |
Numeric value for number of std deviations to use in calculating Study Variation |
data |
Data frame containing the variables |
main |
Main title for the graphic output |
sub |
Subtitle for the graphic output (recommended the name of the project) |
alphaLim |
Limit to take into account interaction |
errorTerm |
Which term of the model should be used as error term (for the model with interation) |
digits |
Number of decimal digits for output |
This is a modified version of the function ss.rr
from the SixSigma package that allows to extract the individual plots from
the output report. The input arguments of the function are the same
as the original function. See the original function help with ?ss.rr for
full documentation.
Generates a list output that can be assigned to a user created variable. The plots can then be accessed with the syntax variable$plot1 to plot6.
For an example application refer to https://j-ramalho.github.io/industRial/
This dataset contains process control measurements of the barrel diameters of pharmaceutical syringes. The sampling rate is hourly and the sample size is 6 syringes.
The sampling hour expressed as Hour1, Hour2 (character)
Syringe diameter of sample 1 (numerical)
Syringe diameter of sample 2 (numerical)
syringe_diameter
syringe_diameter
A tibble with 25 observations on 7 variables.
Original data set.
For a complete case study application refer to https://j-ramalho.github.io/industRial/.
data(syringe_diameter)
data(syringe_diameter)
This data set contains physical measurements of pharmaceutical tablets (pills) including measurement room conditions. The data and the insights it provides are typical of an industrial context with high production throughput and stringent dimensional requirements.
tablet_thickness
tablet_thickness
An object of class tibble with 675 observations on 11 variables
The data set contains other variables not used in the text book related with to the measurement room conditions (not listed).
Position of the part on the measurement device
Size class (L, M, S)
Part number (L001, L002, ...)
Measurement replicate, a sequential numbers
Measurement Day, a sequential numbers
Measurement date (POSIXct)
Operator name (ficticious)
Tablet thickness (micrometers)
Room temperature
Based on a gage r&R (gage reproducibility and repeatability) study performed in 2020 on a physical measurement of parts coming out of a high throughput industrial equipment.
For a complete case study application refer to https://j-ramalho.github.io/industRial/
data(tablet_thickness)
data(tablet_thickness)
This data set contains weight measurements of pharmaceutical tablets (pills). The data and the #' insights it provides are typical of an industrial context with high production throughput and stringent dimensional requirements.
tablet_weight
tablet_weight
An object of class tibble with 137 observations on 3 variables
The data set contains other variables not used in the text book related with to the measurement room conditions (not listed).
Unique sequencial identifier given during production (numeric)
Tablet weight target specification value in [mg] (numeric
Tablet weight measured value [m] (numeric)
Anonymized data based on statistical process control data obtained in a high volume production setup.
For a complete case study application refer to https://j-ramalho.github.io/industRial/
hist(tablet_weight$`Weight value`)
hist(tablet_weight$`Weight value`)
This theme aims at optimal balance between readability and precision. It has adapted from the package cowplot by Claus O.Wilke and reflects the principles of his book Fundamentals of Data Visualization
theme_industRial( font_size = 14, font_family = "", line_size = 0.5, rel_small = 12/14, rel_tiny = 11/14, rel_large = 16/14, base_size = font_size, base_family = font_family )
theme_industRial( font_size = 14, font_family = "", line_size = 0.5, rel_small = 12/14, rel_tiny = 11/14, rel_large = 16/14, base_size = font_size, base_family = font_family )
font_size |
defaults to 14 |
font_family |
defaults to "" |
line_size |
defaults to 0.5 |
rel_small |
defaults to 12/14 |
rel_tiny |
defaults to 11/14 |
rel_large |
defaults to 16/14 |
base_size |
internal arguments, defaults to font_size |
base_family |
internal arguments, defaults to font_family |
Apply this theme by adding it at the end of the code of any ggplot
chart.
It basically combines the half open theme with a grid background from cowplot
This function returns an object of classes theme and gg from the ggplot2 package
For a complete case study application refer to https://j-ramalho.github.io/industRial/
library(dplyr) library(ggplot2) pet_delivery %>% ggplot(aes(x = A)) + geom_histogram(color = "grey", fill = "grey90") + labs(title = "PET clothing case study", subtitle = "Raw data plot", x = "Treatment", y = "Tensile strength [MPa]") + theme_industRial()
library(dplyr) library(ggplot2) pet_delivery %>% ggplot(aes(x = A)) + geom_histogram(color = "grey", fill = "grey90") + labs(title = "PET clothing case study", subtitle = "Raw data plot", x = "Treatment", y = "Tensile strength [MPa]") + theme_industRial()
This theme provides a similar look and feel to the package qcc
statistical process control charts (SPC) which have themselves a resemblance with
Minitab charts. This theme aims at providing a layout that is familiar to readers
of Minitab chart to help in reducing transition to R build reports and charts.
theme_qcc(base_size = 12, base_family = "")
theme_qcc(base_size = 12, base_family = "")
base_size |
font size, defaults to 12 |
base_family |
font family defaults to "" |
Apply this theme by adding it at the end of the code of any ggplot
chart.
It #' basically provides a grey background and some highlights to help reading key
process statistics such as the population mean.
This function returns an object of classes theme and gg from the ggplot2 package
For a complete case study application refer to https://j-ramalho.github.io/industRial/
library(dplyr) library(ggplot2) pet_delivery %>% ggplot(aes(x = A)) + geom_histogram(color = "grey", fill = "grey90") + labs(title = "PET clothing case study", subtitle = "Raw data plot", x = "Treatment", y = "Tensile strength [MPa]") + theme_qcc()
library(dplyr) library(ggplot2) pet_delivery %>% ggplot(aes(x = A)) + geom_histogram(color = "grey", fill = "grey90") + labs(title = "PET clothing case study", subtitle = "Raw data plot", x = "Treatment", y = "Tensile strength [MPa]") + theme_qcc()