Package 'industRial'

Title: Data, Functions and Support Materials from the Book "industRial Data Science"
Description: Companion package to the book "industRial data science", J.Ramalho (2021) <https://j-ramalho.github.io/industRial/>. Provides data sets and functions to complete the case studies and contains the book original Rmd files and tutorials.
Authors: Joao Ramalho [aut, cre]
Maintainer: Joao Ramalho <[email protected]>
License: GPL (>= 3)
Version: 0.1.0
Built: 2025-03-04 04:04:13 UTC
Source: https://github.com/j-ramalho/industrial

Help Index


Charging time of a lithium-ion battery.

Description

A data set with charging time in hours required to recharge a lithium-ion battery based on a full factorial design of experiment with four variables (A, B, C, D) coded as +/- 1. Design effects are coded as numerical variables in order to allow to build models without coding the contrasts and then to make predictions on a continuous range from -1 to +1.

A

Variable A (numerical)

B

Variable B (numerical)

C

Variable B (numerical)

D

Variable B (numerical)

Replicate

The independent repeat of each unique factor combination.

charging_time

Battery charging time [h]

Usage

battery_charging

Format

A tibble with 32 observations on 6 variables.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

data(battery_charging)
head(battery_charging)

# Building a linear model:
battery_lm <- lm(
    formula = charging_time ~ A * B * C, 
    data = battery_charging
)
summary(battery_lm)

Create a capability chart for statistical process control

Description

Generate a histogram type chart from a set of consecutive measurements.

Usage

chart_Cpk(data)

Arguments

data

A dataset generated by the function process_stats

Details

This type of chart is typically applied in product manufacturing to monitor deviations from the target value over time. It is usually accompanied by the statistical process control time series chart_I and chart_IMR

Value

This function returns an object of class ggplot

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/


Create IMR chart for statistical process control

Description

Generate a single point time series chart from a set of consecutive measurements.

Usage

chart_I(data)

Arguments

data

A dataset generated by the function process_stats

Details

This type of chart is typically applied in product manufacturing to monitor deviations from the target value over time. It is usually accompanied by the chart_IMR

Value

This function returns an object of class ggplot

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/


Create R MR chart for statistical process control

Description

Generate a moving range chart chart from a set of consecutive measurements.

Usage

chart_IMR(data)

Arguments

data

A dataset generated by the function process_stats

Details

This type of chart is typically applied in product manufacturing to monitor deviations from the target value over time. It is usually accompanied by the chart_IMR

Value

This function returns an object of class ggplot

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/


Collection of visual defects on watch dial production.

Description

This data set contains observations of visual defects present in watch dials such as indentations and scratches taken during production. It provides a practical case to establish pareto charts typically with a function like paretochart.

Operator

The shop floor operator collecting the data

Date

Data collection date

Defect

Defect type ("Indent", "Scratch")

Location

Position on the watch dial refered to as the hour (1h, 2h)

id

Part unique id number

Usage

dial_control

Format

An object of class tibble with 58 observations on 4 variables.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

head(dial_control)

Cycles to failure of ebikes frames after temperature treatment.

Description

A data set with the results of aging tests on several groups of ebikes frames (g1, g2, ...). Each entry corresponds to the number of cycles to failure for each level of treatment temperature-

temperature

Position of the part on the device

g1

group 1, remaining groups have names g2 to g5

Usage

ebike_hardening

Format

A tibble with 4 observations on 6 variables.

Details

The ebike_hardening2 dataset contains alternative data that gives non significant results in the analysis of variance study.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

data(ebike_hardening)

Formula expansion

Description

Takes a linear model formula and returns it expanded version.

Usage

expand_formula(formulae)

Arguments

formulae

Takes as input object of class formula, e.g.: Y ~ A * B, see ?formula for syntax details

Details

Supports verification and understanding of the creation of linear models syntax such as *,+ and other conventions.

Value

Returns a character vector such as A + B + A:B

References

For an example application refer to https://j-ramalho.github.io/industRial/


industRial: companion package to the book "industRial data science"

Description

This package contains datasets and toy functions to run the examples from the book "industRial data science". It also contains all the book original Rmd files and the learnr Rmd original tutorial files.

Author(s)

João Ramalho

References

For complete case studies refer to https://j-ramalho.github.io/industRial/


Dry matter content of different juices obtained with two different measurement devices.

Description

This data set contains laboratory measurements of the dry matter content of different fruit juices obtained with two different measurement devices. One of the devices is considered the reference (REF) and the other one is a new device (DRX) on which a linearity and bias study has to be performed.

product

The juice base fruit ("Apple", "Beetroot")

drymatter_TGT

Target drymatter content in [g]

speed

Production line speed

particle_size

Dry matter powder particle size [micrometers]

part

Part number

drymatter_DRX

Drymatter content measured with device DRX

drymatter_REF

Drymatter content measured with reference device

Usage

juice_drymatter

Format

An object of class tibble with 108 observations on 7 variables.

Source

Adapted from a real gage bias and linearity study performed in 2021 on industrial beverages dry matter content measurement. The structure of the data corresponds to a full factorial design of 5 factors (3 with 3 levels and 2 with 2 levels).

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

library(dplyr)
# Calculate the bias between the new device and the reference:
juice_drymatter <- juice_drymatter %>% dplyr::mutate(bias = drymatter_DRX - drymatter_REF)
# Establish the analysis of variance:
juice_drymatter_aov <- aov(
     bias ~ drymatter_TGT * speed * particle_size,
     data = juice_drymatter)
summary(juice_drymatter_aov)

Calculate percentage of out of specification for Statistical Process Control

Description

This function takes process variables and calculates the probability that parts are produced out of specification on the long run.

Usage

off_spec(UCL, LCL, mean, sd)

Arguments

UCL

the process upper control limit

LCL

the process lower control limit

mean

the process mean

sd

the process standard deviation

Value

This function returns an object of class numeric

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/

Examples

off_spec(100, 0, 10, 3)

Correlation matrix of the input variables of an experiment design in perfume formulation.

Description

The data set contains the expected correlation (expressed in 1 to 10) of an experiment anonymized input variables. The dataset consists in a double entry table with the same variables in row and column. It is coded as a tibble but subsequent utilization in network plots requires it to be converted to a matrix format.

Usage

perfume_experiment

Format

A tibble with 22 observations on 23 variables.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

data(perfume_experiment)

Tensile strength values on PET raw material for the clothing industry.

Description

Measurements of tensile strength of two different deliveries of PET raw material used in the clothing industry. The two data sets follow approximately a normal distribution.

A

Tensile strenght measurements for product A [Mpa] (numeric)

B

Tensile strenght measurements for product B [Mpa] (numeric)

Usage

pet_delivery

Format

An object of class tibble with 28 observations on 2 variables.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

data(pet_delivery)

A factorial design for the improvement of PET film tensile strength.

Description

The data corresponds to full factorial design with two factors coded as +/- and 3 replicates for each combination.

A

PET formulation A (factor)

B

PET formulation B (factor)

replicate

the measurement replicate I to III (factor)

yield

the output variable measured on the PET, (numerical)

Usage

pet_doe

Format

An object of classes design and data.frame with 12 observations of 4 variables.

Source

Original data set generated with the function fac.design form the package DoE.base.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/

Examples

data(pet_doe)
contrasts(pet_doe$A)

Calculate process capability index for Statistical Process Control

Description

This function takes process variables and calculates the Cpk index which is a measure of the process centering and variability against specification.

Usage

process_Cpk(UCL, LCL, mean, sd)

Arguments

UCL

the process upper control limit

LCL

the process lower control limit

mean

the process mean

sd

the process standard deviation

Value

This function returns an object of class numeric

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/

Examples

process_Cpk(100, 0, 10, 3)

Calculate summary statistics for Statistical Process Control

Description

This function takes process variables and calculates summary statistics and presents them in a easy readable table format.

Usage

process_stats(data, part_spec_percent)

Arguments

data

This function takes the dataset tablet_thickness cleaned with the clean_names function from the janitor package

part_spec_percent

the process tolerance in percentage.

Value

This function returns an object with class tibble (tbl_df)

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/


Summary statistics table outputs for Statistical Process Control

Description

This function takes summary statistics and presents them in a easy readable table format.

Usage

process_stats_table(data)

Arguments

data

A data set generated by the function process_stats

Value

This function returns an object with classes gt_tbl and list

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/


Yearly outputs and fills factor of solarcells of different types.

Description

A dataset with the energy output resulting from tests on solarcells made of three different configurations. The fill factor provides an indication of the cell quality and is a non controlled variable that can be taken into consideration in an analysis of covariance to better assess the output variation from material to material.

material

The solar cell material (character)

output

he yearly energy output (numberic)

fillfactor

The fill factor measured for each cell (numberic)

Usage

solarcell_fill

Format

A tibble with 15 observations of 3 variables.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

hist(solarcell_fill$output)

Yearly outputs of solarcells of different types.

Description

A dataset with the energy output resulting from tests on solarcells made of three different raw materials / configurations.

material

The solar cell type (character)

run

The test run (numberic)

T-10

The yearly output for the test result at temperature of 10°C

T20

The yearly output for the test result at temperature of 20°C

T50

The yearly output for the test result at temperature of 50°C

Usage

solarcell_output

Format

A tibble with 12 observations of 5 variables.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

data(solarcell_output)

Gage R & R plots

Description

Extracts stand alone plots from the ss.rr function of the SixSigma package.

Usage

ss.rr.plots(
  var,
  part,
  appr,
  lsl = NA,
  usl = NA,
  sigma = 6,
  data,
  main = "Six Sigma Gage R&R Study",
  sub = "",
  alphaLim = 0.05,
  errorTerm = "interaction",
  digits = 4
)

Arguments

var

Measured variable

part

Factor for parts

appr

Factor for appraisers (operators, machines, ...)

lsl

Numeric value of lower specification limit used with USL to calculate Study Variation as %Tolerance

usl

Numeric value of upper specification limit used with LSL to calculate Study Variation as %Tolerance

sigma

Numeric value for number of std deviations to use in calculating Study Variation

data

Data frame containing the variables

main

Main title for the graphic output

sub

Subtitle for the graphic output (recommended the name of the project)

alphaLim

Limit to take into account interaction

errorTerm

Which term of the model should be used as error term (for the model with interation)

digits

Number of decimal digits for output

Details

This is a modified version of the function ss.rr from the SixSigma package that allows to extract the individual plots from the output report. The input arguments of the function are the same as the original function. See the original function help with ?ss.rr for full documentation.

Value

Generates a list output that can be assigned to a user created variable. The plots can then be accessed with the syntax variable$plot1 to plot6.

References

For an example application refer to https://j-ramalho.github.io/industRial/


Production measurements of the inner diameter of syringes barrels.

Description

This dataset contains process control measurements of the barrel diameters of pharmaceutical syringes. The sampling rate is hourly and the sample size is 6 syringes.

Hour

The sampling hour expressed as Hour1, Hour2 (character)

Sample1

Syringe diameter of sample 1 (numerical)

Sample2

Syringe diameter of sample 2 (numerical)

Usage

syringe_diameter

Format

A tibble with 25 observations on 7 variables.

Source

Original data set.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/.

Examples

data(syringe_diameter)

Thickness measurements of pharmaceutical tablets

Description

This data set contains physical measurements of pharmaceutical tablets (pills) including measurement room conditions. The data and the insights it provides are typical of an industrial context with high production throughput and stringent dimensional requirements.

Usage

tablet_thickness

Format

An object of class tibble with 675 observations on 11 variables

Details

The data set contains other variables not used in the text book related with to the measurement room conditions (not listed).

Position

Position of the part on the measurement device

Size

Size class (L, M, S)

Tablet

Part number (L001, L002, ...)

Replicate

Measurement replicate, a sequential numbers

Day

Measurement Day, a sequential numbers

Date [DD.MM.YYYY]

Measurement date (POSIXct)

Operator

Operator name (ficticious)

Thickness [micron]

Tablet thickness (micrometers)

Temperature [°C]

Room temperature

Source

Based on a gage r&R (gage reproducibility and repeatability) study performed in 2020 on a physical measurement of parts coming out of a high throughput industrial equipment.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/

Examples

data(tablet_thickness)

Weight measurements of pharmaceutical tablets

Description

This data set contains weight measurements of pharmaceutical tablets (pills). The data and the #' insights it provides are typical of an industrial context with high production throughput and stringent dimensional requirements.

Usage

tablet_weight

Format

An object of class tibble with 137 observations on 3 variables

Details

The data set contains other variables not used in the text book related with to the measurement room conditions (not listed).

part_id

Unique sequencial identifier given during production (numeric)

Weight Target Value

Tablet weight target specification value in [mg] (numeric

Weight Value

Tablet weight measured value [m] (numeric)

Source

Anonymized data based on statistical process control data obtained in a high volume production setup.

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/

Examples

hist(tablet_weight$`Weight value`)

Custom theme "industRial" for the book industRial Data Science plots

Description

This theme aims at optimal balance between readability and precision. It has adapted from the package cowplot by Claus O.Wilke and reflects the principles of his book Fundamentals of Data Visualization

Usage

theme_industRial(
  font_size = 14,
  font_family = "",
  line_size = 0.5,
  rel_small = 12/14,
  rel_tiny = 11/14,
  rel_large = 16/14,
  base_size = font_size,
  base_family = font_family
)

Arguments

font_size

defaults to 14

font_family

defaults to ""

line_size

defaults to 0.5

rel_small

defaults to 12/14

rel_tiny

defaults to 11/14

rel_large

defaults to 16/14

base_size

internal arguments, defaults to font_size

base_family

internal arguments, defaults to font_family

Details

Apply this theme by adding it at the end of the code of any ggplot chart. It basically combines the half open theme with a grid background from cowplot

Value

This function returns an object of classes theme and gg from the ggplot2 package

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/

Examples

library(dplyr)
library(ggplot2)

pet_delivery %>% 
   ggplot(aes(x = A)) +
   geom_histogram(color = "grey", fill = "grey90") +
   labs(title = "PET clothing case study",
      subtitle = "Raw data plot",
      x = "Treatment",
      y = "Tensile strength [MPa]") +
      theme_industRial()

Custom theme "qcc" for the book industRial Data Science plots

Description

This theme provides a similar look and feel to the package qcc statistical process control charts (SPC) which have themselves a resemblance with Minitab charts. This theme aims at providing a layout that is familiar to readers of Minitab chart to help in reducing transition to R build reports and charts.

Usage

theme_qcc(base_size = 12, base_family = "")

Arguments

base_size

font size, defaults to 12

base_family

font family defaults to ""

Details

Apply this theme by adding it at the end of the code of any ggplot chart. It #' basically provides a grey background and some highlights to help reading key process statistics such as the population mean.

Value

This function returns an object of classes theme and gg from the ggplot2 package

References

For a complete case study application refer to https://j-ramalho.github.io/industRial/

Examples

library(dplyr)
library(ggplot2)

pet_delivery %>% 
   ggplot(aes(x = A)) +
   geom_histogram(color = "grey", fill = "grey90") +
   labs(title = "PET clothing case study",
      subtitle = "Raw data plot",
      x = "Treatment",
      y = "Tensile strength [MPa]") +
      theme_qcc()