Package 'vvfiller' reference manual

Title:	Fill Data Points
Description:	Provides numerous functions to fill data. These can be applied either to missing or skewed data. The functions are designed within the scope of Student Analytics.
Authors:	Tomer Iwan [aut, cre], Yaïr Jacob [ctb], VU Analytics [cph]
Maintainer:	Tomer Iwan <[email protected]>
License:	MIT + file LICENSE
Version:	0.6.7.9000
Built:	2025-03-11 05:01:25 UTC
Source:	https://github.com/vusaverse/vvfiller

Check if some missing values are present

Description

Check if some missing values are present, but not all are missing. returns a boolean. This check is done to save time for vectors where filling is not needed

Usage

check_some_missing(x)
check_some_missing(x)

Arguments

`x`	the vector to check

Value

TRUE or FALSE

Fill column with aggregate by group

Description

Calculate a summary statistic (mean, median, vvconverter::mode, min, max etc.) by group and use it to fill missing values in a column. Primarily for use in fill_with_agg_by_group().

Usage

fill_col_with_agg_by_group(df, group, col, statistic)
fill_col_with_agg_by_group(df, group, col, statistic)

Arguments

`df`	tibble to use
`group`	string or vector of strings: columns to group by
`col`	string: column to impute
`statistic`	function: summary statistic to use (mean, median, min etc.). For now requires a function with na.rm argument

Value

a filled vector

Fill with aggregate by group

Description

Function to calculate a summary statistic (mean, median, vvconverter::mode, min, max etc.) by group and use it to fill missing values. Note: this takes and produces a tibble rather than a vector.

Usage

fill_df_with_agg_by_group(
  df,
  group,
  columns,
  overwrite_col = FALSE,
  statistic = mean,
  fill_empty_group = FALSE
)
fill_df_with_agg_by_group(
  df,
  group,
  columns,
  overwrite_col = FALSE,
  statistic = mean,
  fill_empty_group = FALSE
)

Arguments

`df`	tibble to use
`group`	string or vector of strings: columns to group by
`columns`	string or vector of strings: columns to impute
`overwrite_col`	boolean: whether to overwrite column. If FALSE, a new column with suffix _imputed will be created
`statistic`	function: summary statistic to use (mean, median, min etc.). For now requires a function with na.rm argument
`fill_empty_group`	boolean: If TRUE, fills groups that only contain NA with summary statistic of entire column

Value

a tibble with filled column(s)

Fill missing

Description

wrapper function to do check and call all fill_vector functions

Usage

fill_missing(x, min_known_n = NULL, min_known_p = NULL, type)
fill_missing(x, min_known_n = NULL, min_known_p = NULL, type)

Arguments

`x`	The vector to fill
`min_known_n`	numeric value: the minimum number of not-missing values
`min_known_p`	numeric value between 0 and 1: the minimum fraction of not-missing values
`type`	the type of fill missing function to be called

Value

filled vector

Fill missing interval

Description

Fill all missing values for an interval observed in the vector

Usage

fill_missing_interval(x, min_known_n = NULL, min_known_p = NULL)
fill_missing_interval(x, min_known_n = NULL, min_known_p = NULL)

Arguments

`x`	The vector to fill
`min_known_n`	numeric value: the minimum number of not-missing values
`min_known_p`	numeric value between 0 and 1: the minimum fraction of not-missing values

Value

a filled vector

Examples

fill_missing_interval(c(NA, 1, 2, NA))
fill_missing_interval(c(NA, 10, 20, NA))
fill_missing_interval(c(NA, 1, 2, NA))
fill_missing_interval(c(NA, 10, 20, NA))

Fill missing last

Description

Fill all missing values in a vector with the last value if it is known.

Usage

fill_missing_last(x, min_known_n = NULL, min_known_p = NULL)
fill_missing_last(x, min_known_n = NULL, min_known_p = NULL)

Arguments

`x`	The vector to fill
`min_known_n`	numeric value: the minimum number of not-missing values
`min_known_p`	numeric value between 0 and 1: the minimum fraction of not-missing values

Value

a filled vector

Examples

fill_missing_last(c(1, 2, NA))
fill_missing_last(c(NA, 1, 2, NA))
fill_missing_last(c(1, 2, NA))
fill_missing_last(c(NA, 1, 2, NA))

Fill missing maximum

Description

Fill all missing values in a vector with the maximum value if it is known.

Usage

fill_missing_max(x, min_known_n = NULL, min_known_p = NULL)
fill_missing_max(x, min_known_n = NULL, min_known_p = NULL)

Arguments

`x`	The vector to fill
`min_known_n`	numeric value: the minimum number of not-missing values
`min_known_p`	numeric value between 0 and 1: the minimum fraction of not-missing values

Value

a filled vector

Examples

fill_missing_max(c(1, 2, NA))
fill_missing_max(c(NA, 1, 2, NA))
fill_missing_max(c(1, 2, NA))
fill_missing_max(c(NA, 1, 2, NA))

Fill missing minimum

Description

Fill all missing values in a vector with the minimum value if it is known.

Usage

fill_missing_min(x, min_known_n = NULL, min_known_p = NULL)
fill_missing_min(x, min_known_n = NULL, min_known_p = NULL)

Arguments

`x`	The vector to fill
`min_known_n`	numeric value: the minimum number of not-missing values
`min_known_p`	numeric value between 0 and 1: the minimum fraction of not-missing values

Value

a filled vector

Examples

fill_missing_min(c(1, 2, NA))
fill_missing_min(c(NA, 1, 2, NA))
fill_missing_min(c(1, 2, NA))
fill_missing_min(c(NA, 1, 2, NA))

Fill missing previous

Description

Fill all missing values in a vector with the previous value if it is known.

Usage

fill_missing_previous(x, min_known_n = NULL, min_known_p = NULL)
fill_missing_previous(x, min_known_n = NULL, min_known_p = NULL)

Arguments

`x`	The vector to fill
`min_known_n`	numeric value: the minimum number of not-missing values
`min_known_p`	numeric value between 0 and 1: the minimum fraction of not-missing values

Value

a filled vector

Examples

fill_missing_previous(c(1, 2, NA))
fill_missing_previous(c(NA, 1, 2, NA))
fill_missing_previous(c(1, 2, NA))
fill_missing_previous(c(NA, 1, 2, NA))

Fill missing rownumber

Description

Impute missing values of a count variable. Imputation is done by counting from the last known value. Example: c(NA,4,NA,NA) then becomes c(NA,4,NA,NA).

Usage

fill_missing_rownumber(x)
fill_missing_rownumber(x)

Arguments

`x`	Integer vector.

Value

Integer vector with filled values.

Examples

fill_missing_rownumber(c(NA,4,NA,NA))
fill_missing_rownumber(c(NA,4,NA,NA))

Fill missing strict

Description

Fill all missing values in a vector with the same value if it is known. Only fills the value when all known values are the same

Usage

fill_missing_strict(x, min_known_n = NULL, min_known_p = NULL)
fill_missing_strict(x, min_known_n = NULL, min_known_p = NULL)

Arguments

`x`	The vector to fill
`min_known_n`	numeric value: the minimum number of not-missing values
`min_known_p`	numeric value between 0 and 1: the minimum fraction of not-missing values

Value

a filled vector

Examples

fill_missing_strict(c(NA, 1))
fill_missing_strict(c(NA, 1))

fill missing value

Description

Returns a vector with all missing values filled with another value

Usage

fill_value(x, value)
fill_value(x, value)

Arguments

`x`	vectors. All inputs should have the same length
`value`	a value with the same class as x

Value

vector with the same length as the first vector

Examples

fill_value(c(NA,1), 2)
fill_value(c(NA,1), 2)

fill_vector_interval

Description

fill_vector_interval

Usage

fill_vector_interval(x)
fill_vector_interval(x)

Arguments

`x`	the vector to be filled

fill_vector_last

Description

fill_vector_last

Usage

fill_vector_last(x, x_na_omit)
fill_vector_last(x, x_na_omit)

Arguments

`x`	the vector to be filled
`x_na_omit`	the x vector without NA values

fill_vector_max

Description

fill_vector_max

Usage

fill_vector_max(x, x_na_omit)
fill_vector_max(x, x_na_omit)

Arguments

`x`	the vector to be filled
`x_na_omit`	the x vector without NA values

fill_vector_min

Description

fill_vector_min

Usage

fill_vector_min(x, x_na_omit)
fill_vector_min(x, x_na_omit)

Arguments

`x`	the vector to be filled
`x_na_omit`	the x vector without NA values

fill_vector_previous

Description

fill_vector_previous

Usage

fill_vector_previous(x)
fill_vector_previous(x)

Arguments

`x`	the vector to be filled

fill_vector_strict

Description

fill_vector_strict

Usage

fill_vector_strict(x, x_na_omit)
fill_vector_strict(x, x_na_omit)

Arguments

`x`	the vector to be filled
`x_na_omit`	the x vector without NA values

NA impute median

Description

Is a specialized function which takes a variable and turns it into two new variables to be used in a prediction model.

the variable for which missing values are imputed by the median for the given year.
an indicator when the variable is missing

Usage

na_impute_median(data, var, year = 2014, year_column)
na_impute_median(data, var, year = 2014, year_column)

Arguments

`data`	The data frame.
`var`	The variable used to create new variables.
`year`	Year used for the median for imputation.
`year_column`	Column with year to use median on.

Value

New data frame in which missing values are filled.

Package 'vvfiller'

Help Index

Check if some missing values are present

Description

Usage

Arguments

Value

Fill column with aggregate by group

Description

Usage

Arguments

Value

Fill with aggregate by group

Description

Usage

Arguments

Value

Fill missing

Description

Usage

Arguments

Value

Fill missing interval

Description

Usage

Arguments

Value

Examples

Fill missing last

Description

Usage

Arguments

Value

Examples

Fill missing maximum

Description

Usage

Arguments

Value

Examples

Fill missing minimum

Description

Usage

Arguments

Value

Examples

Fill missing previous

Description

Usage

Arguments

Value

Examples

Fill missing rownumber

Description

Usage

Arguments

Value

Examples

Fill missing strict

Description

Usage

Arguments

Value

Examples

fill missing value

Description

Usage

Arguments

Value

Examples

fill_vector_interval

Description

Usage

Arguments

fill_vector_last

Description

Usage

Arguments

fill_vector_max

Description