Title: | Apply Transformations to Data |
---|---|
Description: | Provides a set of functions for data transformations. Transformations are performed on character and numeric data. As the scope of the package is within Student Analytics, there are functions focused around the academic year. |
Authors: | Tomer Iwan [aut, cre, cph] |
Maintainer: | Tomer Iwan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.5.10 |
Built: | 2025-03-10 04:04:12 UTC |
Source: | https://github.com/vusaverse/vvconverter |
In this function, a date is translated to the academic year in which it falls. This is based on a start of the academic year on the 1st of September.
academic_year(x, start_1_oct = FALSE)
academic_year(x, start_1_oct = FALSE)
x |
A date, or vector with multiple dates. POSIXct is also accepted. |
start_1_oct |
Does the academic year start on the 1st of October? default FALSE: based on September 1st |
The academic year in which the specified date falls
Other vector calculations:
clean_multiple_underscores()
,
interval_round()
,
month_name()
,
sum_0_1()
,
transform_01_to_ft()
academic_year(lubridate::today())
academic_year(lubridate::today())
Replaces multiple underscores into a single underscore in a vector or string.
clean_multiple_underscores(x)
clean_multiple_underscores(x)
x |
The vector or string to be cleaned. |
cleaned vector or string.
Other vector calculations:
academic_year()
,
interval_round()
,
month_name()
,
sum_0_1()
,
transform_01_to_ft()
clean_multiple_underscores("hello___world")
clean_multiple_underscores("hello___world")
Convert character vector to numeric, ignoring irrelevant characters.
destring(x, keep = "0-9.-")
destring(x, keep = "0-9.-")
x |
A vector to be operated on |
keep |
Characters to keep in, in bracket regular expression form. Typically includes 0-9 as well as the decimal separator (. in the US and , in Europe). |
vector of type numeric
destring("24k") destring("5,5")
destring("24k") destring("5,5")
Calculate the means (or other function) per group to analyze how each segment behaves. It scales each variable mean into the 0 to 1 range to easily profile the groups according to its mean. It also calculates the mean regardless of the grouping. This function is also useful when you want to profile cluster results in terms of its means. It automatically adds a row representing the summary of the column regardless of the group_var categories, which is useful to compare each segment with the whole population. It will exclude all factor/character variables.
group_summary(data, group_var, group_func = mean)
group_summary(data, group_var, group_func = mean)
data |
Input data source. |
group_var |
Variable to make the group by. |
group_func |
Function to be used in the group by. Default is mean. |
Grouped data frame.
Similar to 'group_summary' function, this one computes the rank of each value in order to quickly know what is the value in each segment that has the highest value (rank=1). 1 represents the highest number. It will exclude all factor/character variables.
group_summary_rank(data, group_var, group_func = mean)
group_summary_rank(data, group_var, group_func = mean)
data |
Input data source. |
group_var |
Variable to make the group by. |
group_func |
Function to be used in the group by. Default is mean. |
Grouped data frame, showing the rank instead of the absolute values.
Function to round numeric values in a vector to values from an interval sequence.
interval_round(x, interval)
interval_round(x, interval)
x |
The numeric vector to adjust |
interval |
The interval sequence |
The vector corrected for the given interval
Other vector calculations:
academic_year()
,
clean_multiple_underscores()
,
month_name()
,
sum_0_1()
,
transform_01_to_ft()
interval_round(c(5, 4, 2, 6), interval = seq(1:4))
interval_round(c(5, 4, 2, 6), interval = seq(1:4))
Trim leading whitespace from sting.
ltrim(x)
ltrim(x)
x |
A text string. |
Cleaned string.
trim(" hello")
trim(" hello")
Calculate the median of the top ten percentage of the values.
median_top_10(x, na.rm = FALSE)
median_top_10(x, na.rm = FALSE)
x |
A numerical vector |
na.rm |
Default TRUE: Remove NAs, before calculations. |
A numerical value
median_top_10(mtcars$cyl)
median_top_10(mtcars$cyl)
Determine the most common value in a vector. If two values have the same frequency, the first occurring value is used.
mode(x, na.rm = FALSE)
mode(x, na.rm = FALSE)
x |
a vector |
na.rm |
If TRUE: Remove nas before the calculation is done |
the most common value in the vector x
mode(c(0, 3, 5, 7, 5, 3, 2))
mode(c(0, 3, 5, 7, 5, 3, 2))
Transform month from numeric to equivalent in specified language.
month_name(month_numeric, lang = "nl")
month_name(month_numeric, lang = "nl")
month_numeric |
Numeric in range 1 - 12. |
lang |
The language of the month names. Default is "nl" (Dutch). |
Character string representation of month in specified language.
Other vector calculations:
academic_year()
,
clean_multiple_underscores()
,
interval_round()
,
sum_0_1()
,
transform_01_to_ft()
Trim trailing whitespaces from string.
rtrim(x)
rtrim(x)
x |
A text string. |
Cleaned string.
trim("hello ")
trim("hello ")
Replace all occurences of a pattern in a file
str_replace_all_in_file( file, pattern, replacement = "[...]", only_comments = TRUE, collapse = FALSE )
str_replace_all_in_file( file, pattern, replacement = "[...]", only_comments = TRUE, collapse = FALSE )
file |
character, path of file to be modified |
pattern |
character, pattern to be replaced |
replacement |
character, replacement text |
only_comments |
logical, should the replacement only be done in comments |
collapse |
logical, should the lines be collapsed into a single line before replacement |
NULL, the file is modified in place
This function is the same as sum(), with one exception: If the outcome value is higher than 1, it will always return 1.
sum_0_1(x)
sum_0_1(x)
x |
a vector with numeric values |
0 or 1. Depending on whether the sum is greater than 0 or not.
Other vector calculations:
academic_year()
,
clean_multiple_underscores()
,
interval_round()
,
month_name()
,
transform_01_to_ft()
This function tests whether the vector is actually a boolean, but is encoded as a 0/1 variable. The function checks for numeric vectors whether the only occurring values are 0, 1, or NA. At character and factor vectors checks whether the only occurring values are "0", "1", or NA to be. If there is a 0/1 variable, TRUE is returned, in all others cases FALSE.
test_01(x)
test_01(x)
x |
The vector to test |
A TRUE/FALSE value on the test
Other booleans:
transform_01_to_ft()
vector <- c(0, 1, 0, 1, 1, 1, 0) test_01(vector)
vector <- c(0, 1, 0, 1, 1, 1, 0) test_01(vector)
This function tests if a vector of responses are yes or no.
test_yes_no(responses)
test_yes_no(responses)
responses |
A vector of responses. |
A logical vector indicating if each response is yes or no.
If the vector is a 0/1 vector, it is converted to a logical one TRUE/FALSE vector. This transformation is performed only if the vector contains only values 0, 1, or NA. If this is not the case returns the original variable. This transformation can be done on numeric, string, and factor vectors.
transform_01_to_ft(x)
transform_01_to_ft(x)
x |
the vector to be tested and transformed. |
The transformed vector if a transformation is possible. If no transformation is possible, the original vector returned.
Other vector calculations:
academic_year()
,
clean_multiple_underscores()
,
interval_round()
,
month_name()
,
sum_0_1()
Other booleans:
test_01()
vector <- c(0, 1, 0, 1, 1, 1, 0) transform_01_to_ft(vector)
vector <- c(0, 1, 0, 1, 1, 1, 0) transform_01_to_ft(vector)
This function transforms a logical vector to a vector of yes/no strings or vice versa.
transform_logical_yes_no(x, lang = "nl")
transform_logical_yes_no(x, lang = "nl")
x |
A logical or character vector. |
lang |
The language of the yes/no strings. Default is "nl" (Dutch). |
A vector of yes/no strings or a logical vector.
This function translates yes/no responses from a given language to English.
translate_yes_no(responses, source_language = "nl")
translate_yes_no(responses, source_language = "nl")
responses |
A vector of responses. |
source_language |
The language of the responses. Default is "nl" (Dutch). |
A vector of translated responses.
Trim both leading and trailing whitespaces from string.
trim(x)
trim(x)
x |
A text string. |
Cleaned string.
trim(" hello ")
trim(" hello ")