2 Foundational Skills
2.1 Introduction
In summary, this section has Business Science Foundational Skills content. This includes the entire Business Science process; from data importing to cleaning, wrangling, exploratory data analysis (EDA), feature engineering, splitting, model building and evaluation, reporting and communication of results.
2.1.1 Business Science Workflow in R
2.2 Data Cleaning
This involves:
removing duplicates,
checking missing data and performing imputations, if necessary,
verifying data types if match the data dictionary,
dropping of irrelevant columns.
2.2.1 Libraries
Thanks to skimr
as this package is capable of scanning your data and gives you the skeletal view and most important descriptive summary of variables in the data set.
Name | data |
Number of rows | 32 |
Number of columns | 5 |
_______________________ | |
Column type frequency: | |
factor | 4 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
Class | 0 | 1 | FALSE | 4 | 1st: 8, 2nd: 8, 3rd: 8, Cre: 8 |
Sex | 0 | 1 | FALSE | 2 | Mal: 16, Fem: 16 |
Age | 0 | 1 | FALSE | 2 | Chi: 16, Adu: 16 |
Survived | 0 | 1 | FALSE | 2 | No: 16, Yes: 16 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|
Freq | 0 | 1 | 68.78 | 136 | 0 | 0.75 | 13.5 | 77 | 670 |
2.3 Data Wrangling
2.4 Visualization
2.5 Exploratory Data Analysis
2.6 Machine Learning
Clustering
Reporting
Programming