2  Foundational Skills

2.1 Introduction

In summary, this section has Business Science Foundational Skills content. This includes the entire Business Science process; from data importing to cleaning, wrangling, exploratory data analysis (EDA), feature engineering, splitting, model building and evaluation, reporting and communication of results.

2.1.1 Business Science Workflow in R

flowchart LR
  A(IMPORT <br> readr, readxl <br> tidyquant, rvest) --> B(TIDY <br> tidyr, tidytext<br> tibble)
  B(TIDY <br> tidyr, tidytext <br> tibble) --> C(VISUALIZE <br> ggplot2, plotly)
  C(VISUALIZE <br> ggplot2, plotly) --> D(TRANSFORM <br> lubridate, forcats <br> dplyr, stringr)
  D(TRANSFORM <br> lubridate, forcats <br> dplyr, stringr) --> E(MODEL <br> tidymodels)
  E(MODEL <br> tidymodels) --> C(VISUALIZE <br> ggplot2, plotly)
  E(MODEL <br> tidymodels) --> F(COMMUNICATE <br> Rmarkdown, Shiny)

journey
    title Business Science Workflow
    section Prepare Data
      Sourcing: 5: Business Problem
      Cleaning: 5: Business Problem
      Recasting: 5: Business Problem
    section Experimentation
      Go downstairs: 2: Business Value
      Sit down: 2: Business Value
    section Distribution
      Reporting:
      Distribution:

2.2 Data Cleaning

This involves:

  • removing duplicates,

  • checking missing data and performing imputations, if necessary,

  • verifying data types if match the data dictionary,

  • dropping of irrelevant columns.

2.2.1 Libraries

Thanks to skimr as this package is capable of scanning your data and gives you the skeletal view and most important descriptive summary of variables in the data set.

Data summary
Name data
Number of rows 32
Number of columns 5
_______________________
Column type frequency:
factor 4
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
Class 0 1 FALSE 4 1st: 8, 2nd: 8, 3rd: 8, Cre: 8
Sex 0 1 FALSE 2 Mal: 16, Fem: 16
Age 0 1 FALSE 2 Chi: 16, Adu: 16
Survived 0 1 FALSE 2 No: 16, Yes: 16

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
Freq 0 1 68.78 136 0 0.75 13.5 77 670

2.3 Data Wrangling

2.4 Visualization

2.5 Exploratory Data Analysis

2.6 Machine Learning

  • Clustering

  • Reporting

  • Programming