Introduction
In summary, this section has Business Science Foundational Skills content. This includes the entire Business Science process; from data importing to cleaning, wrangling, exploratory data analysis (EDA), feature engineering, splitting, model building and evaluation, reporting and communication of results.
Business Science Workflow in R
flowchart LR
A(IMPORT <br> readr, readxl <br> tidyquant, rvest) --> B(TIDY <br> tidyr, tidytext<br> tibble)
B(TIDY <br> tidyr, tidytext <br> tibble) --> C(VISUALIZE <br> ggplot2, plotly)
C(VISUALIZE <br> ggplot2, plotly) --> D(TRANSFORM <br> lubridate, forcats <br> dplyr, stringr)
D(TRANSFORM <br> lubridate, forcats <br> dplyr, stringr) --> E(MODEL <br> tidymodels)
E(MODEL <br> tidymodels) --> C(VISUALIZE <br> ggplot2, plotly)
E(MODEL <br> tidymodels) --> F(COMMUNICATE <br> Rmarkdown, Shiny)
journey
title Business Science Workflow
section Prepare Data
Sourcing: 5: Business Problem
Cleaning: 5: Business Problem
Recasting: 5: Business Problem
section Experimentation
Go downstairs: 2: Business Value
Sit down: 2: Business Value
section Distribution
Reporting:
Distribution:
Data Cleaning
This involves:
removing duplicates,
checking missing data and performing imputations, if necessary,
verifying data types if match the data dictionary,
dropping of irrelevant columns.
Libraries
Thanks to skimr
as this package is capable of scanning your data and gives you the skeletal view and most important descriptive summary of variables in the data set.
Data summary
Name |
data |
Number of rows |
32 |
Number of columns |
5 |
_______________________ |
|
Column type frequency: |
|
factor |
4 |
numeric |
1 |
________________________ |
|
Group variables |
None |
Variable type: factor
Class |
0 |
1 |
FALSE |
4 |
1st: 8, 2nd: 8, 3rd: 8, Cre: 8 |
Sex |
0 |
1 |
FALSE |
2 |
Mal: 16, Fem: 16 |
Age |
0 |
1 |
FALSE |
2 |
Chi: 16, Adu: 16 |
Survived |
0 |
1 |
FALSE |
2 |
No: 16, Yes: 16 |
Variable type: numeric
Freq |
0 |
1 |
68.78 |
136 |
0 |
0.75 |
13.5 |
77 |
670 |
Exploratory Data Analysis
Machine Learning
Clustering
Reporting
Programming