Data science/machine learning with R 2019-08-28T12:14:41+00:00

# Data Science & Machine Learning with R (DSMLR)

Get familiar with R using Data science & machine learning Techniques.

Data Science and Machine Learning with R – R is a language and environment for statistical computing and graphics. R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.

R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity. One of R’s strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.

## CURRICULUM

#### Module 01: Introduction to Data Science

Module 01: Introduction to Data Science:

• Introduction to Data Science
• Life cycle of data science
• Skills required for data science
Career path in data science
• Applications of data science

#### Module 02: Statistics in Data science

Module 02: Statistics in Data science:

1. Introduction to Data:

• Data types
• Data Collection Techniques

2. Descriptive Statistics:

• Measures of Central Tendency
• Measures of Dispersion Measures of Skewness and Kurtosis
• Visualization

3. Inferential Statistics:

• Sampling variability and Central Limit Theorem
• Confidence Interval for Mean Hypothesis Testing, t- Test, F-Test, Chi-square Test
• ANOVA

4. Random Sampling and Probability Distribution:

• Probability and Limitations, Discrete Probability, Continuous Probability
• Bernoulli, Binomial Poisson distribution and Normal Distribution

#### Module 3: Statistical Learning

Module 3: Statistical Learning:

What Is Statistical Learning? Why Estimate f? | How Do We Estimate f?
The Trade-Off Between Prediction Accuracy and Model Interpretability
Supervised Versus Unsupervised Learning | Regression Versus Classification Problems
Assessing Model Accuracy | Measuring the Quality of Fit | The Bias-Variance Trade-Off

1. Linear Regression:

• Simple Linear Regression:
• Estimating the Coefficients
• Assessing the Accuracy of the Coefficient Estimates
• Assessing the Accuracy of the Model

2. Multiple Linear Regression:

• Estimating the Regression Coefficients
• Some Important Questions
• Other Considerations in the Regression Model
• Qualitative Predictors
• Interaction Terms
• Non-linear Transformations of the Predictors
• Extensions of the Linear Model
• Potential Problems

3. Classification:

• An Overview of Classification
• Why Not Linear Regression

4. Logistic Regression:

• The Logistic Model
• Estimating the Regression Coefficients
• Making Predictions
• Multiple Logistic Regression
• Logistic Regression for >2 Response Classes

5. Resampling Methods:

• Cross-Validation:
• The Validation Set Approach
• Leave-One-Out Cross-Validation
• k-Fold Cross-Validation
• Cross-Validation
• Cross-Validation on Classification Problems
• The Bootstrap

6. Linear Model Selection and Regularization:

• Subset Selection
• Best Subset
• Selection Stepwise
• Selection Forward and Backward Stepwise Selection
• Choosing the Optimal Model

7. Shrinkage Methods:

• Ridge Regression
• The Lasso Regression K-Nearest Neighbor

#### Module 4 : Deep dive into Machine Learning

Module 4 : Deep dive into Machine Learning:

Tree-Based Methods

1. Basics of Decision Trees:

• Regression Trees
• Classification Trees
• Trees Versus Linear Models

2. Bagging, Random Forests, Boosting:

• Bagging
• Random Forests
• Boosting

3. Support Vector Machines:

• Hyperplane
• The Maximal Margin Classifier
• Support Vector Classifiers
• Support Vector Machines
• Kernel Trick
• Gamma, Cost and Epsilon
• SVMs with More than Two Classes

#### Module 5: R for Data Science

Module 5: R for Data Science:

1. Python programming:

• Environment Setup
• Jupyter Notebook Overview
• Data types:
• Numbers,Strings
• Printing, Lists
• Dictionaries, Booleans,Tuples , Sets
• Comparison Operators
• if, elif, else Statements

2. Loops:

• for Loops, while Loops
• range()
• list comprehension
• functions

3. Python for Exploratory Data Analysis:

• Numpy
• Pandas

4. Python for Data Visualization:

• Matplotlib
• Seaborn
• Pandas built in visualization

#### Module 6: Unsupervised Learning

Module 6: Unsupervised Learning:

The Challenges of Unsupervised Learning

1. Principal Components Analysis:

• What Are Principal Components?
• Another Interpretation of Principal Components
• More on PCA
• Other Uses for Principal Components

2. Clustering Methods:

• K-Means Clustering
• Hierarchical Clustering
• Practical Issues in Clustering

#### Module 7: Association Rules Mining and Times Series Analysis

Module 8: Association Rules Mining and Times Series Analysis:

1. Association Rules Mining:

• Apriori/Support/Confidence/Lift

2. Time Series Analysis:

• What is Times Series Data?
• Stationarity in Time Series Data
• Augmented Dickey Fuller Test
• The Box-Jenkins Approach
• The AR Process
• The MA Process What is ARIMA?
• ACF,PACF and IACF plots
• Decomposition of Times Series
• Trend, Seasonality and Cyclic
• Exponential Smoothing
• | EWMA

## Course Details

The Advanced certification program is delivered is the most pragmatic learning approach which is an interfusion of theoretical & practical learning to ensure the participants comprension is accurate.

• Technology infused learning
• Guest Lectures by Industry experts
• Hackathons & Real time projects
• A most friendly & supportive environment

Case Studies: Education industry using Linear Regression in R Insurance domain using Logistic Regression in R Banking Industry using Decision Tree in R Network Intrusion using Decision tree in R Manufacturing industry Support Vector Machine in R BPO using Time Series in R Crime analysis using PCA in R Liquor Industry using Clustering in R Salary Analysis using Lasso and Ridge Regression in R

## Artificial Intelligence and Data Science

In 2012, Harvard Business Review named data scientist the “sexiest job of the 21st century.” More recently, Glassdoor named it the “best job of the year” for 2016.

“It isn’t a big surprise,” Dr. Andrew Chamberlain, Glassdoor’s chief economist, told Business Insider. “It’s one of the hottest and fastest growing jobs we’re seeing right now.” According to Glassdoor, data scientists earn a base pay of \$116,840 a year, on average.

Here’s how much they take in, on average, at some of the hottest tech companies, according to Glass-Door’s employee salary reviews:
Apple: \$149,963
Microsoft: \$119,129
Airbnb: \$117,229

The advanced certification program is perfect for the participants who are very keen on working towards analytics, automation, AI & to enhance their skillset in the most advanced technology in the world.

1. Why is R used?
Graphical powers of R is also used in Facebook’s social network graph. They also use R to predict colleague interaction. Google uses R to predict economic activity. They also R for statistical analysis and visualization, to ensure that its advertisers are always getting the best for their marketing investment.

2. What are the advantages of R programming?
R supports extensions. R performs a wide variety of functions, such as data manipulation, statistical modeling, and graphics. The one really big advantage of R, however, is its extensibility. Developers can easily write their own software and distribute it in the form of add-on packages.

3. Why is the “R” language important?
Importance of R Language for Data Science. R is an open-source programming language that was created by Roass Ihaka and Robert Gentleman in 1995. The purpose of developing this language was to focus on delivering a more user-friendly and better way to perform statistics, data analysis, and graphical modules.

4. Is R related to Python?
R and Python are both open-source programming languages with a large community. R and Python requires a time-investment, and such luxury is not available for everyone. Python is a general-purpose language with a readable syntax. R, however, is built by statisticians and encompasses their specific language.

5. How is R used in data analytics?
R is a language used for statistical computations, data analysis and graphical representation of data. Created in the 1990s by Ross Ihaka and Robert Gentleman, R was designed as a statistical platform for data cleaning, analysis, and representation. This shows how popular R programming is in data science.

6. Is Python better than R for data science?
Python has caught up some with advances in Matplotlib but R still seems to be much better at data visualization (ggplot2, HTML widgets, Leaflet). Python is a powerful, versatile language that programmers can use for a variety of tasks in computer science. The Python vs R debate confines you to one programming language.

7. What is machine learning with R?
Introducing: Machine Learning in R. Machine learning is a branch in computer science that studies the design of algorithms that can learn. Typical machine learning tasks are concept learning, function learning or “predictive modeling”, clustering and finding predictive patterns.

8. Is R good for machine learning?
Python or R for Machine Learning and Data Science. They’re the two most popular tools used by data scientists. They’re both open-source and free. But while Python was designed as a general-purpose programming language, R was developed for statistical analysis. Call Now