DSBDA : Data Science and Big Data Analytics
  • Overview
  • Course Outline
  • What is included
  • Schedule
  • Testimonials
  • View Comments

This course provides practical foundation level training that enables immediate and effective participation in big data and other analytics projects. It includes an introduction to big data and the Data Analytics Lifecycle to address business challenges that leverage big data. The course provides grounding in basic and advanced analytic methods and an introduction to big data analytics technology and tools, including MapReduce and Hadoop. Labs offer opportunities for students to understand how these methods and tools may be applied to real world business challenges by a practicing data scientist. The course takes an “Open”, or technology-neutral approach, and includes a final lab which addresses a big data analytics challenge by applying the concepts taught in the course in the context of the Data Analytics Lifecycle. The course prepares the student for the Proven™ Professional Data Scientist Associate (EMCDSA) certification exam.

Audience

This course is intended for individuals seeking to develop an understanding of Data Science from the perspective of a practicing Data Scientist, including:

 Managers of teams of business intelligence, analytics, and big data professionals

 Current Business and Data Analysts looking to add big data analytics to their skills.

 Data and database professionals looking to exploit their analytic skills in a big data environment

 Recent college graduates and graduate students with academic experience in a related discipline looking to move into the world of data science and big data

 Individuals seeking to take advantage of the EMC Proven™ Professional Data Scientist Associate (EMCDSA) certification

Exams Covered

Upon successful completion of this course, participants should be able to:

  • Immediately participate and contribute as a Data Science Team Member on big data and other analytics projects by:
    • Deploying the Data Analytics Lifecycle to address big data analytics projects
    • Reframing a business challenge as an analytics challenge
    • Applying appropriate analytic techniques and tools to analyze big data, create statistical models, and identify insights that can lead to actionable results
    • Selecting appropriate data visualizations to clearly communicate analytic insights to business sponsors and analytic audiences
    • Using tools such as: R and RStudio, MapReduce/Hadoop, in-database analytics, Window and MADlib functions
  • Explain how advanced analytics can be leveraged to create competitive advantage and how the data scientist role and skills differ from those of a traditional business intelligence analyst

The classes will cover six modules of data science and big data analytics. 

  1. 1.     Introduction and Big Data Analytics

1.1.   What is Big data

1.2.   State of the practice in analytics

1.3.   Roles of a Data Scientist

  1. 2.     Big Data Analytics in Industry Standards

2.1.   Data analytics lifecycle

2.2.   Discovery

2.3.   Data preparation

2.4.   Model planning

2.5.   Model building

2.6.   Communicating results

2.7.   Operationalizing

  1. 3.     Data Analytics with R

3.1.   Introduction to R

3.2.   Analyzing and exploring the data

3.3.   R toolboxes and uses

3.4.   Statistics for model building and evaluation

  1. 4.     Advanced Data Analytics 

4.1.   Linear regression

4.2.   Logistic regression

4.3.   K Means clustering

4.4.   Association rules

4.5.   Naïve Bayesian classifier

4.6.   Decision Trees

4.7.   Time series analysis

4.8.   Text analysis

  1. 5.     Advanced Data Analysist Technologies and Tools

5.1.   Analytics for unstructured data

5.2.   MapReduce and Hadoop

5.3.   The Hadoop ecosystem

5.3.1. SQL essentials

5.3.2. Advanced SQL and MADlib

  1. 6.     Summary and Putting all Together

6.1.   Operationalizing an analytics project

6.2.   Creating the final deliverables

6.3.   Data visualization techniques

Final lab exercise on Big Data analytics

LOCATION START END TIME ENROLL