Concepts, Drivers & Techniques
Co-authored and Edited by Best-Selling Author Thomas Erl
Co-authored by Wajid Khattak and Paul Buhler, PhD
PRENTICE HALL
ServiceTech.Press
BOSTON • COLUMBUS • INDIANAPOLIS • NEW YORK • SAN FRANCISCO
AMSTERDAM • CAPE TOWN • DUBAI • LONDON • MADRID • MILAN • MUNICH
PARIS • MONTREAL • TORONTO • DELHI • MEXICO CITY • SAO PAULO
SIDNEY • HONG KONG • SEOUL • SINGAPORE • TAIPEI • TOKYO
ServiceTech.Press
BOSTON • COLUMBUS • INDIANAPOLIS • NEW YORK • SAN FRANCISCO
AMSTERDAM • CAPE TOWN • DUBAI • LONDON • MADRID • MILAN • MUNICH
PARIS • MONTREAL • TORONTO • DELHI • MEXICO CITY • SAO PAULO
SIDNEY • HONG KONG • SEOUL • SINGAPORE • TAIPEI • TOKYO
![]() |
Big Data Fundamentals |
To my family and friends.
—Thomas Erl
I dedicate this book to my daughters Hadia and Areesha,
my wife Natasha, and my parents.
—Wajid Khattak
I thank my wife and family for their patience and for
putting up with my busyness over the years.
I appreciate all the students and colleagues I have had the
privilege of teaching and learning from.
John 3:16, 2 Peter 1:5-8
—Paul Buhler, PhD
About the Authors
Thomas Erl is a top-selling IT author, founder of Arcitura Education and series editor of
the Prentice Hall Service Technology Series from Thomas Erl. With more than 200,000
copies in print worldwide, his books have become international bestsellers and have been
formally endorsed by senior members of major IT organizations, such as IBM, Microsoft,
Oracle, Intel, Accenture, IEEE, HL7, MITRE, SAP, CISCO, HP and many others. As CEO
of Arcitura Education Inc., Thomas has led the development of curricula for the
internationally recognized Big Data Science Certified Professional (BDSCP), Cloud
Certified Professional (CCP) and SOA Certified Professional (SOACP) accreditation
programs, which have established a series of formal, vendor-neutral industry certifications
obtained by thousands of IT professionals around the world. Thomas has toured more than
20 countries as a speaker and instructor. More than 100 articles and interviews by Thomas
have been published in numerous publications, including The Wall Street Journal and CIO Magazine.
Wajid Khattak is a Big Data researcher and trainer at Arcitura Education Inc. His areas of
interest include Big Data engineering and architecture, data science, machine learning,
analytics and SOA. He has extensive .NET software development experience in the
domains of business intelligence reporting solutions and GIS.
Wajid completed his MSc in Software Engineering and Security with distinction from
Birmingham City University in 2008. Prior to that, in 2003, he earned his BSc (Hons)
degree in Software Engineering from Birmingham City University with first-class
recognition. He holds MCAD & MCTS (Microsoft), SOA Architect, Big Data Scientist,
Big Data Engineer and Big Data Consultant (Arcitura) certifications.
Dr. Paul Buhler is a seasoned professional who has worked in commercial, government
and academic environments. He is a respected researcher, practitioner and educator of
service-oriented computing concepts, technologies and implementation methodologies.
His work in XaaS naturally extends to cloud, Big Data and IoE areas. Dr. Buhler’s more
recent work has been focused on closing the gap between business strategy and process
execution by leveraging responsive design principles and goal-based execution.
As Chief Scientist at Modus21, Dr. Buhler is responsible for aligning corporate strategy
with emerging trends in business architecture and process execution frameworks. He also
holds an Affiliate Professorship at the College of Charleston, where he teaches both
graduate and undergraduate computer science courses. Dr. Buhler earned his Ph.D. in
Computer Engineering at the University of South Carolina. He also holds an MS degree in
Computer Science from Johns Hopkins University and a BS in Computer Science from The Citadel.
Acknowledgments
Thomas Erl is a top-selling IT author, founder of Arcitura Education and series editor of
the Prentice Hall Service Technology Series from Thomas Erl. With more than 200,000
copies in print worldwide, his books have become international bestsellers and have been
formally endorsed by senior members of major IT organizations, such as IBM, Microsoft,
Oracle, Intel, Accenture, IEEE, HL7, MITRE, SAP, CISCO, HP and many others. As CEO
of Arcitura Education Inc., Thomas has led the development of curricula for the
internationally recognized Big Data Science Certified Professional (BDSCP), Cloud
Certified Professional (CCP) and SOA Certified Professional (SOACP) accreditation
programs, which have established a series of formal, vendor-neutral industry certifications
obtained by thousands of IT professionals around the world. Thomas has toured more than
20 countries as a speaker and instructor. More than 100 articles and interviews by Thomas
have been published in numerous publications, including The Wall Street Journal and CIO Magazine.
Wajid Khattak is a Big Data researcher and trainer at Arcitura Education Inc. His areas of
interest include Big Data engineering and architecture, data science, machine learning,
analytics and SOA. He has extensive .NET software development experience in the
domains of business intelligence reporting solutions and GIS.
Wajid completed his MSc in Software Engineering and Security with distinction from
Birmingham City University in 2008. Prior to that, in 2003, he earned his BSc (Hons)
degree in Software Engineering from Birmingham City University with first-class
recognition. He holds MCAD & MCTS (Microsoft), SOA Architect, Big Data Scientist,
Big Data Engineer and Big Data Consultant (Arcitura) certifications.
Dr. Paul Buhler is a seasoned professional who has worked in commercial, government
and academic environments. He is a respected researcher, practitioner and educator of
service-oriented computing concepts, technologies and implementation methodologies.
His work in XaaS naturally extends to cloud, Big Data and IoE areas. Dr. Buhler’s more
recent work has been focused on closing the gap between business strategy and process
execution by leveraging responsive design principles and goal-based execution.
As Chief Scientist at Modus21, Dr. Buhler is responsible for aligning corporate strategy
with emerging trends in business architecture and process execution frameworks. He also
holds an Affiliate Professorship at the College of Charleston, where he teaches both
graduate and undergraduate computer science courses. Dr. Buhler earned his Ph.D. in
Computer Engineering at the University of South Carolina. He also holds an MS degree in
Computer Science from Johns Hopkins University and a BS in Computer Science from The Citadel.
Acknowledgments
In alphabetical order by last name:
• Allen Afuah, Ross School of Business, University of Michigan
• Thomas Davenport, Babson College
• Hugh Dubberly, Dubberly Design Office
• Joe Gollner, Gnostyx Research Inc.
• Dominic Greenwood, Whitestein Technologies
• Gareth Morgan, The Schulich School of Business, York University
• Peter Morville, Semantic Studios
• Michael Porter, The Institute for Strategy and Competitiveness, Harvard Business
School
• Mark von Rosing, LEADing Practice
• Jeanne Ross, Center for Information Systems Research, MIT Sloan School of
Management
• Jim Sinur, Flueresque
• John Sterman, MIT System Dynamics Group, MIT Sloan School of Management
Special thanks to the Arcitura Education and Big Data Science School research and
development teams that produced the Big Data Science Certified Professional (BDSCP)
course modules upon which this book is based.
Screenshot
Product details
Price
|
|
---|---|
File Size
| 10,357 KB |
Pages
|
235 p |
File Type
|
PDF format |
ISBN-13
ISBN-10 | 978-0-13-429107-9 0-13-429107-7 |
Copyright
| 2016 Arcitura Education Inc |
Contents at a Glance
PART I: THE FUNDAMENTALS OF BIG DATA
CHAPTER 1: Understanding Big Data
CHAPTER 2: Business Motivations and Drivers for Big Data Adoption
CHAPTER 3: Big Data Adoption and Planning Considerations
CHAPTER 4: Enterprise Technologies and Big Data Business Intelligence
PART II: STORING AND ANALYZING BIG DATA
CHAPTER 5: Big Data Storage Concepts
CHAPTER 6: Big Data Processing Concepts
CHAPTER 7: Big Data Storage Technology
CHAPTER 8: Big Data Analysis Techniques
APPENDIX A: Case Study Conclusion
About the Authors
Index
Contents
Acknowledgments
Reader Services
PART I: THE FUNDAMENTALS OF BIG DATA
CHAPTER 1: Understanding Big Data
Concepts and Terminology
Datasets
Data Analysis
Data Analytics
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
Business Intelligence (BI)
Key Performance Indicators (KPI)
Big Data Characteristics
Volume
Velocity
Variety
Veracity
Value
Different Types of Data
Structured Data
Unstructured Data
Semi-structured Data
Metadata
Case Study Background
History
Technical Infrastructure and Automation Environment
Business Goals and Obstacles
Case Study Example
Identifying Data Characteristics
Volume
Velocity
Variety
Veracity
Value
Identifying Types of Data
CHAPTER 2: Business Motivations and Drivers for Big Data Adoption
Marketplace Dynamics
Business Architecture
Business Process Management
Information and Communications Technology
Data Analytics and Data Science
Digitization
Affordable Technology and Commodity Hardware
Social Media
Hyper-Connected Communities and Devices
Cloud Computing
Internet of Everything (IoE)
Case Study Example
CHAPTER 3: Big Data Adoption and Planning Considerations
Organization Prerequisites
Data Procurement
Privacy
Security
Provenance
Limited Realtime Support
Distinct Performance Challenges
Distinct Governance Requirements
Distinct Methodology
Clouds
Big Data Analytics Lifecycle
Business Case Evaluation
Data Identification
Data Acquisition and Filtering
Data Extraction
Data Validation and Cleansing
Data Aggregation and Representation
Data Analysis
Data Visualization
Utilization of Analysis Results
Case Study Example
Big Data Analytics Lifecycle
Business Case Evaluation
Data Identification
Data Acquisition and Filtering
Data Extraction
Data Validation and Cleansing
Data Aggregation and Representation
Data Analysis
Data Visualization
Utilization of Analysis Results
CHAPTER 4: Enterprise Technologies and Big Data Business Intelligence
Online Transaction Processing (OLTP)
Online Analytical Processing (OLAP)
Extract Transform Load (ETL)
Data Warehouses
Data Marts
Traditional BI
Ad-hoc Reports
Dashboards
Big Data BI
Traditional Data Visualization
Data Visualization for Big Data
Case Study Example
Enterprise Technology
Big Data Business Intelligence
PART II: STORING AND ANALYZING BIG DATA
CHAPTER 5: Big Data Storage Concepts
Clusters
File Systems and Distributed File Systems
NoSQL
Sharding
Replication
Master-Slave
Peer-to-Peer
Sharding and Replication
Combining Sharding and Master-Slave Replication
Combining Sharding and Peer-to-Peer Replication
CAP Theorem
ACID
BASE
Case Study Example
CHAPTER 6: Big Data Processing Concepts
Parallel Data Processing
Distributed Data Processing
Hadoop
Processing Workloads
Batch
Transactional
Cluster
Processing in Batch Mode
Batch Processing with MapReduce
Map and Reduce Tasks
Map
Combine
Partition
Shuffle and Sort
Reduce
A Simple MapReduce Example
Understanding MapReduce Algorithms
Processing in Realtime Mode
Speed Consistency Volume (SCV)
Event Stream Processing
Complex Event Processing
Realtime Big Data Processing and SCV
Realtime Big Data Processing and MapReduce
Case Study Example
Processing Workloads
Processing in Batch Mode
Processing in Realtime
CHAPTER 7: Big Data Storage Technology
On-Disk Storage Devices
Distributed File Systems
RDBMS Databases
NoSQL Databases
Characteristics
Rationale
Types
Key-Value
Document
Column-Family
Graph
NewSQL Databases
In-Memory Storage Devices
In-Memory Data Grids
Read-through
Write-through
Write-behind
Refresh-ahead
In-Memory Databases
Case Study Example
CHAPTER 8: Big Data Analysis Techniques
Quantitative Analysis
Qualitative Analysis
Data Mining
Statistical Analysis
A/B Testing
Correlation
Regression
Machine Learning
Classification (Supervised Machine Learning)
Clustering (Unsupervised Machine Learning)
Outlier Detection
Filtering
Semantic Analysis
Natural Language Processing
Text Analytics
Sentiment Analysis
Visual Analysis
Heat Maps
Time Series Plots
Network Graphs
Spatial Data Mapping
Case Study Example
Correlation
Regression
Time Series Plot
Clustering
Classification
APPENDIX A: Case Study Conclusion
About the Authors
Thomas Erl
Wajid Khattak
Paul Buhler
Index
●▬▬▬▬▬❂❂❂▬▬▬▬▬●
●▬▬❂❂▬▬●
●▬❂▬●
●❂●