Intelligent data analysis : from data gathering to data comprehension /
edited by Dr. Deepak Gupta, Dr. Siddhartha Bhattacharyya, Dr. Ashish Khanna, Ms. Kalpna Sagar.
- 1 online resource.
- The Wiley series in intelligent signal and data processing .
Includes bibliographical references and index.
Cover -- Title Page -- Copyright -- Contents -- List of Contributors -- Series Preface -- Preface -- Chapter 1 Intelligent Data Analysis: Black Box Versus White Box Modeling -- 1.1 Introduction -- 1.1.1 Intelligent Data Analysis -- 1.1.2 Applications of IDA and Machine Learning -- 1.1.3 White Box Models Versus Black Box Models -- 1.1.4 Model Interpretability -- 1.2 Interpretation of White Box Models -- 1.2.1 Linear Regression -- 1.2.2 Decision Tree -- 1.3 Interpretation of Black Box Models -- 1.3.1 Partial Dependence Plot -- 1.3.2 Individual Conditional Expectation 1.3.3 Accumulated Local Effects -- 1.3.4 Global Surrogate Models -- 1.3.5 Local Interpretable Model-Agnostic Explanations -- 1.3.6 Feature Importance -- 1.4 Issues and Further Challenges -- 1.5 Summary -- References -- Chapter 2 Data: Its Nature and Modern Data Analytical Tools -- 2.1 Introduction -- 2.2 Data Types and Various File Formats -- 2.2.1 Structured Data -- 2.2.2 Semi-Structured Data -- 2.2.3 Unstructured Data -- 2.2.4 Need for File Formats -- 2.2.5 Various Types of File Formats -- 2.2.5.1 Comma Separated Values (CSV) -- 2.2.5.2 ZIP -- 2.2.5.3 Plain Text (txt) -- 2.2.5.4 JSON 2.2.5.5 XML -- 2.2.5.6 Image Files -- 2.2.5.7 HTML -- 2.3 Overview of Big Data -- 2.3.1 Sources of Big Data -- 2.3.1.1 Media -- 2.3.1.2 The Web -- 2.3.1.3 Cloud -- 2.3.1.4 Internet of Things -- 2.3.1.5 Databases -- 2.3.1.6 Archives -- 2.3.2 Big Data Analytics -- 2.3.2.1 Descriptive Analytics -- 2.3.2.2 Predictive Analytics -- 2.3.2.3 Prescriptive Analytics -- 2.4 Data Analytics Phases -- 2.5 Data Analytical Tools -- 2.5.1 Microsoft Excel -- 2.5.2 Apache Spark -- 2.5.3 Open Refine -- 2.5.4 R Programming -- 2.5.4.1 Advantages of R -- 2.5.4.2 Disadvantages of R -- 2.5.5 Tableau 2.5.5.1 How TableauWorks -- 2.5.5.2 Tableau Feature -- 2.5.5.3 Advantages -- 2.5.5.4 Disadvantages -- 2.5.6 Hadoop -- 2.5.6.1 Basic Components of Hadoop -- 2.5.6.2 Benefits -- 2.6 Database Management System for Big Data Analytics -- 2.6.1 Hadoop Distributed File System -- 2.6.2 NoSql -- 2.6.2.1 Categories of NoSql -- 2.7 Challenges in Big Data Analytics -- 2.7.1 Storage of Data -- 2.7.2 Synchronization of Data -- 2.7.3 Security of Data -- 2.7.4 Fewer Professionals -- 2.8 Conclusion -- References -- Chapter 3 Statistical Methods for Intelligent Data Analysis: Introduction and Various Concepts 3.1 Introduction -- 3.2 Probability -- 3.2.1 Definitions -- 3.2.1.1 Random Experiments -- 3.2.1.2 Probability -- 3.2.1.3 Probability Axioms -- 3.2.1.4 Conditional Probability -- 3.2.1.5 Independence -- 3.2.1.6 Random Variable -- 3.2.1.7 Probability Distribution -- 3.2.1.8 Expectation -- 3.2.1.9 Variance and Standard Deviation -- 3.2.2 Bayes' Rule -- 3.3 Descriptive Statistics -- 3.3.1 Picture Representation -- 3.3.1.1 Frequency Distribution -- 3.3.1.2 Simple Frequency Distribution -- 3.3.1.3 Grouped Frequency Distribution -- 3.3.1.4 Stem and Leaf Display -- 3.3.1.5 Histogram and Bar Chart
"The new tool for analyses is ?Intelligent Data Analysis (IDA)?. IDA can be defined as the use of specialized statistical, pattern recognition, machine learning, data abstraction, and visualization tools for analysis of data and discovery of mechanisms that created the data. Such data are typically complex, meaning that they are characterized by many records, many variables, subtle interactions between variables, or a combination of all three. Engineering, computing sciences, database science, machine learning, and even artificial intelligence are bringing their powers to this newly born data analysis discipline. The main idea underlying the concept of Intelligent Data Analysis is extracting knowledge from a very large amount of data, with a very large amount of variables; data that represents very complex, non-linear, real-life problems. Moreover, IDA can help when starting from the raw data, coping with prediction tasks without knowing the theoretical description of the underlying process, classification tasks of new events based on past ones, or modeling the aforementioned unknown process. Classification, prediction, and modeling are the cornerstones that Intelligent Data Analysis can bring to us"--