Chapter 3 Design Big Data and NoSQL Design Big data analytics can provide and uncover the patterns hidden in your organization data. You can integrate actionable insights with DXP and create efficient data streams that can learn, predict, and take action by connecting DXP with multiple data sources, and then apply ML algorithms for better understanding of their own customer. Big Data and NoSQL Integration Big data solutions are built using open-source projects like Apache Spark, Hadoop, and Kafka, to name a few, which help you to collect data from multiple data sources and provide distributed processing, and usage of ML algorithms and data visualization methods help you to analyze big data that helps management as well, as shown in Figure 3-26. We will look into big data components such as ETL, ML models for efficient steaming of predictive data models, search and query web services, and usage of NoSQL databases in this section. • Extract, transform, and load (ETL): • You can load data from multiple data source using open-source big data streaming engines such as Apache Spark. It can access multiple data sources including the Hadoop Distributed File System (HDFS), NoSQL database, and SQL-databases. • Collection of elements of your dataset that will be stored in memory or disk across a cluster of machines • A data frame is created to help process large data sets easily. Spark’s dataset and data frame provide an API that allows developers to easily express transformations on domain objects. • Train and test predictive data model: • You can use different kinds of ML algorithms (supervised learning, unsupervised learning, or reinforcement learning) depending upon the nature of problem. 102
Building Digital Experience Platforms Page 121 Page 123