Module list

Professional training module

TID-020/ Introduction to Big Data architecture

Build Smarter Infrastructure. Deliver Real Value. This 3-day intensive workshop teaches you how to design scalable data platforms and monetize them through APIs, services, and marketplaces. Learn to package your data into business-ready products — with billing, privacy, and licensing built in from day one.

Track
Analytics & Artificial Intelligence
Duration
21 hour
Format
Schools, cohorts, or programme teams
Price
75 €

Overview

What this module covers

Build Smarter Infrastructure. Deliver Real Value. This 3-day intensive workshop teaches you how to design scalable data platforms and monetize them through APIs, services, and marketplaces. Learn to package your data into business-ready products — with billing, privacy, and licensing built in from day one.

Learning outcomes

What learners should be able to do

6 outcomes
  • 1

    Architect for value: design pipelines, APIs, and scalable infrastructure

  • 2

    Build and expose data products and services

  • 3

    Integrate monetization mechanisms: billing, licensing, usage tracking

  • 4

    Navigate privacy, compliance, and governance concerns

  • 5

    Deploy production-ready data services with modern DevOps patterns

  • 6

    Identify monetization-ready data across the organization

Module content

Course description

3-Day Intensive Course for Technical Professionals

3 Intense Days
7 Hours per Day (Split into two 3.5-hour sessions)

Learning Path Visual

Your hands-on journey from infrastructure to monetization:

Day 1: Architecting for Value — Data Value Chains & Infrastructure Setup
Map your data’s monetization potential and build the architecture to support it — from pipelines and APIs to governance frameworks.

Day 2: Engineering the Flow — Pipelines, Products & Insights
Design and implement pipelines that turn raw data into reusable assets. Create and expose data products using real tools and cloud infrastructure.

Day 3: Monetization Engines — APIs, Marketplaces & Security at Scale
Build and launch monetizable APIs, integrate billing and licensing controls, and distribute products across marketplaces with privacy and compliance by design.

Course Overview

Data is the new oil — but only if you can refine and monetize it. This workshop equips engineers, developers, and cloud architects with the technical and strategic skills to design scalable data architectures and turn them into revenue-generating platforms.

From backend infrastructure to API deployment, this course walks you through the entire lifecycle of data monetization using open-source and cloud-native tools.

You’ll learn how to:

  • Architect for value: design pipelines, APIs, and scalable infrastructure

  • Build and expose data products and services

  • Integrate monetization mechanisms: billing, licensing, usage tracking

  • Navigate privacy, compliance, and governance concerns

  • Deploy production-ready data services with modern DevOps patterns

This course bridges data engineering, API productization, and business model integration — giving you both the code and the context to drive value from data.

What’s Inside Each Day

Day 1 — Architecting for Value: Data Value Chains & Infrastructure Setup

  • Identify monetization-ready data across the organization

  • Map direct and indirect monetization strategies (internal, external, hybrid)

  • Set up infrastructure: Docker, Airflow, Spark clusters, cloud functions

  • Understand storage layers: Data lakes vs. warehouses (Delta Lake, BigQuery)

  • Implement data governance frameworks (GDPR, DMBOK)

  • Manage catalogs and metadata (Apache Atlas, OpenMetadata)

Tools: Apache Spark, Docker, Airflow, Delta Lake, BigQuery
Focus: Architecture • Infrastructure • Value Mapping

Day 2 — Engineering the Flow: Pipelines, Products & Insights

  • Design batch and real-time data pipelines (Kafka ➝ Spark ➝ BigQuery ➝ API)

  • Transform raw data into monetizable assets: enriched datasets, insights, ML features

  • Visualize and share: Kibana dashboards, Power BI tiles

  • Publish data products: FastAPI/Flask endpoints, API documentation

  • Package outputs for portability (Parquet, Arrow, JSON API)

Tools: Kafka, Spark, dbt, Power BI, Kibana, FastAPI
Focus: Pipelines • Productization • Delivery

Day 3 — Monetization Engines: APIs, Marketplaces & Security at Scale

  • Build and launch monetization-ready APIs (FastAPI + Swagger + billing)

  • Enable usage-based pricing, quotas, and access control (Stripe, OAuth2, JWT)

  • Integrate data marketplaces (Snowflake Marketplace, Dawex, Azure Data Share)

  • Enforce privacy, licensing, and IP policies (OpenPolicyAgent, GDPR tags)

  • Deploy full-stack services with observability, rate-limiting, and metering

Tools: FastAPI, Swagger, Stripe APIs, OAuth2, OpenPolicyAgent, Snowflake
Focus: Monetization • API Security • Licensing

Course Goals

By the end of this course, you’ll be able to:

  • Architect systems for scalable data monetization

  • Create and expose data products and APIs for internal or external use

  • Build pipelines that align with business value and reuse

  • Integrate billing, metering, and licensing into data services

  • Deploy compliant, secure, monetizable data workflows at scale

  • Understand and apply data governance frameworks across platforms

Who Should Take This Course?

  • Data engineers expanding into product and revenue-focused architecture

  • Backend developers building API-first services from data pipelines

  • Cloud architects implementing scalable, secure data platforms

  • DevOps professionals automating deployment of monetizable data workflows

  • ML engineers preparing data for external or multi-tenant delivery

  • CTOs and tech leads designing data business models

Class Reference: TID-020
Form Updated on: 06/16/2025 (Version 1)
Last Modified on: 06/16/2025

Program Note

This course is actively updated with new APIs, governance standards, and monetization frameworks to reflect the fast-moving data economy.

Links to resources for presentations or summaries:

Hortonworks Sandbox

Hadoop BI effort gets more out of big data at Yellow Pages

Managing Hadoop projects: What you need to know to succeed

What is Cassandra (Apache Cassandra)? – Definition from WhatIs.com

Apache Storm – Hortonworks

Apache Pig – Hortonworks

Apache Hive & Hadoop – Hortonworks

Apache Flume – Hortonworks

How to become a Data Scientist for Free

MongoDB NoSQL DBMS overview

What is JDBC driver? – Definition from WhatIs.com

What is Open Database Connectivity (ODBC)? – Definition from WhatIs.com

Will the R language benefit from Microsoft acquisition?

Apache Flink: New Hadoop contender squares off against Spark | InfoWorld

What is the Confluent Platform? — Confluent Platform 2.0.0 documentation

R Basic Syntax

5 Ways in Which Big Data Can Help Leverage Customer Data

sqrrl – Google Patents

Welcome. The R Journal

Hadoop as a Service: 18 Cloud Options

Hadoop Mock Test – TutorialsPoint

Droit de l’environnement et pratique notariale

DBMS

DBMS Data Models

Which in-memory DBMS best fits your company’s needs?

Which relational DBMS is best for your company?

Redis open source DBMS overview

MySQL open source RDBMS overview

Evaluating the different types of DBMS products

Data Warehouse Design

Make the right choice between Hadoop clusters and a data warehouse

What is MySQL? – Definition from WhatIs.com

flat file from FOLDOC

What is NoSQL (Not Only SQL database)? – Definition from WhatIs.com

Unstructured Data: InfoGraphics – Big Data News

What are primary, super, foreign and candidate keys in a DBMS?

A Practical Guide to Data Warehousing in Oracle, Part 2 — DatabaseJournal.com

Analytics

DBMS 2 : Database management and analytic technologies in a changing world

Fast analytics without coding

Guide to big data analytics tools, trends and best practices

What is recommendation engine? – Definition from WhatIs.com

Analytics, Data Mining, and Data Science

What is sensor analytics? – Definition from WhatIs.com

A Cheat Sheet on Probability – Data Science Central

Why I Will Never Have a Girlfriend | Tristan Miller

The Key to Data Monetization

Big Analytics Roundup (February 8, 2016) | The Big Analytics Blog

Directed acyclic graph – Wikipedia, the free encyclopedia

Application programming interface – Wikipedia, the free encyclopedia

Real-time operating system – Wikipedia, the free encyclopedia

Organic data growth and gaining access to the data

How data virtualization tools work

Adding a data virtualization layer to IT systems: Three questions to ask

Spark

Apache Spark Key Terms, Explained

Apache Spark Key Terms, Explained

Spark Packages

Examples | Apache Spark

Spark user survey suggests growth beyond Hadoop

What is Apache Spark? – Definition from WhatIs.com

Why Apache Spark is the Heir to MapReduce in the Hadoop Ecosystem

What is graph analytics? Definition from WhatIs.com

Brief pédagogique en français

TID-020/ Introduction à Big Data architecture est présenté ici en version synthétique française afin que les équipes pédagogiques puissent évaluer rapidement l'intérêt du module.

Le module s'inscrit dans la famille Analytics et intelligence artificielle. Il peut être adapté au calendrier de l'école, au niveau Tous niveaux, au volume horaire 21 h et aux modalités d'évaluation prévues.

Objectif d'intervention

Ce module vise à relier les outils data, IA et automatisation à des usages professionnels concrets.

Livrables et activités possibles

  • cas d'usage, prompts, scénarios d'automatisation ou analyses data
  • évaluation critique des résultats, limites et risques
  • communication claire des choix techniques et business

Adaptation école

LC peut ajuster le déroulé, la langue d'enseignement, les supports, les exercices et les critères d'évaluation selon la promotion, le diplôme, le niveau d'autonomie attendu et les contraintes de planning.

Pour une version détaillée du syllabus en français, LC confirme le programme final après cadrage du niveau, des heures, du calendrier et des livrables attendus.

Academic delivery team

Instructor matching for this module

After reviewing the module content, LC confirms the right delivery profile by topic, level, teaching language and assessment expectations.

Instructor matchingCurriculum fitAssessment support
Meriam Mbindyo

AI, data & software instructor

Meriam Mbindyo

Instructor for AI, data, DevOps, Agile and software modules, with experience across Paris-based IT and business schools.

Artificial intelligenceMachine learningData mining
Syed Mohammad Shah Mostafa

Digital strategy, AI & technical communication instructor

Syed Mohammad Shah Mostafa

Instructor for English-medium web, AI, technical communication and employability modules in higher-education technical programmes.

Digital strategyWeb developmentAI in business