Databricks Flashcards

๐Ÿ”ฅ Databricks Flashcards
๐Ÿ”ฅ What is Databricks?
Databricks is a cloud-based unified data analytics platform built on Apache Spark, supporting data engineering, data science, and machine learning.

๐Ÿ’ก What is the Lakehouse architecture?
Lakehouse combines the benefits of data lakes and data warehouses into a single platform for structured and unstructured data.

๐Ÿงช What is a Databricks Notebook?
A collaborative web-based interface where users can write and run code in Python, SQL, Scala, and R for data processing and ML.

๐Ÿ› ๏ธ What is a Cluster?
A cluster is a set of compute resources in Databricks used to execute notebooks, jobs, and SQL queries.

๐Ÿ“ฆ What is Delta Lake?
Delta Lake is an open-source storage layer that adds ACID transactions and schema enforcement on top of data lakes.

โš™๏ธ What is a Job in Databricks?
A Job is a scheduled task that runs notebooks or JAR/py files in an automated way, often used in production pipelines.

๐Ÿ”’ How does Databricks handle security?
Supports role-based access control (RBAC), workspace isolation, encryption, audit logs, and compliance standards like HIPAA and SOC 2.

๐Ÿ“Š What is Unity Catalog?
Unity Catalog provides centralized data governance for managing access and auditing across all Databricks assets.

๐Ÿ“ˆ What is MLflow in Databricks?
MLflow is an open-source platform integrated in Databricks for managing the ML lifecycle: experimentation, tracking, deployment, and model registry.

๐ŸŒ Does Databricks support SQL?
Yes, with Databricks SQL users can run interactive queries, visualize results, and build dashboards using familiar SQL syntax.

๐Ÿ“‚ What are Workspaces?
Workspaces are environments in Databricks where teams can collaborate, share notebooks, libraries, and data assets.

โšก Is Databricks multi-cloud?
Yes. Databricks runs on AWS, Microsoft Azure, and Google Cloud, providing consistent features across clouds.