Documents
  • Invariant Documents
  • Platform
    • Data Platform
      • Install Overview
      • System Requirement
      • Software Requirement
      • Prepare the Environment
      • Installing Ambari Server
      • Setup Ambari Server
      • Start Ambari Server
      • Single Node Install
      • Multi-Node Cluster Install
      • Cluster Install from Ambari
      • Run and monitor HDFS
    • Apache Hadoop
      • Compatible Hadoop Versions
      • HDFS
        • HDFS Architecture
        • Name Node
        • Data Node
        • File Organization
        • Storage Format
          • ORC
          • Parquet
        • Schema Design
      • Hive
        • Data Organization
        • Data Types
        • Data Definition
        • Data Manipulation
          • CRUD Statement
            • Views, Indexes, Temporary Tables
        • Cost-based SQL Optimization
        • Subqueries
        • Common Table Expression
        • Transactions
        • SerDe
          • XML
          • JSON
        • UDF
      • Oozie
      • Sqoop
        • Commands
        • Import
      • YARN
        • Overview
        • Accessing YARN Logs
    • Apache Kafka
      • Compatible Kafka Versions
      • Installation
    • Elasticsearch
      • Compatible Elasticsearch Versions
      • Installation
  • Discovery
    • Introduction
      • Release Notes
    • Methodology
    • Discovery Pipeline
      • Installation
      • DB Event Listener
      • Pipeline Configuration
      • Error Handling
      • Security
    • Inventory Manager
      • Installation
      • Metadata Management
      • Column Mapping
      • Service Configuration
      • Metadata Configuration
      • Metadata Changes and Versioning
        • Generating Artifacts
      • Reconciliation, Merging Current View
        • Running daily reconciliation and merge
      • Data Inventory Reports
    • Schema Registry
  • Process Insight
    • Process Insight
      • Overview
    • Process Pipeline
      • Data Ingestion
      • Data Storage
    • Process Dashboards
      • Panels
      • Templating
      • Alerts
        • Rules
        • Notifications
  • Content Insight
    • Content Insight
      • Release Notes
      • Configuration
      • Content Indexing Pipeline
    • Management API
    • Query DSL
    • Configuration
  • Document Flow
    • Overview
  • Polyglot Data Manager
    • Polyglot Data Manager
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Operational Insight
    • Operational Insight
      • Release Notes
    • Data Store
      • Concepts
      • Sharding
    • Shippers
      • Filerelay Container
    • Processors
    • Search
    • User Interface
  • Data Science
    • Data Science Notebook
      • Setup JupyterLab
      • Configuration
        • Configuration Settings
        • Libraries
    • Spark DataHub
      • Concepts
      • Cluster Setup
      • Spark with YARN
      • PySpark Setup
        • DataFrame API
      • Reference
  • Product Roadmap
    • Roadmap
  • TIPS
    • Service Troubleshooting
    • Service Startup Errors
    • Debugging YARN Applications
      • YARN CLI
    • Hadoop Credentials
    • Sqoop Troubleshooting
    • Log4j Vulnerability Fix
Powered by GitBook
On this page
  • Jupyter Notebook
  • JupyterLab
  • JupyterHub
  1. Data Science

Data Science Notebook

PreviousUser InterfaceNextSetup JupyterLab

Last updated 3 years ago

Data Science Notebooks provide users an interface for interactive computing. It allows users to write and execute code, visualize the output, and then share the results with others. Notebooks are popular among data scientists and widely used by them for data analysis and exploration tasks.

Jupyter Notebook

Jupyter Notebook is an popular Python based (but not limited to Python) web application that allows users to create and share documents. The documents can contain live code, text, equations, and visualizations.

Jupyter notebook can be used for following use cases:

  • Data cleaning and transformation

  • Statistical modeling

  • Data visualization

  • Numerical simulation

  • Machine learning

For more info, visit

JupyterLab

JupyterLab is a web-based interactive development environment (IDE) for Jupyter notebooks, code, and data. It provides a configurable user interface, which can be modified to support a variety of workflows in data science and scientific computing. JupyterLab is extensible and modular, once can write plugins that add new components and integrate with existing ones.

For details, refer

JupyterHub

JupyterHub builds on the power of Jupyter notebooks and makes it available to multiple users. Since it is available as a managed environment, it gives the users access to computational environments and resources without worrying about local setup and maintenance. Users can have their own separate workspaces but the server runs on shared resources that can be managed centrally by system administrators.

Invariant data science notebooks can run both in the cloud or on premise. This makes it possible to serve a pre-configured environment that can be customized and scaled to suit your needs.

For more information about the Jupyter project, see

https://jupyter.org/
https://jupyterlab.readthedocs.io/en/stable/
https://jupyter.org/hub