Apache Spark

Apache Spark

Verified

Data Processing Engine

Unified analytics engine for large-scale data processing and machine learning.

Open Source employees
Open Source Project
Founded 2009
Visit Website

Work at Apache Spark?

Claim this profile to update your company information and connect with buyers

0

Product Overview

Apache Spark

Apache Spark is an open-source unified analytics engine for large-scale data processing. It provides high-performance in-memory computing capabilities for batch processing, real-time streaming, machine learning, and graph processing workloads across distributed clusters.

Unique Value Proposition

Apache Spark delivers lightning-fast processing speeds through in-memory computing and optimized execution, processing data up to 100x faster than traditional Hadoop MapReduce. Its unified engine supports diverse workloads from ETL to machine learning in a single framework with APIs in Java, Scala, Python, and R.

Categories

Big Data Integration Platforms
Analytics
Data Processing

Target Market

Industries

Technology
Financial Services
E-commerce
Telecommunications
Healthcare
Manufacturing

Company Size

100 - 50000 employees

Reviews (0)

No reviews yet. Be the first to review!

Pricing Information

Pricing Model

open_source

0

Key Features

In-Memory Processing
Resilient Distributed Datasets (RDDs)
DataFrame API
Structured Streaming
MLlib Machine Learning
GraphX Graph Processing
Spark SQL
Catalyst Optimizer
Tungsten Execution Engine
Dynamic Resource Allocation
Fault Tolerance
Lazy Evaluation
DAG Scheduler
Broadcast Variables
Accumulators
Checkpointing
Data Source APIs
UDF Support

Integrations

Hadoop HDFS
Apache Kafka
Apache Cassandra
Amazon S3
Azure Data Lake
Google Cloud Storage
Delta Lake
Apache Hive
Apache HBase
PostgreSQL
MySQL
MongoDB
Elasticsearch
Apache Parquet
Apache Avro
Apache ORC
Kubernetes
YARN
Mesos
Databricks
API Available
View Docs

Security Features

Authentication
SSL/TLS Encryption
Network Encryption
Kerberos Support
Access Control Lists
Column-level Security
Row-level Security
Audit Logging

Implementation & Support

Implementation Time

5 weeks (30 days)

Deployment Options

Cloud
On-Premise
Hybrid

Support Hours

Community Support