Highly Parallel Programming with Apache Spark

Tutorials – Apache Spark

Article from Issue 202/2017

Author(s): Ben Everard

Churn through lots of data with cluster computing on Apache's Spark platform.

As a society, we're creating more data than ever before. We're monitoring everything from the planet's weather to the performance of our computers, and we're storing all this information. But how do you process all this data? On a single machine, you can get a few terabytes of disk space and a few hundred gigabytes of memory (at least, you can if your pockets are deep enough), but how do you churn through a petabyte of raw ones and zeros? Basically, you're going to need more than one computer, and you're going to look for a method of running your programs on many machines at the same time: Apache Spark [1].

Before you run off and buy a rack of servers, slow down! We're going to start by introducing Spark on a single machine. Once you've mastered the basics, you can scale up.

Spark is a data processing engine that is often used with Hadoop for managing large amounts of data in a highly distributed manner. If you move forward with Spark, you're probably going to end up with a complete Hadoop setup; however, that's also getting ahead of ourselves. We can start Spark as a standalone service on a single computer.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Express-Checkout as PDF

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subs

Digisubs

TABLET & SMARTPHONE APPS

US / Canada

UK / Australia

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

KDE Unleashes Plasma 6.5

Flatpak , KDE , Plasma

The Plasma 6.5 desktop environment is now available with new features, improvements, and the usual bug fixes.
Xubuntu Site Possibly Hacked

Linux , Security , Xubuntu

It appears that the Xubuntu site was hacked and briefly served up a malicious ZIP file from its download page.
LMDE 7 Now Available

Cinnamon , DEBIAN , Linux mint

Linux Mint Debian Edition, version 7, has been officially released and is based on upstream Debian.
Linux Kernel 6.16 Reaches EOL

Kernel , Linux

Linux kernel 6.16 has reached its end of life, which means you'll need to upgrade to the next stable release, Linux kernel 6.17.
Amazon Ditches Android for a Linux-Based OS

Linux , Operating Systems , Tools

Amazon has migrated from Android to the Linux-based Vega OS for its Fire TV.
Cairo Dock 3.6 Now Available for More Compositors

Desktop , graphics , Linux

If you're a fan of third-party desktop docks, then the latest release of Cairo Dock with Wayland support is for you.
System76 Unleashes Pop!_OS 24.04 Beta

COSMIC , Operating Systems , Pop!_OS

System76's first beta of Pop!_OS 24.04 is an impressive feat.
Linux Kernel 6.17 is Available

Games , Kernel , Linux

Linus Torvalds has announced that the latest kernel has been released with plenty of core improvements and even more hardware support.
Kali Linux 2025.3 Released with New Hacking Tools

Kali Linux , Linux , Operating Systems

If you're a Kali Linux fan, you'll be glad to know that the third release of this famous pen-testing distribution is now available with updates for key components.
Zorin OS 18 Beta Available for Testing

Linux , Operating Systems , Zorin OS

The latest release from the team behind Zorin OS is ready for public testing, and it includes plenty of improvements to make it more powerful, user-friendly, and productive.

Highly Parallel Programming with Apache Spark

Tutorials – Apache Spark

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

KDE Unleashes Plasma 6.5

Xubuntu Site Possibly Hacked

LMDE 7 Now Available

Linux Kernel 6.16 Reaches EOL

Amazon Ditches Android for a Linux-Based OS

Cairo Dock 3.6 Now Available for More Compositors

System76 Unleashes Pop!_OS 24.04 Beta

Linux Kernel 6.17 is Available

Kali Linux 2025.3 Released with New Hacking Tools

Zorin OS 18 Beta Available for Testing

Highly Parallel Programming with Apache Spark

Tutorials – Apache Spark

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters