Linux Journal

A Good Front End for R

3 weeks 4 days ago
A Good Front End for R Image Joey Bernard Thu, 04/26/2018 - 09:30 Data Analysis R Science

R is the de facto statistical package in the Open Source world. It's also quickly becoming the default data-analysis tool in many scientific disciplines.

R's core design includes a central processing engine that runs your code, with a very simple interface to the outside world. This basic interface means it's been easy to build graphical interfaces that wrap the core portion of R, so lots of options exist that you can use as a GUI.

In this article, I look at one of the available GUIs: RStudio. RStudio is a commercial program, with a free community version, available for Linux, Mac OSX and Windows, so your data analysis work should port easily regardless of environment.

For Linux, you can install the main RStudio package from the download page. From there, you can download RPM files for Red Hat-based distributions or DEB files for Debian-based distributions, then use either rpm or dpkg to do the installation.

For example, in Debian-based distributions, use the following to install RStudio:

sudo dpkg -i rstudio-xenial-1.1.423-amd64.deb

It's important to note that RStudio is only the GUI interface. This means you need to install R itself as a separate step. Install the core parts of R with:

sudo apt-get install r-base

There's also a community repository of available packages, called CRAN, that can add huge amounts of functionality to R. You'll want to install at least some of them in order to have some common tools to use:

sudo apt-get install r-recommended

There are equivalent commands for RPM-based distributions too.

At this point, you should have a complete system to do some data analysis.

When you first start RStudio, you'll see a window that looks somewhat like Figure 1.

Figure 1. RStudio creates a new session, including a console interface to R, where you can start your work.

The main pane of the window, on the left-hand side, provides a console interface where you can interact directly with the R session that's running in the back end.

The right-hand side is divided into two sections, where each section has multiple tabs. The default tab in the top section is an environment pane. Here, you'll see all the objects that have been created and exist within the current R session.

The other two tabs provide the history of every command given and a list of any connections to external data sources.

The bottom pane has five tabs available. The default tab gives you a file listing of the current working directory. The second tab provides a plot window where any data plots you generate are displayed. The third tab provides a nicely ordered view into R's library system. It shows a list of all of the currently installed libraries, along with tools to manage updates and install new libraries. The fourth tab is the help viewer. R includes a very complete and robust help system modeled on Linux man pages. The last tab is a general "viewer" pane to view other types of objects.

One part of RStudio that's a great help to people managing multiple areas of research is the ability to use projects. Clicking the menu item File→New Project pops up a window where you can select how your new project will exist on the filesystem.

Figure 2. When you create a new project, it can be created in a new directory, an existing directory or be checked out from a code repository.

As an example, let's create a new project hosted in a local directory. The file display in the bottom-right pane changes to the new directory, and you should see a new file named after the project name, with the filename ending .Rproj. This file contains the configuration for your new project. Although you can interact with the R session directly through the console, doing so doesn't really lead to easily reproduced workflows. A better solution, especially within a project, is to open a script editor and write your code within a script file. This way you automatically have a starting point when you move beyond the development phase of your research.

When you click File→New File→R Script, a new pane opens in the top left-hand side of the window.

Figure 3. The script editor allows you to construct more complicated pieces of code than is possible using just the console interface.

From here, you can write your R code with all the standard tools you'd expect in a code editor. To execute this code, you have two options. The first is simply to click the run button in the top right of this editor pane. This will run either the single line where the cursor is located or an entire block of code that previously had been highlighted.

Figure 4. You can enter code in the script editor and then have them run to make code development and data analysis a bit easier on your brain.

If you have an entire script file that you want to run as a whole, you can click the source button in the top right of the editor pane. This lets you reproduce analysis that was done at an earlier time.

The last item to mention is data visualization in RStudio. Actually, the data visualization is handled by other libraries within R. There is a very complete, and complex, graphics ability within the core of R. For normal humans, several libraries are built on top of this. One of the most popular, and for good reason, is ggplot. If it isn't already installed on your system, you can get it with:

install.packages(c('ggplot2'))

Once it's installed, you can make a simple scatter plot with this:

library(ggplot2) c <- data.frame(x=a, y=b) ggplot(c, aes(x=x, y=y)) + geom_point()

As you can see, ggplot takes dataframes as the data to plot, and you control the display with aes() function calls and geom function calls. In this case, I used the geom_point() function to get a scatter plot of points. The plot then is generated in the bottom-left pane.

Figure 5. ggplot2 is one of the most powerful and popular graphing tools available in the R environment.

There's a lot more functionality available in RStudio, including a server portion that can be run on a cluster, allowing you to develop code locally and then send it off to a server for the actual processing.

Joey Bernard

Official Ubuntu 18.04 LTS Release, Gmail Redesign, New Cinnamon 3.8 Desktop and More

3 weeks 4 days ago
News Distributions Google Ubuntu Desktop Amazon

News briefs for April 26, 2018.

Ubuntu 18.04 "Bionic Beaver" LTS is scheduled to be released officially today. This release features major changes, including kernel version updated to 4.15, GNOME instead of Unity, Python 2 no longer installed by default and much more. According to the release announcement "Ubuntu Server 18.04 LTS includes the Queens release of OpenStack including the clustering enabled LXD 3.0, new network configuration via netplan.io, and a next-generation fast server installer. See the Release Notes for download links.

Google's new Gmail redesign launched yesterday, with several new privacy features including a new "confidential mode", which allows users to set an expiration date for private email, and "integrated rights management", which lets users block forwarding, copying, downloading or printing certain messages. See the story on The Verge for more information.

openSUSE Tumbleweed had four snapshots released this week, including new updates for the kernel, Mesa, KDE Frameworks and a major version update of libglvnd. See the post on openSUSE for all the details.

The Cinnamon 3.8 desktop has been released and is already available in some repositories, including Arch Linux. Cinnamon 3.8, which is scheduled to ship with Linux Mint 19 "Tara" later this summer, this release "brings numerous improvements, new features, and lots of Python 3 ports for a bunch of components." (Source: Softpedia News.)

Attackers subverted Amazon's domain-resolution service on Tuesday, and according to an Ars Technica report, "masqueraded as cryptocurrency website MyEtherWallet.com and stole about $150,000 in digital coins from unwitting end users. They may have targeted other Amazon customers as well."

Jill Franklin

ONNX: the Open Neural Network Exchange Format

3 weeks 5 days ago
ONNX: the Open Neural Network Exchange Format Image Braddock Gaskill Wed, 04/25/2018 - 09:19 Deep Learning HPC Machine Learning ONNX open source python

An open-source battle is being waged for the soul of artificial intelligence. It is being fought by industry titans, universities and communities of machine-learning researchers world-wide. This article chronicles one small skirmish in that fight: a standardized file format for neural networks. At stake is the open exchange of data among a multitude of tools instead of competing monolithic frameworks.

The good news is that the battleground is Free and Open. None of the big players are pushing closed-source solutions. Whether it is Keras and Tensorflow backed by Google, MXNet by Apache endorsed by Amazon, or Caffe2 or PyTorch supported by Facebook, all solutions are open-source software.

Unfortunately, while these projects are open, they are not interoperable. Each framework constitutes a complete stack that until recently could not interface in any way with any other framework. A new industry-backed standard, the Open Neural Network Exchange format, could change that.

Now, imagine a world where you can train a neural network in Keras, run the trained model through the NNVM optimizing compiler and deploy it to production on MXNet. And imagine that is just one of countless combinations of interoperable deep learning tools, including visualizations, performance profilers and optimizers. Researchers and DevOps no longer need to compromise on a single toolchain that provides a mediocre modeling environment and so-so deployment performance.

What is required is a standardized format that can express any machine-learning model and store trained parameters and weights, readable and writable by a suite of independently developed software.

Enter the Open Neural Network Exchange Format (ONNX).

The Vision

To understand the drastic need for interoperability with a standard like ONNX, we first must understand the ridiculous requirements we have for existing monolithic frameworks.

A casual user of a deep learning framework may think of it as a language for specifying a neural network. For example, I want 100 input neurons, three fully connected layers each with 50 ReLU outputs, and a softmax on the output. My framework of choice has a domain language to specify this (like Caffe) or bindings to a language like Python with a clear API.

However, the specification of the network architecture is only the tip of the iceberg. Once a network structure is defined, the framework still has a great deal of complex work to do to make it run on your CPU or GPU cluster.

Python, obviously, doesn't run on a GPU. To make your network definition run on a GPU, it needs to be compiled into code for the CUDA (NVIDIA) or OpenCL (AMD and Intel) APIs or processed in an efficient way if running on a CPU. This compilation is complex and why most frameworks don't support both NVIDIA and AMD GPU back ends.

The job is still not complete though. Your framework also has to balance resource allocation and parallelism for the hardware you are using. Are you running on a Titan X card with more than 3,000 compute cores, or a GTX 1060 with far less than half as many? Does your card have 16GB of RAM or only 4? All of this affects how the computations must be optimized and run.

And still it gets worse. Do you have a cluster of 50 multi-GPU machines on which to train your network? Your framework needs to handle that too. Network protocols, efficient allocation, parameter sharing—how much can you ask of a single framework?

Now you say you want to deploy to production? You wish to scale your cluster automatically? You want a solid language with secure APIs?

When you add it all up, it seems absolutely insane to ask one monolithic project to handle all of those requirements. You cannot expect the authors who write the perfect network definition language to be the same authors who integrate deployment systems in Kubernetes or write optimal CUDA compilers.

The goal of ONNX is to break up the monolithic frameworks. Let an ecosystem of contributors develop each of these components, glued together by a common specification format.

The Ecosystem (and Politics)

Interoperability is a healthy sign of an open ecosystem. Unfortunately, until recently, it did not exist for deep learning. Every framework had its own format for storing computation graphs and trained models.

Late last year that started to change. The Open Neural Network Exchange format initiative was launched by Facebook, Amazon and Microsoft, with support from AMD, ARM, IBM, Intel, Huawei, NVIDIA and Qualcomm. Let me rephrase that as everyone but Google. The format has been included in most well known frameworks except Google's TensorFlow (for which a third-party converter exists).

This seems to be the classic scenario where the clear market leader, Google, has little interest in upending its dominance for the sake of openness. The smaller players are banding together to counter the 500-pound gorilla.

Google is committed to its own TensorFlow model and weight file format, SavedModel, which shares much of the functionality of ONNX. Google is building its own ecosystem around that format, including TensorFlow Server, Estimator and Tensor2Tensor to name a few.

The ONNX Solution

Building a single file format that can express all of the capabilities of all the deep learning frameworks is no trivial feat. How do you describe convolutions or recurrent networks with memory? Attention mechanisms? Dropout layers? What about embeddings and nearest neighbor algorithms found in fastText or StarSpace?

ONNX cribs a note from TensorFlow and declares everything is a graph of tensor operations. That statement alone is not sufficient, however. Dozens, perhaps hundreds, of operations must be supported, not all of which will be supported by all other tools and frameworks. Some frameworks may also implement an operation differently from their brethren.

There has been considerable debate in the ONNX community about what level tensor operations should be modeled at. Should ONNX be a mathematical toolbox that can support arbitrary equations with primitives such as sine and multiplication, or should it support higher-level constructs like integrated GRU units or Layer Normalization as single monolithic operations?

As it stands, ONNX currently defines about 100 operations. They range in complexity from arithmetic addition to a complete Long Short-Term Memory implementation. Not all tools support all operations, so just because you can generate an ONNX file of your model does not mean it will run anywhere.

Generation of an ONNX model file also can be awkward in some frameworks because it relies on a rigid definition of the order of operations in a graph structure. For example, PyTorch boasts a very pythonic imperative experience when defining models. You can use Python logic to lay out your model's flow, but you do not define a rigid graph structure as in other frameworks like TensorFlow. So there is no graph of operations to save; you actually have to run the model and trace the operations. The trace of operations is saved to the ONNX file.

Conclusion

It is early days for deep learning interoperability. Most users still pick a framework and stick with it. And an increasing number of users are going with TensorFlow. Google throws many resources and real-world production experience at it—it is hard to resist.

All frameworks are strong in some areas and weak in others. Every new framework must re-implement the full "stack" of functionality. Break up the stack, and you can play to the strengths of individual tools. That will lead to a healthier ecosystem.

ONNX is a step in the right direction.

Note: the ONNX GitHub page is here.

Disclaimer

Braddock Gaskill is a research scientist with eBay Inc. He contributed to this article in his personal capacity. The views expressed are his own and do not necessarily represent the views of eBay Inc.

About the Author

Braddock Gaskill has 25 years of experience in AI and algorithmic software development. He also co-founded the Internet-in-a-Box open-source project and developed the libre Humane Wikipedia Reader for getting content to students in the developing world.

Braddock Gaskill

Purism Partners with UBports to Offer Ubuntu Touch on the Librem 5, Red Hat Storage One Launches and More

3 weeks 5 days ago
News Purism Mobile Red Hat Storage Community

News briefs for April 25, 2018.

Purism has partnered with UBports to offer Ubuntu Touch on its Librem 5 smartphone. By default, the smartphone runs Purism's PureOS, which supports GNOME and KDE Plasma mobile interfaces. UBports is ensuring Ubuntu Touch will run on the phones as well, so the Librem 5 can "now offer users three fully free and open mobile operating system options".

Red Hat today announced the general availability of Red Hat Storage One, a "new approach to software-defined storage aimed at providing pre-configured, workload optimized systems on a number of hardware choices." Features include ease of installation, workload and hardware optimization, flexible scalability and cost-effectiveness.

Microsoft yesterday announced vcpkg, a single C++ library manager for Linux, macOS and Windows: "This gives you immediate access to the vcpkg catalog of C++ libraries on two new platforms, with the same simple steps you are familiar with on Windows and UWP today."

The Linux Foundation and Dice are looking for people to take their open-source job surveys: "this is your chance to let companies, HR and hiring managers and industry organizations know what motivates you as an open source professional." There two surveys, one for open-source professionals and one specifically for hiring managers.

Outreachy, which provides three-month internships for people from groups traditionally underrepresented in tech, has announced its accepted summer interns. Out of 1,264 applicants, 44 were chosen. Here's a list of all the interns and their projects. If you're interested in participating, the next round of Outreachy internships opens in September 2018 for the December 2018 to March 2019 internship round.

Jill Franklin

Lying with statistics, distributions, and popularity contests on Cooking With Linux (without a net)

3 weeks 6 days ago

Please support Linux Journal by subscribing or becoming a patron.

It's Tuesday and that means it's time for Cooking With Linux (without a net), sponsored and supported by Linux Journal. Today, I'm courting controversy by discussing numbers, OS popularity, and how to pick the right Linux distribution if you want to be where are the beautiful people hang out. And yes, I'll do it all live, without a net, and with a high probability of falling flat on my face.

Cooking with Linux Distributions
Marcel Gagné

Blockchain, Part II: Configuring a Blockchain Network and Leveraging the Technology

3 weeks 6 days ago
Blockchain, Part II: Configuring a Blockchain Network and Leveraging the Technology Image Petros Koutoupis Tue, 04/24/2018 - 11:30 Blockchain HOW-TOs Cryptocurrency Cryptominig

How to set up a private ethereum blockchain using open-source tools and a look at some markets and industries where blockchain technologies can add value.

In Part I, I spent quite a bit of time exploring cryptocurrency and the mechanism that makes it possible: the blockchain. I covered details on how the blockchain works and why it is so secure and powerful. In this second part, I describe how to set up and configure your very own private ethereum blockchain using open-source tools. I also look at where this technology can bring some value or help redefine how people transact across a more open web.

Setting Up Your Very Own Private Blockchain Network

In this section, I explore the mechanics of an ethereum-based blockchain network—specifically, how to create a private ethereum blockchain, a private network to host and share this blockchain, an account, and then how to do some interesting things with the blockchain.

What is ethereum, again? Ethereum is an open-source and public blockchain platform featuring smart contract (that is, scripting) functionality. It is similar to bitcoin but differs in that it extends beyond monetary transactions.

Smart contracts are written in programming languages, such as Solidity (similar to C and JavaScript), Serpent (similar to Python), LLL (a Lisp-like language) and Mutan (Go-based). Smart contracts are compiled into EVM (see below) bytecode and deployed across the ethereum blockchain for execution. Smart contracts help in the exchange of money, property, shares or anything of value, and it does so in a transparent and conflict-free way avoiding the traditional middleman.

If you recall from Part I, a typical layout for any blockchain is one where all nodes are connected to every other node, creating a mesh. In the world of ethereum, these nodes are referred to as Ethereum Virtual Machines (EVMs), and each EVM will host a copy of the entire blockchain. Each EVM also will compete to mine the next block or validate a transaction. Once the new block is appended to the blockchain, the updates are propagated to the entire network, so that each node is synchronized.

In order to become an EVM node on an ethereum network, you'll need to download and install the proper software. To accomplish this, you'll be using Geth (Go Ethereum). Geth is the official Go implementation of the ethereum protocol. It is one of three such implementations; the other two are written in C++ and Python. These open-source software packages are licensed under the GNU Lesser General Public License (LGPL) version 3. The standalone Geth client packages for all supported operating systems and architectures, including Linux, are available here. The source code for the package is hosted on GitHub.

Geth is a command-line interface (CLI) tool that's used to communicate with the ethereum network. It's designed to act as a link between your computer and all other nodes across the ethereum network. When a block is being mined by another node on the network, your Geth installation will be notified of the update and then pass the information along to update your local copy of the blockchain. With the Geth utility, you'll be able to mine ether (similar to bitcoin but the cryptocurrency of the ethereum network), transfer funds between two addresses, create smart contracts and more.

Download and Installation

In my examples here, I'm configuring this ethereum blockchain on the latest LTS release of Ubuntu. Note that the tools themselves are not restricted to this distribution or release.

Downloading and Installing the Binary from the Project Website

Download the latest stable release, extract it and copy it to a proper directory:

$ wget https://gethstore.blob.core.windows.net/builds/ ↪geth-linux-amd64-1.7.3-4bb3c89d.tar.gz $ tar xzf geth-linux-amd64-1.7.3-4bb3c89d.tar.gz $ cd geth-linux-amd64-1.7.3-4bb3c89d/ $ sudo cp geth /usr/bin/

Building from Source Code

If you are building from source code, you need to install both Go and C compilers:

$ sudo apt-get install -y build-essential golang

Change into the directory and do:

$ make geth

Installing from a Public Repository

If you are running on Ubuntu and decide to install the package from a public repository, run the following commands:

$ sudo apt-get install software-properties-common $ sudo add-apt-repository -y ppa:ethereum/ethereum $ sudo apt-get update $ sudo apt-get install ethereum Getting Started

Here is the thing, you don't have any ether to start with. With that in mind, let's limit this deployment to a "private" blockchain network that will sort of run as a development or staging version of the main ethereum network. From a functionality standpoint, this private network will be identical to the main blockchain, with the exception that all transactions and smart contracts deployed on this network will be accessible only to the nodes connected in this private network. Geth will aid in this private or "testnet" setup. Using the tool, you'll be able to do everything the ethereum platform advertises, without needing real ether.

Remember, the blockchain is nothing more than a digital and public ledger preserving transactions in their chronological order. When new transactions are verified and configured into a block, the block is then appended to the chain, which is then distributed across the network. Every node on that network will update its local copy of the chain to the latest copy. But you need to start from some point—a beginning or a genesis. Every blockchain starts with a genesis block, that is, a block "zero" or the very first block of the chain. It will be the only block without a predecessor. To create your private blockchain, you need to create this genesis block. To do this, you need to create a custom genesis file and then tell Geth to use that file to create your own genesis block.

Create a directory path to host all of your ethereum-related data and configurations and change into the config subdirectory:

$ mkdir ~/eth-evm $ cd ~/eth-evm $ mkdir config data $ cd config

Open your preferred text editor and save the following contents to a file named Genesis.json in that same directory:

{ "config": { "chainId": 999, "homesteadBlock": 0, "eip155Block": 0, "eip158Block": 0 }, "difficulty": "0x400", "gasLimit": "0x8000000", "alloc": {} }

This is what your genesis file will look like. This simple JSON-formatted string describes the following:

  • config — this block defines the settings for your custom chain.

  • chainId — this identifies your Blockchain, and because the main ethereum network has its own, you need to configure your own unique value for your private chain.

  • homesteadBlock — defines the version and protocol of the ethereum platform.

  • eip155Block / eip158Block — these fields add support for non-backward-compatible protocol changes to the Homestead version used. For the purposes of this example, you won't be leveraging these, so they are set to "0".

  • difficulty — this value controls block generation time of the blockchain. The higher the value, the more calculations a miner must perform to discover a valid block. Because this example is simply deploying a test network, let's keep this value low to reduce wait times.

  • gasLimit — gas is ethereum's fuel spent during transactions. As you do not want to be limited in your tests, keep this value high.

  • alloc — this section prefunds accounts, but because you'll be mining your ether locally, you don't need this option.

Now it's time to instantiate the data directory. Open a terminal window, and assuming you have the Geth binary installed and that it's accessible via your working path, type the following:

$ geth --datadir /home/petros/eth-evm/data/PrivateBlockchain ↪init /home/petros/eth-evm/config/Genesis.json WARN [02-10|15:11:41] No etherbase set and no accounts found ↪as default INFO [02-10|15:11:41] Allocated cache and file handles ↪database=/home/petros/eth-evm/data/PrivateBlockchain/ ↪geth/chaindata cache=16 handles=16 INFO [02-10|15:11:41] Writing custom genesis block INFO [02-10|15:11:41] Successfully wrote genesis state ↪database=chaindata hash=d1a12d...4c8725 INFO [02-10|15:11:41] Allocated cache and file handles ↪database=/home/petros/eth-evm/data/PrivateBlockchain/ ↪geth/lightchaindata cache=16 handles=16 INFO [02-10|15:11:41] Writing custom genesis block INFO [02-10|15:11:41] Successfully wrote genesis state ↪database=lightchaindata

The command will need to reference a working data directory to store your private chain data. Here, I have specified eth-evm/data/PrivateBlockchain subdirectories in my home directory. You'll also need to tell the utility to initialize using your genesis file.

This command populates your data directory with a tree of subdirectories and files:

$ ls -R /home/petros/eth-evm/ .: config data ./config: Genesis.json ./data: PrivateBlockchain ./data/PrivateBlockchain: geth keystore ./data/PrivateBlockchain/geth: chaindata lightchaindata LOCK nodekey nodes transactions.rlp ./data/PrivateBlockchain/geth/chaindata: 000002.ldb 000003.log CURRENT LOCK LOG MANIFEST-000004 ./data/PrivateBlockchain/geth/lightchaindata: 000001.log CURRENT LOCK LOG MANIFEST-000000 ./data/PrivateBlockchain/geth/nodes: 000001.log CURRENT LOCK LOG MANIFEST-000000 ./data/PrivateBlockchain/keystore:

Your private blockchain is now created. The next step involves starting the private network that will allow you to mine new blocks and have them added to your blockchain. To do this, type:

petros@ubuntu-evm1:~/eth-evm$ geth --datadir ↪/home/petros/eth-evm/data/PrivateBlockchain --networkid 9999 WARN [02-10|15:11:59] No etherbase set and no accounts found ↪as default INFO [02-10|15:11:59] Starting peer-to-peer node ↪instance=Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9.2 INFO [02-10|15:11:59] Allocated cache and file handles ↪database=/home/petros/eth-evm/data/PrivateBlockchain/ ↪geth/chaindata cache=128 handles=1024 WARN [02-10|15:11:59] Upgrading database to use lookup entries INFO [02-10|15:11:59] Initialised chain configuration ↪config="{ChainID: 999 Homestead: 0 DAO: DAOSupport: ↪false EIP150: EIP155: 0 EIP158: 0 Byzantium: ↪Engine: unknown}" INFO [02-10|15:11:59] Disk storage enabled for ethash caches ↪dir=/home/petros/eth-evm/data/PrivateBlockchain/ ↪geth/ethash count=3 INFO [02-10|15:11:59] Disk storage enabled for ethash DAGs ↪dir=/home/petros/.ethash count=2 INFO [02-10|15:11:59] Initialising Ethereum protocol ↪versions="[63 62]" network=9999 INFO [02-10|15:11:59] Database deduplication successful ↪deduped=0 INFO [02-10|15:11:59] Loaded most recent local header ↪number=0 hash=d1a12d...4c8725 td=1024 INFO [02-10|15:11:59] Loaded most recent local full block ↪number=0 hash=d1a12d...4c8725 td=1024 INFO [02-10|15:11:59] Loaded most recent local fast block ↪number=0 hash=d1a12d...4c8725 td=1024 INFO [02-10|15:11:59] Regenerated local transaction journal ↪transactions=0 accounts=0 INFO [02-10|15:11:59] Starting P2P networking INFO [02-10|15:12:01] UDP listener up ↪self=enode://f51957cd4441f19d187f9601541d66dcbaf560 ↪770d3da1db4e71ce5ba3ebc66e60ffe73c2ff01ae683be0527b77c0f96 ↪a178e53b925968b7aea824796e36ad27@[::]:30303 INFO [02-10|15:12:01] IPC endpoint opened: /home/petros/eth-evm/ ↪data/PrivateBlockchain/geth.ipc INFO [02-10|15:12:01] RLPx listener up ↪self=enode://f51957cd4441f19d187f9601541d66dcbaf560 ↪770d3da1db4e71ce5ba3ebc66e60ffe73c2ff01ae683be0527b77c0f96 ↪a178e53b925968b7aea824796e36ad27@[::]:30303

Notice the use of the new parameter, networkid. This networkid helps ensure the privacy of your network. Any number can be used here. I have decided to use 9999. Note that other peers joining your network will need to use the same ID.

Your private network is now live! Remember, every time you need to access your private blockchain, you will need to use these last two commands with the exact same parameters (the Geth tool will not remember it for you):

$ geth --datadir /home/petros/eth-evm/data/PrivateBlockchain ↪init /home/petros/eth-evm/config/Genesis.json $ geth --datadir /home/petros/eth-evm/data/PrivateBlockchain ↪--networkid 9999 Configuring a User Account

So, now that your private blockchain network is up and running, you can start interacting with it. But in order to do so, you need to attach to the running Geth process. Open a second terminal window. The following command will attach to the instance running in the first terminal window and bring you to a JavaScript console:

$ geth attach /home/petros/eth-evm/data/PrivateBlockchain/geth.ipc Welcome to the Geth JavaScript console! instance: Geth/v1.7.3-stable-4bb3c89d/linux-amd64/go1.9.2 modules: admin:1.0 debug:1.0 eth:1.0 miner:1.0 net:1.0 ↪personal:1.0 rpc:1.0 txpool:1.0 web3:1.0 >

Time to create a new account that will manipulate the Blockchain network:

> personal.newAccount() Passphrase: Repeat passphrase: "0x92619f0bf91c9a786b8e7570cc538995b163652d"

Remember this string. You'll need it shortly. If you forget this hexadecimal string, you can reprint it to the console by typing:

> eth.coinbase "0x92619f0bf91c9a786b8e7570cc538995b163652d"

Check your ether balance by typing the following script:

> eth.getBalance("0x92619f0bf91c9a786b8e7570cc538995b163652d") 0

Here's another way to check your balance without needing to type the entire hexadecimal string:

> eth.getBalance(eth.coinbase) 0 Mining

Doing real mining in the main ethereum blockchain requires some very specialized hardware, such as dedicated Graphics Processing Units (GPU), like the ones found on the high-end graphics cards mentioned in Part I. However, since you're mining for blocks on a private chain with a low difficulty level, you can do without that requirement. To begin mining, run the following script on the JavaScript console:

> miner.start() null

Updates in the First Terminal Window

You'll observe mining activity in the output logs displayed in the first terminal window:

INFO [02-10|15:14:47] Updated mining threads ↪threads=0 INFO [02-10|15:14:47] Transaction pool price threshold ↪updated price=18000000000 INFO [02-10|15:14:47] Starting mining operation INFO [02-10|15:14:47] Commit new mining work ↪number=1 txs=0 uncles=0 elapsed=186.855us INFO [02-10|15:14:57] Generating DAG in progress ↪epoch=1 percentage=0 elapsed=7.083s INFO [02-10|15:14:59] Successfully sealed new block ↪number=1 hash=c81539...dc9691 INFO [02-10|15:14:59] mined potential block ↪number=1 hash=c81539...dc9691 INFO [02-10|15:14:59] Commit new mining work ↪number=2 txs=0 uncles=0 elapsed=211.208us INFO [02-10|15:15:04] Generating DAG in progress ↪epoch=1 percentage=1 elapsed=13.690s INFO [02-10|15:15:06] Successfully sealed new block ↪number=2 hash=d26dda...e3b26c INFO [02-10|15:15:06] mined potential block ↪number=2 hash=d26dda...e3b26c INFO [02-10|15:15:06] Commit new mining work ↪number=3 txs=0 uncles=0 elapsed=510.357us [ ... ] INFO [02-10|15:15:52] Generating DAG in progress ↪epoch=1 percentage=8 elapsed=1m2.166s INFO [02-10|15:15:55] Successfully sealed new block ↪number=15 hash=d7979f...e89610 INFO [02-10|15:15:55] block reached canonical chain ↪number=10 hash=aedd46...913b66 INFO [02-10|15:15:55] mined potential block ↪number=15 hash=d7979f...e89610 INFO [02-10|15:15:55] Commit new mining work ↪number=16 txs=0 uncles=0 elapsed=105.111us INFO [02-10|15:15:57] Successfully sealed new block ↪number=16 hash=61cf68...b16bf2 INFO [02-10|15:15:57] block reached canonical chain ↪number=11 hash=6b89ff...de8f88 INFO [02-10|15:15:57] mined potential block ↪number=16 hash=61cf68...b16bf2 INFO [02-10|15:15:57] Commit new mining work ↪number=17 txs=0 uncles=0 elapsed=147.31us

Back to the Second Terminal Window

Wait 10–20 seconds, and on the JavaScript console, start checking your balance:

> eth.getBalance(eth.coinbase) 10000000000000000000

Wait some more, and list it again:

> eth.getBalance(eth.coinbase) 75000000000000000000

Remember, this is fake ether, so don't open that bottle of champagne, yet. You are unable to use this ether in the main ethereum network.

To stop the miner, invoke the following script:

> miner.stop() true

Well, you did it. You created your own private blockchain and mined some ether.

Who Will Benefit from This Technology Today and in the Future?

Although the blockchain originally was developed around cryptocurrency (more specifically, bitcoin), its uses don't end there. Today, it may seem like that's the case, but there are untapped industries and markets where blockchain technologies can redefine how transactions are processed. The following are some examples that come to mind.

Improving Smart Contracts

Ethereum, the same open-source blockchain project deployed earlier, already is doing the whole smart-contract thing, but the idea is still in its infancy, and as it matures, it will evolve to meet consumer demands. There's plenty of room for growth in this area. It probably and eventually will creep into governance of companies (such as verifying digital assets, equity and so on), trading stocks, handling intellectual property and managing property ownership, such as land title registration.

Enabling Market Places and Shared Economies

Think of eBay but refocused to be peer-to-peer. This would mean no more transaction fees, but it also will emphasize the importance of your personal reputation, since there will be no single body governing the market in which goods or services are being traded or exchanged.

Crowdfunding

Following in the same direction as my previous remarks about a decentralized marketplace, there also are opportunities for individuals or companies to raise the capital necessary to help "kickstart" their initiatives. Think of a more open and global Kickstarter or GoFundMe.

Multimedia Sharing or Hosting

A peer-to-peer network for aspiring or established musicians definitely could go a long way here—one where the content will reach its intended audiences directly and also avoid those hefty royalty costs paid out to the studios, record labels and content distributors. The same applies to video and image content.

File Storage and Data Management

By enabling a global peer-to-peer network, blockchain technology takes cloud computing to a whole new level. As the technology continues to push itself into existing cloud service markets, it will challenge traditional vendors, including Amazon AWS and even Dropbox and others—and it will do so at a fraction of the price. For example, cold storage data offerings are a multi-hundred billion dollar market today. By distributing your encrypted archives across a global and decentralized network, the need to maintain local data-center equipment by a single entity is reduced significantly.

Social media and how your posted content is managed would change under this model as well. Under the blockchain, Facebook or Twitter or anyone else cannot lay claim to what you choose to share.

Another added benefit to leveraging blockchain here is making use of the cryptography securing your valuable data from getting hacked or lost.

Internet of Things

What is the Internet of Things (IoT)? It is a broad term describing the networked management of very specific electronic devices, which include heating and cooling thermostats, lights, garage doors and more. Using a combination of software, sensors and networking facilities, people can easily enable an environment where they can automate and monitor home and/or business equipment.

Supply Chain Audits

With a distributed public ledger made available to consumers, retailers can't falsify claims made against their products. Consumers will have the ability to verify their sources, be it food, jewelry or anything else.

Identity Management

There isn't much to explain here. The threat is very real. Identity theft never takes a day off. The dated user name/password systems of today have run their course, and it's about time that existing authentication frameworks leverage the cryptographic capabilities offered by the blockchain.

Summary

This revolutionary technology has enabled organizations in ways that weren't possible a decade ago. Its possibilities are enormous, and it seems that any industry dealing with some sort of transaction-based model will be disrupted by the technology. It's only a matter of time until it happens.

Now, what will the future for blockchain look like? At this stage, it's difficult to say. One thing is for certain though; large companies, such as IBM, are investing big into the technology and building their own blockchain infrastructure that can be sold to and used by corporate enterprises and financial institutions. This may create some issues, however. As these large companies build their blockchain infrastructures, they will file for patents to protect their technologies. And with those patents in their arsenal, there exists the possibility that they may move aggressively against the competition in an attempt to discredit them and their value.

Anyway, if you will excuse me, I need to go make some crypto-coin.

Petros Koutoupis

Eclipse Foundation's New Open-Source Governance Model for Jakarta EE, Turris MOX Modular Router Campaign and More

3 weeks 6 days ago
News open source Cloud Android Google SUSE GNOME

News briefs for April 24, 2018.

The Eclipse Foundation announced today a new open-source governance model and "a 'cloud native Java' path forward for Jakarta EE, the new community-led platform created from the contribution of Java EE." According to the press release, with this move to the community-driven open-source governance model, "Jakarta EE promises faster release and innovation cycles." See https://jakarta.ee for more details or to join the Jakarta EE Working Group.

A Czech Republic company has launched an Indiegogo campaign to create a modular and open-source router called Turris MOX. The Turris MOX team claims it's "probably the first router configurable as easily as a sandwich. Choose the parts you actually want." (Source: It's FOSS.)

Amnesty International's Technology and Human Rights researcher Joe Westby recently said that Google "shows total contempt for Android users' privacy" in reference to the launch of its new "Chat" messaging service: "With its baffling decision to launch a messaging service without end-to-end encryption, Google has shown utter contempt for the privacy of Android users and handed a precious gift to cybercriminals and government spies alike, allowing them easy access to the content of Android users' communications." (Source: BleepingComputer.)

SUSE recently announced that SUSE Linux Enterprise High Performance Computing is in the SLE 15 Beta program: "as part of the (upcoming) major version of SUSE Linux Enterprise 15, SUSE Linux Enterprise High Performance Computing (HPC) is now a dedicated SLE product base on SLES 15. This product is available for the x86_64 and ARM aarch64 hardware platforms." For more info on the SLE 15 Public Beta program, visit here.

According to Phoronix, the GNOME shell memory leak has been fixed: "the changes are currently staged in Git for what will become GNOME 3.30 but might also be backported to 3.28". See GNOME developer Georges Stravracas' blog post for more details on the leak and its fix.

Jill Franklin

Heptio Announces Gimbal, Netflix Open-Sources Titus, Linux 4.15 Reaches End of Life and More

4 weeks ago
News Cloud Containers AWS Kubernetes OpenStack Gimbal kernel multimedia FFmpeg

News briefs for April 23, 2018.

Heptio this morning announces Gimbal, "an open source initiative to unify and scale the flow of network traffic into hybrid environments consisting of multiple Kubernetes clusters and traditional infrastructure technologies including OpenStack". The initiative is in collaboration with Actapio, a subsidiary of Yahoo Japan Corporation, and according to Craig McLuckie, founder and CEO of Heptio, "This collaboration demonstrates the full potential of cloud native technologies and open source as a way to not only manage applications, but address broader infrastructure considerations."

Netflix open-sources its Titus container management system. According to Christine Hall's DataCenter Knowledge article, Titus is tightly integrated with AWS, and it "launches as many as three million containers per week, to host thousands of applications over seven regionally isolated stacks across tens of thousands of EC2 virtual machines."

Linux 4.15 has reached end of life. Greg Kroah-Hartman announced on the LKML that if you're still using the 4.15 kernel series, it's time to upgrade to the 4.16.y kernel tree.

GNU Parallel 20180422 was released yesterday. GNU Parallel is "a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input". More info is available here.

FFmpeg has a new major release: version 4.0 "Wu". Some highlights include "bitstream filters for editing metadata in H.264, HEVC and MPEG-2 streams", experimental Magic YUV encoder, TiVo ty/ty+ demuxer and more.

Jill Franklin

Userspace Networking with DPDK

4 weeks ago
Userspace Networking with DPDK Image Rami Rosen Mon, 04/23/2018 - 07:07 Networking kernel Virtualization

DPDK is a fully open-source project that operates in userspace. It's a multi-vendor and multi-architecture project, and it aims at achieving high I/O performance and reaching high packet processing rates, which are some of the most important features in the networking arena. It was created by Intel in 2010 and moved to the Linux Foundation in April 2017. This move positioned it as one of the most dominant and most important open-source Linux projects. DPDK was created for the telecom/datacom infrastructure, but today, it's used almost everywhere, including the cloud, data centers, appliances, containers and more. In this article, I present a high-level overview of the project and discuss features that were released in DPDK 17.08 (August 2017).

Undoubtedly, a lot of effort in many networking projects is geared toward achieving high speed and high performance. Several factors contribute to achieving this goal with DPDK. One is that DPDK is a userspace application that bypasses the heavy layers of the Linux kernel networking stack and talks directly to the network hardware. Another factor is usage of memory hugepages. By using hugepages (of 2MB or 1GB in size), a smaller number of memory pages is needed than when using standard memory pages (which in many platforms are 4k in size). As a result, the number of Translation Lookaside Buffers (TLBs) misses is reduced significantly, and performance is increased. Yet another factor is that low-level optimizations are done in the code, some of them related to memory cache line alignment, aiming at achieving optimal cache use, prefetching and so on. (Delving into the technical details of those optimizations is outside the scope of this article.)

DPDK has gained popularity in recent years, and it's used in many open-source projects. Many Linux distributions (Fedora, Ubuntu and others) have included DPDK support in their packaging systems as well.

The core DPDK ingredients are libraries and drivers, also known as Poll Mode Drivers (PMDs). There are more than 35 libraries at the time of this writing. These libraries abstract away the low-level implementation details, which provides flexibility as each vendor implements its own low-level layers.

The DPDK Development Model

DPDK is written mostly in C, but the project also has a few tools that are written in Python. All code contributions to DPDK are done by patches sent and discussed over the dpdk-dev mailing list. Patches aiming at getting feedback first are usually titled RFCs (Request For Comments). In order to keep the code as stable as possible, the preference is to preserve the ABI (Application Binary Interface) whenever possible. When it seems that there's no other choice, developers should follow a strict ABI deprecation process, including announcement of the requested ABI changes over the dpdk-dev mailing list ahead of time. The ABI changes that are approved and merged are documented in the Release Notes. When acceptance of new features is in doubt, but the respective patches are merged into the master tree anyway, they are tagged as "EXPERIMENTAL". This means that those patches may be changed or even could be removed without prior notice. Thus, for example, new rte_bus experimental APIs were added in DPDK 17.08. I also should note that usually whenever patches for a new, generic API (which should support multiple hardware devices from different vendors) are sent over the mailing list, it's expected that at least one hardware device that supports the new feature is available on the market (if the device is merely announced and not available, developers can't test it).

There's a technical board of nine members from various companies (Intel, NXP, 6WIND, Cavium and others). Meetings typically are held every two weeks over IRC, and the minutes of those meetings are posted on the dpdk-dev mailing list.

As with other large open-source projects, there are community-driven DPDK events across the globe on a regular basis every year. First, there are various DPDK Summits. Among them, DPDK Summit Userspace is focused on being more interactive and on getting feedback from the community. There also are several DPDK meetups in different locations around the world. Moreover, from time to time there is an online community survey, announced over the dpdk-dev mailing list, in order to get feedback from the community, and everyone can participate in it.

The DPDK website hosts the master DPDK repo, but several other repos are dedicated for new features. Several tools and utilities exist in the DPDK tree, among them are the dpdk-devbind.py script, which is for associating a network device or a crypto device with DPDK, and testpmd, which is a CLI tool for various tasks, such as forwarding, monitoring statistics and more. There are almost 50 sample applications under the "examples" folder, bundled with full detailed documentation.

Apart from DPDK itself, the DPDK site hosts several other open-source projects. One is the DPDK Test Suite (DTS), which a Python-based framework for DPDK. It has more than 100 test modules for various features, including the most advanced and most recent features. It runs with IXIA and Scapy traffic generators. It includes both functional and benchmarking tests, and it's very easy to configure and run, as you need to set up only three or four configuration files. You also can set the DPDK version with which you want to run it. DTS development is handled over a dedicated mailing list, and it currently has support for Intel NICs and Mallanox NICs.

DPDK is released every three months. This release cadence is designed to allow DPDK to keep evolving in a rapid pace while giving enough opportunity to review, discuss and improve the contributions. There are usually 3–5 release candidates (RCs) before the final release. For the 17.08 release, there were 1,023 patches from 125 authors, including patches from Intel, Cavium, 6WIND, NXP and others. The release numbers follow the Ubuntu versions convention. A Long Term Stable (LTS) release is maintained for two years. Plans for future LTS releases currently are being discussed in the DPDK community. The plan is to make every .11 release in an even-numbered year (16.11, 18.11 and so forth ) an LTS release and to maintain it for two years.

Recent Features and New Ideas

Several interesting features were added last year. One of the most fascinating capabilities (added in DPDK 17.05, with new features enabled in 17.08 and 17.11) is "Dynamic Device Personalization" (DDP) for the Intel I40E driver (10Gb/25Gb/40Gb). This feature allows applying a per-device profile to the I40E firmware dynamically. You can load a profile by running a testpmd CLI command (ddp add), and you can remove it with ddp del. You also can apply or remove profiles when traffic is flowing, with a small number of packets dropped during handling a profile. These profiles are created by Intel and not by customers, as I40E firmware programming requires deep knowledge of the I40E device internals.

Other features to mention include Bruce Richardson's build system patch, which provides a more efficient build system for DPDK with meson and ninja, a new kernel module called Kernel Control Path (KCP), port representors and more.

DPDK and Networking Projects

DPDK is used in various important networking projects. The list is quite long, but I want to mention a few of them briefly:

  • Open vSwitch (OvS): the OvS project implements a virtual network switch. It was transitioned to the Linux Foundation in August 2016 and gained a lot of popularity in the industry. DPDK was first integrated into OvS 2.2 in 2015. Later, in OvS 2.4, support for vHost user, which is a virtual device, was added. Support for advanced features like multi-queues and numa awareness was added in subsequent releases.
  • Contrail vRouter: Contrail Systems was a startup that developed SDN controllers. Juniper Networks acquired it in 2012, and Juniper Networks released the Contrail vRouter later as an open-source project. It uses DPDK to achieve better network performance.
  • pktgen-dpdk: an open-source traffic generator based on DPDK (hosted on the DPDK site).
  • TREX: a stateful and stateless open-source traffic generator based on DPDK.
  • Vector Packet Processing (VPP): an FD.io project.
Getting Started with DPDK

For those who are newcomers to DPDK, both users and developers, there is excellent documentation hosted on DPDK site. It's recommended that you actually try run several of the sample applications (following the "Sample Applications User Guides"), starting with the "Hello World" application. It's also a good idea to follow the dpdk-users mailing list on a regular basis. For those who are interested in development, the Programmer's Guide is a good source of information about the architecture and development environment, and developers should follow the dpdk-dev mailing list as well.

DPDK and SR-IOV Example

I want to conclude this article with a very basic example (based on SR-IOV) of how to create a DPDK VF and how to attach it to a VM with qemu. I also show how to create a non-DPDK VF ("kernel VF"), attach it to a VM, run a DPDK app on that VF and communicate with it from the host.

As a preparation step, you need to enable IOMMU and virtualization on the host. To support this, add intel_iommu=on iommu=pt as kernel parameters to the kernel command line (in grub.cfg), and also to enable virtualization and VT-d in the BIOS (VT-d stands for "Intel Virtualization Technology for Directed I/O"). You'll use the Intel I40E network interface card for this example. The I40E device driver supports up to 128 VFs per device, divided equally across ports, so if you have a quad-port I40E NIC, you can create up to 32 VFs on each port.

For this example, I also show a simple usage of the testpmd CLI, as mentioned earlier. This example is based on DPDK-17.08, the most recent release of DPDK at the time of this writing. In this example, you'll use Single Root I/O Virtualization (SR-IOV), which is an extension of the PCI Express (PCIe) specification and allows sharing a single physical PCI Express resource across several virtual environments. This technology is very popular in data-center/cloud environments, and many network adapters support this feature, and likewise, their drivers support this feature. I should note that SRIOV is not limited to network devices, but is available for other PCI devices as well, such as graphic cards.

DPDK VF

You create DPDK VFs by writing the number of requested VFs into a DPDK sysfs entry called max_vfs. Say that eth8 is the PF on top of which you want to create a VF and its PCI address is 0000:07:00.0. (You can fetch the PCI address with ethtool -i | grep bus-info.) The following is the sequence you run on the host in order to create a VF and launch a VM. First, bind the PF to DPDK with usertools/dpdk-devbind.py, for example:

modprobe uio insmod /build/kmod/igb_uio.k ./usertools/dpdk-devbind.py -b igb_uio 0000:07:00.0

Then, create two DPDK VFs with:

echo 2 > /sys/bus/pci/devices/0000:07:00.0/max_vfs

You can verify that the two VFs were created by this operation by checking whether two new entries were added when running: lspci | grep "Virtual Function", or by verifying that you have now two new symlinks under /sys/bus/pci/devices/0000:07:00.0/ for the two newly created VFs: virtfn0 and virtfn1.

Next, launch the VMs via qemu using PCI Passthrough, for example:

qemu-system-x86_64 -enable-kvm -cpu host \ -drive file=Ubuntu_1604.qcow2,index=0,media=disk,format=qcow2 \ -smp 5 -m 2048 -vga qxl \ -vnc :1 \ -device pci-assign,host=0000:07:02.0 \ -net nic,macaddr=00:00:00:99:99:01 \ -net tap,script=/etc/qemu-ifup.

Note: qemu-ifup is a shell script that's invoked when the VM is launched, usually for setting up networking.

Next, you can start a VNC client (such as RealVNC client) to access the VM, and from there, you can verify that the VF was indeed assigned to it, with lspci -n. You should see a single device, which has "8086 154c" as the vendor ID/device ID combination; "8086 154c" is the virtual function PCI ID of the I40E NIC. You can launch a DPDK application in the guest on top of that VF.

Kernel VF

To conclude this example, let's create a kernel VF on the host and run a DPDK on top of it in the VM, and then let's look at a simple interaction with the host PF.

First, create two kernel VFs with:

echo 2 > /sys/bus/pci/devices/0000:07:00.0/sriov_numvfs

Here again you can verify that these two VFs were created by running lspci | grep "Virtual Function".

Next, run this sequence:

echo "8086 154c" > /sys/bus/pci/drivers/pci-stub/new_id echo 07:02.0 > /sys/bus/pci/devices/$VF_PCI_0/driver/unbind echo 07:02.0 > /sys/bus/pci/drivers/pci-stub/bind

Then launch the VM the same way as before, with the same qemu-system-x86_64 command mentioned earlier. Again, in the guest, you should be able to see the I40E VF with lspci -n. On the host, doing ip link show will show the two VFs of eth8: vf 0 and vf 1. You can set the MAC addresses of a VF from the host with ip link set—for example:

ip link set eth8 vf 0 mac 00:11:22:33:44:55

Then, when you run a DPDK application like testpmd in the guest, and run, for example, show port info 0 from the testpmd CLI, you'll see that indeed the MAC address that you set in the host is reflected for that VF in DPDK.

Summary

This article provides a high-level overview of the DPDK project, which is growing dynamically and gaining popularity in the industry. The near future likely will bring support for more network interfaces from different vendors, as well as new features.

Rami Rosen

Weekend Reading: Networking

1 month ago
Weekend Reading: Networking Image Carlie Fairchild Sat, 04/21/2018 - 08:08 Networking Networking is one of Linux's strengths and a popular topic for our subscribers. For your weekend reading, we've curated some of Linux Journal's most popular networking articles. 

 

NTPsec: a Secure, Hardened NTP Implementation

by Eric S. Raymond

Network time synchronization—aligning your computer's clock to the same Universal Coordinated Time (UTC) that everyone else is using—is both necessary and a hard problem. Many internet protocols rely on being able to exchange UTC timestamps accurate to small tolerances, but the clock crystal in your computer drifts (its frequency varies by temperature), so it needs occasional adjustments.

 

smbclient Security for Windows Printing and File Transfer

by Charles Fisher

Microsoft Windows is usually a presence in most computing environments, and UNIX administrators likely will be forced to use resources in Windows networks from time to time. Although many are familiar with the Samba server software, the matching smbclient utility often escapes notice.

 

Understanding Firewalld in Multi-Zone Configurations

by Nathan R. Vance and William F. Polik

Stories of compromised servers and data theft fill today's news. It isn't difficult for someone who has read an informative blog post to access a system via a misconfigured service, take advantage of a recently exposed vulnerability or gain control using a stolen password. Any of the many internet services found on a typical Linux server could harbor a vulnerability that grants unauthorized access to the system.

 

Papa's Got a Brand New NAS

by Kyle Rankin

It used to be that the true sign you were dealing with a Linux geek was the pile of computers lying around that person's house. How else could you experiment with networked servers without a mass of computers and networking equipment? If you work as a sysadmin for a large company, sometimes one of the job perks is that you get first dibs on decommissioned equipment. Through the years, I was able to amass quite a home network by combining some things I bought myself with some equipment that was too old for production. A major point of pride in my own home network was the 24U server cabinet in the garage. It had a gigabit top-of-rack managed switch, a 2U UPS at the bottom, and in the middle was a 1U HP DL-series server with a 1U eSATA disk array attached to it. Above that was a slide-out LCD and keyboard in case I ever needed to work on the server directly.

 

Banana Backups

by Kyle Rankin

I wrote an article called "Papa's Got a Brand New NAS" where I described how I replaced my rackmounted gear with a small, low-powered ARM device—the Odroid XU4. Before I settled on that solution, I tried out a few others including a pair of Banana Pi computers—small single-board computers like Raspberry Pis only with gigabit networking and SATA2 controllers on board. In the end, I decided to go with a single higher-powered board and use a USB3 disk enclosure with RAID instead of building a cluster of Banana Pis that each had a single disk attached. Since I had two Banana Pis left over after this experiment, I decided to put them to use, so in this article, I describe how I turned one into a nice little backup server.

 

Roll Your Own Enterprise Wi-Fi

by Shawn Powers

The UniFi line of products from Ubiquiti is affordable and reliable, but the really awesome feature is its (free!) Web-based controller app. The only UniFi products I have are wireless access points, even though the company also has added switches, gateways and even VoIP products to the mix. Even with my limited selection of products, however, the Web controller makes designing and maintaining a wireless network not just easy, but fun!

 

Tracking Down Blips

by Shawn Powers

In a previous article, I explained the process for setting up Cacti, which is a great program for graphing just about anything. One of the main things I graph is my internet usage. And, it's great information to have, until there is internet activity you can't explain. In my case, there was a "blip" every 20 minutes or so that would use about 4mbps of bandwidth (Figure 1). In the grand scheme of things, it wasn't a big deal, because my connection is 60mbps down. Still, it was driving me crazy. I don't like the idea of something on my network doing things on the internet without my knowledge. So, the hunt began.

 

Carlie Fairchild

Caption This!

1 month ago
Caption This! Image Carlie Fairchild Fri, 04/20/2018 - 11:21 caption this Contest

Each month, we provide a cartoon in need of a caption. You submit your caption, we choose three finalists, and readers vote for their favorite. The winning caption for this month's cartoon will appear in the June issue of Linux Journal.

To enter, simply type in your caption in the comments below or email us, publisher@linuxjournal.com.

Carlie Fairchild

Mozilla's Common Voice Project, Red Hat Announces Vault Operator, VirtualBox 5.2.10 Released and More

1 month ago
Mozilla open source Apple Red Hat News VirtualBox Chrome OS Cloud Security Containers

News briefs April 20, 2018.

Participate in Mozilla's open-source Common Voice Project, an initiative to help teach machines how real people speak: "Now you can donate your voice to help us build an open-source voice database that anyone can use to make innovative apps for devices and the web." For more about the Common Voice Project, see the story on opensource.com.

Red Hat yesterday announced the Vault Operator, a new open-source project that "aims to make it easier to install, manage, and maintain instances of Vault—a tool designed for storing, managing, and controlling access to secrets, such as tokens, passwords, certificates, and API keys—on Kubernetes clusters."

Google might be working on implementing dual-boot functionality in Chrome OS to allow Chromebook users to boot multiple OSes. Softpedia News reports on a Reddit thread that references "Alt OS" in recent Chromium Gerrit commits. This is only speculation so far, and Google has not confirmed it is working on dual-boot support for Chrome OS on Chromebooks.

Oracle recently released VirtualBox 5.2.10. This release addresses the CPU (Critical Patch Updates) Advisory for April 2018 related to Oracle VM VirtualBox and several other improvements, including fixing a KDE Plasma hang and having multiple NVMe controllers with ICH9 enabled. See the Changelog for all the details.

Apple yesterday announced it has open-sourced its FoundationDB cloud database. Apple's goal is "to build a community around the project and make FoundationDB the foundation for the next generation of distributed databases". The project is now available on GitHub.

Jill Franklin

More L337 Translations

1 month ago
More L337 Translations Image Dave Taylor Thu, 04/19/2018 - 09:20 HOW-TOs Programming Shell Scripting

Dave continues with his shell-script L33t translator.

In my last article, I talked about the inside jargon of hackers and computer geeks known as "Leet Speak" or just "Leet". Of course, that's a shortened version of the word Elite, and it's best written as L33T or perhaps L337 to be ultimately kewl. But hey, I don't judge.

Last time I looked at a series of simple letter substitutions that allow you to convert a sentence like "I am a master hacker with great skills" into something like this:

I AM A M@ST3R H@XR WITH GR3@T SKILLZ

It turns out that I missed some nuances of Leet and didn't realize that most often the letter "a" is actually turned into a "4", not an "@", although as with just about everything about the jargon, it's somewhat random.

In fact, every single letter of the alphabet can be randomly tweaked and changed, sometimes from a single letter to a sequence of two or three symbols. For example, another variation on "a" is "/-\" (for what are hopefully visually obvious reasons).

Continuing in that vein, "B" can become "|3", "C" can become "[", "I" can become "1", and one of my favorites, "M" can change into "[]V[]". That's a lot of work, but since one of the goals is to have a language no one else understands, I get it.

There are additional substitutions: a word can have its trailing "S" replaced by a "Z", a trailing "ED" can become "'D" or just "D", and another interesting one is that words containing "and", "anned" or "ant" can have that sequence replaced by an ampersand (&).

Let's add all these L337 filters and see how the script is shaping up.

But First, Some Randomness

Since many of these transformations are going to have a random element, let's go ahead and produce a random number between 1–10 to figure out whether to do one or another action. That's easily done with the $RANDOM variable:

doit=$(( $RANDOM % 10 )) # random virtual coin flip

Now let's say that there's a 50% chance that a -ed suffix is going to change to "'D" and a 50% chance that it's just going to become "D", which is coded like this:

if [ $doit -ge 5 ] ; then word="$(echo $word | sed "s/ed$/d/")" else word="$(echo $word | sed "s/ed$/'d/")" fi

Let's add the additional transformations, but not do them every time. Let's give them a 70–90% chance of occurring, based on the transform itself. Here are a few examples:

if [ $doit -ge 3 ] ; then word="$(echo $word | sed "s/cks/x/g;s/cke/x/g")" fi if [ $doit -ge 4 ] ; then word="$(echo $word | sed "s/and/\&/g;s/anned/\&/g; s/ant/\&/g")" fi

And so, here's the second translation, a bit more sophisticated:

$ l33t.sh "banned? whatever. elite hacker, not scriptie." B&? WH4T3V3R. 3LIT3 H4XR, N0T SCRIPTI3.

Note that it hasn't realized that "elite" should become L337 or L33T, but since it is supposed to be rather random, let's just leave this script as is. Kk? Kewl.

If you want to expand it, an interesting programming problem is to break each word down into individual letters, then randomly change lowercase to uppercase or vice versa, so you get those great ransom-note-style WeiRD LeTtEr pHrASes.

Next time, I plan to move on, however, and look at the great command-line tool youtube-dl, exploring how to use it to download videos and even just the audio tracks as MP3 files.

Dave Taylor

Help Canonical Test GNOME Patches, Android Apps Illegally Tracking Kids, MySQL 8.0 Released and More

1 month ago
GNOME Desktop News Android Security Privacy MySQL KDE LibreOffice Cloud

News briefs for April 19, 2018.

Help Canonical test the GNOME desktop memory leak fixes in Ubuntu 18.04 LTS (Bionic Beaver) by downloading and installing the current daily ISO for your hardware from here: http://cdimage.ubuntu.com/daily-live/current/bionic-desktop-amd64.iso. Then download the patched version of gjs, install, reboot, and then just use your desktop normally. If performance seems impacted by the new packages, re-install from the ISO again, but don't install the new packages and see if things are better. See the Ubuntu Community page for more detailed instructions.

Thousands of Android apps downloaded from the Google Play store may be tracking kids' data illegally, according to a new study. NBC News reports: "Researchers at the University of California's International Computer Science Institute analyzed 5,855 of the most downloaded kids apps, concluding that most of them are 'are potentially in violation' of the Children's Online Privacy Protection Act 1998, or COPPA, a federal law making it illegal to collect personally identifiable data on children under 13."

MySQL 8.0 has been released. This new version "includes significant performance, security and developer productivity improvements enabling the next generation of web, mobile, embedded and Cloud applications." MySQL 8.0 features include MySQL document store, transactional data dictionary, SQL roles, default to utf8mb4 and more. See the white paper for all the details.

KDE announced this morning that KDE Applications 18.04.0 are now available. New features include improvements to panels in the Dolphin file manager; Wayland support for KDE's JuK music player; improvements to Gwenview, KDE's image viewer and organizer; and more.

Collabora Productivity, "the driving force behind putting LibreOffice in the cloud", announced a new release of its enterprise-ready cloud document suite—Collabora Online 3.2. The new release includes implemented chart creation, data validation in Calc, context menu spell-checking and more.

Jill Franklin

An Update on Linux Journal

1 month ago
An Update on Linux Journal Image Carlie Fairchild Wed, 04/18/2018 - 12:41 Linux Journal Subscribe Patron Advertise Write

So many of you have asked how to help Linux Journal continue to be published* for years to come.

First, keep the great ideas coming—we all want to continue making Linux Journal 2.0 something special, and we need this community to do it.

Second, subscribe or renew. Magazines have a built-in fundraising program: subscriptions. It's true that most magazines don't survive on subscription revenue alone, but having a strong subscriber base tells Linux Journal, prospective authors, and yes, advertisers, that there is a community of people who support and read the magazine each month.

Third, if you prefer reading articles on our website, consider becoming a Patron. We have different Patreon reward levels, one even gets your name immortalized in the pages of Linux Journal.

Fourth, spread the word within your company about corporate sponsorship of Linux Journal. We as a community reject tracking, but we explicitly invite high-value advertising that sponsors the magazine and values readers. This is new and unique in online publishing, and just one example of our pioneering work here at Linux Journal.  

Finally, write for us! We are always looking for new writers, especially now that we are publishing more articles more often.  

With all our gratitude,

Your friends at Linux Journal

 

*We'd be remiss to not acknowledge or thank Private Internet Access for saving the day and bringing Linux Journal back from the dead. They are incredibly supportive partners and sincerely, we can not thank them enough for keeping us going. At a certain point however, Linux Journal has to become sustainable on its own.

Carlie Fairchild

Rise of the Tomb Raider Comes to Linux Tomorrow, IoT Developers Survey, New Zulip Release and More

1 month ago
News gaming Chrome GIMP IOT openSUSE Distributions Desktop

News briefs for April 18, 2018.

Rise of the Tomb Raider: 20 Year Celebration comes to Linux tomorrow! A minisite dedicated to Rise of the Tomb Raider is available now from Feral Interactive, and you also can view the trailer on Feral's YouTube channel.

Zulip 1.8, the open-source team chat software, announces the release of Zulip Server 1.8. This is a huge release, with more than 3500 new commits since the last release in October 2017. Zulip "is an alternative to Slack, HipChat, and IRC. Zulip combines the immediacy of chat with the asynchronous efficiency of email-style threading, and is 100% free and open-source software".

The IoT Developers Survey 2018 is now available. The survey was sponsored by the Eclipse IoT Working Group, Agile IoT, IEEE and the Open Mobile Alliance "to better understand how developers are building IoT solutions". The survey covers what people are building, key IoT concerns, top IoT programming languages and distros, and more.

Google released Chrome 66 to its stable channel for desktop/mobile users. This release includes many security improvements as well as new JavaScript APIs. See the Chrome Platform Status site for details.

openSUSE Leap 15 is scheduled for release May 25, 2018. Leap 15 "shares a common core with SUSE Linux Enterprise (SLE) 15 sources and has thousands of community packages on top to meet the needs of professional and semi-professional users and their workloads."

GIMP 2.10.0 RC 2 has been released. This release fixes 44 bugs and introduces important performance improvements. See the complete list of changes here.

Jill Franklin

Create Dynamic Wallpaper with a Bash Script

1 month ago
Create Dynamic Wallpaper with a Bash Script Image Patrick Wheelan Wed, 04/18/2018 - 09:58 bash Desktop Programming

Harness the power of bash and learn how to scrape websites for exciting new images every morning.

So, you want a cool dynamic desktop wallpaper without dodgy programs and a million viruses? The good news is, this is Linux, and anything is possible. I started this project because I was bored of my standard OS desktop wallpaper, and I have slowly created a plethora of scripts to pull images from several sites and set them as my desktop background. It's a nice little addition to my day—being greeted by a different cat picture or a panorama of a country I didn't know existed. The great news is that it's easy to do, so let's get started.

Why Bash?

BAsh (The Bourne Again shell) is standard across almost all *NIX systems and provides a wide range of operations "out of the box", which would take time and copious lines of code to achieve in a conventional coding or even scripting language. Additionally, there's no need to re-invent the wheel. It's much easier to use somebody else's program to download webpages for example, than to deal with low-level system sockets in C.

How's It Going to Work?

The concept is simple. Choose a site with images you like and "scrape" the page for those images. Then once you have a direct link, you download them and set them as the desktop wallpaper using the display manager. Easy right?

A Simple Example: xkcd

To start off, let's venture to every programmer's second-favorite page after Stack Overflow: xkcd. Loading the page, you should be greeted by the daily comic strip and some other data.

Now, what if you want to see this comic without venturing to the xkcd site? You need a script to do it for you. First, you need to know how the webpage looks to the computer, so download it and take a look. To do this, use wget, an easy-to-use, commonly installed, non-interactive, network downloader. So, on the command line, call wget, and give it the link to the page:

user@LJ $: wget https://www.xkcd.com/ --2018-01-27 21:01:39-- https://www.xkcd.com/ Resolving www.xkcd.com... 151.101.0.67, 151.101.192.67, ↪151.101.64.67, ... Connecting to www.xkcd.com|151.101.0.67|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 2606 (2.5K) [text/html] Saving to: 'index.html' index.html 100% [==========================================================>] 2.54K --.-KB/s in 0s 2018-01-27 21:01:39 (23.1 MB/s) - 'index.html' saved [6237]

As you can see in the output, the page has been saved to index.html in your current directory. Using your favourite editor, open it and take a look (I'm using nano for this example):

user@LJ $: nano index.html

Now you might realize, despite this being a rather bare page, there's a lot of code in that file. Instead of going through it all, let's use grep, which is perfect for this task. Its sole function is to print lines matching your search. Grep uses the syntax:

user@LJ $: grep [search] [file]

Looking at the daily comic, its current title is "Night Sky". Searching for "night" with grep yields the following results:

user@LJ $: grep "night" index.html Image URL (for hotlinking/embedding): ↪https://imgs.xkcd.com/comics/night_sky.png

The grep search has returned two image links in the file, each related to "night". Looking at those two lines, one is the image in the page, and the other is for hotlinking and is already a usable link. You'll be obtaining the first link, however, as it is more representative of other pages that don't provide an easy link, and it serves as a good introduction to the use of grep and cut.

To get the first link out of the page, you first need to identify it in the file programmatically. Let's try grep again, but this time instead of using a string you already know ("night"), let's approach as if you know nothing about the page. Although the link will be different, the HTML should remain the same; therefore, always should appear before the link you want:

user@LJ $: grep "img src=" index.html

It looks like there are three images on the page. Comparing these results from the first grep, you'll see that grep. The other two links contain "/s/"; whereas the link we want contains "/comics/". So, you need to grep the output of the last command for "/comics/". To pass along the output of the last command, use the pipe character (|):

user@LJ $: grep "img src=" index.html | grep "/comics/"

And, there's the line! Now you just need to separate the image link from the rest of it with the cut command. cut uses the syntax:

user@LJ $: cut [-d delimeter] [-f field] [-c characters]

To cut the link from the rest of the line, you'll want to cut next to the quotation mark and select the field before the next quotation mark. In other words, you want the text between the quotes, or the link, which is done like this:

user@LJ $: grep "img src=" index.html | grep "/comics/" | ↪cut -d\" -f2 //imgs.xkcd.com/comics/night_sky.png

And, you've got the link. But wait! What about those pesky forward slashes at the beginning? You can cut those out too:

user@LJ $: grep "img src=" index.html | grep "/comics/" | ↪cut -d\" -f 2 | cut -c 3- imgs.xkcd.com/comics/night_sky.png

Now you've just cut the first three characters from the line, and you're left with a link straight to the image. Using wget again, you can download the image:

user@LJ $: wget imgs.xkcd.com/comics/night_sky.png --2018-01-27 21:42:33-- http://imgs.xkcd.com/comics/night_sky.png Resolving imgs.xkcd.com... 151.101.16.67, 2a04:4e42:4::67 Connecting to imgs.xkcd.com|151.101.16.67|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 54636 (53K) [image/png] Saving to: 'night_sky.png' night_sky.png 100% [===========================================================>] 53.36K --.-KB/s in 0.04s 2018-01-27 21:42:33 (1.24 MB/s) - 'night_sky.png' ↪saved [54636/54636]

Now you have the image in your directory, but its name will change when the comic's name changes. To fix that, tell wget to save it with a specific name:

user@LJ $: wget "$(grep "img src=" index.html | grep "/comics/" ↪| cut -d\" -f2 | cut -c 3-)" -O wallpaper --2018-01-27 21:45:08-- http://imgs.xkcd.com/comics/night_sky.png Resolving imgs.xkcd.com... 151.101.16.67, 2a04:4e42:4::67 Connecting to imgs.xkcd.com|151.101.16.67|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 54636 (53K) [image/png] Saving to: 'wallpaper' wallpaper 100% [==========================================================>] 53.36K --.-KB/s in 0.04s 2018-01-27 21:45:08 (1.41 MB/s) - 'wallpaper' saved [54636/54636]

The -O option means that the downloaded image now has been saved as "wallpaper". Now that you know the name of the image, you can set it as a wallpaper. This varies depending upon which display manager you're using. The most popular are listed below, assuming the image is located at /home/user/wallpaper.

GNOME:

gsettings set org.gnome.desktop.background picture-uri ↪"File:///home/user/wallpaper" gsettings set org.gnome.desktop.background picture-options ↪scaled

Cinnamon:

gsettings set org.cinnamon.desktop.background picture-uri ↪"file:///home/user/wallpaper" gsettings set org.cinnamon.desktop.background picture-options ↪scaled

Xfce:

xfconf-query --channel xfce4-desktop --property ↪/backdrop/screen0/monitor0/image-path --set ↪/home/user/wallpaper

You can set your wallpaper now, but you need different images to mix in. Looking at the webpage, there's a "random" button that takes you to a random comic. Searching with grep for "random" returns the following:

user@LJ $: grep random index.html
  • Random
  • Random
  • This is the link to a random comic, and downloading it with wget and reading the result, it looks like the initial comic page. Success!

    Now that you've got all the components, let's put them together into a script, replacing www.xkcd.com with the new c.xkcd.com/random/comic/:

    #!/bin/bash wget c.xkcd.com/random/comic/ wget "$(grep "img src=" index.html | grep /comics/ | cut -d\" ↪-f 2 | cut -c 3-)" -O wallpaper gsettings set org.gnome.desktop.background picture-uri ↪"File:///home/user/wallpaper" gsettings set org.gnome.desktop.background picture-options ↪scaled

    All of this should be familiar except the first line, which designates this as a bash script, and the second wget command. To capture the output of commands into a variable, you use $(). In this case, you're capturing the grepping and cutting process—capturing the final link and then downloading it with wget. When the script is run, the commands inside the bracket are all run producing the image link before wget is called to download it.

    There you have it—a simple example of a dynamic wallpaper that you can run anytime you want.

    If you want the script to run automatically, you can add a cron job to have cron run it for you. So, edit your cron tab with:

    user@LJ $: crontab -e

    My script is called "xkcd", and my crontab entry looks like this:

    @reboot /bin/bash /home/user/xkcd

    This will run the script (located at /home/user/xkcd) using bash, every restart.

    Reddit

    The script above shows how to search for images in HTML code and download them. But, you can apply this to any website of your choice—although the HTML code will be different, the underlying concepts remain the same. With that in mind, let's tackle downloading images from Reddit. Why Reddit? Reddit is possibly the largest blog on the internet and the third-most-popular site in the US. It aggregates content from many different communities together onto one site. It does this through use of "subreddits", communities that join together to form Reddit. For the purposes of this article, let's focus on subreddits (or "subs" for short) that primarily deal with images. However, any subreddit, as long as it allows images, can be used in this script.

    Figure 1. Scraping the Web Made Simple—Analysing Web Pages in a Terminal

    Diving In

    Just like the xkcd script, you need to download the web page from a subreddit to analyse it. I'm using reddit.com/r/wallpapers for this example. First, check for images in the HTML:

    user@LJ $: wget https://www.reddit.com/r/wallpapers/ && grep ↪"img src=" index.html --2018-01-28 20:13:39-- https://www.reddit.com/r/wallpapers/ Resolving www.reddit.com... 151.101.17.140 Connecting to www.reddit.com|151.101.17.140|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 27324 (27K) [text/html] Saving to: 'index.html' index.html 100% [==========================================================>] 26.68K --.-KB/s in 0.1s 2018-01-28 20:13:40 (270 KB/s) - 'index.html' saved [169355] a community ↪for 9 years ↪....Forever and ever...... --- SNIP ---

    All the images have been returned in one long line, because the HTML for the images is also in one long line. You need to split this one long line into the separate image links. Enter Regex.

    Regex is short for regular expression, a system used by many programs to allow users to match an expression to a string. It contains wild cards, which are special characters that match certain characters. For example, the * character will match every character. For this example, you want an expression that matches every link in the HTML file. All HTML links have one string in common. They all take the form href="LINK". Let's write a regex expression to match:

    href="([^"#]+)"

    Now let's break it down:

    • href=" — simply states that the first characters should match these.

    • () — forms a capture group.

    • [^] — forms a negated set. The string shouldn't match any of the characters inside.

    • + — the string should match one or more of the preceding tokens.

    Altogether the regex matches a string that begins href=", doesn't contain any quotation marks or hashtags and finishes with a quotation mark.

    This regex can be used with grep like this:

    user@LJ $: grep -o -E 'href="([^"#]+)"' index.html href="/static/opensearch.xml" href="https://www.reddit.com/r/wallpapers/" href="//out.reddit.com" href="//out.reddit.com" href="//www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-57x57.png" --- SNIP ---

    The -e options allow for extended regex options, and the -o switch means grep will print only patterns exactly matching and not the whole line. You now have a much more manageable list of links. From there, you can use the same techniques from the first script to extract the links and filter for images. This looks like the following:

    user@LJ $: grep -o -E 'href="([^"#]+)"' index.html | cut -d'"' ↪-f2 | sort | uniq | grep -E '.jpg|.png' https://i.imgur.com/6DO2uqT.png https://i.imgur.com/Ualn765.png https://i.imgur.com/UO5ck0M.jpg https://i.redd.it/s8ngtz6xtnc01.jpg //www.redditstatic.com/desktop2x/img/favicon/ ↪android-icon-192x192.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-114x114.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-120x120.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-144x144.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-152x152.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-180x180.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-57x57.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-60x60.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-72x72.png //www.redditstatic.com/desktop2x/img/favicon/ ↪apple-icon-76x76.png //www.redditstatic.com/desktop2x/img/favicon/ ↪favicon-16x16.png //www.redditstatic.com/desktop2x/img/favicon/ ↪favicon-32x32.png //www.redditstatic.com/desktop2x/img/favicon/ ↪favicon-96x96.png

    The final grep uses regex again to match .jpg or .png. The | character acts as a boolean OR operator.

    As you can see, there are four matches for actual images: two .jpgs and two .pngs. The others are Reddit default images, like the logo. Once you remove those images, you'll have a final list of images to set as a wallpaper. The easiest way to remove these images from the list is with sed:

    user@LJ $: grep -o -E 'href="([^"#]+)"' index.html | cut -d'"' ↪-f2 | sort | uniq | grep -E '.jpg|.png' | sed /redditstatic/d https://i.imgur.com/6DO2uqT.png https://i.imgur.com/Ualn765.png https://i.imgur.com/UO5ck0M.jpg https://i.redd.it/s8ngtz6xtnc01.jpg

    sed works by matching what's between the two forward slashes. The d on the end tells sed to delete the lines that match the pattern, leaving the image links.

    The great thing about sourcing images from Reddit is that every subreddit contains nearly identical HTML; therefore, this small script will work on any subreddit.

    Creating a Script

    To create a script for Reddit, it should be possible to choose from which subreddits you'd like to source images. I've created a directory for my script and placed a file called "links" in the directory with it. This file contains the subreddit links in the following format:

    https://www.reddit.com/r/wallpapers https://www.reddit.com/r/wallpaper https://www.reddit.com/r/NationalPark https://www.reddit.com/r/tiltshift https://www.reddit.com/r/pic

    At run time, I have the script read the list and download these subreddits before stripping images from them.

    Since you can have only one image at a time as desktop wallpaper, you'll want to narrow down the selection of images to just one. First, however, it's best to have a wide range of images without using a lot of bandwidth. So you'll want to download the web pages for multiple subreddits and strip the image links but not download the images themselves. Then you'll use a random selector to select one image link and download that one to use as a wallpaper.

    Finally, if you're downloading lots of subreddit's web pages, the script will become very slow. This is because the script waits for each command to complete before proceeding. To circumvent this, you can fork a command by appending an ampersand (&) character. This creates a new process for the command, "forking" it from the main process (the script).

    Here's my fully annotated script:

    #!/bin/bash DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )" ↪# Get the script's current directory linksFile="links" mkdir $DIR/downloads cd $DIR/downloads # Strip the image links from the html function parse { grep -o -E 'href="([^"#]+)"' $1 | cut -d'"' -f2 | sort | uniq ↪| grep -E '.jpg|.png' >> temp grep -o -E 'href="([^"#]+)"' $2 | cut -d'"' -f2 | sort | uniq ↪| grep -E '.jpg|.png' >> temp grep -o -E 'href="([^"#]+)"' $3 | cut -d'"' -f2 | sort | uniq ↪| grep -E '.jpg|.png' >> temp grep -o -E 'href="([^"#]+)"' $4 | cut -d'"' -f2 | sort | uniq ↪| grep -E '.jpg|.png' >> temp } # Download the subreddit's webpages function download { rname=$( echo $1 | cut -d / -f 5 ) tname=$(echo t.$rname) rrname=$(echo r.$rname) cname=$(echo c.$rname) wget --load-cookies=../cookies.txt -O $rname $1 ↪&>/dev/null & wget --load-cookies=../cookies.txt -O $tname $1/top ↪&>/dev/null & wget --load-cookies=../cookies.txt -O $rrname $1/rising ↪&>/dev/null & wget --load-cookies=../cookies.txt -O $cname $1/controversial ↪&>/dev/null & wait # wait for all forked wget processes to return parse $rname $tname $rrname $cname } # For each line in links file while read l; do if [[ $l != *"#"* ]]; then # if line doesn't contain a ↪hashtag (comment) download $l& fi done < ../$linksFile wait # wait for all forked processes to return sed -i '/www.redditstatic.com/d' temp # remove reddit pics that ↪exist on most pages from the list wallpaper=$(shuf -n 1 temp) # select randomly from file and DL echo $wallpaper >> $DIR/log # save image into log in case ↪we want it later wget -b $wallpaper -O $DIR/wallpaperpic 1>/dev/null # Download ↪wallpaper image gsettings set org.gnome.desktop.background picture-uri ↪file://$DIR/wallpaperpic # Set wallpaper (Gnome only!) rm -r $DIR/downloads # cleanup

    Just like before, you can set up a cron job to run the script for you at every reboot or whatever interval you like.

    And, there you have it—a fully functional cat-image harvester. May your morning logins be greeted with many furry faces. Now go forth and discover new subreddits to gawk at and new websites to scrape for cool wallpapers.

    Patrick Wheelan

    Cooking With Linux (without a net): A CMS Smorgasbord

    1 month ago

    Please support Linux Journal by subscribing or becoming a patron.

    Note : You are watching a recording of a live show. It's Tuesday and that means it's time for Cooking With Linux (without a net), sponsored and supported by Linux Journal. Today, I'm going to install four popular content management systems. These will be Drupal, Joomla, Wordpress, and Backdrop. If you're trying to decide on what your next CMS platform should be, this would be a great time to tune in. And yes, I'll do it all live, without a net, and with a high probability of falling flat on my face. Join me today, at 12 noon, Easter Time. Be part of the conversation.

    Content management systems covered include:

    Drupal WordPress CMS Web Development
    Marcel Gagné