apache-nifi | The Blog Pros

May 19, 2021

Best Practices for Data Pipeline Error Handling in Apache NiFi

According to a McKinsey report, ”the best analytics are worth nothing with bad data”. We as data engineers and developers know this simply as "garbage in, garbage out". Today, with the success of the cloud, data sources are many and varied. Data pipelines help us to consolidate data from these different sources and work on it. However, we must ensure that the data used is of good quality. As data engineers, we mold data into the right shape, size, and type with high attention to detail.

Fortunately, we have tools such as Apache NiFi, which allow us to design and manage our data pipelines, reducing the amount of custom programming and increasing overall efficiency. Yet, when it comes to creating them, a key and often neglected aspect is minimizing potential errors.

September 24, 2020

Real-Time Streaming Deep Learning Pipelines With DJL and Apache NiFi

Introduction:

I will be talking about this processor at Apache Con @ Home 2020 in my "Apache Deep Learning 301" talk with Dr. Ian Brooks.

Sometimes you want your Deep Learning Easy and in Java, so let's do that with DJL in custom Apache NiFi processors running in CDP Data Hubs, Private Cloud, and in laptop deployments.

December 20, 2019

Combining DJL.AI With Apache NiFi for Deep Learning Workflows

NiFi + DJL.AI = A Merry Deep Learning Christmas

Happy Mmm...FLaNK Day!

December 19, 2019

Modern Apache NiFi Load Balancing

In today's Apache NiFi, there is a new and improved means of load balancing data between nodes in a cluster. With the introduction of NiFi 1.8.0, connection load balancing has been added between every processor in any connection. You now have an easy to set option for automatically load balancing between your nodes.

The legacy days of using Remote Process Groups to distribute the load between Apache NiFi nodes is over. For maximum flexibility, performance and ease, please make sure you upgrade your existing flows to use the built-in Connection Load Balancing.

November 26, 2019

Exploring Apache NiFi 1.10: Parameters and Stateless Engine

Apache NiFi Is Now Available in 1.10!

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12316020&version=12344993

You can now use JDK 8 or JDK 11! I am running in JDK 11, and it seems a bit faster. A huge feature is the addition of Parameters! You can use them to pass parameters to Apache NiFi Stateless!

October 15, 2019

Migrating Apache Flume Flows to Apache NiFi: Kafka Source to Multiple Sinks

The world of streaming is constantly moving... yes I said it. Every few years some projects get favored by the community and by developers. Apache NiFi has stepped ahead and has been the go-to for quickly ingesting sources and storing those resources to sinks with routing, aggregation, basic ETL/ELT, and security. I am recommending a migration from legacy Flume to Apache NiFi. The time is now.

Below, I walk you through a common use case. It's easy to integrate Kafka as a source or sink with Apache NiFi or MiNiFi agents. We can also add HDFS or Kudu sinks as well. All of this with full security, SSO, governance, cloud and K8 support, schema support, full data lineage, and an easy to use UI. Don't get fluming mad, let's try another great Apache project.

August 23, 2019

EFM Series: Using MiNiFi Agents on Raspberry Pi 4 With Intel Movidius Neural Compute Stick 2, Apache NiFi, and AI

The good news is that Raspberry Pi 4 can run MiNiFi Java Agents, Intel Movidius Neural Compute Stick 2, and AI libraries. You can now use this 4GB of RAM device to run IoT with AI on the edge.

Flow From MiNiFi Agent Running OpenVino, SysLog Tail, and Grabbing WebCam Images

August 16, 2019

Arm Twisting Apache NiFi

Introduction

Apache NiFi, is a software project from Apache Software Foundation, designed to automate the flow of data between software systems.

Early this year, I created a generic, meta-data driven data offloading framework using Talend. While championing that tool, many accounts raised concerns regarding the Talend license. While some were apprehensive of the additional cost, many others questioned the tool itself, due to the fact that their account already had licenses for other competitive ETL tools like DataStage and Informatica (to name a few). A few accounts also wanted to know if the same concept of offloading could be made available using NiFi. Therefore, it was most logical to explore NiFi.

June 18, 2019August 6, 2019

Apache NiFi Overview

What Is Apache NiFI?

Apache NiFi is a robust open-source Data Ingestion and Distribution framework and more. It can propagate any data content from any source to any destination.

NiFi is based on a different programming paradigm called Flow-Based Programming (FBP). I’m not going to explain the definition of Flow-Based Programming. Instead, I will tell how NiFi works, and then you can connect it with the definition of Flow-Based Programming.

February 15, 2019

Integration of Apache NiFi and Cloudera Data Science Workbench for Deep Learning Workflows

Summary

Now that we have shown that it is easy to do standard NLP, next up is Deep Learning. As you can see, NLP, Machine Learning, Deep Learning, and more are all in your reach for building your own AI as a Service using tools from Cloudera. These can run in public or private clouds at scale. Now you can run and integrate machine learning services, computer vision APIs, and anything you have created in-house with your own Data Scientists. The YOLO pre-trained model will download the image to /tmp from the URL to process it. The Python 3 script will also download the GLUONCV model for YOLO3.

Using Pre-trained Model:

February 8, 2019

Reading SUDO Logs With Apache NiFi

Log, Log, Log

Sudo logs have a lot of useful information on hosts, users, and auditable actions that may be useful for cybersecurity, capacity planning, user tracking, data lake population, user management, and general security.

Symbol Model 1

February 8, 2019

Using Cloudera Data Science Workbench With Apache NiFi

Using Deployed Models as a Function as a Service

Using Cloudera Data Science Workbench with Apache NiFi, we can easily call functions within our deployed models from Apache NiFi as part of flows. I am working against CDSW on HDP, but it will work for all CDSW regardless of install type.

In my simple example, I built a Python model that uses TextBlob to run sentiment against a passed sentence. It returns Sentiment Polarity and Subjectivity, which we can immediately act upon in our flow.