- Lab 1
- Build lab environment
- Lab 2
- Build a simple NiFi data flow
- Lab 3
- Using Nifi Templates
- Lab 4
- MiNifi and remote process group
- Lab 5
- Kafka basics
- Lab 6
- Integrate Kafka with Nifi
- Setup sandbox lab environment
- Consume Meetup RSVP stream using Nifi
- Extract the JSON elements we are interested in
- Split the JSON into smaller fragmentsand
- Create, save, upload Nifi template
- Create Nifi flow from template
- Send data to remote Nifi instance
- Create MiNifi flow from template
- Deploy and execute MiNifi flow
- Write data to Kafka with toolkit and Nifi
- Consume data from Kafka with toolkit and Nifi
- Write data to local disk with Nifi
We will run through a series of labs and step by step to achieve all of the above goals
- Download and start Hortonworks HDF Sandbox
- Access sandbox Ambari UI and Nifi UI
- Remote to Sandbox console
Download the HDF Sandbox from Hortonworks website.
All the following instrucitons are based on the VirtualBox version of the Sandbox, if you use VMWare, you might need to make slight changes to some of the settings.
Once the Sandbox is started in VirtualBox, the start page will show you the start page link like http://127.0.0.1:18888
- host: 127.0.0.1
- port: 12222
- user: root
- password: hadoop
- once you remote inside the sandbox, use following command to reset Ambari password
# Updates password
ambari-admin-password-reset
# If Ambari doesn't restart automatically, restart ambari service
ambari-agent restart
- Access Ambari UI at http://127.0.0.1:9080 user: admin password: the one you set in previous step
- Access Nifi UI at http://127.0.0.1:19090/nifi
- For more reference about Nifi, please go to Nifi documents
- Please use Chrome, Firefox, Safari or Edge for the labs.
- Internet Explorer is not supported by Nifi and should not be used for the labs.
There are more tutorials available on Hortonworks website that you could follow using the same sandbox.
- Consume Meetup RSVP stream
- Extract the JSON elements we are interested in
- Split the JSON into smaller fragments
- Use funnel as a temporary processor sink
To get started we need to consume the data from the Meetup RSVP stream, extract what we need, split the content and save it to a file:
Our final flow for this lab will look like the following:
-
Drag a Processor Group to the canvas and name it
Nifi Lab
- double click the newly create Porcessor Group and create a new Processor Group called
Lab 2
- double click
Lab 2
Processor Group and continue the following steps inside
- double click the newly create Porcessor Group and create a new Processor Group called
-
Add a ConnectWebSocket processor to the cavas
-
Add an Update Attribute procesor
-
Add an EvaluateJsonPath processor and configure it as shown below:
event.name $.event.event_name event.url $.event.event_url group.city $.group.group_city group.state $.group.group_state group.country $.group.group_country group.name $.group.group_name venue.lat $.venue.lat venue.lon $.venue.lon venue.name $.venue.venue_name
-
Add a SplitJson processor and configure the JsonPath Expression to be
$.group.group_topics
-
Add a ReplaceText processor and configure the Search Value to be
([{])([\S\s]+)([}])
and the Replacement Value to be
{
"event_name": "${event.name}",
"event_url": "${event.url}",
"venue" : {
"lat": "${venue.lat}",
"lon": "${venue.lon}",
"name": "${venue.name}"
},
"group" : {
"group_city" : "${group.city}",
"group_country" : "${group.country}",
"group_name" : "${group.name}",
"group_state" : "${group.state}",
$2
}
}
- Add a funnel processor to the canvas and connect ReplaceText to it
- What does a full RSVP JSON object look like?
- How many output files do you end up with?
- How can you change the file name that Json is saved as from PutFile?
- Why do you think we are splitting out the RSVP's by group?
- Why are we using the Update Attribute processor to add a mime.type?
- How can you cange the flow to get the member photo from the Json and download it.
In this lab, we will learn how to create, save, upload Nifi template and create Nifi flow using NiFi template.
- Create Nifi template from exisitng flow
- Save Template to xml file and upload xml template file to Nifi
- Create new flow with existing template
-
Select everything inside Processor Group
Lab 2
-
Go to Template manager and download the newly created template to disk as xml file
-
Upload the template xml file from load disk to create another template
-
Create a new flow from existing templates
- Create a new Processor Group
Lab 3
underNifi Lab
and double click to go inisde - Drag and drop template on canvas and select one of the template to create a new flow
- Now you have a flow create from the template. There will be warning on Web socket procesor. This is caused by the controller service not enabled. Once you enable the controller service by clicking the flasj icon, everything works.
- Create a new Processor Group
In this lab, we will learn how to use Remote Process Group for site-to-site communicaton and use MiNiFi to send data to remote NiFi instance.
- Understand how to communicate to remote Nifi instance
- Prepare flow for MiNifi
NOTE: Before start this lab, we need to enable Site-to-Site communication and install MiNifi on sandbox VM.
Make the change via Ambari UI
-
Go to Nifi => Config => Advanced nifi-properties, and change the following values:
-
Then restart NiFi via Ambari
- Remote to VM console and execute the following commands to install MiNifi on VM
cd /usr/hdf/current/
mkdir minifi
cd minifi/
wget http://public-repo-1.hortonworks.com/HDF/3.0.1.1/minifi-1.0.3.0.1.1-5-bin.tar.gz
wget http://public-repo-1.hortonworks.com/HDF/3.0.1.1/minifi-toolkit-1.0.3.0.1.1-5-bin.tar.gz
tar -xzf minifi-1.0.3.0.1.1-5-bin.tar.gz
tar -xzf minifi-toolkit-1.0.3.0.1.1-5-bin.tar.gz
Now we should be ready to create our flow. To do this do the following:
-
The first thing we are going to do is setup an Input Port. This is the port that MiNiFi will be sending data to. To do this drag the Input Port icon to the canvas and call it
From MiNiFi
. -
Now that the Input Port is configured we need to have somewhere for the data to go once we receive it. In this case we will use a funnel so all incoming data will be buffered in teh queue.
-
Now that we have the input port and the processor to handle our data, we need to connect them.
-
We are now ready to build the MiNiFi side of the flow. To do this do the following:
- Create a new Process Group called
Lab 4
insideNifi Lab
and go inside - Add a GenerateFlowFile processor to the canvas and change
File Size
to10B
,Run Schedule
to10 sec
- Add a Remote Processor Group to the canvas
- Set the URL to
http://sandbox-hdf:19090/nifi/
- Connect the GenerateFlowFile to the Remote Process Group (may need wait a bit for remote input ports to be refreshed)
- Right click the Remote Process Group and Enable Transmission
- Create a new Process Group called
-
Now go back to the root canvas and you should see data being buffered in the queue after the Input Port
From Minifi
. -
The next step is to generate the flow we need for MiNiFi. To do this do the following steps:
- Go into
Lab 4
, select the GenerateFlowFile and the NiFi Flow Remote Processor Group (these are the only things needed for MiMiFi) - Select the "Create Template" button from the toolbar
- Choose a name for your template
- Go into
-
Download the template to your local disk
-
Now SCP the template you downloaded to the
/tmp
directory on your VM. (You can use WinSCP if you are on Windows)
scp -P 12222 <local MiNifi template> [email protected]:/tmp
[email protected]'s password:hadoop
- We are now ready to setup MiNiFi. We need to convert the template to YAML format which MiNiFi uses. To do this we need to do the following. The
config.yml
is the file that MiNiFi uses to generate the nifi.properties file and the flow.xml.gz.
cd /usr/hdf/current/minifi/minifi-toolkit-1.0.3.0.1.1-5
bin/config.sh transform /tmp/MiNiFi_Flow.xml /usr/hdf/current/minifi/minifi-1.0.3.0.1.1-5/conf/config.yml
- That is it, we are now ready to start MiNiFi. To start MiNiFi from a command prompt execute the following:
cd /usr/hdf/current/minifi/minifi-1.0.3.0.1.1-5
bin/minifi.sh start
- You should be able to now go to your NiFi flow and see data coming in from MiNiFi. Once you confirm everything is working, stop MiNifi flow with following command
cd /usr/hdf/current/minifi/minifi-1.0.3.0.1.1-5
bin/minifi.sh stop
In this lab we are going to explore creating, writing to and consuming Kafka topics. This will come in handy when we later integrate Kafka with NiFi.
- Create kafka topic using console tool
- send messages to kafka topic
- receive messages from kafka topic
Before start the lab steps, make sure kafka service is started in Ambari UI. If Kafka is not started, manually start the service from Ambari.
- Creating a topic
- Open an SSH connection to your VM.
- Naviagte to the Kafka directory (
/usr/hdp/current/kafka-broker
), this is where Kafka is installed, we will use the utilities located in the bin directory.
cd /usr/hdp/current/kafka-broker/
- Create a topic using the kafka-topics.sh script
bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 --replication-factor 1 --topic first-topic
- Ensure the topic was created
bin/kafka-topics.sh --list --zookeeper localhost:2181
- Testing Producers and Consumers
- Open a second terminal to your VM and navigate to the Kafka directory
- In one shell window connect a consumer:
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic first-topic
- In the second shell window connect a producer:
bin/kafka-console-producer.sh --broker-list sandbox-hdf:6667 --topic first-topic
- Sending messages. Now that the producer is connected, we can type messages.
- Type a message in the producer window
- Messages should appear in the consumer window.
- Close the consumer (ctrl-c) and reconnect using the default offset, of latest. You will now see only new messages typed in the producer window.
- As you type messages in the producer window they should appear in the consumer window.
In this lab we will learn how to use Nifi to push data to kafka queue, as well as consume data from kafka queue.
- Send meetup JSON message to kafka queue using Nifi
- Receive data from kafka queue using Nifi
- Write data to local folder
- Creating the Kafka topic
- Open an SSH connection to your VM and naviagte to the Kafka directory
- For our integration with NiFi create a Kafka topic called
meetup-raw-rsvps
bin/kafka-topics.sh --zookeeper localhost:2181 --create --partitions 1 --replication-factor 1 --topic meetup-raw-rsvps
-
We are going to reuse the flow from Lab 2. Add a PublishKafka_0_10 processor to the canvas. Then connect the funnel to PublishKafka_0_10 processor.
-
Start the flow and using the Kafka tools verify the data is flowing all the way to Kafka.
bin/kafka-console-consumer.sh --zookeeper localhost:2181 --from-beginning --topic meetup-raw-rsvps
- Create a new Processor Group called
Lab 6
underNifi Lab
.