Pentaho App Builder - Learning Pentaho CTools (2016)

Learning Pentaho CTools (2016)

Chapter 9. Pentaho App Builder

Pentaho App Builder is one plugin you can use to build your Pentaho plugins. The most interesting part of it is that you don't need to create any code to get it working. Yes, you heard right, no code.

In this chapter, you will learn about:

· Pentaho App Builder

· Community Plugin Kick-starter

· Creating a dashboard

· Making a plugin available on the marketplace

By the end of this chapter, you will understand Pentaho App Builder and how to work with it. There was a time when you would have needed to know how to write Java code for the back end of the plugin, but now it's much more simple and more accessible to many more people.

You will also know what the Community Plugin Kick-starter (CPK) is and its relationship with Pentaho App Builder. You really need to understand the concepts behind CPK, because that's where most of the magic happens. Pentaho App Builder is just a graphical interface that leverages the work. You will also see that with CPK, you are able to make use of jobs directly, and not just transformations like in CDA. We'll give you some tips and tricks that you will find very useful.

Understanding Pentaho App Builder

"Sparkl, or Pentaho App Builder, is a plugin creator instrument that sits on 2 major cornerstones of Pentaho: CTools and PDI, aiming to leverage as much as possible of our existing stack."

--Pedro Alves

The main idea is to use both of the two most amazing tools in Pentaho: the CTools and Kettle (also known as Pentaho Data Integration). If you know how to build Kettle jobs and transformations and also know how to build a dashboard, you should be able to build a Pentaho plugin. If not, it's about time to learn. I can recommend you two books: https://www.packtpub.com/big-data-and-business-intelligence/pentaho-data-integration-beginners-guide-second-edition and https://www.packtpub.com/big-data-and-business-intelligence/pentaho-data-integration-4-cookbook.

If you didn't know Java code, it would be hard for you to create a plugin, but that's not the case anymore as you are able to do it without the need to write Java code. You can also create a CTools dashboard without writing a line of code; however, as I already told you earlier in the book, you will need to write some JavaScript to build remarkable dashboards. Not that you need to know a lot of JavaScript; you just need to understand it and adjust some code in the book or in the included samples. We are making the absolute most of the existing skills of data developers. When we talk about building a web application, usually we will talk about having a back end and a front end.

Pentaho App Builder works on top of CPK. CPK provides a way to simplify the structure of a Pentaho package application, where the UI can be built as a CDE dashboard and the back end as a Kettle transformation/job. There are other options, such as making the back end using JavaScript or Java code. However, there is no need to do this if you can use Kettle.

Installing Pentaho App Builder

You can install Pentaho App Builder using Marketplace, and you just need to refer to the instructions in the first chapter. Pentaho App Builder has some dependencies, so make sure you have them installed:

· CPF: Community Plugin Framework

· CDE: Community Dashboard Editor

· CDF: Community Dashboard Framework

· CDA: Community Data Access

Create a new plugin

Open Pentaho App Builder using the PUC menu, or directly from http://localhost:8080/pentaho/plugin/sparkl/api/main. When you start Pentaho App Builder, you will be in the following dashboard. I referred to the dashboard, because Pentaho App Builder is itself dashboard:

Create a new plugin

In the preceding image, you will find the following buttons/options:

1. Sort plugins by: This is to sort the plugins that are available in your Pentaho instance. Here, you only see the plugins that were built using Pentaho App Builder or CPK.

2. Refresh: This refreshes the list of available plugins.

3. Create a new Plugin: Click this plus sign to be able to create new plugins. You will learn more about this later in the chapter.

4. Play: This will open/execute the main dashboard of the plugin.

5. Edit: You will be redirected to another window where you can edit the metadata of the plugin, or create/edit/remove end points.

6. Remove: This deletes the plugin from the system folder. The files will be removed from the system folder of the server.

When you create a new plugin using option 3, you need to enter the name of the plugin and click on the Create plugin button. You will get a message to restart the server. So please proceed. In future versions of Pentaho 6.X and new releases of Pentaho App Builder for Pentaho 6.X, it will be possible to generate a new plugin that becomes immediately available.

After the restart, you need to go into Pentaho App Builder and edit your plugin. If you didn't set the metadata of the plugin, it's a good time to do it. Just fill out the form data and click on Apply Changes. The following image is an example of what you will get:

Create a new plugin

You will see two tabs: About and Elements. About is the information/metadata of the plugin. Elements is about the endpoints of the plugin:

Create a new plugin

In the preceding image, we can identify the following sections:

1. Plugin name: The plugin's ID/name.

2. Add new element: You can click on this plus button to add a new endpoint. It can be a dashboard or a Kettle job/transformation. We will cover these separately in this chapter.

3. Display options: You can select the type of add-in you want to be displayed. It can be a dashboard or a Kettle transformation/job, or both at the same time. In the image, you can see a dashboard that can be used as the front end and a Kettle transformation/job as part of the back end.

4. Refresh button: This will let you refresh the endpoints available. You can add a dashboard or transformation/job to the endpoints folder in the plugin system folder, and you can then refresh the plugin to see the endpoints listed, so there is no need for a restart.

5. Available endpoints: Here you will get a list of the available endpoints. The list will include the name, type, and permissions. The type will give you information about whether it's a dashboard or Kettle endpoint.

Creating a new endpoint

An endpoint can be considered as a URI and HTTP method that directly gets a response from the server. For instance, when invoking the URI, we might get a dashboard or the result from a Kettle job/transformation. It does not need to be invoked from the plugin itself, as you can run a Kettle endpoint from another application, but of course one of the use cases it to use the end points inside the plugin itself.

We saw in Chapter 2, Acquiring Data with CDA, that we can use CDA to get data from a Kettle transformation, but using Pentaho App Builder to do so, you can also use jobs to execute some actions and return success or error messages, download a file/multiple files, or have a custom format. Note that you should use Pentaho App Builder only if you're building one application/plugin, otherwise you might build a normal dashboard or use CDA endpoints to get data. If you can't do this with a regular dashboard and CDA data sources, then use Pentaho App Builder.

Like we have already covered, there are two kinds of endpoints that you can use in Pentaho App Builder/CPK. When creating a new element, you need to specify its name and type:

· A Pentaho Data Integration (Kettle) job/transformation

· A CDE dashboard

Creating a job/transformation

To create a new Kettle endpoint, you will need to specify the name and the type Kettle Endpoint. You will also need to choose from Clean Job or Clean Transformation. You can also create a Kettle transformation that can be executed only by an administrator:

Creating a job/transformation

After the endpoint has been created, you will see it as shown in the following image:

Creating a job/transformation

There are two buttons here you can use for each Kettle endpoint:

1. Execute the endpoint: This will trigger the execution of the endpoint.

2. Remove/delete the endpoint: This removes the endpoint, and will ask for confirmation.

You don't have a button to edit the Kettle transformation/job, and that's because it's not possible to open Data Integration in the browser.

Starting to learn about CPK

Like we covered earlier, CPK is where almost all the magic exists. In reality, the plugins that you create with Pentaho App Builder are CPK plugins that might be created by Pentaho App Builder, the web interface that allows you to do so easily.

CPK lets you expose the Kettle jobs, transformations, and dashboards as a REST endpoint. You can call them using either of the following calls:

· http://<host>:<port>/<webapp>/plugin/<cpkPluginId>/api/{kettleFileName}

· http://<host>:<port>/<webapp>/plugin/<cpkPluginId>/api/{dashboardFileName}

Here:

· host: This is the hostname or IP of the server. It can be localhost when you have Pentaho installed on the same machine as the request.

· port: This is the port we can use to access the Pentaho server. It is 8080 by default.

· webapp: This is the web app's name, which by default is pentaho.

· cpkPluginId: This is the ID of the plugin that you specified when creating the plugin.

· kettleFileName or dashboardFileName: This is the name of the endpoint you are requesting. You can specify the name of the dashboard or the name of a Kettle transformation or job.

When we create a Kettle transformation or job using Pentaho App Builder, as explained earlier, you will get the following parameters automatically created for the job or transformation you have created:

Parameter

Default value

Description

#cpk.cache.isEnabled

false

This enables/disables the caching of results.

#cpk.cache.timeToLiveSeconds

3,600

This shows how many seconds a result will be cached for. Setting this value to 0 means the result will be cached forever.

#cpk.executeAtStart

This indicates whether the transformation is to be executed when the plugin is initialized or not.

#cpk.plugin.dir

This is the plugin folder.

#cpk.plugin.id

This is the ID of the plugin.

#cpk.response.attachmentName

This is the attachment name used when downloading a file from a result.

#cpk.response.download

false

This shows whether or not to mark the HTTP response body as an attachment.

#cpk.response.kettleOutput

This is the output format to be used by default. The possible values are: Infered, Json, SingleCell, ResultFiles, and ResultOnly.

#cpk.response.mimeType

This is the mimeType of the HTTP response. If this value is not set, the plugin will try to determine it from the file extension.

#cpk.result.stepName

OUTPUT

This is the default output step where the rows will be fetched for the result.

#cpk.session.[sessionVarName]

This is the value of the session variable named [sessionVarName]. It will be automatically injected when the variable is enabled.

#cpk.session.roles

These are the roles of the username executing this transformation.

#cpk.session.username

This is the username that is executing this transformation.

#cpk.solution.system.dir

This is the pentaho-solutions folder.

#cpk.webapp.dir

This is the webapp folder.

By default, all the parameters are disabled. To enable the parameter, you should remove the # from the beginning of its name. Otherwise, it will be seen as a comment.

Specifying the step of where to get results from

A Kettle endpoint (job or transformation) may have multiple steps or job entries where we can get the results from. You are able to choose which step to retrieve data from. This can be done by setting the name of the step/job entry to start with OUTPUT. Just prefix the name of your step with the referred string and see the results returned to the caller of the endpoint.

There is also another way. It is possible to specify which step entry we want to fetch the row results from—we just need to include the stepName parameter in the query string of the request. To make it easier to identify the step name where to pull the information from, we need to add to our URI ?stepName=OUTPUT:

http://<host>:<port>/<webapp>/plugin/<cpkPluginId>/api/{dashboardName}?stepName=OUTPUT

By default, CPK will try to find an OUTPUT step name. A bit later on, we will cover how you can change the step name directly in the jobs/transformations using the cpk.result.stepName property.

Specifying parameters' values

It's very useful that you can pass parameters to a transformation/job. To do so, you need to prefix the name of the parameter with param. If the parameter name is called territory, you need to use paramterritory, as shown here:

http://<host>:<port>/pentaho/plugin/<cpkPluginId>/api/{dashboardName}?paramterritory=EMEA

Changing the output format type

CPK will try to guess the output type; however, you can specify the output type you desire. We may want to return data to the caller, or just return a file that can be downloaded. The available options are as follows:

· Json: This returns the result rows in a standard CDA-like result set format (metadata / queryinfo / resultset), just like a CDA data source.

· ResultFiles: This gets the files that were set in the result. For this to be enabled, we need to set the option Add filename to the result set.

· SingleCell: This returns the content of the first cell of the first row in the result. This allows us to return other format types, for instance XML.

· ResultOnly: This returns status information about the execution. This is usually the output of the executions of a job.

We can select the desired format by setting a query string parameter like in the following link. The query string name is kettleOutput:

http://<host>:<port>/pentaho/plugin/<cpkPluginId>/api/{dashboardName}?paramterritory=EMEA

You want to avoid the use of the query string parameter, and you also have the transformation/job parameter cpk.response.kettleOutput that you can change. When the parameter cpk.response.kettleOutput isn't used, CPK will try to infer it. Take a look at the following diagrm:

Changing the output format type

The previous diagram is the decision logic to determine which will be the result returned from the endpoint.

When the option resultFiles is used, CPK will compress all the files inside a .zip file, which is returned, but keep in mind that if the result only includes one file, CPK will not zip it and will return the files, so if you're using a browser, the browser will try to determine the mime type from the file extension. If the mime type is known, the browser will try to render the file and not download it. You can also force the file to be downloaded if it's a single file. This can be achieved by setting cpk.response.download to true.

If you also want to specify the mime type so that the browser can understand its content, you can do this by setting the parameter cpk.response.mimeType to the desired value (for example, an application/XML).

Tip

Specify the filename for the downloaded file

To specify the filename, you can use the parameter cpk.response.attachmentName, or set the query string parameter name attachmentName.

Returning a single cell

Let's suppose you want to return your own JSON or XML structure. How would you do so? To achieve that you need to return a single cell. You should follow the same behavior as for the result file. This will also behave as defined by the use of the parameters cpk.response.download, cpk.response.mimeType and cpk.response.attachedName.

Other considerations

Each CPK plugin has its own cache to store the results obtained from the execution of its endpoints. By default, caching is disabled and to enable it, you set the value of the transformation/job parameter cpk.cache.isEnabled to true.

The length of time that the results will remain cached can be set by setting the time in seconds using the parameter cpk.cache.timeToLiveSeconds.

Creating a dashboard

To create a new dashboard, just add the name of your dashboard and select the type Dashboard. You will also need to select its style between a clean dashboard or a Pentaho App Builder template dashboard. I advise you to use the clean dashboard and apply your own styles, or create your own style (similar to what was explained in Chapter 7, Advanced Concepts using CDF and CDE):

Creating a dashboard

There is also a checkbox you can check if the dashboard should only be accessible only to Pentaho administrators. When you create a new endpoint, the dashboard will become available. The following image is an example of a row displaying a dashboard endpoint.

After the endpoint has been created, you will see the following image:

Creating a dashboard

There are three buttons you can use for each frontend endpoint (dashboard):

1. Open the dashboard: This will trigger the execution of the dashboard.

2. Edit the dashboard: This will open the dashboard in edit mode.

3. Remove / Delete the endpoint: This removes the endpoint, and will ask for confirmation.

A dashboard is used like the Kettle endpoints are. The URL to call is basically the same, and the only change is the name of the endpoint, which should point to the name of the dashboard.

You can invoke dashboards using the following calls:

· http://<host>:<port>/<webapp>/plugin/<cpkPluginId>/api/{dashboardFileName}

· http://<host>:<port>/<webapp>/plugin/<cpkPluginId>/api/{dashboardFileName}?mode=edit

In the second call, you will see ?mode=edit, meaning that we want to open the dashboard in edit mode, while the first one will open it in render mode.

There are some default endpoints already defined that you can use so you don't give the same name to the dashboards or transformations/jobs. The endpoints are as follows:

· status: Displays the status of the plugin and all its endpoints

· refresh or reload: Reloads all the configurations, endpoints, and dashboards, and also clears the endpoints cache

· version: Returns the plugin version (defined in the plugin's version.xml file or through the control panel)

· getSitemapJson: Returns a JSON object with the plugins sitemap (for dashboards only!)

· getElementsList: Returns a JSON object with the whole list of elements present in the plugin (dashboards and Kettle endpoints)

Folder structure

The folder structure of the plugins is as follows:

dashboards | endpoints | static | resources | lib | plugin.xml

Where, they are explained as follows:

· dashboards: Inside the folder are all the dashboards that have been created and that should be accessible to all users. Any dashboard that is placed inside the admin folder will only be available for administrators.

· endpoints: All backend endpoints, Kettle jobs, and transformations should be placed inside a kettle folder. When the name of a Kettle job or transformation is prefixed by _, the endpoint will not become available as an external endpoint. This makes it possible to have private endpoints that can only be used by another job and/or transformation.

· static: All CSS, JavaScript, and images can be placed here, and will be available to be used inside the dashboard.

· lib: This is where the Java libraries can be placed.

· plugin.xml: This is the file where all the configurations are set. Here you may uncomment the menu-item tag and make a menu item available to open the dashboard when the option is clicked:

· <!-- Menu entry -->

· <menu-items>

· <menu-item id="myPlugin_main" anchor="tools-submenu"

· label="MyPlugin" command="content/myPlugin/"

· type="MENU_ITEM" how="LAST_CHILD"/>

</menu-items>

Making use of Kettle endpoints on a dashboard

You may use some of the Kettle endpoints you have created for each of the plugins available on the server. When creating the dashboard, in edit mode of course, if you go to the data sources perspectives, you will find some data sources groups that are new to the list of data sources types. Inside each one of those groups, you will find a list of endpoints that you have created for your plugin, but also can be used in other plugins. Each plugin will have its own group, and the data sources inside are the data sources for each of them, as shown here:

Making use of Kettle endpoints on a dashboard

The previous image is one example of the groups that will become available, and you will also see your own data sources or the data sources you have created for your plugin. You can see in the following image two plugins: myPlugin and d3ComponentLibrary. There are two endpoints that belong to myPlugin: getData and removeMe:

Making use of Kettle endpoints on a dashboard

You can make use of this endpoint as another data source, except that you will have fewer options. The preceding image shows you that an endpoint provides three properties:

· Name: Used the same as with other data sources. This is the name of the data source that can be set on a component.

· Output step: This is where you can set the name of the step where you want to extract data from.

· Output type: This sets the output type of the data source. The available options that you can select from the dropdown when you start typing or clicking down are as follows:

· Inferred: This lets CPK infer the result, as explained earlier.

· JSON: The result will be a Json object like any other CDA data source.

· ResultFiles: This will return a file, which may be the only one, or a .zip file with all the files that are being used in the dashboard. You will need to check the option add filename to the result set for the steps where the option is available. Doing so is not mandatory for all the steps but only for the ones where the files will be exported from. The filenames are used to identify the files to export.

· ResultOnly: This is usually used when the endpoint is a job.

· SingleCell: This is used when you want to get a custom result. You set the output that you wish for a single cell as a string. The result of the string of a single cell will be returned.

But how about parameters? Can't we use parameters and pass them to a transformation/job? Yes, you can, but you need to define them in the parameters of the component that is using the data source.

Tip

Using another plugins endpoint

When using another plugin transformation from another Pentaho App Builder plugin, you need to take care, because there will be an endpoint that belongs to another plugin, and that's a dependency. There is no problem doing so, you just need to make sure that you include the plugin as a dependency, so that users installing your plugin know that another plugin needs to be included.

When you use a data source (that is, a plugin endpoint) in a component, you may specify the parameters to pass to the transformation/job. Check the following image:

Making use of Kettle endpoints on a dashboard

Let's suppose you have a parameter in your little transformation called myKettleParameter. You can send a value to the transformation by creating a mapping with the parameter of the dashboard. When using a table component, for instance, you may define the data source and the parameters to be sent to the data source.

The preceding image is one example of the image you will get when setting the parameters. On the left, you will need to specify the name of the parameters of the transformation, and on the right, the name of the parameter of the dashboard. This will create a mapping between both, and that's the way we can send a value to a Kettle parameter. You can add as many parameters as you need. Just don't forget that if you are adding too many parameters, you may be doing something wrong.

When creating a plugin, you don't always want a job/transformation to be used as a way to get data for the front end. You may want the dashboard of the plugin to perform some action. For that purpose, you can use a Kettle transformation or job. You can use your own code to do so, or you can use the button component.

The button component has a property called Action Datasource, where you can choose one of the data sources to execute when clicking the button. The action parameters property is where we can specify the parameters mapping between the Kettle job/transformation and the dashboard parameters. There are two callbacks you can use when setting Action Datasource:

· Success Callback: This will be executed when the job/transformation is executed with success.

· Failure Callback: This will be executed when the job/transformation fails to execute.

The functions that can be defined are as follows:

function(result) {

// Make use of the result returned to the dashboard.

}

The function receives one argument, which is the result returned to the dashboard. Inside the function, you will need to write the code to parse the information and display it, or just interpret it.

How do I make the plugin available on the marketplace?

As soon as the plugin is developed and in a stable state, it is ready to be shared to the community. CPK is able to generate a .zip file with metadata information so it is able to be published to the marketplace. Of course, if you are building the plugin for a customer, you don't want to make it public, so you don't need to go through these steps.

To submit the plugin, you need to follow the instructions provided at the following link: https://github.com/pentaho/marketplace-metadata.

The following instructions assume that you have the git command line installed and available.

As you can see, there are three main steps:

1. Clone the repository:

To do this, you first need to create a GitHub account, which can be created for free. Go to https://github.com/pentaho/marketplace-metadata and click on the fork button, which will create a fork of the repository in your account. You will then be able to clone the fork in your account. There are many ways to clone the repository, and one of these ways is to execute a line, as follows:

git clone git@github.com:mfgaspar/marketplace-metadata.git

This will create a folder with all the files needed. The file you will need to change is marketplace.xml. These steps are required to make it possible to perform the submit request.

2. Update the marketplace.xml file with your market-entry:

You should edit the file and add an entry as explained in the instructions provided on the Pentaho marketplace metadata link we just mentioned. After you've filled out all the necessary information, you can proceed with the pull request, but first you will need to commit the changes to your repository. To do so, you can run the following in the command line:

git add marketplace.xml

git commit –m "Adding myPlugin to the marketplace"

3. Submit a pull request to have your plugin reviewed for inclusion on both the marketplace plugins and the Pentaho Marketplace website.

To submit the pull request, you need to go back to your Pentaho Marketplace fork page, add click on Pull Request, and include a message.

Pentaho will need to categorize and approve the plugin before it becomes available on the marketplace.

Summary

As you can see, it's pretty simple to create a new plugin for Pentaho. I really hope you can have a brilliant idea that we can use, that becomes available on the marketplace.

In this chapter, you learned that you can build a plugin just by creating endpoints that are accessible from the browser, and this can be very useful when integrating Pentaho with third-party applications.

In the next chapter, we will cover what we have missed until now, for example, how to embed a CDF/CDE dashboard in a third-party application and how to perform debugging.