RapidMiner Advanced Reporting Extension Released!


The idea is to use the capabilities of RapidMiner to automate any regular reporting task that results in an Excel sheet. Now you can simply start at the beginning to create a nearly zero overhead reporting, even if you don’t have or can’t use real business intelligence tools like tableau or qlik.

Download :
  RapidMiner Advanced Reporting Extension 

Let's start using it -

Create a Template in Excel


First we create a dummy sheet and add all of the desired layout components, diagrams, texts and of course areas for data. We can use any formatting, chart type or conditional coloring. Just one thing is important: We need to reserve space for inserting the data.

What will happen later is, that we overwrite parts of the content of the table with data from RapidMiner.
Insert some dummy values so that you can see the charts in action.


Create a process in RapidMiner to load the data

RapidMiner is very versatile to get the data into the shape you want. It can read and combine many different formats and sources and then aggregate, join, pivot and process the data into the shape that you need it. On the right you see a process combining data from four different sources with multiple joins and preprocessing steps to match the data. Such a process could just deliver us the data we want to put into our nice Worktime sheet. The trick is that we can leverage the entire flexibility of RapidMiner to get the data we want to put into an Excel sheet.

Open Report


Once we have the data in the desired format, we add an Open Report (Excel) operator from our extension. You see it on the right hand side in the operator tree. We need to point the operator on two files: The template file we created and saved in Step 1. You can either use the parameter form template file or the tem input port. The second file can be specified as target file parameter or by using the tar output port.

The ports allows you to handle the files conveniently in scenarios where you want to do stuff with them in the process later. You could even create a template file in a RapidMiner process, or less fancy and more realistic: Store the file in the repository of a RapidMiner Server to share among many users. 

Any data we want to insert into the Excel file, we need to forward to the input ports of the Open Report (Excel) operator. We will use the data delivered to these ports in the inner subprocess to do the actual insertion.

Insert Tabluar Data

If we entered the inner process of the Open Report (Excel), we can add the Write Data Entry (Excel) operator to insert an ExampleSet into the excel. The operator allows to select which attributes to use and where to place it. Therefore you specify the sheet where it will be insert by it’s index. Then you point it to a fill range. For our little employee table from Step 1, we set it to B11:C13. Unless we select fit to range, the process will now fail if our data does not fit into this range. We will add another operator of this type to output the second table.

Insert Data


Therefore we first use a Generate Macro operator from RapidMiner’s core functionality to create a process variable containing the current date and time. We then add a Write Cell (Excel) operator from the Advanced Reporting Extension and connect the ports. Although there will be no data flowing from the Generate Macro operator to the Write Cell (Excel) operator, the connection makes sure that the Generate Macro will be executed first and set the process variable before it is read.

Then we just need to point the Write Cell (Excel) operator to the right fill position, which is F5 in our case. Setting the value and type correctly and we are good to go.

Short notice on dates: There is an unlimited number of different date formats out there. If you want to write a date to excel, you first need to parse the date format that the value has in RapidMiner. So if you enter something like 2017-03-29 23:59:59 as value, you should enter “yyyy-MM-dd HH:mm:ss” in the date format parameter of the Write Cell (Excel) operator. Once it knows the date, it will automatically transform it in the correct format of the Excel Template Sheet, where you set it with the Cell Format.