bodsexpert
bodsexpert

How to split a high volume table into multiple files

By Imran, Mohammad January 24, 2022 under BODS


You might be having a requirement to split your high volume table into multiple files, it could be N number of files. This is needed because you are not able to open or use it because of its high volume. For an example, you have one million of records in the table and you want to get those records in a file because you can cannot open the file with one million of records. Instead, you can divide it into 10 files, 100000 each. Or more, it depends on your requirement.

The same approach can be utilized to tackle dump issue of the high volume data. You just need to change the target from File to permanent table and your permanent table is unchecked to Delete before Loading. Your while loop will keep on executing untill certain number of times(defined by you) and insert all the records one by one.


Source Data

In the example below, the count of the table is very huge if you take it into a file. The count of the table is about 4.9 million as shown below. We can divide it into 5 files, the formula can be applied - 4.9M / 100000 = 5 files

Source Count

Query Design

Lets build a query in BODS. In the design below, please notice a script and a while loop has been used. Script will decide how many records you want to put in a file and while loop will executes untill it reaches to the final file generation. A dataflow in the begining has been created to generate row number on every record in ascending order so that we can generate the files based on row numbers.

Query Design

DATAFLOW Design

In the example below, the dataflow in the begining is created to short the records besed on key fields (it is upto you if you want files are created on ascending order) and to generate row number.

DF Design

Dataflow to short the records

The first query(SHORT) will short the records in ascending order.

Shorting

Dataflow to generate row number

The second query(ROW_NUM) will generate row number on all records.

Gen Row Num

Script initialization

The script first will retrieve the count of the source table, it will then apply the formula as we discussed earlier.

$G_LOOP_COUNT = $G_COUNT / 900000; will give us 5.54 and it will be treated as 5 but we need to make it 6 so that after generating 5 files, remaining records will be written in 6th file. $G_LOOP_COUNT = $G_LOOP_COUNT + 1; will make the count 6. $G_START and $G_END are defined to start and jump to the next loop after generating 900000 records from the first loop.
Script

While loop to generate files

While loop will have a dataflow to generate the files and a script to increment the loop.

Script

Dataflow inside while loop

The dataflow inside the while loop will generate the files, when loop is executed DF will also be executed with where clause which will determine how many records to be processed during the loop.

Loop

Where claus in dataflow

The where clause will ensure it generates the records from the START and to END. START will have the 0 and END will have 900000. So record which falls between START and END will be generated. After the 900000 records are processed the next loop will get started. The script after dataflow will ensure the make increment in START and END variables.

DF Loop

Script in while loop

$G_LOOP = $G_LOOP + 1 will increment the loop so that loop jumps to next looping. $G_START = $G_START + 900000 and $G_END = $G_END + 900000 will make $G_START = 900000 because it was 0 during first loop and $G_END = 1800000 because it was 900000 during first loop; So when second loop starts executing, the records will be generated from 900000 to 1800000 and so on, untill the loop reaches to 6 becaue $G_LOOP_COUNT holds 6, while loop will fail when loop count becomes 7. So total 6 files will be generated.

Script

Conclusion

I have tried list down all the steps required to generate multiple files from a huge table. This will be very usefull when you have requirement to load the data using LSMW or any other program which accepts only text file as input. The program cannot execute large file so you need to divide into the chunks. if you need more help on this or any other topic related to data migration, do connect with me.