Pages

Interview Questions on SAP HANA Architecture: Part-2

Qs. In HANA which type of tables should be preferred - Row-based or Column-based?

     SQL queries involving aggregation functions take a lot of time on huge amounts of data because every single row is touched to collect the data for the query response.
In columnar tables, this information is stored physically next to each other, significantly increasing the speed of certain data queries. Data is also compressed, enabling shorter loading times.
Conclusion:

To enable fast on-the-fly aggregations, ad-hoc reporting, and to benefit from compression mechanisms it is recommended that transaction data is stored in a column-based table.

The SAP HANA data-base allows joining row-based tables with column-based tables. However, it is more efficient to join tables that are located in the same row or column store. For example, master data that is frequently joined with transaction data should also be stored in column-based tables.

Few more important points about column table:

1. HANA modeling views are only possible for column tables. Row based tables cannot be used in modeling views.
2. For that reason Replication Server creates SAP HANA tables in column store by default.
3. Data Services also creates target tables in column store as default for SAP HANA database
4. The SQL command to create column table: “CREATE COLUMN TABLE Table_Name..”.
5. The data storage type of a table can be modified from Row to Column storage with the SQL command “ALTER TABLE Table_Name COLUMN“.

Qs. Why materialized aggregates are not required in HANA?

Since the SAP HANA database resides entirely in-memory all the time, additional complex calculations, functions and data-intensive operations can happen on the data directly in the database. Hence materialized aggregations are not required.

It also provides benefits like 

    • Simplified data model
    • Simplified application logic
    • Higher level of concurrency

Qs. How does SAP HANA support Massively Parallel Processing?

With availability of Multi-Core CPUs, higher CPU execution speeds can be achieved.
Also HANA Column-based storage makes it easy to execute operations in parallel using multiple processor cores.
In a column store data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core. 
In addition operations on one column can be parallelized by partitioning the column into multiple sections that can be processed by different processor cores. With the SAP HANA database, queries can be executed rapidly and in parallel. 

Qs. What are the advantages and disadvantages of row-based tables?

Row based tables have advantages in the following circumstances:
    • The application needs to only process a single record at one time (many selects and/or updates of single records).
    • The application typically needs to access a complete record (or row).
    • Neither aggregations nor fast searching are required.
    • The table has a small number of rows (e. g. configuration tables, system tables).
Row based tables have dis-advantages in the following circumstances:
    • In case of analytic applications where aggregation are used and fast search and processing is required. In row based tables all data in a row has to be read even though the requirement may be to access data from a few columns.

Qs. What are the advantages of column-based tables?

--------------------------------------------------------------------------------------------------------------------------
Advantages: 
    • Faster Data Access:
Only affected columns have to be read during the selection process of a query. Any of the columns can serve as an index.
    • Better Compression:
Columnar data storage allows highly efficient compression because the majority of the columns contain only few distinct values (compared to number of rows).
    • Better parallel Processing
In a column store, data is already vertically partitioned. This means that operations on different columns can easily be processed in parallel. If multiple columns need to be searched or aggregated, each of these operations can be assigned to a different processor core.

No comments:

Post a Comment