Tag Archives: DBA

Oracle eBS 11i Infrastructure

Oracle eBS 11i Infrastructure

 

In this article we will describe the infrastructure of Oracle eBusiness Suite (eBS). In its simplest form, eBS is a 3-tier application with a client tier, Application-tier and DB-tier.

Database-Tier

Let’s start with the DB-Tier. Surprisingly, the database tier has only very little eBS specific features.

Of course we need a database (instance) and therefore an ORACLE_HOME. But the database can either be a single-instance or a RAC-installation and all Oracle RDBMS features are transparently available for eBS.

The management of the RDBMS Installation is also independent of eBS.

 

DB-Tier filesystem

Let’s start with the filesystem on the DB-Tier. Of course there is an Oracle_home installation needed, for the RDBMS-Instance. This will be installed during installation of eBS. But also a fresh installed ORACLE_HOME can be used, with an eBS database.

In the ORACLE_HOME, an extra directory is added. The Appsutil directory. This directory contains the software and data needed for running Autoconfig and Rapidclone.

All other directories are at the discretion of the eBS DBA.

Oracle Instance

When we look at the instance to run eBS, we find a number of mandatory parameters for eBS. These are found in Metalink notes 216205.1 and 396009.1 (At the time of writing. Please verify these notes for yourself).

These parameters are recommended or mandatory based on testing by Oracle Corp. They will automatically be set by the eBS installer. But you should take note of them when you use a fresh installed ORACLE_HOME.

Then we finally come to the contents of the database.

The eBS Database

Let’s start with the schemas in the database. Oracle eBS creates a separate schema for every module. The schema is named as the short_name of the module, for example AP (Oracle Payables / Accounts Payable), AR (Oracle Receivables / Oracle Receivables).

There is a separate schema for the Application owner APPS.

The Application schemas contain the tables, indexes and sequences for the different applications. All objects in these schemas (except indexes, of course) have a synonym in the APPS Schema. In the APPS Schema we also find all PL/SQL objects, views and Materialized Views.

A major part of eBS is written in PL/SQL. All PL/SQL objects are also installed in the APPS Schema.

User sessions within eBS will usually run in the APPS Schema as well.

That brings us to an extra schema in the database: APPLSYSPUB. This schema has access to some of the eBS tables and packages, that allow it to validate eBS logins and start an APPS-session based on that login information. We will see the details of this later on.

Before release 11.5.2 every schema had its own tablespace. However, the number of modules for eBS (and with that the number of schemas) is ever increasing. So managing the database became more and more complex. In 11.5.2 Oracle introduced the Oracle Applications Tablespace Model (OATM). Within this model, the tablespaces in eBS are based on functionality, rather than schemas.

In this model, we see the following tablespaces:

APPS_TS_TX_DATA – Containing all transaction tables, Materialized Views and IOT’s

APPS_TS_TX_IDX – Containing all indexes for transaction tables

APPS_TS_SEED – Containing the tables and indexes with seeded data (as opposed to transaction data).

APPS_TS_INTERFACE – For Open Interface tables

APPS_TS_SUMMARY – Contains summary tables for several modules (AR, PA, BIM, etc)

APPS_TS_NOLOGGING – For tables and objects that are created with the NOLOGGING option

APPS_TS_ARCHIVE – Containing archive and history tables (and indexes)

APPS_TS_QUEUES – Containing the AQ (Advanced Queuing) objects

APPS_TS_MEDIA – Containing tables with LOB’s. For media objects or documents.

The Undo and Temp tablespaces are not part of the tablespace model.

 

Application Tier

Now it’s time to look at the Application Tier. In fact the Application Tier consists of 3 different services: Web service, Forms service and Concurrent Processing. In 11i installations, there is also an Administration service.

The Application Tier significantly changed from R11i to R12. We’ll discuss the 11i Apps Tier shortly, and then discuss the R12 tier in more detail.

The 11i infrastructure

Both the R11 and R12 infrastructure consist of a Web-service, a Forms Service and a Concurrent Processing part.

We will be discussing the different services here. The following picture shows all components and their communications. You might want to keep it for reference during this article.

 

 

R11i Web Service

For 11i, the web tier is built on Oracle iAS 9i. The iAS installation provides webservices (Apache HTTP/HTTPS), a Java Runtime Engine (JSERV) and a PL/SQL engine (modplsql).

The web service also acts a gateway for the Concurrent Request log and output files. And it is the first point of access when starting a forms session. (When using a socket forms connection, when using a forms servlet the web tier will host the forms process).

A detail from the picture above shows the iAS structure.

 

The core of iAS is the web server. This is the front-end for the client. Requests can also be forwarded to and from the forms server and the concurrent processing. We’ll see that in the next paragraphs.

Within the iAS Jserv and modplsql are plugins. They are the only components that communicate with the database. When they are called, they execute java (Jserv) or PL/SQL (modplsql) and return an html page. This page is then sent to the client through the HTTP service.

The Jserv delivers a Java Runtime Environment. In the Jserv, java servlets can be run. Also the JSP-files are executed in the Jserv. A JSP-file (Java Server Page) is a page with java code that returns an html-page (similar to the way scripting languages like PhP work). The java part is executed in Jserv, which returns the html to the webserver. The webserver redirects the html to the client.

 

Let’s take a closer look at the components and their executions:

 

Webserver

The webserver is based on the regular Apache 2 webserver. The configuration file is also equal to the Apache config file. The configuration is set in httpd.conf (or httpds.conf for SSL).

Instead of starting the webserver through $APACHE_HOME/bin/httpdctl, we start through $APACHE_HOME/apachectl. The default port number used for eBS 11i is 8000. This is part of port-pool 0. For different port-pools, the port number is increased. So for port-pool 1 the webserver runs on port 8001.

The root directory for the webserver is set to $OA_HTML, which is by default $COMMON_TOP/html. This directory contains all *.html files for eBS.

A number of virtual directories are set up within eBS.

 

JServ

As mentioned before, java code is executed by Jserv. Jserv is a java servlet engine. That means that it can run both servlets and jsp-files.

These servlets are mostly located in the $JAVA_TOP. The *.jsp files are located in the $OA_HTML directory.

One of the options of Jserv is to create a database connection to the rdbms database. This is done by a JDBC Thin Client connection.

Before we look at the configuration for Jserv, examine the following picture.

Within Jserv, we can define different java environments, called zones. These zones are configured with different servlets or java archives (jar-files). Each zone is configured with its own configuration file. Within the zone the startup parameters (initargs) for the servlet are defined.

On the other side of the picture, you see a group. All java processes within Jserv are grouped together. You must define at least one group. The default group is ‘OACoreGroup’. Within each group, we create one or more processes that will be mapped to our zones.

This mapping is done by mounting the zones and the groups to different logical directories. In the picture, a mountpoint is created: /oa_servlets/. It refers to the group ‘OACoreGroup’, which holds 3 java processes. And it is mapped to zone ‘root’, which includes the servlet ‘dummy’ with a set of startup parameters.

When iAS receives a call to the virtual directory /oa_servlets/ it will be recognized as a Jserv mount point and the request will be forwarded to Jserv. In this example Jserv has 3 java processes in the group for this mount point. And they will be able to run all the servlets in the zone.

Sounds complicated? Take a look at the following configurations:

Jserv.conf:

ApJServGroup OACoreGroup 3 1 /etc/oracle/iAS/Jserv/etc/jserv.properties

ApJServGroupMount /oa_servlets balance://OACoreGroup/root

 

Jserv.properties:

 

wrapper.bin=/opt/oracle/iAS/Apache/Apache/bin/java.sh

zones=root<host>

root<host>.properties = /etc/oracle/iAS/Jserv/etc/zone.properties

 

Zone.properties

servlet.Dummy.initArgs=message=I'm a dummy servlet

 

Within jserv.conf we define a group called OACoreGroup. This group is running 3 Java processes. And the definition of the group is in the jserv.properties file. The 1 indicates the weight for load-balancing with multiple groups.

Then we mount the zone ‘root’ to the group ‘OACoreGroup’. This mount point is linked to the virtual directory /oa_servlets/.

The virtual directory is used for redirection to the JServ. When a request is made to the virtual directory Jserv will be called. The part of the URL after the virtual directory is the path to the servlet. This path will be searched for in the $CLASSPATH.

When the java servlets need to connect to the database, they can build a connection using JDBC. The access information is stored in a *.dbc file in $FND_TOP/secure. The dbc-file is referred to in the parameters for the zone.

 

Modplsql

Let’s take a look at the modplsql module. This module is designed to run pl/sql procedures within the database. The connection is based on the wdbsvr.app file. This file contains the DAD (Database Access Descriptor), including the access data to the eBS database.

The module is also called through a virtual directory. For example http://<host>:<port>/pls/TESTDB/ dummy. /pls/ is the virtual directory that refers to modplsql. TESTDB is the name of the DAD and dummy is the name of a pl/sql procedure accessible for the db-user from the DAD.

 

That concludes the 9i iAS module for now.

 

Formsserver

Let’s take a look at the forms server.

Oracle forms can be set up in two ways, socket connection and servlet. The default is socket connection. With a socket connection, a separate forms server and dedicated forms processes are used. For the servlet connection, a java servlet is called within the iAS.

The formsserver itself is installed in the 8.0.9 ORACLE_HOME. Forms has a forms server, and one or more client processes. The forms server is started with f60svr. It will spawn a f60webmx process for every client session connecting.

On the server side, forms are run in the forms client processes. On the client side, they are run in a java applet. When the client clicks a forms based function in the Personal Home Page, it calls an URL that refers to the forms client executable in the 8.0.6 ORACLE_HOME.

This URL is taken from the profile option ‘ICX: Forms Launcher’, and the default value is like ‘http://<server>:<port>/dev60cgi/f60cgi’. The parameters referring to the function being clicked are added to this URL as parameters. (i.e. the name of the form to be started)

When this URL is called, the webserver will execute the executable f60cgi. This executable returns a HTML page to the client. This page is called the ‘Base HTML’ for this forms server. (by default this is $OA_HTML/US/appsbase.htm)

This HTML page calls the J-initiator plugin (or the native JVM when configured). It also includes the parameters to connect to the forms server and the name of the form to start.

The J-initiator will start an applet on the client, which connects to the forms listener process. The forms listener process then assigns a dedicated forms client process.

At this point the whole chain looks like this:

 

The configuration for the forms server is in the appsweb.cfg file in ($OA_HTML/bin). This file contains the basic coloring scheme for the forms server, the forms settings and the referral information to the J-Initiator plugin. The plugin on the client side is called through its class-id, which is also set in the appsweb.cfg.

 

Concurrent Managers

The last part of the application tier is the concurrent processing part. The concurrent managers are used to execute background and batch processes.

Different executables including host-command files, pl/sql procedures, Oracle Reports, SQL*Loader control files and Binary executables can be defined to be run as concurrent programs. Parameters are also optionally defined with the concurrent programs.

The executable files are defined separate from the concurrent programs. So an executable can be run as different programs with different parameters.

The programs are executed through ‘Requests’. A request is started as a concurrent program and the values of its parameters. It can be scheduled to start at a specific time, or in a specific schedule. The output of the program can be sent to a printer. It is also available through the application.

This picture shows the relation between programs, requests and managers.

 

The managers

We’ll take a closer look at the concurrent programs later. Let’s first look at the concurrent managers. There are concurrent managers and transaction managers. Also a number of control managers are defined.

We’ll start with the ‘Internal Manager’. This is the first manager to be started. Its purpose is to control the stopping and (re-)starting of the other managers. When Generic Service Management is enabled (default as of 11.5.7), it delegates to the ‘Service Managers’. On every node where concurrent processing is enabled, a ‘Service Manager’ is started. However, only one Internal Manager is running at any time.

The other concurrent managers are defined with a work shift that controls how many processes a concurrent manager should have at certain times. The work shifts consist of a time-range and a number of processes. The Service Managers (or Internal Manager) will start and stop processes according to these work shifts.

Another part of the setup of concurrent managers is their specialization rules. The specialization rules indicate which programs are valid for a concurrent manager, or are excluded for that manager. They work on an include/exclude principle. When programs are included for that manager, the manager can only run those programs. When programs are excluded, the manager can run any program except the excluded ones.

When a request (to run a program) is submitted from eBS, it will be placed in FND_CONCURRENT_REQUESTS with a status_code ‘I’ (The eBS forms have fewer statuses than the codes in the table). The manager processes will query this table for requests that they are eligible to run.

Once a manager process finds a request with status_code ‘I’, which it is eligible to run then it will put the request on its own queue. It will then run the executable with the defined parameters. The logfile and outputfile are written to the filesystem in $APPLCSF/$APPLLOG resp. $APPLCSF/$APPLOUT.

There are some special cases that need to be discussed. The first is the incompatibility. Concurrent Programs can be made incompatible with each other. That means that they cannot run at the same time. Once a program is started that is defined as incompatible with another, it will be automatically put on the queue for the ‘Conflict Resolution Manager’. This special manager will check if any incompatible program is running or ‘Pending’ with code ‘I’. If so, it will hold the request on its own queue. If no incompatible program is running or ‘Pending’, then it will set the status of the request to ‘Pending’.

Another special case are the ‘Transaction Managers’. They are started and stopped the same way as the other concurrent managers. But they do not use the request queue. Transaction managers are called online from the eBS forms. And they execute a limited number of programs. These programs defined within their executables. They are called through the ‘FND_TRANSACTION.SYNCHRONOUS’ procedure, which uses the ‘DBMS_PIPE’ package.

 

The programs

It’s time to look at the concurrent programs. As mentioned before, a concurrent program is an instantiation of an executable. The executable is defined with a short name, an application (module), a filename and a method.

The short name will uniquely identify the executable. The other data is needed to determine what should be run for this executable. If the executable is an OS-based program, the application will be used to derive the directory on the file system where the executable is found.

When the executable is defined as PL/SQL, the filename will contain the procedure that needs to be run.

The concurrent program is defined as the executable with an (optional) set of parameters. It also has some properties for the printing of the output (print-style, pre-defined printer, size of the output).

Depending on the type of executable that needs to be run, the parameters will be sent to the executable ordered or named. For PL/SQL and host files, the parameters are ordered. And the order in which they are defined in the form defines how they will be sent to the executable. For reports the parameters are named, which means they are sent as <parameter>=<value>, ….

After the request has finished, the request table ‘FND_CONCURRENT_REQUESTS’ will be updated with the status information and a reference to the log- and output file. During the execution of the request, a status_code and phase_code are updated. The exact values of these fields are described in one of the next articles. That will go deeper into concurrent processing.

The output

The log- and output file from the requests are written to the directories $APPLCSF/$APPLLOG and $APPLCSF/$APPLOUT. But they of course also need to be made available to the user. This is done through the ‘Applications Report Review Agent’.

There is quite a lot of setup that can be done for the whole process. But within the scope of this article, we’ll only look at the basic architecture.

From the eBS form the log and output file are available from 2 buttons. These buttons call the web server for ‘FNDWRR.exe’ (.exe on both Windows and Unix).

FNDWRR.exe is a cgi-executable that will call the ‘FNDFS listener’.

This is an 8.0.6 TNS-listener in the eBS ORACLE_HOME on the eBS application tier. One of the less known features of the TNS-listeners is that they can do more than create database connections.

In the listener.ora, you can define a program to be called when a connection is made on a certain tns-entry. That feature is used for the FNDFS listener. When it is called, it will redirect traffic to the ‘FNDFS’ (FND File System) executable. This executable will read the requested file from the file-system and send it to FNDWRR.exe.

Again, we have a schema to show the whole flow:

 

This complex retrieval is of course needed because the concurrent processing tier can be separate from the forms and web-tiers.

 

Writing efficient SQL

The other day I gave a presentation on ‘Efficient SQL’. It was the first of a number of presentations, so I started with explaining some basic concepts.

Maybe it will be interesting for other people too. So here is the summary of it.

 

MAIN RULES OF EFFICIENT SQL

 

The main rules of efficient SQL are:

·         Less is better

·         More is better

That seems easy, since everything is better. But let me explain.

Less is better: The less I/O generated the better your statement will perform. Even though somebody might be able to think of some exceptions. It is safe to keep this as a rule of thumb.

Avoid unnecessary work for your query. You can do that by selecting only the rows that you need. (That sounds obvious, but I’ll give an example soon). In complex queries, make sure that you select the smallest possible set in every part of your query. Rather than collecting a huge amount of data and then selecting what you need, select the smallest set possible before you join.

Also select only the columns that you need. One argument is that it might give Oracle a chance to skip the table access, and use an index-only access.

Think about the use of ‘select *’. Most often this is a complete waste of resources. In packaged software, it can easily lead to bugs when the table definition changes. In all situations, it will cost extra resources to collect the data, send them to the client and then filter out the data that is not needed.

 

More is better: This is of course not about I/O. It is about the information that you give Oracle about your data and your query. Add as many predicates as possible, since it can help the optimizer do a better job. When 2 tables are joined on an ID-column. But you know, from your knowledge of the data, that another column can also be used as a join-condition then use both join conditions.

It will help the optimizer work out the relationship between the tables, and select the optimal plan. If 2 columns have related data, and the predicate on one column means that the other column is restricted too, put a predicate on both columns.

 

Consider the following:

SQL> create table Xxx_inner

  2  as select * from dba_objects

  3  where object_type='TABLE';

 

Table created.

 

SQL> create index xxx_inner_n1 on xxx_inner(object_id,object_type);

 

Index created.

 

SQL> create index xxx_inner_n2 on xxx_inner(Object_type);

 

Index created.

 

SQL> create table xxx_outer

  2  as select * from dba_objects;

 

Table created.

 

SQL> insert into  xxx_outer

  2  select * from dba_objects;

 

71136 rows created.

 

SQL> create index xxx_outer_n1 on xxx_outer(Object_id,object_type);

 

Index created.

 

SQL> create index xxx_outer_n2 on xxx_outer(object_type);

 

Index created.

 

SQL> analyze table xxx_inner compute statistics;

 

Table analyzed.

 

SQL> analyze table xxx_outer compute statistics;

 

Table analyzed.

 

SQL> set autotrace traceonly;

SQL> select o.*

  2  from xxx_outer o

  3  where o.object_id in (select object_id from xxx_inner i)

  4  and o.object_type='PROCEDURE';

 

no rows selected

 

Statistics

———————————————————-

        113  consistent gets

       1064  bytes sent via SQL*Net to client

 

Consistent gets is the number of times data was read from the buffer cache into our session memory. It is therefore a good measure of the amount of work a session needed to do while executing a query.

SQL> select o.*

  2  from xxx_outer o

  3  where o.object_id in (select object_id

    from xxx_inner i

    where i.object_type=o.object_type)

  4  and o.object_type='PROCEDURE';

 

no rows selected

 

Statistics

———————————————————-

         15  consistent gets

       1064  bytes sent via SQL*Net to client

 

SQL> select o.object_id

  2  from xxx_outer o

  3  where o.object_id in (select object_id

    from xxx_inner i

    where i.object_type=o.object_type)

  4  and o.object_type='PROCEDURE';

 

no rows selected

 

Statistics

———————————————————-

         15  consistent gets

        254  bytes sent via SQL*Net to client

 

 

Here we see two tables created from ‘DBA_OBJECTS’. We join these tables on ‘OBJECT_ID’. In this case, we have some information that the Oracle Optimizer does not have. The object_id,object_type combination is the same in both tables. Unaware of this fact, Oracle has to search all object_id’s for ‘XXX_INNER’. When we tell Oracle that the object_type is the same, it can skip most of the records resulting in less I/O.

When we finally tell Oracle that we’re only interested in the object_id, we also reduce the network traffic by 75%.

 

The concepts

 

For every query, Oracle will have to collect some data. And in most cases, it will have to join one or more data-sets. Let’s see what options Oracle has for collecting data and joining the data together. In this presentation we only look at the basic options, not the more sophisticated features. So don’t expect this list to be complete. We will see how data can be retrieved from the database, and how tables / data-sets can be joined together. We will not yet go into the most efficient way to do it, because the most efficient way to retrieve data depends on many factors. Only when you have a basic understanding of the concepts, we can start thinking about the most efficient way to do things.

 

Collecting data

 

To gather data from a table, Oracle can use 3 different options, called ‘Access Paths’:

·         Full Table Scan

·         Index Scan / Table Access by Rowid

·         Index Scan

The Full Table Scan (FTS) is exactly as the name implies, a full scan of the table. All the records of the table are read into memory, and checked against the predicates in the query (where clause or Join condition). The blocks of the table don’t have to be read in any particular order. Oracle will just try to get the blocks of the table as quickly as possible.

When a sizable part of the table needs to be selected, this will be the most efficient access path. It will be the only one available, if there is no predicate available that matches (part of) an index.

The ‘Index Scan / Table Access by Rowid’, will use an index to decide which rows need to be read into memory. Then based on the rowid in the index, the correct row will be retrieved. The rowid refers directly to the position on disk where the row is located.

And the last option is the ‘Index Scan’. The difference with the previous option is that Oracle does not need to get the table data anymore. The data in the index is sufficient to answer the query.

 

So which one of these is the most efficient?

As usual, it depends. Many people think a FTS is less efficient than an Index Scan. But consider this:

SQL> create table xxx_access as select * from dba_objects;

 

Table created.

 

SQL> create index xxx_access_n1 on xxx_access (object_id);

 

Index created.

 

SQL> select object_id,object_name from xxx_access where object_id>0;

 

71139 rows selected.

 

Elapsed: 00:00:08.02

 

Execution Plan

———————————————————-

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=287 Card=78799 Bytes=6225121)

   1    0   TABLE ACCESS (FULL) OF 'XXX_ACCESS' (TABLE) (Cost=287 Card=78799 Bytes=6225121)

 

Statistics

———————————————————-

       5803  consistent gets

    2727442  bytes sent via SQL*Net to client

      71139  rows processed

 

SQL> select /*+ INDEX (xxx_access) */ object_id, object_name from xxx_access where object_id>0;

 

71139 rows selected.

 

Elapsed: 00:00:10.05

 

Execution Plan

———————————————————-

0     SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1290 Card=78799 Bytes=6225121)

1   0   TABLE ACCESS (BY INDEX ROWID) OF 'XXX_ACCESS' (TABLE) (Cost=1290 Card=78799 By=6225121)

2   1     INDEX (RANGE SCAN) OF 'XXX_ACCESS_N1' (INDEX) (Cost=161 Card=78799)

 

Statistics

———————————————————-

      10771  consistent gets

    2727442  bytes sent via SQL*Net to client

      71139  rows processed

 

The FTS needs 5803 I/O operations. While the Index Scan takes almost double at 10771 I/O’s and 2 seconds more. (out of 10 sec’s!)

Imagine what will happen, when you try this with a multi-million row table.

 

But for sure an Index Scan will be the most efficient when it can be done? Again, it depends. Consider this:

 

SQL> create table test1 (l number, txt varchar2(500));

Table created.

 

SQL> create index test1_idx on test1(l,txt);

Index created.

 

SQL> Begin

  2  For I In (Select  Level L, Rpad('ABC',500,To_char(Level)) Txt

  3            From Dual Connect By Level <= 50000) Loop

  4    Insert Into Test1 Values (I.L,I.Txt);

  5    If Mod(I.L,5)!=0 Then

  6       Delete From Test1 Where L=I.L;

  7    End If;

  8  End Loop;

  9  Commit;

 10  end;

 11  /

PL/SQL procedure successfully completed.

 

SQL> analyze table test1 compute statistics;

Table analyzed.

 

SQL> set autotrace traceonly;

SQL> select l from test1 t where l>0;

10000 rows selected.

 

Execution Plan

———————————————————-

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=773 Card=10000 Bytes=40000)

   1    0   TABLE ACCESS (FULL) OF 'TEST1' (TABLE) (Cost=773 Card=10000 Bytes=40000)

 

Statistics

———————————————————-

       1393  consistent gets

     140549  bytes sent via SQL*Net to client

      10000  rows processed

 

SQL> select /*+ INDEX(t,test1_idx) */ l from test1 t where l>0;

10000 rows selected.

 

Execution Plan

———————————————————-

   0      SELECT STATEMENT Optimizer=ALL_ROWS (Cost=3338 Card=10000 Bytes=40000)

   1    0   INDEX (RANGE SCAN) OF 'TEST1_IDX' (INDEX) (Cost=3338 Card=10000 Bytes=40000)

Statistics

———————————————————-

       4003  consistent gets

     140549  bytes sent via SQL*Net to client

      10000  rows processed

 

 

 

 

How can the Full Table Scan be more efficient?

 

Good question. To answer it, we have to look at the amount of work that Oracle has to do to get the data. We start with the FTS:

As mentioned, a FTS needs to scan all the blocks in a table. In my system, the table XXX_ACCESS was created with 1152 blocks. So basically Oracle will read 1152 blocks. However Oracle has optimized this process. One of the most noticeable optimizations is the db_file_multiblock_readcount, which tells Oracle to read multiple blocks in one I/O operation. So instead of doing 1152 reads, Oracle reads 4 (on my system) blocks at a time in 288 reads. (Take a look at the explain plan above again, and notice the cost of the Full Table Scan!).

Now how much work does an Index Scan / Table Access rowid need to do?

To answer that question, you need to know the structure of an index. An index in Oracle is a B-Tree structure. It looks like an inverted tree, with the top being the ‘Root-block’. The lowest level contains the ‘Leaf blocks’, that hold the index keys and the matching rowid’s. The index keys in the leaf blocks are sorted. So the lowest value will be in the utter-left block, the highest in the far right block.

All upper level blocks show the ranges that the lower level blocks contain.

To find a range of data, Oracle starts at the root block, and follows the pointers to the first leaf block containing an index key within the range. From here Oracle can walk the leaf blocks from left to right (or right to left, if needed).

See the following picture:

That means that Oracle has to walk the ‘height’ of the index (Index level). Then read a number of leaf blocks. And for every entry in the leaf-block, it needs to retrieve the data from the table.

In the worst case scenario, the data is spread throughout the table. And for every index key, Oracle has to retrieve a different table block. In that case, the amount of work is: (index level – 1) + number of index leaf blocks + (Number of keys * Number of table blocks).

It will be obvious that is a lot more I/O than just reading the table once. The formula is not completely correct, because Oracle does not read the same table block twice, when the next rowid from the index is in the same block.

But the formula should make it clear that with more data being retrieved, the cost of the index access / Table Rowid, is increasing faster than the cost of the Full Table Scan.

 

What happened to Index Only Access?

 

The example showed that a Full Table Scan is still more efficient than using only an index. Even though Oracle would only have to walk through the index to retrieve all the data.

This is caused by a feature of the index structure and the way the data was entered into the table. This caused the index to grow bigger than the actual table:

SQL> select table_name,num_rows,blocks from dba_Tables where table_name='TEST1';

 

TABLE_NAME             NUM_ROWS     BLOCKS

——————– ———- ———-

TEST1                     10000        772

 

SQL> select index_name, distinct_keys, leaf_blocks from dba_indexes where index_name='TEST1_IDX';

 

INDEX_NAME           DISTINCT_KEYS LEAF_BLOCKS

——————– ————- ———–

TEST1_IDX                    10000        3334

 

This situation can occur when a lot of inserts and deletes are taking place in the same transaction. The space for the deleted table rows can be reused immediately. But the space for the deleted index keys only comes available after the commit. NOTE: The space in the index will be reusable after the commit!

Another situation where this can happen is with an index key based on a sequence. So new data is only inserted at the end of the range. When you delete a lot of values on the lower end of the range (but not all). Richard Foote has some excellent material on his website about Deleted Index Keys:

http://en.wordpress.com/tag/index-delete-operations/

 

Join Mechanisms

 

I hope you’re not yet in despair. Because so far we have only retrieved data from single tables. In most cases, we need to join tables together to get our data. Now we will look at 3 basic forms of joining data-sets together. Note that it is not necessarily only tables that we join. It can also be a result set from an earlier part of your query. For example we might collect some data from an Index Only Access and then join it to data from a Full Table Scan.

Oracle has 3 mechanisms to perform Joins: Nested Loops, Hash Join, Merge Join. Let’s take a look at them.

Nested Loops

 

The Nested Loop join loops over a smaller data set, and for every record in that set, it searches a matching record in the second data set. Visually, it looks like this:

 

You can see that the first set is fully scanned. This is the Outer or Driving Set. For every record in this set, we look up a matching record in the second set. Usually this will be done through an index, even though this is not mandatory.

The efficiency of this mechanism depends on the size of Set A, and the number of records we need to  retrieve from Set B.

We will see some situations where a Nested Loops join is more or less efficient.

The second join mechanism is the

Hash Join

 

For a Hash Join, Oracle builds a hash table from a (preferably) smaller data set (Build set), where the key is a hash value derived from the join columns. Then Oracle reads the second data set (Probe set), derives the hash value on the join columns and probes the hash table for a match.

Schematically, it looks like this:

For the Hash Join, it is not relevant if the data is retrieved through an index, or from a Table scan. Both data sets need to be read fully. One of the features of hashing is that collisions may occur. A collision means that 2 different values result in the same hash value. Therefore, the full data set is stored in the hash table. When a match is found on the hash values, Oracle double-checks the actual values of the join columns to see if they match.

There is a major drawback for this mechanism. When the hash table does not fit into the available memory (hash_area_size). Then Oracle will dump parts of the hash table to disk (Temp tablespace). A bitmap of all possible hash values is kept in memory with a bit indicating if a hash-value is used or not.

So while scanning the Probe set, Oracle can check if there is a possible match. If that part of the hash table is not in memory, Oracle will set aside the records from the probe table. When finished with the probe set, the next part of the hash table is loaded into memory, and the records that were set aside are tested again. This is called a ‘Multipass Hash Join’, instead of the ‘Onepass Hash Join’ where the entire hash table is held in memory.

You can imagine that the hash join can be very efficient even without index access. However it can deteriorate quickly when a ‘Multipass Hash Join’  is needed.

That brings us to the

Merge Join

 

A Merge Join is possible when 2 datasets are both ordered on their join columns. The datasets can then be read in an alternating way. You start reading the first dataset, then you read the second dataset until you find a matching value, or exceed the value. If it is a matching value, you can start building your result set. If you pass the join value, you continue reading the first dataset again.

Schematically, it looks like this:

 The main requirement is of course that both data sets are ordered, before the join takes place. If that is the case, this is probably the most efficient join mechanism. It can handle all kinds of join comparisons, including range comparisons.

Back to efficient SQL

 

Now it is time to go back to the main focus of this article. How to write efficient SQL.

We have seen how Oracle can access data and join it together. It is the job of the Oracle Optimizer to find out the most efficient way to do that, for a given query. It does this based on statistics on the tables and indexes. It is not in the scope of this article to discuss how to gather statistics. But we will see the use of several of the statistics. These statistics include (but are not limited to) the number of rows in the table, the number of blocks, the average row length, etc. For an index, they include (but are not limited to) the number of leaf blocks, the number of keys per leaf block, the number of datablocks in the table per key, etc.

In an ideal world, we would be able to trust the optimizer to do the right thing all the time.

However, we live in an imperfect world, and the Oracle Optimizer does not always have perfect information about the data or your query. Either the statistics might not be up to date, or they might not include some dependencies within the data. Also your query might not give the optimizer all the information that you have. (see the first example in this article, where we can give the optimizer extra information by adding a predicate).

Consider this query:

Select * From

Tiny  T, –7089 rec

Small S   — 71151 Rec

Where t.Object_id = s.Object_id;

 

Which explain plan is more efficient:

 

0 SELECT STATEMENT  

1  HASH JOIN        

2   TABLE ACCESS FULL  TINY

3   TABLE ACCESS FULL  SMALL

 

Or:

 

0 SELECT STATEMENT                      

1  TABLE ACCESS BY INDEX ROWID SMALL     

2   NESTED LOOPS                       

3    TABLE ACCESS FULL         TINY    

4    INDEX RANGE SCAN          SMALL_IDX

 

The first explain plan uses a Hash Join, with Tiny as the Build Table. The second uses a Nested Loops Join, with Tiny as the Inner Table.

I hope you have decided that you don’t know which one is the most efficient. Because you cannot know based on the information you have. If I selected 2 non-overlapping sets of data, the Nested Loop Join might be more efficient.  It would scan Tiny for 7089 records, then do 7089 index lookups on Small (without result), so it would need to read approximately 7089+7089=14178 blocks.

 

If the 2 sets would have a 1-n relationship, and every occurrence of object_id in Small is also in Tiny, both tables would need to be read fully and the Hash Join would be more efficient.

 

That means that to write efficient SQL, you’ll need to have an understanding of the data. And you must have considered the optimal execution plan for your query. When you write your query, keep in mind the mantra in first part of this article. Less is Better, More is Better. You want to give Oracle as much information as possible, and you want to get as small as possible result sets.

 

The last part that we will do in this part, is to look at the

Explain plan

 

To see what Oracle will do when executing a query, you can make an explain plan. Many tools have built-in options to show explain plans on queries. Alternatively, you can use the Oracle commands:

SQL> Explain plan for

  2  Select * From

  3  Tiny  T, –7089 rec

  4  Small S   — 71151 Rec

  5  Where t.Object_id = s.Object_id;

 

Explained.

 

SQL> select * from table( dbms_xplan.display() );

 

PLAN_TABLE_OUTPUT

——————————————————————————–

Plan hash value: 803478362

 

—————————————————————————-

| Id  | Operation          | Name  | Rows  | Bytes | Cost (%CPU)| Time     |

—————————————————————————-

|   0 | SELECT STATEMENT   |       |  7089 |  1287K|  1264   (1)| 00:00:16 |

|*  1 |  HASH JOIN         |       |  7089 |  1287K|  1264   (1)| 00:00:16 |

|   2 |   TABLE ACCESS FULL| TINY  |  7089 |   650K|   213   (0)| 00:00:03 |

|   3 |   TABLE ACCESS FULL| SMALL | 71151 |  6392K|  1049   (1)| 00:00:13 |

—————————————————————————-

 

 

PLAN_TABLE_OUTPUT

——————————————————————————–

Predicate Information (identified by operation id):

—————————————————

 

   1 – access("T"."OBJECT_ID"="S"."OBJECT_ID")

 

15 rows selected.

 

I took the previous query, and ran an explain plan on it. First I should tell you how I created tiny and small, so you can verify:

 

create table tiny as select * from dba_objects where mod(object_id,10)=0;

 

create table small as select * from dba_objects;

 

create index tiny_idx on tiny(object_id);

 

create index small_idx on small(object_id);

 

Analyze Table Tiny Compute Statistics;

 

analyze table small compute statistics;

 

You can see that both tables were created from dba_objects and tiny is a subset of small. Therefore a Hash Join makes most sense.

 

Let’s take a closer look at the explain plan now. Keep in mind that all the numbers are estimates from the optimizer. They are based on the statistics available, and the runtime numbers might be completely different.

The execution starts at the rows furthest right. And top down. In this case that means that we start with a Full Table Scan of Tiny. Then we do a Full Table Scan of Small, and finally we Hash join the 2 sets together.

The result of the Hash Join is returned, and becomes the result of the query (Select).

 

Let’s take a closer look at the Full Table Scans. After the name of the table, we see ‘Rows’,’Bytes’,’Cost’ and ‘Time’.

The ‘Time’ is an estimate of the amount of time it will take to complete this step. It is useful in query tuning  to see which step will take the most time.

The ‘Bytes’ are an estimate of the amount of data used to complete this step. I find it useful when a hash join is involved. Because when it exceeds my hash_area_size, Oracle will need a ‘Multi-Pass Hash Join’.

The ‘Cost’ is often used for query tuning, and people will try to ‘tune down’ the cost. This makes sense, because the cost is an estimate for the amount of I/O Oracle needs to do for this step. However, there is a logical trap in this.

Oracle has used the cost of different explain plans for the original query to decide the most efficient one. Now when we change the query, the cost can no longer be compared to the original query. After all, it is a different query. It will return a different result. Unless, of course there is some information about the data we are selecting, that we didn’t give the optimizer initially.

 

To me, it makes more sense to look at the ‘Rows’. This is the number of records Oracle expects from that step in the query. If this is very different from the number you are expecting, something is wrong. Either the statistics are outdated, or the Optimizer is missing some information that you have, or you are not selecting the data you are expecting.

 

In the query above, we see that Oracle has made the perfect assumptions. Tiny will return 7089 rows, Small will return 71151 rows and there is a 1-1 relationship  between them. So the join will return one record for each record in Tiny. No use looking for a more efficient plan here. Unless again, we did not query what we are looking for.

 

 

 

Oracle workflow for eBS DBA’s (Part 4b ;-)

 A reader pointed out that I didn’t fulfill my promise to write about suspend, resume and abort in part 4 of ‘Oracle workflow for eBS DBA’s.

 
So to make up for that omission, I will first write a separate note about it here. Then I’ll incorporate it in the article at a later time. 
 
Let’s start with the abort. 
In part 3, we saw that we can get workflows in a state where they will never be able to continue again. We did this by setting the ‘On Revisit’ property to ignore. 
 
The correct way to handle these workflows is to run a workflow background engine with parameter ‘process_stuck’ set to ‘TRUE’. That will set the item status to ‘Error’ and run the appropriate error process. 
 
But there may be reasons where you want to just abort the item, without error processing. 
 
For those situations Oracle provides the ‘wf_engine.abortprocess’ API. 
The API will set the status of the process to complete. If a result is needed, you can set this on the call to the API. It defaults to ‘#FORCE’ (wf_engine.eng_force constant).
 
Let’s see how this works. First I used the ‘MAIN_DBA_PROCESS’ from part 3 of the series, and I set the ‘On Revisit’ for the ‘LOOP_COUNTER’ to ‘Ignore’.
Now when I run the process, we get this result: 
 
INSTANCE_LABEL   FUNCTION                      BEGIN_DATE         END_DATE           STATUS   RESULT   OUTBOUND_QUEUE_ID
DBA_MAIN_PROCESS                               31-7-2009 12:36:58                    ACTIVE   #NULL 
START            START                         31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE #NULL 
INITIALIZE_FLOW  XXX_WF_DBA.init               31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE COMPLETE 
COMPARETEXT      WF_STANDARD.COMPARE           31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE EQ 
CHECK_INVALIDS   XXX_CHECK_INVALIDS            31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE Y 
GET_INVALIDS     XXX_WF_UTILS.get_invalids     31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE  
LOOPCOUNTER      WF_STANDARD.LOOPCOUNTER       31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE LOOP 
PROCESS_INVALIDS XXX_WF_UTILS.process_invalids 31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE  
DEFER            WF_STANDARD.DEFER             31-7-2009 12:37:33 31-7-2009 12:37:33 COMPLETE #NULL 
AND              WF_STANDARD.ANDJOIN           31-7-2009 12:37:33                    WAITING  
TRACK_FLOW-1     TRACK_FLOW_PROGRESS           31-7-2009 12:37:42 31-7-2009 12:37:42 COMPLETE  
 
 
Now we can abort the item with the API:
 
 
begin
         wf_engine.abortprocess(itemtype=>'DBA_TYPE'
                               ,itemkey=>'30'
                               ,process=>'ROOT:DBA_MAIN_PROCESS');
end;
 
Note how we have to indicate that we want to abort the root process of the DBA_MAIN_PROCESS. The workflow engine needs to know unambiguously which process to abort. The way to do that is to set ‘process:<activity>’ to indicate the process. In our case this would be ‘ROOT:DBA_MAIN_PROCESS’.
 
And this is the result afterwards. 
 
INSTANCE_LABEL FUNCTION BEGIN_DATE END_DATE STATUS RESULT OUTBOUND_QUEUE_ID
DBA_MAIN_PROCESS                               31-7-2009 12:36:58 31-7-2009 12:39:10 COMPLETE #FORCE 
START            START                         31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE #NULL 
INITIALIZE_FLOW  XXX_WF_DBA.init               31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE COMPLETE 
COMPARETEXT      WF_STANDARD.COMPARE           31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE EQ 
CHECK_INVALIDS   XXX_CHECK_INVALIDS            31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE Y 
GET_INVALIDS     XXX_WF_UTILS.get_invalids     31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE  
LOOPCOUNTER      WF_STANDARD.LOOPCOUNTER       31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE LOOP 
PROCESS_INVALIDS XXX_WF_UTILS.process_invalids 31-7-2009 12:36:58 31-7-2009 12:36:58 COMPLETE  
DEFER            WF_STANDARD.DEFER             31-7-2009 12:37:33 31-7-2009 12:37:33 COMPLETE #NULL 
AND              WF_STANDARD.ANDJOIN           31-7-2009 12:37:33 31-7-2009 12:39:10 COMPLETE #FORCE 
TRACK_FLOW-1     TRACK_FLOW_PROGRESS           31-7-2009 12:37:42 31-7-2009 12:37:42 COMPLETE  
 
 
 
Compare this with the result from running a background engine with parameter ‘process_stuck=>TRUE’:
 
ORA-20002: 3150: Process 'DBA_TYPE/33' is being worked upon. Please retry the current request on the process later.
ORA-06512: at "APPS.WF_CORE", line 300
ORA-06512: at "APPS.WF_ENGINE", line 4528
ORA-06512: at line 2
 
Of course this error can be captured and handled as we saw in ‘Oracle workflow for eBS DBA’s (Part 4)’
 
Then lets take a look at the ‘wf_engine.Suspend’ function. This is basically a ‘pause’-API for a workflow item. It sets the active process to ‘SUSPEND’. 
 
The workflow engine will not pick it up any more until the wf_engine.resume API is called. 
 
Let’s see the resume and suspend with a small example. I used the same dba_control_process. After launching it, it will be deferred. Instead of running a background engine, we suspend it. 
 
begin
     wf_engine.suspend(itemtype=>'DBA_TYPE',itemkey=>'34');
end;
 
And the status becomes:
 
INSTANCE_LABEL      FUNCTION          BEGIN_DATE         END_DATE           STATUS   RESULT OUTBOUND_QUEUE_ID
DBA_CONTROL_PROCESS                   31-7-2009 18:30:45                    SUSPEND  #NULL 
START               START             31-7-2009 18:30:45 31-7-2009 18:30:45 COMPLETE #NULL 
DEFER               WF_STANDARD.DEFER 31-7-2009 18:31:02                    DEFERRED #NULL  6FFEFFF31C2604F5E0440003BAB3AD6B
 
The interesting thing is that the deferred status is still there. Including its queue_id. So when we run a background engine. We’ll see that it indeed picks up the item. It dequeues the message but leaves the status on deferred. 
 
Now when we resume the process:
 
begin
    wf_engine.resume(itemtype=>'DBA_TYPE',itemkey=>'34');
end;
 
The function is performed, and the item continues as usual.
 

Oracle Workflow for eBS DBA’s (Part 1)

For many Oracle eBS-DBA’s workflow is a strange and hardly understood module.Still it is widely used in 11i and 12i. So let’s dive into its workings in more detail.

This is part one of a so far unlimited series. Don’t be put off by the lack of code in this part. We first need to go through the basics. In the next parts we’ll get more action.
During this series, I used an 11.5.10 instance on a 9.2.0.8 database. The basic statements in these articles will still hold for earlier and later versions, but small modifications may be needed.
In this first part we go into the definitions and the basics of the Workflow engine. We start with some definitions, and then we build a simple basic workflow.
We’ll see how this relates to the wf_tables in the database.

The basic terminology
 

First. What is a workflow? A workflow is a sequence of functions and events that follow a certain path based on decisions made during the progress of the sequence.
Most of us know the pictures from workflow builder. With the pictograms for functions joined together with lines.
That set is a definition of a workflow. In the Oracle workflow world it is called a ‘process’. The nodes in the process can be functions, processes or notifications.

All these are grouped together in containers that Oracle called an ‘Itemtype’. The itemtype is very important, since it will become part of the primary key in the underlying tables.
The actual workflows that are running according to the definition of the itemtype are called ‘Items´. The item is started as a specific process within an ‘itemtype’. And it is uniquely identified by the ‘itemtype’ and an ‘itemkey’.

Every process consists of 2 or more nodes, which are joined together by transitions. At least 2 nodes are required, because a process needs a ’start’ and a ’stop’-node.
Ok. We talked enough for now. Let’s build a process and find out the rest along the way.
By the way, all the definitions above will be linked to a glossary later on.

Getting Started

To start building our process, we first need the itemtype. To create an itemtype, we use ‘Workflow builder’. In workflow builder, when we click the new button we are greeted with this screen:

wf builder start

On right clicking the ‘untitled’ map, we can create a new itemtype.

New_item_type

Internal name is the unique name that will be used in the table keys for this itemtype and its items. It is limited to 8 characters. So choose wisely!
Display name is the name that will be shown to users when they need to interact with items from this itemtype.
The description…….. you can guess that one.
We will discuss the other three fields in a later article.

My first Itemtype

I choose to start building a flow that will do some DBA checks and tries to fix problems or notify the DBA if there is a problem.
During the course of building this flow, we’ll stumble on different features of the workflow engine.
The first step is to build the itemtype.
I called it: DBA_TYPE.
With a display name: DBA Itemtype
And a description: Itemtype to hold DBA processes and functions.

dba_item_type

When you open your newly created itemtype, you see the components that can be created within this itemtype.
You’ll remember that the flow definition was called a process. So next we create a new ‘Process’: 

dba_main_process

Because this is a process that we will only be calling from our client, we have no use for the result type at the moment.
Later on, we’ll see nested processes, where the result of a process will determine the direction of the calling process.
When we go to the Process Detail (right click the process). We again have a virgin drawing board.
This will be where the actual flow is created. Every process consists of activities (functions, notifications and processes) and the transitions between them (based on the results of the activities).
Also every process has to start with a ‘Start’ Activity and finish with an ‘End’ activity. (Take care to avoid loose ends, since the end finalizes the flow and gives back control, or makes the flow purgeable).
So first we create a new function to start our flow.

start_1

Note the wf_standard.noop for the function. This is a dummy function because the only purpose of this node is to indicate the starting point for the process.
Even though we named this function ‘START’, we still need to flag it as a ‘Start’ function. That is in the node tab.

start_2

We then create an ‘END’ function in the same way.
Finally we create our own function.
 

Now we have an item_type with 1 process, and 3 functions. It’s time to connect the functions together.
Right click START, and drag to INITIALIZE_FLOW. Then right click there and drag to END. The result should be like:  

process-1

Now we have a runnable flow.

You can even start it already, if you create an empty packaged function: ‘ XXX_WF_DBA.init’.

But there is more work to be done.
 

First we are going to see how this is recorded in the wf_ tables in our database in part 2 of our series.