CAS Registry Number Validation using PL SQL

CAS Registry Numbers are universally used to provide a unique, unmistakable identifier for chemical substances. CAS stands for Chemical Abstracts Service, a division of the American Chemical Society, that assigns a unique numerical identifier to every chemical substance described in the open scientific literature.

The CAS Registry database contains more than 113 million unique organic and inorganic chemical substances, such as alloys, coordination compounds, minerals, mixtures, polymers and salts, and more than 66 million sequences. CAS Registry contains substances reported in the literature back to the early 1800s and over 15,000 substances are added daily. [1]

A CAS Registry Number itself has no inherent chemical significance but provides an unambiguous way to identify a chemical substance or molecular structure when there are many possible systematic, generic, proprietary or trivial names. [2] These numbers are assigned in a sequential, increasing order when the substance is identified by CAS scientists for inclusion in the CAS REGISTRY database,

A CAS number is separated by hyphens into three parts, the first consisting from two up to seven digits, the second consisting of two digits, and the third consisting of a single digit serving as a check digit. The check digit is calculated by taking the sum of last digit times 1, the previous digit times 2 and the still previous digit times 3 and so on, and finally computing the remainder upon dividing the sum by 10.

For example, the CAS number of water is 7732-18-5: the checksum 5 is calculated as (8×1 + 1×2 + 2×3 + 3×4 + 7×5 + 7×6) = 105; 105 mod 10 = 5.

I wrote a PL SQL function to validate CAS Registry Numbers for use in one of the database applications for a semiconductors company for managing the chemicals used in their chip manufacturing facilities.

The function f_is_valid_cas_number takes a CAS number as input and returns 1 if it is valid and 0 otherwise. The following examples show how the function can be used:

-- Valid CAS Number for Water, returns 1
SELECT f_is_valid_cas_number('7732-18-5') FROM dual;

-- Invalid syntax, returns 0
SELECT f_is_valid_cas_number('7A2-181-522') FROM dual;

-- Valid syntax but incorrect checksum, returns 0
SELECT f_is_valid_cas_number('1-11-5') FROM dual;

In an Oracle PL SQL program, the function can be used to set a flag as follows:

-- Valid CAS Number for Water, sets is_valid_cas_number flag to 1
is_valid_cas_number NUMBER(1) := f_is_valid_cas_number('7732-18-5');

The code for the function is available in the CAS Number Validator repository on Github. I hope you find it useful.

ORA-03113: end-of-file on communication channel

One of the lesser used features in an application using Oracle Database installed on a UNIX server stopped working recently throwing the ORA-03113 error. The oerr utility provided the following information:

$ oerr ora 3113

ORA-03113: end-of-file on communication channel

Cause: The connection between Client and Server process was broken.

Action: There was a communication error that requires further investigation.

This meant that one of these most probable things had happened:

  • The network connection was lost
  • The server process had crashed

(Note that ORA-03113 is a generic error and there are over 1000 bugs related to it on Oracle Support!)

First possibility was quickly eliminated as the error wasn’t encountered just once or twice which might happen due to that rare, momentary glitch in network connectivity. But it did happen every single time that particular feature was used.

This meant that something in the function invoked, when the feature in question was run, was causing the server process to die. Running through the code using a test script led to the point of failure.

A simple dynamic SQL query was being executed without event.

-- The following query is being used multiple times later on
v_sqlstr := 'SELECT column1, column2, SUM( column_qty ) AS proposed_qty
             FROM a_global_temporary_table	-- or any table
             WHERE column1 = ' || column1_value ||
             ' GROUP BY column1, column2' ||
             ' ORDER BY MOD( SUM( column_qty ) )';

-- Count the number of rows returned by the "ordered" query above
EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM (' || v_sqlstr || ')' INTO v_count;

But why?

Playing around with bits and pieces of the query stored in v_sqlstr led to the finding that when trying to get COUNT(*) using v_sqlstr as the source, including the MOD() in the ORDER BY clause of the query was causing the server process to crash. Weird!

To fix the issue, and because you don’t need an ordered result set to take a count, I removed the ORDER BY clause from the original query and appended it back only after taking the count, as shown below:

-- The following query is being used multiple times
v_sqlstr := 'SELECT column1, column2, SUM( column_qty ) AS proposed_qty
             FROM a_global_temporary_table	-- or any table
             WHERE column1 = ' || column1_value ||
             ' GROUP BY column1, column2';

-- Count the number of rows returned by the query above
EXECUTE IMMEDIATE 'SELECT COUNT(*) FROM (' || v_sqlstr || ')' INTO v_count;

-- Append the ORDER BY clause to the original query
v_sqlstr := v_sqlstr || ' ORDER BY MOD( SUM( column_qty ) )';

And things were back on track!

This is just one of the many possible causes and fixes for this error. I hope you find it useful in case you get stuck in a similar situation. Cheers!