Bald DBA – All the DBA, None of the hair

Ransomeware Attack

February 7, 2023 by Aaron

I got hit by a ransomware attack, that destroyed my server. Thankfully I have keep really good backups and was able to just wipe the server and restore my backups. Some configuration information was lost, so it’s taking me a big to get all the bugs worked out. The good news is I didn’t lose any data. Its just a pain in the ass to find the time to re-configure everything.

Aaron

RMAN table recovery error

May 20, 2022 by Aaron

I had a user delete the data in a table that they didn’t mean to. It took them a week to notify us, so no chance to use flashback to get the data back. I was not looking forward to restoring 100TB of datafiles and recovery logs just to restart a couple GBs of data. Thankfully the database is running 19c, and rman can now do a table recovery.

I tried to recover the table

run {
  set auxiliary instance parameter file to '/opt/tempam/pfile.ora';
  recover table amyers.mytable until time 'trunc(sysdate-12)'
  auxiliary destination '/opt/tempam'
  remap table 'amyers'.'mytable':'mytable_recovered';
}

I had 20TB of space provisioned to /opt/tempam. To do table recovery, the system, sysaux, undo and the source table tablespaces are restored. in my case this should come to about 3TB.

After waiting an hour for everything to restore and roll forward, I get the following error in the final step where it should be exporting the table with datapump

RMAN-06960: EXPDP> ORA-31626: job does not exist
ORA-31637: cannot create job TSPITR_EXP_enpf_BDAD for user SYS
ORA-39062: error creating master process DM00
ORA-31613: Master process DM00 failed during startup.
RMAN-05122: There is insufficient disk space 951901 MB to perform table recovery of 1921807 MB.
RMAN-05122: There is insufficient disk space 951901 MB to perform table recovery of 1058623 MB.

This is a lie, there was plenty of space.

Turns out this is a known bug, Bug 9109785 : ORA-1555 TRANSPORT TABLESPACE FAILS WITH ORA-01555 DURING THE EXPDP PHASE.

The work around is to prevent the database from being dropped after the failure by adding the “keep auxiliary” statement to the restore, and manually performing the datapump export.

run{  
  set auxiliary instance parameter file to '/opt/tempam/pfile.ora';
  recover table amyers.mytable until time 'trunc(sysdate-12)'
  auxiliary destination '/opt/tempam' keep auxiliary
  remap table 'amyers'.'mytable':'mytable_recovered';
}

Getting and formatting index ddl

September 17, 2021 by Aaron

This is a little useful sql I used this morning for stripping out all the additional information that dbms_metadata.get_ddl outputs. This strips the create index statements down to their bare bones.

set linesize 32767
SET TRIMSPOOL ON
SET TRIMOUT ON
SET WRAP OFF
set pagesize 0
 
begin
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'STORAGE',false);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'TABLESPACE',false);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'SEGMENT_ATTRIBUTES', false);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'REF_CONSTRAINTS', TRUE);
       dbms_metadata.set_transform_param (dbms_metadata.session_transform,'CONSTRAINTS', false);
end;
/
 
SELECT DBMS_METADATA.get_ddl ('INDEX', index_name, owner)
  FROM all_indexes
WHERE     table_owner = UPPER ('OWNER')
       AND table_name IN ('TABLE1', 'TABLE2');

pythonGit private repository

September 2, 2020 by Aaron

I built an online editor for modifying some files. They need to be checked into git after they are edited. Unfortunately there is no direct way of setting the username and password for the git repository. Instead you need to use http authentication, and put the username and password into the URL of the repository. This is just a little code snippet of what I needed to do to update the file. I left out the middle part updating the file, instead just put some junk code in to add a new line.

import git
import urllib.parse 
# Set the variables we are going to need
# These are completely made up, so won't run as written
l_git_username = 'amyers'
l_git_password = urllib.parse.quote('myP@$$W0rd')
l_git_repo = 'https://git.balddba.com/scm/or/someproject.git'
l_tmp_dir = '/tmp/gitproject/'
l_file_to_update = tns_tmp_dir + 'pyfile.py'

# insert the username and password, URL Encode any special characters
l_git_repo = l_repo[:8] + l_git_username + ':' + l_git_password + '@' + l_repo[8:]

# Clone the repository
git.Repo.clone_from(l_git_repo, l_tmp_dir, env={'GIT_SSL_NO_VERIFY': '1'})
repo = git.Repo(l_tmp_dir)

#Make some file modification
with open(l_file_to_update, 'a') as f:
  f.write("This is a new line\n")

# Commit the file back to the repository
l_commit_message = 'File modified in python'
repo.index.commit(l_commit_message)
origin = repo.remote(name='origin')
origin.push(env={'GIT_SSL_NO_VERIFY': '1'})

cx_Oracle ORA-24418

April 30, 2020 by Aaron

I am writing a web portal for a bunch of database tasks, and storing information in an oracle database. I am creating a connection pool but seeing this error.

> File "C:\Users\aar00287\PycharmProjects\oracle-flask\src\um\api\app.py", line 86, in getDbHostname
    db = pool.acquire()
         |    -> <method 'acquire' of 'cx_Oracle.SessionPool' objects>
         -> <cx_Oracle.SessionPool object at 0x00000262107B81B0>

cx_Oracle.DatabaseError: ORA-24418: Cannot open further sessions.
2020-04-30T08:52:35.861481-0400 ERROR ORA-24418: Cannot open further sessions.
Traceback (most recent call last):

This looks like an ORA-24418 out of sessions error. However when I check the database, there are only 120 of the 1200 available sessions being consumed. The alert logs don’t show any error at all which is really confusing.

After some digging, I found a bug report which states you will an ORA-24418 error if the pool is in the process of being resized.

Here is my session pool creation

pools = cx_Oracle.SessionPool(l_user,l_pass, l_server + '/' + l_service, encoding="UTF-8", min=5, max=50, increment=1, threaded=True)

adding the set mode parameter to the pool creation will cause a call to wait while a connection pool is started or resized instead of immediately erring out.

pool = cx_Oracle.SessionPool(l_user,l_pass, l_server + '/' + l_service, encoding="UTF-8", min=5, max=50, increment=1, threaded=True, setmode = cx_Oracle.SPOOL_ATTRVAL_WAIT)

Problem solved

Patching 19.3 to 19.6

March 19, 2020 by Aaron

I’m not going to go through the entire patching process, there are lots of other sites that can walk you through this. What I am going to talk about is some issues that I had after patching. The patch to the binaries went fine, however when I tried to run Datapatch against my database, I got the following error


[oracle@oradb1 OPatch]$ ./datapatch 
SQL Patching tool version 19.6.0.0.0 Production on Tue Mar 17 13:02:31 2020 
Copyright (c) 2012, 2019, Oracle. All rights reserved. 

Log file for this invocation: /opt/oracle/cfgtoollogs/sqlpatch/sqlpatch_1929_2020_03_17_13_02_31/sqlpatch_invocation.log 

Connecting to database... 
Error: prereq checks failed! 
Database connect failed with: ORA-12547: TNS:lost contact (DBD ERROR: OCIServerAttach) 

Please refer to MOS Note 1609718.1 and/or the invocation log 
/opt/oracle/cfgtoollogs/sqlpatch/sqlpatch_1929_2020_03_17_13_02_31/sqlpatch_invocation.log 
for information on how to resolve the above errors. 

SQL Patching tool complete on Tue Mar 17 13:02:32 2020

I have see this error before when the parameter remote_listener is not set on a database, but this is a single instance database.

I opened an SR with oracle support. After some trouble shooting, they recommended I remove the trailing slash from my ORACLE_HOME variable and restart the database.

So change
ORACLE_HOME=/opt/oracle/product/19c/dbhome_1/
to
ORACLE_HOME=/opt/oracle/product/19c/dbhome_1

after restarting the database, I got a new error

[oracle@oradb1 OPatch]$ ./datapatch -verbose 
SQL Patching tool version 19.6.0.0.0 Production on Wed Mar 18 16:52:03 2020 
Copyright (c) 2012, 2019, Oracle. All rights reserved. 

Log file for this invocation: /opt/oracle/cfgtoollogs/sqlpatch/sqlpatch_31007_2020_03_18_16_52_03/sqlpatch_invocation.log 

Connecting to database...OK 
Gathering database info...done 
Bootstrapping registry and package to current versions...done 
Determining current state...done 

Current state of interim SQL patches: 
No interim patches found 

Current state of release update SQL patches: 
Binary registry: 
19.6.0.0.0 Release_Update 191217155004: Installed 
SQL registry: 
Applied 19.6.0.0.0 Release_Update 191217155004 with errors on 18-MAR-20 04.51.13.824844 PM 

Adding patches to installation queue and performing prereq checks...done 
Installation queue: 
No interim patches need to be rolled back 
Patch 30557433 (Database Release Update : 19.6.0.0.200114 (30557433)): 
Apply from 19.3.0.0.0 Release_Update 190410122720 to 19.6.0.0.0 Release_Update 191217155004 
No interim patches need to be applied 

Installing patches... 
Patch installation complete. Total patches installed: 1 

Validating logfiles...done 
Patch 30557433 apply: WITH ERRORS 
logfile: /opt/oracle/cfgtoollogs/sqlpatch/30557433/23305305/30557433_apply_DB1_2020Mar18_16_52_52.log (errors) 
-> Error at line 7738: script rdbms/admin/prvtgwm.sql 
- ORA-03113: end-of-file on communication channel 
- Process ID: 32448 
- Session ID: 270 Serial number: 1817 
-> Error at line 7745: script rdbms/admin/prvtgwm.sql 
- ORA-03114: not connected to ORACLE 
-> Error at line 7749: script rdbms/admin/prvtgwm.sql 
- ORA-03114: not connected to ORACLE 
-> Error at line 7753: script rdbms/admin/prvtgwm.sql 
- ORA-03114: not connected to ORACLE 
-> Error at line 7757: script rdbms/admin/prvtgwm.sql

So some progress. Something was causing a core dump and the connection was terminated.

Looking at the script it was failing on, prvtgwm.sql, that script is all related to the schema gsmadmin_internal

Looking at the patching log, I see the following

GRANT datapump_exp_full_database TO package gsmadmin_internal.exchange
                                                              *
ERROR at line 1:
ORA-03113: end-of-file on communication channel
Process ID: 32448
Session ID: 270 Serial number: 1817

So looking back at the database, there are several package inside the GSMADMIN_INTERNAL schema that are invalid

======================================================
Count of Invalids by Schema
======================================================


OWNER        OBJECT_TYPE                                COUNT(*)
------------ ---------------------------------------- ----------
GSMADMIN_INT PACKAGE                                           1
ERNAL

GSMADMIN_INT PACKAGE BODY                                      7
ERNAL

GSMADMIN_INT PROCEDURE                                         2
ERNAL

GSMADMIN_INT TRIGGER                                           4
ERNAL

GSMADMIN_INT VIEW                                              4
ERNAL

MDSYS        FUNCTION                                          1
MDSYS        PACKAGE BODY                                      2
MDSYS        TRIGGER                                           1
PUBLIC       SYNONYM                                           5
SYS          PACKAGE BODY                                      3
==============================================================
Identifying whether a database was created as 32-bit or 64-bit
==============================================================

I tried running utlrp several times, however I was able to compile most of the objects manually issuing compile commands.

I was then able to run Datapatch successfully

[oracle@oradb1 OPatch]$ ./datapatch -verbose
SQL Patching tool version 19.6.0.0.0 Production on Thu Mar 19 09:33:50 2020
Copyright (c) 2012, 2019, Oracle.  All rights reserved.

Log file for this invocation: /opt/oracle/cfgtoollogs/sqlpatch/sqlpatch_3442_2020_03_19_09_33_50/sqlpatch_invocation.log

Connecting to database...OK
Gathering database info...done
Bootstrapping registry and package to current versions...done
Determining current state...done

Current state of interim SQL patches:
  No interim patches found

Current state of release update SQL patches:
  Binary registry:
    19.6.0.0.0 Release_Update 191217155004: Installed
  SQL registry:
    Applied 19.6.0.0.0 Release_Update 191217155004 with errors on 18-MAR-20 04.53.44.788558 PM

Adding patches to installation queue and performing prereq checks...done
Installation queue:
  No interim patches need to be rolled back
  Patch 30557433 (Database Release Update : 19.6.0.0.200114 (30557433)):
    Apply from 19.3.0.0.0 Release_Update 190410122720 to 19.6.0.0.0 Release_Update 191217155004
  No interim patches need to be applied

Installing patches...
Patch installation complete.  Total patches installed: 1

Validating logfiles...done
Patch 30557433 apply: SUCCESS
  logfile: /opt/oracle/cfgtoollogs/sqlpatch/30557433/23305305/30557433_apply_DB1_2020Mar19_09_34_28.log (no errors)
SQL Patching tool complete on Thu Mar 19 09:39:03 2020

wrong number or types of arguments in call to ‘CHECKDATAFILEFORSTANDBY’

January 24, 2020 by Aaron

I started seeing this today in one of our databases

RMAN> resync catalog;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of resync command on default channel at 01/24/2020 11:45:49
RMAN-10015: error compiling PL/SQL program
RMAN-10014: PL/SQL error 0 on line 2620 column 18: Statement ignored
RMAN-10014: PL/SQL error 306 on line 2620 column 18: wrong number or types of arguments in call to 'CHECKDATAFILEFORSTANDBY'

I found metalink article 2360045.1 which states that it is a catalog version mismatch, however running upgrade catalog doesn’t resolve the problem.

I remember hitting this error a long time ago, and although I can’f find the reason why, running upgrade catalog two times in a row seems to fix the problem (so typing upgrade catalog 4 times).

TDE Wallet on RAC

September 13, 2019 by Aaron

So I have been having trouble getting the oracle wallet to auto open, and it’s always messed up in gv$encryption_wallet

select * from gv$encryption_wallet;

   INST_ID WRL_TYPE		WRL_PARAMETER STATUS   WALLET_TYPE   WALLET_OR  FULLY_BAC    CON_ID
---------- ------------ ---------------------- ------------- ---------- ------------ ------
         4 FILE 		OPEN                   AUTOLOGIN     SINGLE     NO           0
         1 FILE			OPEN                   AUTOLOGIN     SINGLE     NO           0
         3 FILE			NOT_AVAILABLE          UNKNOWN       SINGLE     UNDEFINED    0
         2 FILE			NOT_AVAILABLE          UNKNOWN       SINGLE     UNDEFINED    0

According to Oracle this is is because TNS_ADMIN is not set, so I set it and several other suggested variables.

$ srvctl setenv database -d oradb1 -T "ORACLE_UNQNAME=oradb1"
$ srvctl setenv database -d oradb1 -T "ORACLE_BASE=/u01/app/oracle"
$ srvctl setenv database -d oradb1 -T "ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/"
$ srvctl setenv database -d oradb1 -T "TNS_ADMIN=/u01/app/oracle/product/12.2.0.1/db/network/admin/"

$ srvctl getenv database -d oradb1
oradb1:
ORACLE_UNQNAME=oradb1
ORACLE_BASE=/u01/app/oracle
TNS_ADMIN=/u01/app/oracle/product/12.2.0.1/db/network/admin/
ORACLE_HOME=/u01/app/oracle/product/12.2.0.1/

After restarting the database, the ACFS volume now shows

select * from gv$encryption_wallet;

   INST_ID WRL_TYPE             WRL_PARAMETER                    STATUS   WALLET_TYPE          WALLET_OR FULLY_BAC     CON_ID
---------- -------------------- -------------------------------- -------- -------------------- --------- --------- ----------
         1 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0
         2 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0
         3 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0
         4 FILE                 /u01/app/oracle/wallet/oradb1/   OPEN     AUTOLOGIN            SINGLE    NO                 0

DBSNMP long running query

September 9, 2019 by Aaron

I noticed a long running query in one of my databases this morning.

  SELECT TABLESPACE,
           ROUND ( ( (max_size_mb - used_space + free_space_mb) / max_size_mb),
                  :"SYS_B_00")
         * :"SYS_B_01"
             AS pct_free
    FROM (  SELECT ts.tablespace_name                             TABLESPACE,
                     ROUND (SUM (NVL (fs.bytes, :"SYS_B_02")))
                   / :"SYS_B_03"
                   / :"SYS_B_04"
                       free_space_mb,
                   ROUND (SUM (df.bytes)) / :"SYS_B_05" / :"SYS_B_06" used_space,
                     ROUND (
                         SUM (
                             CASE autoextensible
                                 WHEN :"SYS_B_07" THEN df.maxbytes
                                 ELSE df.bytes
                             END))
                   / :"SYS_B_08"
                   / :"SYS_B_09"
                       AS max_size_mb,
                   ts.block_size
                       AS block_size
              FROM dba_data_files df
                   LEFT OUTER JOIN (  SELECT file_id, SUM (bytes) bytes
                                        FROM dba_free_space fs
                                    GROUP BY file_id) fs
                       ON df.file_id = fs.file_id
                   INNER JOIN dba_tablespaces ts
                       ON df.tablespace_name = ts.tablespace_name
             WHERE ts.contents = :"SYS_B_10"
          GROUP BY ts.tablespace_name, ts.block_size)
ORDER BY pct_free DESC

Its computing tablespace size. The average runtime is 272.03 sec, or 4.5 minutes. This isn’t right, its running in a few seconds in other databases.

I started looking at the query in OEM, and noticed almost all the execution time was being spent on a FTS of X$KTFBUE.

A little googling turned up the following article:

Fixed Table x$ktfbue has not statistics

My issue appears to be the same, there are not statistics on teh X$KTFBUE table.

 SYS pp1oim1> exec print_table('select * from dba_tab_statistics where  table_name=''X$KTFBUE'' ');
OOWNER                        : SYS
TABLE_NAME                    : X$KTFBUE
PARTITION_NAME                :
PARTITION_POSITION            :
SUBPARTITION_NAME             :
SUBPARTITION_POSITION         :
OBJECT_TYPE                   : FIXED TABLE
NUM_ROWS                      :
BLOCKS                        :
EMPTY_BLOCKS                  :
AVG_SPACE                     :
CHAIN_CNT                     :
AVG_ROW_LEN                   :
AVG_SPACE_FREELIST_BLOCKS     :
NUM_FREELIST_BLOCKS           :
AVG_CACHED_BLOCKS             :
AVG_CACHE_HIT_RATIO           :
IM_IMCU_COUNT                 :
IM_BLOCK_COUNT                :
IM_STAT_UPDATE_TIME           :
SCAN_RATE                     :
SAMPLE_SIZE                   :
LAST_ANALYZED                 :
GLOBAL_STATS                  :
USER_STATS                    :
STATTYPE_LOCKED               :
STALE_STATS                   :
SCOPE                         : SHARED
??????????????????

So gather stats on the table

 
 SYS pp1oim1> exec DBMS_STATS.GATHER_TABLE_STATS('SYS','X$KTFBUE');

PL/SQL procedure successfully completed.

Elapsed: 00:00:17.24
 SYS pp1oim1> exec print_table('select * from dba_tab_statistics where  table_name=''X$KTFBUE'' ');
OOWNER                        : SYS
TABLE_NAME                    : X$KTFBUE
PARTITION_NAME                :
PARTITION_POSITION            :
SUBPARTITION_NAME             :
SUBPARTITION_POSITION         :
OBJECT_TYPE                   : FIXED TABLE
NUM_ROWS                      : 158077
BLOCKS                        :
EMPTY_BLOCKS                  :
AVG_SPACE                     :
CHAIN_CNT                     :
AVG_ROW_LEN                   : 66
AVG_SPACE_FREELIST_BLOCKS     :
NUM_FREELIST_BLOCKS           :
AVG_CACHED_BLOCKS             :
AVG_CACHE_HIT_RATIO           :
IM_IMCU_COUNT                 :
IM_BLOCK_COUNT                :
IM_STAT_UPDATE_TIME           :
SCAN_RATE                     :
SAMPLE_SIZE                   : 158077
LAST_ANALYZED                 : 09-sep-2019 10:32:52
GLOBAL_STATS                  : YES
USER_STATS                    : NO
STATTYPE_LOCKED               :
STALE_STATS                   :
SCOPE                         : SHARED
??????????????????

With the new statistics the query finishes in a few seconds.

ORA-01422: exact fetch returns more than requested number of rows

May 14, 2019 by Aaron

The other night on of my databases started throwing the following error.

RMAN> delete noprompt archivelog all backed up 1 times to 'sbt'; 
RMAN-00571: =========================================================== 
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== 
RMAN-00571: =========================================================== 
RMAN-03002: failure of delete command at 05/03/2019 08:53:40 
RMAN-03014: implicit resync of recovery catalog failed 
RMAN-03009: failure of partial resync command on default channel at 05/03/2019 08:53:40 
ORA-01422: exact fetch returns more than requested number of rows

From the RMAN trace

DBGMISC: Trimming message: ORA-01422: exact fetch returns more than requested number of rows [09:01:19.157] (krmstrim) 
DBGMISC: ORA-06512: at "RMAN_NB_TP0LAWS.DBMS_RCVCAT", line 13092 (krmstrim) 
DBGMISC: ORA-06512: at line 3995 (krmstrim) 
DBGMISC: (146) (krmstrim)

The solution here is actually quite simple, the rout table, which stores previous rman output from previous runs so that it can displayed in OEM had excessive rows in it. Since we don’t really use OEM to track our rman backups, we can safely purge them.

SQL> truncate table rout ; 
SQL> truncate table rsr ;

This resolved our issue. I was able to resync the catalog with out any further errors.