Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Hackers > Re: the un-vacu...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 6 of 16 Topic 9559 of 11009
Post > Topic >>

Re: the un-vacuumable table

by andrew.george.hammond@[EMAIL PROTECTED] ("Andrew Hammond") Jun 30, 2008 at 07:20 PM

On Fri, Jun 27, 2008 at 8:14 PM, Tom Lane <tgl@[EMAIL PROTECTED]
> wrote:
> "Andrew Hammond" <andrew.george.hammond@[EMAIL PROTECTED]
> writes:
>> (I thought this line was interesting)
>> Jun 27 15:54:31 qadb2 postgres[92519]: [44-1] PANIC:  could not open
>> relation 1663/16386/679439393: No such file or directory
>
>> I googled to find out what the numbers 1663/16386/679439393 from the
>> PANIC message mean, but no luck.
>
> tablespaceOID/databaseOID/relfilenode.  Looks like just some random user
> table.  Not clear why this would be a crash, *especially* since WAL
> recovery is generally willing to create nonexistent files.  Is this
> reproducible?

Yes, both when I just tried to restart the recovery:

Jun 30 16:04:43 qadb2 postgres[20797]: [27-1] DEBUG:  invoking
IpcMemoryCreate(size=92938240)
Jun 30 16:04:43 qadb2 postgres[20797]: [28-1] DEBUG:  max_safe_fds =
983, usable_fds = 1000, already_open = 7
Jun 30 16:04:43 qadb2 postgres[20798]: [29-1] LOG:  database system
was interrupted while in recovery at 2008-06-27 15:54:31 PDT
Jun 30 16:04:43 qadb2 postgres[20798]: [29-2] HINT:  This probably
means that some data is corrupted and you will have to use the last
backup for recovery.
Jun 30 16:04:43 qadb2 postgres[20798]: [30-1] LOG:  starting archive
recovery
Jun 30 16:04:43 qadb2 postgres[20798]: [31-1] LOG:  restore_command =
"cp -p /usr/tmp/2008-06-25_wals/%f %p"
Jun 30 16:04:43 qadb2 postgres[20798]: [32-1] DEBUG:  executing
restore command "cp -p /usr/tmp/2008-06-25_wals/00000001.history
pg_xlog/RECOVERYHISTORY"
Jun 30 16:04:43 qadb2 postgres[20798]: [33-1] DEBUG:  could not
restore file "00000001.history" from archive: return code 256
Jun 30 16:04:43 qadb2 postgres[20798]: [34-1] DEBUG:  executing
restore command "cp -p
/usr/tmp/2008-06-25_wals/00000001000001D600000078.0055F0B8.backup
Jun 30 16:04:43 qadb2 postgres[20798]: [34-2]  pg_xlog/RECOVERYHISTORY"
Jun 30 16:04:43 qadb2 postgres[20798]: [35-1] LOG:  restored log file
"00000001000001D600000078.0055F0B8.backup" from archive
Jun 30 16:04:43 qadb2 postgres[20798]: [36-1] DEBUG:  executing
restore command "cp -p
/usr/tmp/2008-06-25_wals/00000001000001D600000078
pg_xlog/RECOVERYXLOG"
Jun 30 16:04:44 qadb2 postgres[20797]: [29-1] DEBUG:  forked new
backend, pid=20805 socket=8
Jun 30 16:04:44 qadb2 postgres[20805]: [29-1] LOG:  connection
received: host=[local]
Jun 30 16:04:44 qadb2 postgres[20805]: [30-1] FATAL:  the database
system is starting up
Jun 30 16:04:44 qadb2 postgres[20805]: [31-1] DEBUG:  proc_exit(0)
Jun 30 16:04:44 qadb2 postgres[20805]: [32-1] DEBUG:  shmem_exit(0)
Jun 30 16:04:44 qadb2 postgres[20805]: [33-1] DEBUG:  exit(0)
Jun 30 16:04:44 qadb2 postgres[20797]: [30-1] DEBUG:  reaping dead
processes
Jun 30 16:04:44 qadb2 postgres[20797]: [31-1] DEBUG:  server process
(PID 20805) exited with exit code 0
Jun 30 16:04:44 qadb2 postgres[20798]: [37-1] LOG:  restored log file
"00000001000001D600000078" from archive
Jun 30 16:04:44 qadb2 postgres[20798]: [38-1] LOG:  checkpoint record
is at 1D6/7855F0B8
Jun 30 16:04:44 qadb2 postgres[20798]: [39-1] LOG:  redo record is at
1D6/7855F0B8; undo record is at 0/0; shutdown FALSE
Jun 30 16:04:44 qadb2 postgres[20798]: [40-1] LOG:  next transaction
ID: 397171279; next OID: 679516596
Jun 30 16:04:44 qadb2 postgres[20798]: [41-1] LOG:  next MultiXactId:
857318; next MultiXactOffset: 1718141
Jun 30 16:04:44 qadb2 postgres[20798]: [42-1] LOG:  automatic recovery
in progress
Jun 30 16:04:44 qadb2 postgres[20798]: [43-1] LOG:  redo starts at
1D6/7855F108

Jun 30 16:04:45 qadb2 postgres[20798]: [44-1] PANIC:  could not open
relation 1663/16386/679439393: No such file or directory

Jun 30 16:04:45 qadb2 postgres[20797]: [32-1] DEBUG:  reaping dead
processes
Jun 30 16:04:45 qadb2 postgres[20797]: [33-1] LOG:  startup process
(PID 20798) was terminated by signal 6
Jun 30 16:04:45 qadb2 postgres[20797]: [34-1] LOG:  aborting startup
due to startup process failure
Jun 30 16:04:45 qadb2 postgres[20797]: [35-1] DEBUG:  proc_exit(1)
Jun 30 16:04:45 qadb2 postgres[20797]: [36-1] DEBUG:  shmem_exit(1)
Jun 30 16:04:45 qadb2 postgres[20797]: [37-1] DEBUG:  exit(1)



And also when I tried to wipe the slate clean and recover it freshly.

Jun 30 19:11:59 qadb2 postgres[23091]: [1-1] DEBUG:  postmaster:
PostmasterMain: initial environ dump:
Jun 30 19:11:59 qadb2 postgres[23091]: [2-1] DEBUG:
-----------------------------------------
Jun 30 19:11:59 qadb2 postgres[23091]: [3-1] DEBUG:     USER=pgsql
Jun 30 19:11:59 qadb2 postgres[23091]: [4-1] DEBUG:    
MAIL=/var/mail/pgsql
Jun 30 19:11:59 qadb2 postgres[23091]: [5-1] DEBUG:
LD_LIBRARY_PATH=:/usr/local/adecn/lib
Jun 30 19:11:59 qadb2 postgres[23091]: [6-1] DEBUG:    
HOME=/usr/local/pgsql
Jun 30 19:11:59 qadb2 postgres[23091]: [7-1] DEBUG:    
PGLIB=/usr/local/lib
Jun 30 19:11:59 qadb2 postgres[23091]: [8-1] DEBUG:     PS1=[QA2]
\u@[EMAIL PROTECTED]
 30 19:11:59 qadb2 postgres[23091]: [9-1] DEBUG:     BLOCKSIZE=K
Jun 30 19:11:59 qadb2 postgres[23091]: [10-1] DEBUG:    TERM=xterm
Jun 30 19:11:59 qadb2 postgres[23091]: [11-1] DEBUG:
PGSYSCONFDIR=/usr/local/etc/postgresql
Jun 30 19:11:59 qadb2 postgres[23091]: [12-1] DEBUG:
PGLOCALEDIR=/usr/local/share/locale
Jun 30 19:11:59 qadb2 postgres[23091]: [13-1] DEBUG:
Jun 30 19:11:59 qadb2 postgres[23091]: [13-2]
PATH=/usr/local/adecn/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin
Jun 30 19:11:59 qadb2 postgres[23091]: [13-3] :/usr/local/pgsql/bin

(snip junk about environment)

Jun 30 19:11:59 qadb2 postgres[23091]: [19-1] DEBUG:
PGDATA=/var/db/adecn/adecndb
Jun 30 19:11:59 qadb2 postgres[23091]: [20-1] DEBUG:    LC_COLLATE=C
Jun 30 19:11:59 qadb2 postgres[23091]: [21-1] DEBUG:    LC_CTYPE=C
Jun 30 19:11:59 qadb2 postgres[23091]: [22-1] DEBUG:    LC_MESSAGES=C
Jun 30 19:11:59 qadb2 postgres[23091]: [23-1] DEBUG:    LC_MONETARY=C
Jun 30 19:11:59 qadb2 postgres[23091]: [24-1] DEBUG:    LC_NUMERIC=C
Jun 30 19:11:59 qadb2 postgres[23091]: [25-1] DEBUG:    LC_TIME=C
Jun 30 19:11:59 qadb2 postgres[23091]: [26-1] DEBUG:
-----------------------------------------
Jun 30 19:11:59 qadb2 postgres[23092]: [27-1] DEBUG:  invoking
IpcMemoryCreate(size=92938240)
Jun 30 19:11:59 qadb2 postgres[23092]: [28-1] DEBUG:  max_safe_fds =
983, usable_fds = 1000, already_open = 7
Jun 30 19:11:59 qadb2 postgres[23093]: [29-1] LOG:  database system
was interrupted at 2008-06-25 03:01:02 PDT
Jun 30 19:11:59 qadb2 postgres[23093]: [30-1] LOG:  starting archive
recovery
Jun 30 19:11:59 qadb2 postgres[23093]: [31-1] LOG:  restore_command =
"cp -p /usr/tmp/2008-06-25_wals/%f %p"
Jun 30 19:11:59 qadb2 postgres[23093]: [32-1] DEBUG:  executing
restore command "cp -p /usr/tmp/2008-06-25_wals/00000001.history
pg_xlog/RECOVERYHISTORY"
Jun 30 19:11:59 qadb2 postgres[23093]: [33-1] DEBUG:  could not
restore file "00000001.history" from archive: return code 256
Jun 30 19:11:59 qadb2 postgres[23093]: [34-1] DEBUG:  executing
restore command "cp -p
/usr/tmp/2008-06-25_wals/00000001000001D600000078.0055F0B8.backup
Jun 30 19:11:59 qadb2 postgres[23093]: [34-2]  pg_xlog/RECOVERYHISTORY"
Jun 30 19:11:59 qadb2 postgres[23093]: [35-1] LOG:  restored log file
"00000001000001D600000078.0055F0B8.backup" from archive
Jun 30 19:11:59 qadb2 postgres[23093]: [36-1] DEBUG:  executing
restore command "cp -p
/usr/tmp/2008-06-25_wals/00000001000001D600000078
pg_xlog/RECOVERYXLOG"
Jun 30 19:12:00 qadb2 postgres[23093]: [37-1] LOG:  restored log file
"00000001000001D600000078" from archive
Jun 30 19:12:00 qadb2 postgres[23093]: [38-1] LOG:  checkpoint record
is at 1D6/7855F0B8
Jun 30 19:12:00 qadb2 postgres[23093]: [39-1] LOG:  redo record is at
1D6/7855F0B8; undo record is at 0/0; shutdown FALSE
Jun 30 19:12:00 qadb2 postgres[23093]: [40-1] LOG:  next transaction
ID: 397171279; next OID: 679516596
Jun 30 19:12:00 qadb2 postgres[23093]: [41-1] LOG:  next MultiXactId:
857318; next MultiXactOffset: 1718141
Jun 30 19:12:00 qadb2 postgres[23093]: [42-1] LOG:  automatic recovery
in progress
Jun 30 19:12:00 qadb2 postgres[23093]: [43-1] LOG:  redo starts at
1D6/7855F108

Jun 30 19:12:00 qadb2 postgres[23093]: [44-1] PANIC:  could not open
relation 1663/16386/679439393: No such file or directory

Jun 30 19:12:00 qadb2 postgres[23092]: [29-1] DEBUG:  reaping dead
processes
Jun 30 19:12:00 qadb2 postgres[23092]: [30-1] LOG:  startup process
(PID 23093) was terminated by signal 6
Jun 30 19:12:00 qadb2 postgres[23092]: [31-1] LOG:  aborting startup
due to startup process failure
Jun 30 19:12:00 qadb2 postgres[23092]: [32-1] DEBUG:  proc_exit(1)
Jun 30 19:12:00 qadb2 postgres[23092]: [33-1] DEBUG:  shmem_exit(1)
Jun 30 19:12:00 qadb2 postgres[23092]: [34-1] DEBUG:  exit(1)

So yes, I believe this is entirely repeatable.

>> (On Thursday night)
>> vacuumdb: vacuuming of database "adecndb" failed: ERROR:  could not
>> write block 209610 of relation 1663/16386/236356665: No space left on
>> device
>> CONTEXT:  writing block 209610 of relation 1663/16386/236356665
>
> That's pretty frickin' odd as well, because as a rule we make sure that
> backing store exists for each table page before we open it up for
> normal writing.  Do you have a way to find out what relation
> 1663/16386/236356665 is?

1663 = pg_default tablespace
16386 = adecndb, the only database (aside from postgres, template1 & 0
on this backend).

SELECT relname FROM pg_class WHERE relfilenode = 236356665; -- got nothing

But this is on the production database where a vacuum has already
succeeded.

> What filesystem is this database sitting on?

FreeBSD 6.2 ufs. Here are the options from /etc/fstab.

/dev/da1                /xraid          ufs     rw,noatime      2       2

>> Now, the first message is very strange since we have monitoring on the
>> file system used by the database and it's been hovering at about 18%
>> space used for the last month. So I can't figure out why we'd get "No
>> space left on device", assuming the device is actually the disk (which
>> seems reasonable given the context) and not shared memory.
>
> Yeah, this is definitely a case of ENOSPC being returned by write(),
> If you're sure the disk wasn't out of space, maybe some per-user quota
> was getting in the way?

Monitoring is using df to determine space available and runs every 5
minutes. It will alarm at 70% usage and critical at 80% usage. We
received no alarms or other notifications. We do not yet have tracking
in place to story the history of this, so I can't say definitively
that we didn't spike over that usage level. However we are only at 18%
currently, so it seems unlikely. Nothing else uses this disk.

I checked /etc/rc.conf and did not find quotas turned on. Neither are
they enabled in the fstab. I confirmed using
quota -v pgsql
It says "none".

Andrew

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
 




 16 Posts in Topic:
the un-vacuumable table
andrew.george.hammond@[EM  2008-06-25 00:40:35 
Re: the un-vacuumable table
heikki@[EMAIL PROTECTED]   2008-06-25 12:58:04 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-06-25 09:57:55 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-06-27 16:31:05 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-06-27 23:14:36 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-06-30 19:20:50 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-03 13:42:30 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-07-03 17:35:23 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-03 15:16:32 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-07-03 18:47:51 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-03 22:57:36 
Re: the un-vacuumable table
stark@[EMAIL PROTECTED]   2008-07-04 09:20:59 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-07 12:00:12 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-07-07 15:33:10 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-07 14:08:03 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-07 15:04:47 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Dec 5 8:52:30 CST 2008.