Talk About Network

Google


Register and Login
Nick
Password
Register create new account Sign up is FREE and you can post replies, new topics, bookmark posts and more!
Recover lost password


Data Bases > Pgsql Hackers > Re: the un-vacu...
Latest [ Topics | Posts ] Archive Post A New Topic Post a Reply
<< Topic < Post Post 7 of 16 Topic 9559 of 11009
Post > Topic >>

Re: the un-vacuumable table

by andrew.george.hammond@[EMAIL PROTECTED] ("Andrew Hammond") Jul 3, 2008 at 01:42 PM

Does anyone else have any suggestions about what I can do to diagnose this?

Do I need to re-initdb or can I reasonably keep running with the existing
db?

A

On Mon, Jun 30, 2008 at 7:20 PM, Andrew Hammond
<andrew.george.hammond@[EMAIL PROTECTED]
> wrote:
> On Fri, Jun 27, 2008 at 8:14 PM, Tom Lane <tgl@[EMAIL PROTECTED]
> wrote:
>> "Andrew Hammond" <andrew.george.hammond@[EMAIL PROTECTED]
> writes:
>>> (I thought this line was interesting)
>>> Jun 27 15:54:31 qadb2 postgres[92519]: [44-1] PANIC:  could not open
>>> relation 1663/16386/679439393: No such file or directory
>>
>>> I googled to find out what the numbers 1663/16386/679439393 from the
>>> PANIC message mean, but no luck.
>>
>> tablespaceOID/databaseOID/relfilenode.  Looks like just some random
user
>> table.  Not clear why this would be a crash, *especially* since WAL
>> recovery is generally willing to create nonexistent files.  Is this
>> reproducible?
>
> Yes, both when I just tried to restart the recovery:
>
> Jun 30 16:04:43 qadb2 postgres[20797]: [27-1] DEBUG:  invoking
> IpcMemoryCreate(size=92938240)
> Jun 30 16:04:43 qadb2 postgres[20797]: [28-1] DEBUG:  max_safe_fds =
> 983, usable_fds = 1000, already_open = 7
> Jun 30 16:04:43 qadb2 postgres[20798]: [29-1] LOG:  database system
> was interrupted while in recovery at 2008-06-27 15:54:31 PDT
> Jun 30 16:04:43 qadb2 postgres[20798]: [29-2] HINT:  This probably
> means that some data is corrupted and you will have to use the last
> backup for recovery.
> Jun 30 16:04:43 qadb2 postgres[20798]: [30-1] LOG:  starting archive
recovery
> Jun 30 16:04:43 qadb2 postgres[20798]: [31-1] LOG:  restore_command =
> "cp -p /usr/tmp/2008-06-25_wals/%f %p"
> Jun 30 16:04:43 qadb2 postgres[20798]: [32-1] DEBUG:  executing
> restore command "cp -p /usr/tmp/2008-06-25_wals/00000001.history
> pg_xlog/RECOVERYHISTORY"
> Jun 30 16:04:43 qadb2 postgres[20798]: [33-1] DEBUG:  could not
> restore file "00000001.history" from archive: return code 256
> Jun 30 16:04:43 qadb2 postgres[20798]: [34-1] DEBUG:  executing
> restore command "cp -p
> /usr/tmp/2008-06-25_wals/00000001000001D600000078.0055F0B8.backup
> Jun 30 16:04:43 qadb2 postgres[20798]: [34-2]  pg_xlog/RECOVERYHISTORY"
> Jun 30 16:04:43 qadb2 postgres[20798]: [35-1] LOG:  restored log file
> "00000001000001D600000078.0055F0B8.backup" from archive
> Jun 30 16:04:43 qadb2 postgres[20798]: [36-1] DEBUG:  executing
> restore command "cp -p
> /usr/tmp/2008-06-25_wals/00000001000001D600000078
> pg_xlog/RECOVERYXLOG"
> Jun 30 16:04:44 qadb2 postgres[20797]: [29-1] DEBUG:  forked new
> backend, pid=20805 socket=8
> Jun 30 16:04:44 qadb2 postgres[20805]: [29-1] LOG:  connection
> received: host=[local]
> Jun 30 16:04:44 qadb2 postgres[20805]: [30-1] FATAL:  the database
> system is starting up
> Jun 30 16:04:44 qadb2 postgres[20805]: [31-1] DEBUG:  proc_exit(0)
> Jun 30 16:04:44 qadb2 postgres[20805]: [32-1] DEBUG:  shmem_exit(0)
> Jun 30 16:04:44 qadb2 postgres[20805]: [33-1] DEBUG:  exit(0)
> Jun 30 16:04:44 qadb2 postgres[20797]: [30-1] DEBUG:  reaping dead
processes
> Jun 30 16:04:44 qadb2 postgres[20797]: [31-1] DEBUG:  server process
> (PID 20805) exited with exit code 0
> Jun 30 16:04:44 qadb2 postgres[20798]: [37-1] LOG:  restored log file
> "00000001000001D600000078" from archive
> Jun 30 16:04:44 qadb2 postgres[20798]: [38-1] LOG:  checkpoint record
> is at 1D6/7855F0B8
> Jun 30 16:04:44 qadb2 postgres[20798]: [39-1] LOG:  redo record is at
> 1D6/7855F0B8; undo record is at 0/0; shutdown FALSE
> Jun 30 16:04:44 qadb2 postgres[20798]: [40-1] LOG:  next transaction
> ID: 397171279; next OID: 679516596
> Jun 30 16:04:44 qadb2 postgres[20798]: [41-1] LOG:  next MultiXactId:
> 857318; next MultiXactOffset: 1718141
> Jun 30 16:04:44 qadb2 postgres[20798]: [42-1] LOG:  automatic recovery
> in progress
> Jun 30 16:04:44 qadb2 postgres[20798]: [43-1] LOG:  redo starts at
1D6/7855F108
>
> Jun 30 16:04:45 qadb2 postgres[20798]: [44-1] PANIC:  could not open
> relation 1663/16386/679439393: No such file or directory
>
> Jun 30 16:04:45 qadb2 postgres[20797]: [32-1] DEBUG:  reaping dead
processes
> Jun 30 16:04:45 qadb2 postgres[20797]: [33-1] LOG:  startup process
> (PID 20798) was terminated by signal 6
> Jun 30 16:04:45 qadb2 postgres[20797]: [34-1] LOG:  aborting startup
> due to startup process failure
> Jun 30 16:04:45 qadb2 postgres[20797]: [35-1] DEBUG:  proc_exit(1)
> Jun 30 16:04:45 qadb2 postgres[20797]: [36-1] DEBUG:  shmem_exit(1)
> Jun 30 16:04:45 qadb2 postgres[20797]: [37-1] DEBUG:  exit(1)
>
>
>
> And also when I tried to wipe the slate clean and recover it freshly.
>
> Jun 30 19:11:59 qadb2 postgres[23091]: [1-1] DEBUG:  postmaster:
> PostmasterMain: initial environ dump:
> Jun 30 19:11:59 qadb2 postgres[23091]: [2-1] DEBUG:
> -----------------------------------------
> Jun 30 19:11:59 qadb2 postgres[23091]: [3-1] DEBUG:     USER=pgsql
> Jun 30 19:11:59 qadb2 postgres[23091]: [4-1] DEBUG:    
MAIL=/var/mail/pgsql
> Jun 30 19:11:59 qadb2 postgres[23091]: [5-1] DEBUG:
> LD_LIBRARY_PATH=:/usr/local/adecn/lib
> Jun 30 19:11:59 qadb2 postgres[23091]: [6-1] DEBUG:    
HOME=/usr/local/pgsql
> Jun 30 19:11:59 qadb2 postgres[23091]: [7-1] DEBUG:    
PGLIB=/usr/local/lib
> Jun 30 19:11:59 qadb2 postgres[23091]: [8-1] DEBUG:     PS1=[QA2]
\u@[EMAIL PROTECTED]
> Jun 30 19:11:59 qadb2 postgres[23091]: [9-1] DEBUG:     BLOCKSIZE=K
> Jun 30 19:11:59 qadb2 postgres[23091]: [10-1] DEBUG:    TERM=xterm
> Jun 30 19:11:59 qadb2 postgres[23091]: [11-1] DEBUG:
> PGSYSCONFDIR=/usr/local/etc/postgresql
> Jun 30 19:11:59 qadb2 postgres[23091]: [12-1] DEBUG:
> PGLOCALEDIR=/usr/local/share/locale
> Jun 30 19:11:59 qadb2 postgres[23091]: [13-1] DEBUG:
> Jun 30 19:11:59 qadb2 postgres[23091]: [13-2]
>
PATH=/usr/local/adecn/bin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/games:/usr/local/sbin:/usr/local/bin:/usr/X11R6/bin
> Jun 30 19:11:59 qadb2 postgres[23091]: [13-3] :/usr/local/pgsql/bin
>
> (snip junk about environment)
>
> Jun 30 19:11:59 qadb2 postgres[23091]: [19-1] DEBUG:
> PGDATA=/var/db/adecn/adecndb
> Jun 30 19:11:59 qadb2 postgres[23091]: [20-1] DEBUG:    LC_COLLATE=C
> Jun 30 19:11:59 qadb2 postgres[23091]: [21-1] DEBUG:    LC_CTYPE=C
> Jun 30 19:11:59 qadb2 postgres[23091]: [22-1] DEBUG:    LC_MESSAGES=C
> Jun 30 19:11:59 qadb2 postgres[23091]: [23-1] DEBUG:    LC_MONETARY=C
> Jun 30 19:11:59 qadb2 postgres[23091]: [24-1] DEBUG:    LC_NUMERIC=C
> Jun 30 19:11:59 qadb2 postgres[23091]: [25-1] DEBUG:    LC_TIME=C
> Jun 30 19:11:59 qadb2 postgres[23091]: [26-1] DEBUG:
> -----------------------------------------
> Jun 30 19:11:59 qadb2 postgres[23092]: [27-1] DEBUG:  invoking
> IpcMemoryCreate(size=92938240)
> Jun 30 19:11:59 qadb2 postgres[23092]: [28-1] DEBUG:  max_safe_fds =
> 983, usable_fds = 1000, already_open = 7
> Jun 30 19:11:59 qadb2 postgres[23093]: [29-1] LOG:  database system
> was interrupted at 2008-06-25 03:01:02 PDT
> Jun 30 19:11:59 qadb2 postgres[23093]: [30-1] LOG:  starting archive
recovery
> Jun 30 19:11:59 qadb2 postgres[23093]: [31-1] LOG:  restore_command =
> "cp -p /usr/tmp/2008-06-25_wals/%f %p"
> Jun 30 19:11:59 qadb2 postgres[23093]: [32-1] DEBUG:  executing
> restore command "cp -p /usr/tmp/2008-06-25_wals/00000001.history
> pg_xlog/RECOVERYHISTORY"
> Jun 30 19:11:59 qadb2 postgres[23093]: [33-1] DEBUG:  could not
> restore file "00000001.history" from archive: return code 256
> Jun 30 19:11:59 qadb2 postgres[23093]: [34-1] DEBUG:  executing
> restore command "cp -p
> /usr/tmp/2008-06-25_wals/00000001000001D600000078.0055F0B8.backup
> Jun 30 19:11:59 qadb2 postgres[23093]: [34-2]  pg_xlog/RECOVERYHISTORY"
> Jun 30 19:11:59 qadb2 postgres[23093]: [35-1] LOG:  restored log file
> "00000001000001D600000078.0055F0B8.backup" from archive
> Jun 30 19:11:59 qadb2 postgres[23093]: [36-1] DEBUG:  executing
> restore command "cp -p
> /usr/tmp/2008-06-25_wals/00000001000001D600000078
> pg_xlog/RECOVERYXLOG"
> Jun 30 19:12:00 qadb2 postgres[23093]: [37-1] LOG:  restored log file
> "00000001000001D600000078" from archive
> Jun 30 19:12:00 qadb2 postgres[23093]: [38-1] LOG:  checkpoint record
> is at 1D6/7855F0B8
> Jun 30 19:12:00 qadb2 postgres[23093]: [39-1] LOG:  redo record is at
> 1D6/7855F0B8; undo record is at 0/0; shutdown FALSE
> Jun 30 19:12:00 qadb2 postgres[23093]: [40-1] LOG:  next transaction
> ID: 397171279; next OID: 679516596
> Jun 30 19:12:00 qadb2 postgres[23093]: [41-1] LOG:  next MultiXactId:
> 857318; next MultiXactOffset: 1718141
> Jun 30 19:12:00 qadb2 postgres[23093]: [42-1] LOG:  automatic recovery
> in progress
> Jun 30 19:12:00 qadb2 postgres[23093]: [43-1] LOG:  redo starts at
1D6/7855F108
>
> Jun 30 19:12:00 qadb2 postgres[23093]: [44-1] PANIC:  could not open
> relation 1663/16386/679439393: No such file or directory
>
> Jun 30 19:12:00 qadb2 postgres[23092]: [29-1] DEBUG:  reaping dead
processes
> Jun 30 19:12:00 qadb2 postgres[23092]: [30-1] LOG:  startup process
> (PID 23093) was terminated by signal 6
> Jun 30 19:12:00 qadb2 postgres[23092]: [31-1] LOG:  aborting startup
> due to startup process failure
> Jun 30 19:12:00 qadb2 postgres[23092]: [32-1] DEBUG:  proc_exit(1)
> Jun 30 19:12:00 qadb2 postgres[23092]: [33-1] DEBUG:  shmem_exit(1)
> Jun 30 19:12:00 qadb2 postgres[23092]: [34-1] DEBUG:  exit(1)
>
> So yes, I believe this is entirely repeatable.
>
>>> (On Thursday night)
>>> vacuumdb: vacuuming of database "adecndb" failed: ERROR:  could not
>>> write block 209610 of relation 1663/16386/236356665: No space left on
>>> device
>>> CONTEXT:  writing block 209610 of relation 1663/16386/236356665
>>
>> That's pretty frickin' odd as well, because as a rule we make sure that
>> backing store exists for each table page before we open it up for
>> normal writing.  Do you have a way to find out what relation
>> 1663/16386/236356665 is?
>
> 1663 = pg_default tablespace
> 16386 = adecndb, the only database (aside from postgres, template1 & 0
> on this backend).
>
> SELECT relname FROM pg_class WHERE relfilenode = 236356665; -- got
nothing
>
> But this is on the production database where a vacuum has already
succeeded.
>
>> What filesystem is this database sitting on?
>
> FreeBSD 6.2 ufs. Here are the options from /etc/fstab.
>
> /dev/da1                /xraid          ufs     rw,noatime      2      
2
>
>>> Now, the first message is very strange since we have monitoring on the
>>> file system used by the database and it's been hovering at about 18%
>>> space used for the last month. So I can't figure out why we'd get "No
>>> space left on device", assuming the device is actually the disk (which
>>> seems reasonable given the context) and not shared memory.
>>
>> Yeah, this is definitely a case of ENOSPC being returned by write(),
>> If you're sure the disk wasn't out of space, maybe some per-user quota
>> was getting in the way?
>
> Monitoring is using df to determine space available and runs every 5
> minutes. It will alarm at 70% usage and critical at 80% usage. We
> received no alarms or other notifications. We do not yet have tracking
> in place to story the history of this, so I can't say definitively
> that we didn't spike over that usage level. However we are only at 18%
> currently, so it seems unlikely. Nothing else uses this disk.
>
> I checked /etc/rc.conf and did not find quotas turned on. Neither are
> they enabled in the fstab. I confirmed using
> quota -v pgsql
> It says "none".
>
> Andrew
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
 




 16 Posts in Topic:
the un-vacuumable table
andrew.george.hammond@[EM  2008-06-25 00:40:35 
Re: the un-vacuumable table
heikki@[EMAIL PROTECTED]   2008-06-25 12:58:04 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-06-25 09:57:55 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-06-27 16:31:05 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-06-27 23:14:36 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-06-30 19:20:50 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-03 13:42:30 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-07-03 17:35:23 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-03 15:16:32 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-07-03 18:47:51 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-03 22:57:36 
Re: the un-vacuumable table
stark@[EMAIL PROTECTED]   2008-07-04 09:20:59 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-07 12:00:12 
Re: the un-vacuumable table
tgl@[EMAIL PROTECTED] (T  2008-07-07 15:33:10 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-07 14:08:03 
Re: the un-vacuumable table
andrew.george.hammond@[EM  2008-07-07 15:04:47 

Post A Reply:
  Go here to Signup

AddThis Feed Button


About - Advertising - Contact - Frequently Asked Questions - Privacy Policy - Terms of Use - Signup

Contact
tan12V112 Fri Dec 5 8:53:20 CST 2008.