This is a multi-part message in MIME format.
--Boundary_(ID_F3d5i5myhqrMBN5f910PBQ)
Content-type: text/plain; format=flowed; charset=ISO-8859-2
Content-transfer-encoding: 7BIT
I attach patch which remove nonsegment mode sup****t. It was discussed
during
last commit fest. Nonsegment mode is possible uses only on couple of FS
(ZFS,
XFS) and it is not safe on any OS because each OS sup****t more
filesystems.
I added RELSEG option to the configure script to allow easily compile with
different segment size (on most filesystem 1T is safe value). As a bonus I
added
also BLCKSZ to configure script. It is not im****tant for this patch but it
could
be useful e.g. for buildfarm testing with different BLCKSZ.
Patch requires to run autoconf and autoheader.
Zdenek
PS: --with-segsize=1/1024 allows set segsize to 1MB - good for testing
--Boundary_(ID_F3d5i5myhqrMBN5f910PBQ)
Content-type: text/x-patch; name=seg.patch
Content-transfer-encoding: 7BIT
Content-disposition: inline; filename=seg.patch
Index: configure.in
===================================================================
RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/configure.in,v
retrieving revision 1.555
diff -c -r1.555 configure.in
*** configure.in 30 Mar 2008 04:08:14 -0000 1.555
--- configure.in 21 Apr 2008 15:19:59 -0000
***************
*** 220,233 ****
#
# Data file segmentation
#
! PGAC_ARG_BOOL(enable, segmented-files, yes,
! [ --disable-segmented-files disable data file
segmentation (requires largefile sup****t)])
#
# C compiler
#
# For historical reasons you can also use --with-CC to specify the C
compiler
# to use, although the standard way to do this is to set the CC
environment
# variable.
PGAC_ARG_REQ(with, CC, [], [CC=$with_CC])
--- 220,287 ----
#
# Data file segmentation
#
! AC_MSG_CHECKING([for default relation segment size])
! PGAC_ARG_REQ(with, segsize, [ --with-segsize=RELSEG_SIZE change
default relation segment size in GB [[1]]],
! [default_segsize=$withval],
! [default_segsize=1])
! AC_MSG_RESULT([${default_segsize}GB])
! AC_DEFINE_UNQUOTED([RELSEG_SIZE],
1024*1024*1024LL*${default_segsize}/BLCKSZ, [
! RELSEG_SIZE is the maximum number of blocks allowed in one disk
! file. Thus, the maximum size of a single file is RELSEG_SIZE * BLCKSZ;
! relations bigger than that are divided into multiple files.
!
! RELSEG_SIZE * BLCKSZ must be less than your OS' limit on file size.
! This is often 2 GB or 4GB in a 32-bit operating system, unless you
! have large file sup****t enabled. By default, we make the limit 1
! GB to avoid any possible integer-overflow problems within the OS.
! A limit smaller than necessary only means we divide a large
! relation into more chunks than necessary, so it seems best to err
! in the direction of a small limit. (Besides, a power-of-2 value
! saves a few cycles in md.c.)
+ Changing RELSEG_SIZE requires an initdb.
+ ])
+ AC_SUBST(default_segsize)
+
+ #
+ # Block size
#
+ AC_MSG_CHECKING([for default block size])
+ PGAC_ARG_REQ(with, blocksize, [ --with-blocksize=BLCKSZ change default
block size (1,2,4,8,16,32 are allowed values). [[8]]],
+ [default_blocksize=$withval],
+ [default_blocksize=8])
+ case ${default_blocksize} in
+ 1) default_blocksize=1024;;
+ 2) default_blocksize=2048;;
+ 4) default_blocksize=4096;;
+ 8) default_blocksize=8192;;
+ 16) default_blocksize=16384;;
+ 32) default_blocksize=32768;;
+ *) AC_MSG_ERROR([Invalid block size. Allowed values are
1,2,4,8,16,32.])
+ esac
+
+ AC_MSG_RESULT([${default_blocksize}B])
+ AC_DEFINE_UNQUOTED([BLCKSZ], ${default_blocksize}, [
+ Size of a disk block --- this also limits the size of a tuple. You
+ can set it bigger if you need bigger tuples (although TOAST should
+ reduce the need to have large tuples, since fields can be spread
+ across multiple tuples).
+
+ BLCKSZ must be a power of 2. The maximum possible value of BLCKSZ
+ is currently 2^15 (32768). This is determined by the 15-bit widths
+ of the lp_off and lp_len fields in ItemIdData (see
+ include/storage/itemid.h).
+
+ Changing BLCKSZ requires an initdb.
+ ])
+ AC_SUBST(default_blocksize)
+
+
# C compiler
#
# For historical reasons you can also use --with-CC to specify the C
compiler
+
# to use, although the standard way to do this is to set the CC
environment
# variable.
PGAC_ARG_REQ(with, CC, [], [CC=$with_CC])
***************
*** 1435,1443 ****
# Check for largefile sup****t (must be after AC_SYS_LARGEFILE)
AC_CHECK_SIZEOF([off_t])
!
! if test "$ac_cv_sizeof_off_t" -lt 8 -o "$enable_segmented_files" =
"yes"; then
! AC_DEFINE([USE_SEGMENTED_FILES], 1, [Define to split data files into
1GB segments.])
fi
# SunOS doesn't handle negative byte comparisons properly with +/-
return
--- 1489,1496 ----
# Check for largefile sup****t (must be after AC_SYS_LARGEFILE)
AC_CHECK_SIZEOF([off_t])
! if test "$ac_cv_sizeof_off_t" -lt 8 -a "$default_segsize" != "1"; then
! AC_MSG_ERROR([Large file sup****t is not enabled. Segment size cannot
be larger then 1GB.])
fi
# SunOS doesn't handle negative byte comparisons properly with +/-
return
Index: src/backend/storage/file/buffile.c
===================================================================
RCS file:
/zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/file/buffile.c,v
retrieving revision 1.30
diff -c -r1.30 buffile.c
*** src/backend/storage/file/buffile.c 10 Mar 2008 20:06:27 -0000 1.30
--- src/backend/storage/file/buffile.c 18 Apr 2008 08:13:45 -0000
***************
*** 38,45 ****
#include "storage/buffile.h"
/*
! * We break BufFiles into gigabyte-sized segments, whether or not
! * USE_SEGMENTED_FILES is defined. The reason is that we'd like large
* tem****ary BufFiles to be spread across multiple tablespaces when
available.
*/
#define MAX_PHYSICAL_FILESIZE 0x40000000
--- 38,44 ----
#include "storage/buffile.h"
/*
! * We break BufFiles into gigabyte-sized segments. The reason is that
we'd like large
* tem****ary BufFiles to be spread across multiple tablespaces when
available.
*/
#define MAX_PHYSICAL_FILESIZE 0x40000000
Index: src/backend/storage/smgr/md.c
===================================================================
RCS file:
/zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/smgr/md.c,v
retrieving revision 1.137
diff -c -r1.137 md.c
*** src/backend/storage/smgr/md.c 18 Apr 2008 06:48:38 -0000 1.137
--- src/backend/storage/smgr/md.c 18 Apr 2008 08:12:02 -0000
***************
*** 89,106 ****
*
* All MdfdVec objects are palloc'd in the MdCxt memory context.
*
- * On platforms that sup****t large files, USE_SEGMENTED_FILES can be
- * #undef'd to disable the segmentation logic. In that case each
- * relation is a single operating-system file.
*/
typedef struct _MdfdVec
{
File mdfd_vfd; /* fd number in fd.c's pool */
BlockNumber mdfd_segno; /* segment number, from 0 */
- #ifdef USE_SEGMENTED_FILES
struct _MdfdVec *mdfd_chain; /* next segment, or NULL */
- #endif
} MdfdVec;
static MemoryContext MdCxt; /* context for all md.c allocations */
--- 89,101 ----
***************
*** 162,171 ****
static void register_unlink(RelFileNode rnode);
static MdfdVec *_fdvec_alloc(void);
- #ifdef USE_SEGMENTED_FILES
static MdfdVec *_mdfd_openseg(SMgrRelation reln, BlockNumber segno,
int oflags);
- #endif
static MdfdVec *_mdfd_getseg(SMgrRelation reln, BlockNumber blkno,
bool isTemp, ExtensionBehavior behavior);
static BlockNumber _mdnblocks(SMgrRelation reln, MdfdVec *seg);
--- 157,164 ----
***************
*** 258,266 ****
reln->md_fd->mdfd_vfd = fd;
reln->md_fd->mdfd_segno = 0;
- #ifdef USE_SEGMENTED_FILES
reln->md_fd->mdfd_chain = NULL;
- #endif
}
/*
--- 251,257 ----
***************
*** 344,350 ****
rnode.relNode)));
}
- #ifdef USE_SEGMENTED_FILES
/* Delete the additional segments, if any */
else
{
--- 335,340 ----
***************
*** 374,380 ****
}
pfree(segpath);
}
- #endif
pfree(path);
--- 364,369 ----
***************
*** 420,431 ****
v = _mdfd_getseg(reln, blocknum, isTemp, EXTENSION_CREATE);
- #ifdef USE_SEGMENTED_FILES
seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
- #else
- seekpos = (off_t) BLCKSZ * blocknum;
- #endif
/*
* Note: because caller usually obtained blocknum by calling mdnblocks,
--- 409,416 ----
***************
*** 469,477 ****
if (!isTemp)
register_dirty_segment(reln, v);
- #ifdef USE_SEGMENTED_FILES
Assert(_mdnblocks(reln, v) <= ((BlockNumber) RELSEG_SIZE));
- #endif
}
/*
--- 454,460 ----
***************
*** 530,539 ****
mdfd->mdfd_vfd = fd;
mdfd->mdfd_segno = 0;
- #ifdef USE_SEGMENTED_FILES
mdfd->mdfd_chain = NULL;
Assert(_mdnblocks(reln, mdfd) <= ((BlockNumber) RELSEG_SIZE));
- #endif
return mdfd;
}
--- 513,520 ----
***************
*** 552,558 ****
reln->md_fd = NULL; /* prevent dangling pointer after error */
- #ifdef USE_SEGMENTED_FILES
while (v != NULL)
{
MdfdVec *ov = v;
--- 533,538 ----
***************
*** 564,574 ****
v = v->mdfd_chain;
pfree(ov);
}
- #else
- if (v->mdfd_vfd >= 0)
- FileClose(v->mdfd_vfd);
- pfree(v);
- #endif
}
/*
--- 544,549 ----
***************
*** 583,594 ****
v = _mdfd_getseg(reln, blocknum, false, EXTENSION_FAIL);
- #ifdef USE_SEGMENTED_FILES
seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
- #else
- seekpos = (off_t) BLCKSZ * blocknum;
- #endif
if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)
ere****t(ERROR,
--- 558,565 ----
***************
*** 653,664 ****
v = _mdfd_getseg(reln, blocknum, isTemp, EXTENSION_FAIL);
- #ifdef USE_SEGMENTED_FILES
seekpos = (off_t) BLCKSZ * (blocknum % ((BlockNumber) RELSEG_SIZE));
Assert(seekpos < (off_t) BLCKSZ * RELSEG_SIZE);
- #else
- seekpos = (off_t) BLCKSZ * blocknum;
- #endif
if (FileSeek(v->mdfd_vfd, seekpos, SEEK_SET) != seekpos)
ere****t(ERROR,
--- 624,631 ----
***************
*** 708,714 ****
{
MdfdVec *v = mdopen(reln, EXTENSION_FAIL);
- #ifdef USE_SEGMENTED_FILES
BlockNumber nblocks;
BlockNumber segno = 0;
--- 675,680 ----
***************
*** 764,772 ****
v = v->mdfd_chain;
}
- #else
- return _mdnblocks(reln, v);
- #endif
}
/*
--- 730,735 ----
***************
*** 777,786 ****
{
MdfdVec *v;
BlockNumber curnblk;
-
- #ifdef USE_SEGMENTED_FILES
BlockNumber priorblocks;
- #endif
/*
* NOTE: mdnblocks makes sure we have opened all active segments, so
that
--- 740,746 ----
***************
*** 804,810 ****
v = mdopen(reln, EXTENSION_FAIL);
- #ifdef USE_SEGMENTED_FILES
priorblocks = 0;
while (v != NULL)
{
--- 764,769 ----
***************
*** 866,884 ****
}
priorblocks += RELSEG_SIZE;
}
- #else
- /* For unsegmented files, it's a lot easier */
- if (FileTruncate(v->mdfd_vfd, (off_t) nblocks * BLCKSZ) < 0)
- ere****t(ERROR,
- (errcode_for_file_access(),
- errmsg("could not truncate relation %u/%u/%u to %u blocks: %m",
- reln->smgr_rnode.spcNode,
- reln->smgr_rnode.dbNode,
- reln->smgr_rnode.relNode,
- nblocks)));
- if (!isTemp)
- register_dirty_segment(reln, v);
- #endif
}
/*
--- 825,830 ----
***************
*** 901,907 ****
v = mdopen(reln, EXTENSION_FAIL);
- #ifdef USE_SEGMENTED_FILES
while (v != NULL)
{
if (FileSync(v->mdfd_vfd) < 0)
--- 847,852 ----
***************
*** 914,928 ****
reln->smgr_rnode.relNode)));
v = v->mdfd_chain;
}
- #else
- if (FileSync(v->mdfd_vfd) < 0)
- ere****t(ERROR,
- (errcode_for_file_access(),
- errmsg("could not fsync relation %u/%u/%u: %m",
- reln->smgr_rnode.spcNode,
- reln->smgr_rnode.dbNode,
- reln->smgr_rnode.relNode)));
- #endif
}
/*
--- 859,864 ----
***************
*** 1476,1483 ****
return (MdfdVec *) MemoryContextAlloc(MdCxt, sizeof(MdfdVec));
}
- #ifdef USE_SEGMENTED_FILES
-
/*
* Open the specified segment of the relation,
* and make a MdfdVec object for it. Returns NULL on failure.
--- 1412,1417 ----
***************
*** 1522,1528 ****
/* all done */
return v;
}
- #endif /* USE_SEGMENTED_FILES */
/*
* _mdfd_getseg() -- Find the segment of the relation holding the
--- 1456,1461 ----
***************
*** 1538,1544 ****
{
MdfdVec *v = mdopen(reln, behavior);
- #ifdef USE_SEGMENTED_FILES
BlockNumber targetseg;
BlockNumber nextsegno;
--- 1471,1476 ----
***************
*** 1600,1607 ****
}
v = v->mdfd_chain;
}
- #endif
-
return v;
}
--- 1532,1537 ----
Index: src/include/pg_config_manual.h
===================================================================
RCS file:
/zfs_data/cvs_pgsql/cvsroot/pgsql/src/include/pg_config_manual.h,v
retrieving revision 1.31
diff -c -r1.31 pg_config_manual.h
*** src/include/pg_config_manual.h 11 Apr 2008 22:54:23 -0000 1.31
--- src/include/pg_config_manual.h 21 Apr 2008 15:17:07 -0000
***************
*** 11,57 ****
*/
/*
- * Size of a disk block --- this also limits the size of a tuple. You
- * can set it bigger if you need bigger tuples (although TOAST should
- * reduce the need to have large tuples, since fields can be spread
- * across multiple tuples).
- *
- * BLCKSZ must be a power of 2. The maximum possible value of BLCKSZ
- * is currently 2^15 (32768). This is determined by the 15-bit widths
- * of the lp_off and lp_len fields in ItemIdData (see
- * include/storage/itemid.h).
- *
- * Changing BLCKSZ requires an initdb.
- */
- #define BLCKSZ 8192
-
- /*
- * RELSEG_SIZE is the maximum number of blocks allowed in one disk
- * file when USE_SEGMENTED_FILES is defined. Thus, the maximum size
- * of a single file is RELSEG_SIZE * BLCKSZ; relations bigger than that
- * are divided into multiple files.
- *
- * RELSEG_SIZE * BLCKSZ must be less than your OS' limit on file size.
- * This is often 2 GB or 4GB in a 32-bit operating system, unless you
- * have large file sup****t enabled. By default, we make the limit 1
- * GB to avoid any possible integer-overflow problems within the OS.
- * A limit smaller than necessary only means we divide a large
- * relation into more chunks than necessary, so it seems best to err
- * in the direction of a small limit. (Besides, a power-of-2 value
- * saves a few cycles in md.c.)
- *
- * When not using segmented files, RELSEG_SIZE is set to zero so that
- * this behavior can be distinguished in pg_control.
- *
- * Changing RELSEG_SIZE requires an initdb.
- */
- #ifdef USE_SEGMENTED_FILES
- #define RELSEG_SIZE (0x40000000 / BLCKSZ)
- #else
- #define RELSEG_SIZE 0
- #endif
-
- /*
* Size of a WAL file block. This need have no particular relation to
BLCKSZ.
* XLOG_BLCKSZ must be a power of 2, and if your system sup****ts
O_DIRECT I/O,
* XLOG_BLCKSZ must be a multiple of the alignment requirement for
direct-I/O
--- 11,16 ----
--Boundary_(ID_F3d5i5myhqrMBN5f910PBQ)
Content-Type: text/plain
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
MIME-Version: 1.0
--
Sent via pgsql-patches mailing list (pgsql-patches@[EMAIL PROTECTED]
)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-patches
--Boundary_(ID_F3d5i5myhqrMBN5f910PBQ)--


|