Main Page | Report this Page
 
   
Science Forum Index  »  Compression Forum  »  gzread and NFS issues
Page 1 of 1    
Author Message
Guest
Posted: Wed Apr 16, 2008 7:34 am
Hi all,

My application reads a lot of gzip NFS files and uncompresses it. We
use the gzread function provided by the zlib library. I wrote a sample
program to see what was becoming IO bottleneck --

#include <zlib.h>
#include <stdio.h>
#include <fcntl.h>

int main(int argc, char** argv)
{
char *file = NULL;
unsigned char* buffer;
gzFile gfile;
int bufsize = 0;
int r;

if (argc < 3) {
printf("wrong\n");
exit(1);
}
file = argv[1];
bufsize = atoi(argv[2]);

buffer = (unsigned char *)malloc(sizeof(unsigned char)*bufsize);
if (!buffer) {
return -1;
}

gfile = gzopen(file, "rb");
if (gfile == NULL) {
fprintf(stderr, "gzopen error\n");
exit(1);
}

while ( (r = gzread(gfile, buffer, bufsize)) > 0)
{
fprintf(stdout, "%d\n", r);
}

if (r < 0)
{
printf("ERRROR\n");
}

gzclose(gfile);
return 0;
}


I see using ktrace that the underneath read system calls use 8K size
of buffer with local disk files (even if I provide larger sizes of
buffer to gzread). With NFS files I see that the read is just limited
to 512 bytes! Is there a way to get away with this bottleneck? I am
considering using read system call over NFS, followed by in-memory
inflate, would that help?

Thanks ..
P.S - I am on FreeBSD 4.11 (zlib 1.4.1)
Guest
Posted: Wed Apr 16, 2008 7:45 am
On Apr 16, 10:34 am, km_jr_use...@yahoo.com wrote:

Quote:

I see using ktrace that the underneath read system calls use 8K size
of buffer with local disk files (even if I provide larger sizes of
buffer to gzread). With NFS files I see that the read is just limited
to 512 bytes! Is there a way to get away with this bottleneck? I am
considering using read system call over NFS, followed by in-memory
inflate, would that help?

Also, read system call over NFS doesn't show any bottlenecks (I could
see 10240 bytes being read per call).

Thanks!
Mark Adler
Posted: Wed Apr 16, 2008 9:46 am
Guest
I can only guess where the 512 is coming from. The gz* routines use
an I/O block size of 4096 bytes. Perhaps the underlying stdio fread()
routine is using 512-byte buffers.

You can avoid the stdio routines by using the lower level open() /
read() / close() routines with zlib's inflate() or inflateBack()
routines.. You can look at the gun.c code in the examples directory
of the zlib distribution for the fastest implementation of gzip file
decompression using zlib.
Guest
Posted: Wed Apr 16, 2008 10:29 am
On Apr 16, 12:46 pm, Mark Adler <mad...@alumni.caltech.edu> wrote:
Quote:
I can only guess where the 512 is coming from. The gz* routines use
an I/O block size of 4096 bytes. Perhaps the underlying stdio fread()
routine is using 512-byte buffers.

Right. I migrated to FreeBSD 6.2 that had zlibv1.2.3. I see 4096 bytes
of read now.

Quote:
You can avoid the stdio routines by using the lower level open() /
read() / close() routines withzlib'sinflate() or inflateBack()
routines.. You can look at the gun.c code in the examples directory
of thezlibdistribution for the fastest implementation ofgzipfile
decompression usingzlib.

Thanks Mark!
 
Page 1 of 1       All times are GMT - 5 Hours
The time now is Sat Aug 30, 2008 1:10 am