diff --git a/external/libsdf/libsw/GNUmakefile b/external/libsdf/libsw/GNUmakefile new file mode 100644 index 0000000..6890aec --- /dev/null +++ b/external/libsdf/libsw/GNUmakefile @@ -0,0 +1,469 @@ +# Make.$(ARCH) sets many of the variables used below including: +# CC, CFLAGS, AS, RANLIB, objdir, objsuf, asmdir + +treedir=.. +treedir_sed=\.\. +appexcludes= +libname=libsw + +# The "Salmon Warren Utility library" +src:= \ +Malloc.c key.c \ +Msgs.c error.c files.c \ +chn.c dll.c \ +stk.c heap.c randoms.c \ +malloc.c gc.c byteswap.c \ +timers.c SDFwrite.c SDFread.c \ +ring.c qromo.c qromod.c \ +cosmo.c mpmy_combine.c counters.c \ +mpmy_gather.c memfile.c rsort.c \ +msgdirinit.c singlio.c abm.c \ +mpi_bcast.c mpi_reduce.c keycvt.c \ +peano.c SDFreadf.c hwclock.c + +include $(treedir)/Make-common/Make.$(ARCH) + +include $(treedir)/Make-common/Make.generic + +# DO NOT DELETE THIS LINE -- make depend depends on it. + +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): $(treedir)/include/libsdf/Msgs.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/Malloc$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/Malloc$(objsuf): +$(objdir)/key$(objsuf): +$(objdir)/key$(objsuf): +$(objdir)/key$(objsuf): +$(objdir)/key$(objsuf): +$(objdir)/key$(objsuf): +$(objdir)/key$(objsuf): +$(objdir)/key$(objsuf): +$(objdir)/key$(objsuf): $(treedir)/include/libsdf/protos.h +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/Msgs$(objsuf): +$(objdir)/Msgs$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/Msgs$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/Msgs.h +$(objdir)/Msgs$(objsuf): $(treedir)/include/libsdf/protos.h +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): $(treedir)/include/libsdf/error.h +$(objdir)/error$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/Msgs.h +$(objdir)/error$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/error$(objsuf): +$(objdir)/error$(objsuf): $(treedir)/include/libsdf/mpmy_abnormal.h $(treedir)/include/libsdf/protos.h +$(objdir)/error$(objsuf): $(treedir)/include/libsdf/memfile.h +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): +$(objdir)/files$(objsuf): $(treedir)/include/libsdf/protos.h +$(objdir)/files$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/files$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/mpmy.h +$(objdir)/files$(objsuf): $(treedir)/include/libsdf/timers.h +$(objdir)/files$(objsuf): +$(objdir)/chn$(objsuf): $(treedir)/include/libsdf/Assert.h $(treedir)/include/libsdf/error.h +$(objdir)/chn$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/chn.h +$(objdir)/chn$(objsuf): $(treedir)/include/libsdf/Msgs.h +$(objdir)/chn$(objsuf): +$(objdir)/chn$(objsuf): +$(objdir)/chn$(objsuf): +$(objdir)/chn$(objsuf): +$(objdir)/chn$(objsuf): +$(objdir)/chn$(objsuf): +$(objdir)/chn$(objsuf): +$(objdir)/dll$(objsuf): $(treedir)/include/libsdf/chn.h $(treedir)/include/libsdf/Malloc.h +$(objdir)/dll$(objsuf): $(treedir)/include/libsdf/error.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/stk$(objsuf): $(treedir)/include/libsdf/Assert.h $(treedir)/include/libsdf/error.h +$(objdir)/stk$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/heap$(objsuf): $(treedir)/include/libsdf/Assert.h $(treedir)/include/libsdf/error.h +$(objdir)/heap$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/Malloc.h +$(objdir)/randoms$(objsuf): $(treedir)/include/libsdf/error.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): +$(objdir)/malloc$(objsuf): $(treedir)/include/libsdf/Msgs.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/gc$(objsuf): +$(objdir)/byteswap$(objsuf): +$(objdir)/byteswap$(objsuf): +$(objdir)/byteswap$(objsuf): +$(objdir)/byteswap$(objsuf): +$(objdir)/byteswap$(objsuf): $(treedir)/include/libsdf/byteswap.h +$(objdir)/timers$(objsuf): +$(objdir)/timers$(objsuf): +$(objdir)/timers$(objsuf): +$(objdir)/timers$(objsuf): +$(objdir)/timers$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/timers$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/timers.h +$(objdir)/timers$(objsuf): +$(objdir)/timers$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/mpmy_time.h +$(objdir)/timers$(objsuf): $(treedir)/include/libsdf/Assert.h +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): +$(objdir)/SDFwrite$(objsuf): $(treedir)/include/libsdf/SDF.h $(treedir)/include/libsdf/SDFwrite.h +$(objdir)/SDFwrite$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/SDFwrite$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/mpmy.h +$(objdir)/SDFwrite$(objsuf): $(treedir)/include/libsdf/timers.h +$(objdir)/SDFwrite$(objsuf): $(treedir)/include/libsdf/mpmy_io.h +$(objdir)/SDFwrite$(objsuf): $(treedir)/include/libsdf/Msgs.h $(treedir)/include/libsdf/protos.h +$(objdir)/SDFwrite$(objsuf): $(treedir)/include/libsdf/singlio.h +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/SDFread$(objsuf): +$(objdir)/SDFread$(objsuf): $(treedir)/include/libsdf/SDF.h $(treedir)/include/libsdf/Assert.h +$(objdir)/SDFread$(objsuf): $(treedir)/include/libsdf/error.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/SDFread$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/Msgs.h +$(objdir)/SDFread$(objsuf): $(treedir)/include/libsdf/SDFread.h $(treedir)/include/libsdf/singlio.h +$(objdir)/ring$(objsuf): +$(objdir)/ring$(objsuf): +$(objdir)/ring$(objsuf): +$(objdir)/ring$(objsuf): +$(objdir)/ring$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/ring$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/mpmy.h +$(objdir)/ring$(objsuf): $(treedir)/include/libsdf/timers.h +$(objdir)/ring$(objsuf): $(treedir)/include/libsdf/Msgs.h +$(objdir)/ring$(objsuf): $(treedir)/include/libsdf/singlio.h +$(objdir)/qromo$(objsuf): +$(objdir)/qromo$(objsuf): +$(objdir)/qromo$(objsuf): +$(objdir)/qromo$(objsuf): +$(objdir)/qromo$(objsuf): +$(objdir)/qromo$(objsuf): +$(objdir)/qromo$(objsuf): +$(objdir)/qromo$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/qromo$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/qromod$(objsuf): +$(objdir)/qromod$(objsuf): +$(objdir)/qromod$(objsuf): +$(objdir)/qromod$(objsuf): +$(objdir)/qromod$(objsuf): +$(objdir)/qromod$(objsuf): +$(objdir)/qromod$(objsuf): +$(objdir)/qromod$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/qromod$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): +$(objdir)/cosmo$(objsuf): $(treedir)/include/libsdf/Msgs.h +$(objdir)/cosmo$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/cosmo.h +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/mpmy_combine$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/mpmy_combine$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/mpmy_combine$(objsuf): +$(objdir)/mpmy_combine$(objsuf): $(treedir)/include/libsdf/Msgs.h op_template.c +$(objdir)/counters$(objsuf): +$(objdir)/counters$(objsuf): +$(objdir)/counters$(objsuf): +$(objdir)/counters$(objsuf): +$(objdir)/counters$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/counters$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/counters$(objsuf): $(treedir)/include/libsdf/timers.h +$(objdir)/counters$(objsuf): $(treedir)/include/libsdf/mpmy.h +$(objdir)/counters$(objsuf): $(treedir)/include/libsdf/Assert.h +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/mpmy_gather$(objsuf): +$(objdir)/mpmy_gather$(objsuf): $(treedir)/include/libsdf/Msgs.h +$(objdir)/mpmy_gather$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/mpmy_gather$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/error.h +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): +$(objdir)/memfile$(objsuf): $(treedir)/include/libsdf/Malloc.h +$(objdir)/memfile$(objsuf): $(treedir)/include/libsdf/error.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/memfile$(objsuf): $(treedir)/include/libsdf/Msgs.h $(treedir)/include/libsdf/protos.h +$(objdir)/memfile$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/memfile$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): +$(objdir)/rsort$(objsuf): $(treedir)/include/libsdf/Malloc.h +$(objdir)/rsort$(objsuf): $(treedir)/include/libsdf/error.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): $(treedir)/include/libsdf/Malloc.h +$(objdir)/msgdirinit$(objsuf): $(treedir)/include/libsdf/error.h +$(objdir)/msgdirinit$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/msgdirinit$(objsuf): $(treedir)/include/libsdf/mpmy_abnormal.h +$(objdir)/msgdirinit$(objsuf): $(treedir)/include/libsdf/protos.h $(treedir)/include/libsdf/Msgs.h +$(objdir)/msgdirinit$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/msgdirinit$(objsuf): +$(objdir)/msgdirinit$(objsuf): $(treedir)/include/libsdf/mpmy_io.h +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): $(treedir)/include/libsdf/protos.h +$(objdir)/singlio$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/singlio$(objsuf): +$(objdir)/singlio$(objsuf): $(treedir)/include/libsdf/Msgs.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/singlio$(objsuf): $(treedir)/include/libsdf/singlio.h +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): $(treedir)/include/libsdf/protos.h $(treedir)/include/libsdf/Assert.h +$(objdir)/abm$(objsuf): $(treedir)/include/libsdf/error.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/abm$(objsuf): $(treedir)/include/libsdf/mpmy.h $(treedir)/include/libsdf/timers.h +$(objdir)/abm$(objsuf): +$(objdir)/abm$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/Msgs.h +$(objdir)/mpi_bcast$(objsuf): $(treedir)/include/libsdf/Msgs.h +$(objdir)/mpi_bcast$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/mpi_bcast$(objsuf): $(treedir)/include/libsdf/error.h +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): +$(objdir)/mpi_reduce$(objsuf): $(treedir)/include/libsdf/Msgs.h +$(objdir)/mpi_reduce$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/mpi_reduce$(objsuf): $(treedir)/include/libsdf/error.h mpi_template.c +$(objdir)/keycvt$(objsuf): $(treedir)/include/libsdf/protos.h $(treedir)/include/libsdf/mpmy.h +$(objdir)/keycvt$(objsuf): $(treedir)/include/libsdf/timers.h +$(objdir)/keycvt$(objsuf): +$(objdir)/keycvt$(objsuf): +$(objdir)/keycvt$(objsuf): +$(objdir)/keycvt$(objsuf): $(treedir)/include/libsdf/Msgs.h +$(objdir)/keycvt$(objsuf): $(treedir)/include/libsdf/gccextensions.h $(treedir)/include/libsdf/Assert.h +$(objdir)/keycvt$(objsuf): $(treedir)/include/libsdf/error.h +$(objdir)/peano$(objsuf): $(treedir)/include/libsdf/Msgs.h $(treedir)/include/libsdf/gccextensions.h +$(objdir)/peano$(objsuf): $(treedir)/include/libsdf/Assert.h $(treedir)/include/libsdf/error.h +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): +$(objdir)/SDFreadf$(objsuf): $(treedir)/include/libsdf/mpmy.h +$(objdir)/SDFreadf$(objsuf): $(treedir)/include/libsdf/timers.h +$(objdir)/SDFreadf$(objsuf): $(treedir)/include/libsdf/SDF.h +$(objdir)/SDFreadf$(objsuf): $(treedir)/include/libsdf/Assert.h $(treedir)/include/libsdf/error.h +$(objdir)/SDFreadf$(objsuf): $(treedir)/include/libsdf/gccextensions.h +$(objdir)/SDFreadf$(objsuf): $(treedir)/include/libsdf/Malloc.h $(treedir)/include/libsdf/Msgs.h +$(objdir)/SDFreadf$(objsuf): $(treedir)/include/libsdf/SDFread.h $(treedir)/include/libsdf/singlio.h +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): +$(objdir)/hwclock$(objsuf): $(treedir)/include/libsdf/error.h +$(objdir)/hwclock$(objsuf): $(treedir)/include/libsdf/gccextensions.h diff --git a/external/libsdf/libsw/Malloc.c b/external/libsdf/libsw/Malloc.c new file mode 100644 index 0000000..0c8a5d8 --- /dev/null +++ b/external/libsdf/libsw/Malloc.c @@ -0,0 +1,139 @@ +/* + * Copyright 1991 Michael S. Warren and John K. Salmon. All Rights Reserved. + */ + +#include +#include +#include +#include "Msgs.h" +#include "Malloc.h" +#include "malloc.h" + +#define WARNSIZEINITIAL (1024L*1024*1024*2) +static size_t WarnSize = WARNSIZEINITIAL; + +#include "error.h" + +static void Error_and_mprint(const char *fmt, ...){ + va_list alist; + malloc_print(); + va_start(alist, fmt); + vError(fmt, alist); + va_end(alist); +} + +/* Try to do this without the typedef Error_t!!! */ +static Error_t reporter = Error_and_mprint; + +Error_t MallocHandler(Error_t new){ + Error_t ret = reporter; + reporter = new; + return ret; +} + +void +xFree(void *ptr, const char *file, int lineno) +{ + Msgf(("%s(%d): f(%#lx)\n", file, lineno, (unsigned long)ptr)); + if( ptr != (void *)0 ) + free(ptr); +} + +void * +xMalloc(size_t size, const char *file, int lineno) +{ + void *ptr; + + Msgf(("%s(%d): m(%lu)=", file, lineno, (unsigned long)size)); + if (size > WarnSize){ + Shout("Large Malloc Warning, size %ld\n", (long)size); + WarnSize *=2; + } + if (size == 0) { + Msgf(("0x0")); + return (void *)0; + } + ptr = malloc(size); + if (ptr == (void *)0 && reporter) { + reporter("%s(%d) Malloc(%ld) failed\n", file, lineno, (long)size); + } + Msgf(("%#lx\n", (unsigned long)ptr)); + return(ptr); +} + + +void * +xRealloc(void *ptr, size_t size, const char *file, int lineno) +{ + void *p1 = ptr; + + Msgf(("%s(%d): r(%#lx,%lu)=", file, lineno, (unsigned long)ptr, + (unsigned long)size)); + if (size > WarnSize){ + Shout("Large Realloc Warning, size %ld\n", (long)size); + WarnSize *= 2; + } + + if( ptr == (void *)0 ){ + Msgf(("-> malloc\n")); + return Malloc(size); + } + if( size == 0 ){ + Msgf(("-> free\n")); + Free(ptr); + return (void *)0; + } + ptr = realloc(ptr, size); + if (ptr == (void *)0 && reporter) { + reporter("%s(%d): Realloc(%p, %ld) failed\n", file, lineno, p1, (long)size); + } + Msgf(("%#lx\n", (unsigned long)ptr)); + return(ptr); +} + +void * +xCalloc(size_t n, size_t s, const char *file, int lineno) +{ + void *ptr; + + Msgf(("%s(%d): c(%lu,%lu)=", file, lineno, + (unsigned long)n, (unsigned long)s)); + if (n*s > WarnSize){ + Shout("Large Calloc Warning, size %ld\n", (long)n*s); + WarnSize *= 2; + } + if( n==0 || s==0 ){ + Msgf(("0x0\n")); + return (void *)0; + } + ptr = calloc(n,s); + if (ptr == (void *)0 && reporter) { + reporter("%s(%d): Calloc(%ld) failed\n", file, lineno, (long)n); + } + Msgf(("%#lx\n", (unsigned long)ptr)); + return(ptr); +} + +/* Use these, for example, when you pass a ptr-to-function arg */ +/* to another function. They call the macro version, which prints */ +/* a message. If you don't intend to ever use the old K&R /lib/cpp */ +/* pre-processor, then these are superfluous. You could pass, e.g. */ +/* foo(Realloc), and you could call these functions, e.g., Realloc */ +/* and everything would be fine... */ + +void *Realloc_f(void *ptr, size_t size){ + return Realloc(ptr, size); +} + +void *Malloc_f(size_t n){ + return Malloc(n); +} + +void *Calloc_f(size_t n, size_t s){ + return Calloc(n, s); +} + +void Free_f(void *p){ + Free(p); +} +/* void MallocOnNULLReturn(void (*printf_like)(const char *fmt, ...)) */ diff --git a/external/libsdf/libsw/Msgs.c b/external/libsdf/libsw/Msgs.c new file mode 100644 index 0000000..45754d8 --- /dev/null +++ b/external/libsdf/libsw/Msgs.c @@ -0,0 +1,375 @@ +/* + * This file implements a common mechanism for delivering messages + * to the user. It also provides a way to selectively turn + * on and off the status messages emanating from a part of the + * program. + * Call msg_on("foo"); to see all messages of the form: + * msg("foo", ("printf-format", printf-args)); + * Note how __FILE__ can be used in place of "foo"... + * + * This file does not use . This is because in the wonderful + * world of distributed parallel, figuring out whose you're getting + * and whether or not the names have been secretly scrambled is a complete + * nightmare. The solution is to make the user pass in two function pointers + * that are supposed to behave like vfprintf and fflush, and a void * which + * acts like a FILE * (when passed to the two functions). How the caller + * arranges these functions is not our problem. + * The down-side is that picky compilers may complain about not having + * prototypes for sscanf. Too bad... + */ +#include +#include +#include /* only for sscanf! */ +#include "mpmy.h" +#include "Malloc.h" +#include "Msgs.h" +#include "protos.h" + +#ifndef Static +#define Static static +#endif + +/* Don't increase MAXNAMES arbitrarily. If it grows much bigger */ +/* than O(100), a different search algorithm would be appropriate. */ +#define MAXNAMES 50 + +/* A substantially larger MAXFILES is probably a mistake. You'll run out */ +/* of file descriptors soon enough anyway. */ +#define MAXFILES 12 + +/* The sum of the lengths of all the NAMES (including terminal null). */ +#define NAMEPOOLLENGTH (MAXNAMES*32) + +Static int nnames; +Static struct nt_s{ + const char *name; + int status; + struct nt_s *next; +} name_tbl[MAXNAMES], *first; + +Static int nfiles; +Static struct ft_s{ + void *fp; + int (*vfprintf_like)(void *, const char *, va_list); + int (*fflush_like)(void *); +} file_tbl[MAXFILES]; + +char name_pool[NAMEPOOLLENGTH] ; +char *poolptr = name_pool; +char *poolend = name_pool + NAMEPOOLLENGTH; + +Static int look_name(const char *name); +Static int Msg_restriction(const char *arg); + +int _Msg_enabled = 1; + +Static int _Msg_flushalways = 0; +Static int called_addfile = 0; + +int Msg_addfile(void *fp, + int (*vfprintf_like)(void *, const char *, va_list), + int (*fflush_like)(void *)){ + if( nfiles == MAXFILES ) + return -1; + + file_tbl[nfiles].fp = fp; + file_tbl[nfiles].vfprintf_like = vfprintf_like; + file_tbl[nfiles].fflush_like = fflush_like; + nfiles++; + called_addfile = 1; + return 0; +} + +int Msg_delfile(void *fp){ + int i; + + for(i=0; ivfprintf_like)(ft->fp, fmt, args); + va_end(args); + if( _Msg_flushalways && ft->fflush_like) + (*ft->fflush_like)(ft->fp); + } +#ifndef NO_STDIO + if( nfiles == 0 && !called_addfile ){ + va_start(args, fmt); + vfprintf(stderr, fmt, args); + if( _Msg_flushalways ) + fflush(stderr); + va_end(args); + extra = 1; + } +#endif + } + --recursion; + _Msg_enabled = save; + return nfiles+extra; +} + +int +Msg_doalist(const char *fmt, va_list alist) +{ + int i; + struct ft_s *ft; + int save; + int extra = 0; + + save = _Msg_enabled; + if( recursion++ == 0 ){ + for(i=0; ivfprintf_like)(ft->fp, fmt, alist); + if( _Msg_flushalways && ft->fflush_like ) + (*ft->fflush_like)(ft->fp); + } +#ifndef NO_STDIO + if( nfiles == 0 && !called_addfile ){ + vfprintf(stderr, fmt, alist); + if( _Msg_flushalways ) + fflush(stderr); + extra = 1; + } +#endif + } + --recursion; + _Msg_enabled = save; + return nfiles+extra; +} + +void Msg_turnon(const char *msg_turn_on){ + char *copy; + char *msg_key; + + /* You can turn off msgs with a "nomsgs" or a null string or an */ + /* empty string. */ + if( msg_turn_on == NULL + || msg_turn_on[0] == '\0' + || strcmp(msg_turn_on, "nomsgs")==0 ){ + Msg_set_enable(0); + return; + } + + copy = strcpy(Malloc(strlen(msg_turn_on)+1), msg_turn_on); + /* Look for comma or space separated arguments to Msg_on */ + for(msg_key = strtok(copy, " ,\t\n"); + msg_key; + msg_key = strtok(NULL, " ,\t\n")){ + char *restriction_begin = strchr(msg_key, ':'); + if( restriction_begin ){ + if( !Msg_restriction(restriction_begin+1) ) + continue; + *restriction_begin='\0'; + } + Msg_on(msg_key); + Msg_do("Turning on Msgs for \"%s\"\n", msg_key); + /* Do we need to add "./" to the key in addition??? */ + if( strncmp( __FILE__, "./", 2) == 0 && + strncmp( msg_key, "./", 2 ) != 0 ){ + char *dot_slash_key = Malloc(strlen(msg_key)+3); + /* sprintf might be easier, but we are trying to avoid */ + /* using stdio. */ + /* sprintf(dot_slash_key, "./%s", msg_key) */ + strcpy(dot_slash_key, "./"); + strcat(dot_slash_key, msg_key); + Msg_on(dot_slash_key); + Msg_do("Turning on Msgs for \"%s\"\n", dot_slash_key); + Free(dot_slash_key); + } + } + Free(copy); +} + +int Msg_flush(void) +{ + int i, ret; + struct ft_s *ft; + int save; + + save = _Msg_enabled; + ret = 0; + if( recursion++ == 0 ){ + for(i=0; ifflush_like ) + ret |= (*ft->fflush_like)(ft->fp); + } + } + --recursion; + _Msg_enabled = save; + return ret; +} + +Static int look_name(const char *name) +{ + int i; + struct nt_s *last, *this; + unsigned int len; + + last = NULL; + this = first; + if( name == NULL ){ + Warning("look_name(NULL)\n"); + return -1; + } + while( this && this->name && strcmp(name, this->name) != 0 ) { + last = this; + this = this->next; + } + if( this ){ + if( last ){ + last->next = this->next; + this->next = first; + first = this; + } + return this - name_tbl; + }else{ + /* Create a new entry. Add it to the front. */ + /* NOTE: we never free any of this! */ + if(nnames == MAXNAMES){ + return -1; + } + len = strlen(name)+1; + if( poolptr + len >= poolend ){ + return -1; + } + i = nnames++; + strcpy(poolptr, name); + + name_tbl[i].name = poolptr; + poolptr += len; + name_tbl[i].next = first; + first = &name_tbl[i]; + return i; + } +} + +Static int Msg_restriction(const char *arg){ + int first, last; + + if( strchr(arg, '-') ){ + if( sscanf(arg, "%d-%d", &first, &last) != 2 ){ + Msg_do("Unparseable msg restriction: \"%s\"\n", arg); + return 0; + } + return MPMY_Procnum() <= last && MPMY_Procnum()>= first ; + }else{ + if( sscanf(arg, "%d", &first) != 1 ){ + Msg_do("Unparseable msg restriction: \"%s\"\n", arg); + return 0; + } + return MPMY_Procnum() == first; + } +} + +#ifdef STANDALONE +#include +/* Of course, Sun's stdio.h doesn't declare fflush or vfprintf! */ +int vfprintf(FILE *, const char *, va_list); +int fflush(FILE *); + +main(int argc, char **argv){ + FILE *aux; + + Msg_addfile(stdout, vfprintf, fflush); + + aux = fopen("Msg.out", "w"); + if( aux == NULL ){ + fprintf(stderr, "Couldn't fopen("Msg.out"), errno=%d\n", errno); + exit(1); + } + Msg_addfile(aux, vfprintf, fflush); + Msg_on("foo"); + Msg("foo", ("Hello world foo seventeen=%d\n", 17)); + Msg("bar", ("Hello bar")); + Msg_on("Msgs.c"); + Msg_on("bar"); + Msg("bar", ("Hello bar pi=%g\n", 3.14159)); + Msglno("bar", ("What was that value again??... %g\n,", 3.1415)); + Msgf(("This is a Msgf message.\n")); + Msg_off("foo"); + Msg_off(__FILE__); + Msgf(("This is a blocked Msgf message.\n")); + Msg("foo", ("Not seen.\n")); + exit(0); +} +#endif + diff --git a/external/libsdf/libsw/SDFread.c b/external/libsdf/libsw/SDFread.c new file mode 100644 index 0000000..121f93d --- /dev/null +++ b/external/libsdf/libsw/SDFread.c @@ -0,0 +1,290 @@ +#include +#include +#include +#include +#include +#include +#include "mpmy.h" +#include "SDF.h" +#include "Assert.h" +#include "Malloc.h" +#include "Msgs.h" +#include "error.h" +#include "verify.h" +#include "SDFread.h" +#include "gc.h" +#include "singlio.h" +#include "timers.h" + +#define MAXNAMES 64 + +Timer_t SDFreadTm; + +/* These ought to be arguments to SDFread, but that breaks too many + things that call SDFread. The varargs makes creating a wrapper a pain. */ +char *SDFread_datafile = "datafile"; +char *SDFread_hdrfile = "hdrfile"; +char *SDFread_npart = "npart"; + +SDF *SDFread(SDF *csdfp, void **btabp, int *gnobjp, int *nobjp, + int stride, + /* char *name, offset_t offset, int *confirm */...) +{ + va_list ap; + char name[256]; + int start; + SDF *sdfp; + int gnobj, nobj; + void *btab; + int Nfiles, procs_per_file, myfile, mysection; + char hdrname[256]; + void *addrs[MAXNAMES]; + char *names[MAXNAMES]; + int strides[MAXNAMES]; + int nobjs[MAXNAMES]; + int64_t starts[MAXNAMES]; + int *confirm; + int nnames; + int sz = 0; + + EnableTimer(&SDFreadTm, "SDFread"); + StartTimer(&SDFreadTm); + + if( !SDFread_datafile || !SDFhasname(SDFread_datafile, csdfp) ){ + sdfp = csdfp; + SinglWarning("SDFread: Looking for data in 'control' file\n"); + Nfiles = 1; + }else{ + sdfp = NULL; + Nfiles = SDFnrecs(SDFread_datafile, csdfp); + if( Nfiles > MPMY_Nproc() || Nfiles < 0){ + SinglError("Sorry, bad Nfiles (%d)!\n", Nfiles); + } + } + + /* Pick out which file in the control file will be ours. */ + /* and which "section" of the file. */ + procs_per_file = MPMY_Nproc()/Nfiles; + Verify(procs_per_file*Nfiles == MPMY_Nproc()); + myfile = MPMY_Procnum()/procs_per_file; + mysection = MPMY_Procnum()%procs_per_file; + + if( sdfp == NULL ){ + VerifySX(0==SDFseekrdvecs(csdfp, + SDFread_datafile, myfile, 1, name, sizeof(name), + NULL), + SinglShout("%s", SDFerrstring)); + if( SDFread_hdrfile ) + SDFgetstringOrDefault(csdfp, SDFread_hdrfile, hdrname, sizeof(hdrname), ""); + else + hdrname[0] = '\0'; + + /* This was moved from above where it used name unitialized */ + if (hdrname[0] && SDFissdf(name) ) { + SinglWarning("Superfluous headerfile %s ignored\n", hdrname); + hdrname[0] = '\0'; + } + VerifySX(sdfp = SDFopen(hdrname, name),SinglShout("%s", SDFerrstring)); + }else{ + strncpy(name, "", sizeof(name)); + } + + if( SDFbyteorder(sdfp) == 0 ){ + int swap; + /* The data/hdr file itself doesn't specify a byte order. */ + SDFgetintOrDefault(csdfp, "swapbytes", &swap, 0); + if( swap ) + SDFswap(sdfp); + /* If there's no byteorder specified, then it won't swap */ + } + + if( SDFread_npart && SDFgetint(sdfp, SDFread_npart, &gnobj) ){ + /* Hopefully calling va_start and va_end in here won't disturb */ + /* the real loop over arguments below... */ + va_start(ap, stride); + names[0] = va_arg(ap, char *); + gnobj = SDFnrecs(names[0], sdfp); + va_end(ap); + if( MPMY_Procnum() == 0 ){ + SinglShout("%s does not have an \"%s\".\n", name, SDFread_npart); + SinglShout("Guessing %s=%d from SDFnrecs(., %s)\n", + SDFread_npart, gnobj, names[0]); + } + } + + NobjInitial(gnobj, procs_per_file, mysection, &nobj, &start); + btab = Calloc(nobj, stride); + Msgf(("Proc %d starting at %d in file, reading %d of %d\n", + MPMY_Procnum(), start, nobj, gnobj)); + + nnames = 0; + va_start(ap, stride); + while(( names[nnames] = va_arg(ap, char *)) != NULL ){ + assert(nnames < MAXNAMES); + addrs[nnames] = va_arg(ap, int) + (char *)btab; + confirm = va_arg(ap, int *); + if( !SDFhasname(names[nnames], sdfp) ){ + *confirm = 0; + Msgf(("SDF file does not have %s\n", names[nnames])); + continue; + }else{ + *confirm = 1; + } + starts[nnames] = start; + nobjs[nnames] = nobj; + sz += SDFtype_sizes[SDFtype(names[nnames], sdfp)]; + strides[nnames] = stride; + nnames++; + } + va_end(ap); + + VerifyX(0==SDFseekrdvecsarr(sdfp, nnames, + names, starts, nobjs, addrs, strides), + Shout("%s", SDFerrstring)); + + *nobjp = nobj; + MPMY_Combine(nobjp, gnobjp, 1, MPMY_INT, MPMY_SUM); + Msgf(("nobj=%d, gnobj=%d\n", *nobjp, *gnobjp)); + + *btabp = btab; + StopTimer(&SDFreadTm); + OutputTimer(&SDFreadTm, singlPrintf); /* global sync and sets timer->max */ + singlPrintf("read %ld x %d at %.0f MB/s\n", *gnobjp, sz, (*gnobjp/(100.*1000.))*(sz/ReadTimer(&SDFreadTm))); + DisableTimer(&SDFreadTm); + return sdfp; +} + +SDF *SDFread64(SDF *csdfp, void **btabp, int64_t *gnobjp, int *nobjp, + int stride, + /* char *name, offset_t offset, int *confirm */...) +{ + va_list ap; + char name[256]; + int64_t start; + SDF *sdfp; + int64_t gnobj; + int nobj; + void *btab; + int Nfiles, procs_per_file, myfile, mysection; + char hdrname[256]; + void *addrs[MAXNAMES]; + char *names[MAXNAMES]; + int strides[MAXNAMES]; + int nobjs[MAXNAMES]; + int64_t starts[MAXNAMES]; + int *confirm; + int nnames; + int sz = 0; + + EnableTimer(&SDFreadTm, "SDFread"); + StartTimer(&SDFreadTm); + + if( !SDFread_datafile || !SDFhasname(SDFread_datafile, csdfp) ){ + sdfp = csdfp; + SinglWarning("SDFread: Looking for data in 'control' file\n"); + Nfiles = 1; + }else{ + sdfp = NULL; + Nfiles = SDFnrecs(SDFread_datafile, csdfp); + if( Nfiles > MPMY_Nproc() || Nfiles < 0){ + SinglError("Sorry, bad Nfiles (%d)!\n", Nfiles); + } + } + + /* Pick out which file in the control file will be ours. */ + /* and which "section" of the file. */ + procs_per_file = MPMY_Nproc()/Nfiles; + Verify(procs_per_file*Nfiles == MPMY_Nproc()); + myfile = MPMY_Procnum()/procs_per_file; + mysection = MPMY_Procnum()%procs_per_file; + + if( sdfp == NULL ){ + VerifySX(0==SDFseekrdvecs(csdfp, + SDFread_datafile, myfile, 1, name, sizeof(name), + NULL), + SinglShout("%s", SDFerrstring)); + if( SDFread_hdrfile ) + SDFgetstringOrDefault(csdfp, SDFread_hdrfile, hdrname, sizeof(hdrname), ""); + else + hdrname[0] = '\0'; + + /* This was moved from above where it used name unitialized */ + if (hdrname[0] && SDFissdf(name) ) { + SinglWarning("Superfluous headerfile %s ignored\n", hdrname); + hdrname[0] = '\0'; + } + VerifySX(sdfp = SDFopen(hdrname, name),SinglShout("%s", SDFerrstring)); + }else{ + strncpy(name, "", sizeof(name)); + } + + if( SDFbyteorder(sdfp) == 0 ){ + int swap; + /* The data/hdr file itself doesn't specify a byte order. */ + SDFgetintOrDefault(csdfp, "swapbytes", &swap, 0); + if( swap ) + SDFswap(sdfp); + /* If there's no byteorder specified, then it won't swap */ + } + + if( SDFread_npart && SDFgetint64(sdfp, SDFread_npart, &gnobj) ){ + /* Hopefully calling va_start and va_end in here won't disturb */ + /* the real loop over arguments below... */ + va_start(ap, stride); + names[0] = va_arg(ap, char *); + gnobj = SDFnrecs(names[0], sdfp); + va_end(ap); + if( MPMY_Procnum() == 0 ){ + SinglShout("%s does not have an \"%s\".\n", name, SDFread_npart); +#if __WORDSIZE==64 + SinglShout("Guessing %s=%ld from SDFnrecs(., %s)\n", + SDFread_npart, gnobj, names[0]); +#else + SinglShout("Guessing %s=%lld from SDFnrecs(., %s)\n", + SDFread_npart, gnobj, names[0]); +#endif + } + } + + NobjInitial64(gnobj, procs_per_file, mysection, &nobj, &start); + btab = Calloc(nobj, stride); + Msgf(("Proc %d starting at %ld in file, reading %d of %ld\n", + MPMY_Procnum(), start, nobj, gnobj)); + + nnames = 0; + va_start(ap, stride); + while(( names[nnames] = va_arg(ap, char *)) != NULL ){ + assert(nnames < MAXNAMES); + addrs[nnames] = va_arg(ap, int) + (char *)btab; + confirm = va_arg(ap, int *); + if( !SDFhasname(names[nnames], sdfp) ){ + *confirm = 0; + Msgf(("SDF file does not have %s\n", names[nnames])); + continue; + }else{ + *confirm = 1; + } + starts[nnames] = start; + nobjs[nnames] = nobj; + strides[nnames] = stride; + sz += SDFtype_sizes[SDFtype(names[nnames], sdfp)]; + nnames++; + } + va_end(ap); + + VerifyX(0==SDFseekrdvecsarr(sdfp, nnames, + names, starts, nobjs, addrs, strides), + Shout("%s", SDFerrstring)); + + *gnobjp = *nobjp = nobj; + MPMY_Combine(gnobjp, gnobjp, 1, MPMY_INT64, MPMY_SUM); + Msgf(("nobj=%d, gnobj=%ld\n", *nobjp, *gnobjp)); + + *btabp = btab; + StopTimer(&SDFreadTm); + OutputTimer(&SDFreadTm, singlPrintf); /* global sync and sets timer->max */ + singlPrintf("read %ld x %d at %.0f MB/s\n", *gnobjp, sz, (*gnobjp/(1000.*1000.))*(sz/ReadTimer(&SDFreadTm))); + DisableTimer(&SDFreadTm); + return sdfp; +} + diff --git a/external/libsdf/libsw/SDFreadf.c b/external/libsdf/libsw/SDFreadf.c new file mode 100644 index 0000000..21f249c --- /dev/null +++ b/external/libsdf/libsw/SDFreadf.c @@ -0,0 +1,177 @@ +#include +#include +#include +#include +#include +#include +#include "mpmy.h" +#include "SDF.h" +#include "Assert.h" +#include "Malloc.h" +#include "Msgs.h" +#include "error.h" +#include "verify.h" +#include "SDFread.h" +#include "gc.h" +#include "singlio.h" +#include "timers.h" + +#define MAXNAMES 64 + +extern Timer_t SDFreadTm; + +SDF *SDFreadf(char *hdr, char *name, void **btabp, int *gnobjp, int *nobjp, + int stride, + /* char *name, offset_t offset, int *confirm */...) +{ + va_list ap; + int start; + SDF *sdfp; + int gnobj, nobj; + void *btab; + void *addrs[MAXNAMES]; + char *names[MAXNAMES]; + int strides[MAXNAMES]; + int nobjs[MAXNAMES]; + int64_t starts[MAXNAMES]; + int *confirm; + int nnames; + + EnableTimer(&SDFreadTm, "SDFread"); + StartTimer(&SDFreadTm); + + VerifySX(sdfp = SDFopen(hdr, name),SinglShout("%s", SDFerrstring)); + + if( SDFgetint(sdfp, "npart", &gnobj) ){ + /* Hopefully calling va_start and va_end in here won't disturb */ + /* the real loop over arguments below... */ + va_start(ap, stride); + names[0] = va_arg(ap, char *); + gnobj = SDFnrecs(names[0], sdfp); + va_end(ap); + if( MPMY_Procnum() == 0 ){ + SinglShout("%s does not have an \"npart\".\n", name); + SinglShout("Guessing npart=%d from SDFnrecs(., %s)\n", + gnobj, names[0]); + } + } + + NobjInitial(gnobj, MPMY_Nproc(), MPMY_Procnum(), &nobj, &start); + btab = Calloc(nobj, stride); + Msgf(("Proc %d starting at %d in file, reading %d of %d\n", + MPMY_Procnum(), start, nobj, gnobj)); + + nnames = 0; + va_start(ap, stride); + while(( names[nnames] = va_arg(ap, char *)) != NULL ){ + assert(nnames < MAXNAMES); + addrs[nnames] = va_arg(ap, int) + (char *)btab; + confirm = va_arg(ap, int *); + if( !SDFhasname(names[nnames], sdfp) ){ + *confirm = 0; + Msgf(("SDF file does not have %s\n", names[nnames])); + continue; + }else{ + *confirm = 1; + } + starts[nnames] = start; + nobjs[nnames] = nobj; + strides[nnames] = stride; + nnames++; + } + va_end(ap); + + VerifyX(0==SDFseekrdvecsarr(sdfp, nnames, + names, starts, nobjs, addrs, strides), + Shout("%s", SDFerrstring)); + + *nobjp = nobj; + MPMY_Combine(nobjp, gnobjp, 1, MPMY_INT, MPMY_SUM); + Msgf(("nobj=%d, gnobj=%d\n", *nobjp, *gnobjp)); + + *btabp = btab; + StopTimer(&SDFreadTm); + OutputTimer(&SDFreadTm, singlPrintf); /* global sync and sets timer->max */ + singlPrintf("read speed %.0f Mb/s\n", (*gnobjp/(1024.*1024.))*(stride/SDFreadTm.max)); + DisableTimer(&SDFreadTm); + return sdfp; +} + +SDF *SDFreadf64(char *hdr, char *name, void **btabp, int64_t *gnobjp, int *nobjp, + int stride, + /* char *name, offset_t offset, int *confirm */...) +{ + va_list ap; + int64_t start; + SDF *sdfp; + int64_t gnobj; + int nobj; + void *btab; + void *addrs[MAXNAMES]; + char *names[MAXNAMES]; + int strides[MAXNAMES]; + int nobjs[MAXNAMES]; + int64_t starts[MAXNAMES]; + int *confirm; + int nnames; + + EnableTimer(&SDFreadTm, "SDFread"); + StartTimer(&SDFreadTm); + + VerifySX(sdfp = SDFopen(hdr, name),SinglShout("%s", SDFerrstring)); + + if( SDFgetint64(sdfp, "npart", &gnobj) ){ + /* Hopefully calling va_start and va_end in here won't disturb */ + /* the real loop over arguments below... */ + va_start(ap, stride); + names[0] = va_arg(ap, char *); + gnobj = SDFnrecs(names[0], sdfp); + va_end(ap); + if( MPMY_Procnum() == 0 ){ + SinglShout("%s does not have an \"npart\".\n", name); + SinglShout("Guessing npart=%ld from SDFnrecs(., %s)\n", + gnobj, names[0]); + } + } + + NobjInitial64(gnobj, MPMY_Nproc(), MPMY_Procnum(), &nobj, &start); + btab = Calloc(nobj, stride); + Msgf(("Proc %d starting at %ld in file, reading %d of %ld\n", + MPMY_Procnum(), start, nobj, gnobj)); + + nnames = 0; + va_start(ap, stride); + while(( names[nnames] = va_arg(ap, char *)) != NULL ){ + assert(nnames < MAXNAMES); + addrs[nnames] = va_arg(ap, int) + (char *)btab; + confirm = va_arg(ap, int *); + if( !SDFhasname(names[nnames], sdfp) ){ + *confirm = 0; + Msgf(("SDF file does not have %s\n", names[nnames])); + continue; + }else{ + *confirm = 1; + } + starts[nnames] = start; + nobjs[nnames] = nobj; + strides[nnames] = stride; + nnames++; + } + va_end(ap); + + VerifyX(0==SDFseekrdvecsarr(sdfp, nnames, + names, starts, nobjs, addrs, strides), + Shout("%s", SDFerrstring)); + + *gnobjp = *nobjp = nobj; + MPMY_Combine(gnobjp, gnobjp, 1, MPMY_INT64, MPMY_SUM); + Msgf(("nobj=%d, gnobj=%ld\n", *nobjp, *gnobjp)); + + *btabp = btab; + StopTimer(&SDFreadTm); + OutputTimer(&SDFreadTm, singlPrintf); /* global sync and sets timer->max */ + singlPrintf("read speed %.0f Mb/s\n", (*gnobjp/(1024.*1024.))*(stride/SDFreadTm.max)); + DisableTimer(&SDFreadTm); + return sdfp; +} + diff --git a/external/libsdf/libsw/SDFwrite.c b/external/libsdf/libsw/SDFwrite.c new file mode 100644 index 0000000..85c7c51 --- /dev/null +++ b/external/libsdf/libsw/SDFwrite.c @@ -0,0 +1,435 @@ +/* This subroutine writes a restricted class of SDF files. + The files contain an arbitrary set of ascii scalars, (specified in the + variable length arg-list) and an array of binary struct records. + This is good enough for our purposes at the moment. More sophisticated + interfaces will probably be needed as time goes on. +*/ +#include +#include +#include +#include +#include +#include /* for open */ +#include /* for close */ +#include "SDF.h" +#include "SDFwrite.h" +#include "Malloc.h" +#include "mpmy.h" +#include "mpmy_io.h" +#include "Msgs.h" +#include "error.h" +#include "protos.h" +#include "timers.h" +#include "singlio.h" + +Timer_t SDFwriteTm; +static char *header_buf; +static int header_size; +static int header_len; + + +/* We would like to write some headers separately from the data. */ +/* We can let SDFwritehdr set this variable, and SDFwrite_alist */ +/* will avoid writing the header if wrote_header is true */ +static int wrote_header = 0; + +/* How much to increase header buffer size when realloced */ +#define BUF_INC 4096 + +/* How big is our line buffer */ +#define LINE_LEN 512 + +/* Align the data segment on this size boundary. */ +/* It MUST be less than LINE_LEN */ +#define DATAALIGN 32 + +static void +SDFwrite_alist(const char *filename, int gnobj, int nobj, + const void *btab, int bsize, const char *bodydesc, va_list alist); + +static void +SDFwrite_alist64(const char *filename, int mode, int64_t gnobj, int64_t nobj, + const void *btab, int bsize, const char *bodydesc, va_list alist); + +static void +outstr(const char *str) + /* This is an obstack, but who's counting? */ +{ + int len; + + len = strlen(str); + if (header_size - header_len <= len) { /* <= deals with terminal null */ + header_size += BUF_INC + len; + header_buf = Realloc(header_buf, header_size); + } + strcpy(header_buf + header_len, str); + header_len += len; +} + +void +SDFwrite(const char *filename, int gnobj, int nobj, + const void *btab, int bsize, const char *bodydesc, ...){ + va_list alist; + + EnableTimer(&SDFwriteTm, "SDFwrite"); + StartTimer(&SDFwriteTm); + va_start(alist, bodydesc); + SDFwrite_alist(filename, gnobj, nobj, btab, bsize, bodydesc, alist); + va_end(alist); + StopTimer(&SDFwriteTm); + OutputTimer(&SDFwriteTm, singlPrintf); /* global sync and set timer->max */ + if (SDFwriteTm.max != 0.0) + singlPrintf("write speed %.0f MB/s\n", + (gnobj/(1000.0*1000.0))*(bsize/SDFwriteTm.max)); + DisableTimer(&SDFwriteTm); /* suppress printing again in OutputTimers */ +} + +void +SDFwrite64(const char *filename, int64_t gnobj, int64_t nobj, + const void *btab, int bsize, const char *bodydesc, ...){ + va_list alist; + int mode = MPMY_WRONLY|MPMY_CREAT|MPMY_TRUNC|MPMY_MULTI; + + EnableTimer(&SDFwriteTm, "SDFwrite"); + StartTimer(&SDFwriteTm); + va_start(alist, bodydesc); + SDFwrite_alist64(filename, mode, gnobj, nobj, btab, bsize, bodydesc, alist); + va_end(alist); + StopTimer(&SDFwriteTm); + OutputTimer(&SDFwriteTm, singlPrintf); /* global sync and set timer->max */ + if (SDFwriteTm.max != 0.0) + singlPrintf("write speed %.0f MB/s\n", + (gnobj/(1000.0*1000.0))*(bsize/SDFwriteTm.max)); + DisableTimer(&SDFwriteTm); /* suppress printing again in OutputTimers */ +} + +void +SDFappend64(const char *filename, int64_t gnobj, int64_t nobj, + const void *btab, int bsize, const char *bodydesc, ...){ + va_list alist; + int mode = MPMY_WRONLY|MPMY_MULTI; + + EnableTimer(&SDFwriteTm, "SDFwrite"); + StartTimer(&SDFwriteTm); + va_start(alist, bodydesc); + SDFwrite_alist64(filename, mode, gnobj, nobj, btab, bsize, bodydesc, alist); + va_end(alist); + wrote_header = 1; + StopTimer(&SDFwriteTm); + OutputTimer(&SDFwriteTm, singlPrintf); /* global sync and set timer->max */ + if (SDFwriteTm.max != 0.0) + singlPrintf("write speed %.0f MB/s\n", + (gnobj/(1000.0*1000.0))*(bsize/SDFwriteTm.max)); + DisableTimer(&SDFwriteTm); /* suppress printing again in OutputTimers */ +} + +void +SDFwritehdr(const char *filename, const char *bodydesc, ...){ + va_list alist; + + va_start(alist, bodydesc); + SDFwrite_alist(filename, 0, 0, NULL, 0, bodydesc, alist); + va_end(alist); + wrote_header = 1; +} + +void +SDFunsetwroteheader(void) +{ + wrote_header = 0; +} + +void +SDFsetwroteheader(void) +{ + wrote_header = 1; +} + +static void +SDFwrite_alist(const char *filename, int gnobj, int nobj, + const void *btab, int bsize, const char *bodydesc, va_list alist) +{ + MPMYFile *myfd; + int i, pad; + char line[LINE_LEN]; + int ival; + double dval; + char *sval; + char *name; + char *buf; + int len; + int mode; + int ok, allok, retried; + + Msgf(("In Wtdata\n")); + header_len = 0; + + header_size = BUF_INC; + header_buf = Malloc(header_size); + + if (MPMY_Procnum() == 0 && wrote_header == 0) { + outstr ("# SDF\n"); + sprintf(line, "parameter byteorder = 0x%x;\n", + SDFcpubyteorder()); outstr(line); + while( (name = va_arg(alist, char *)) ){ + Msgf(("name(%lx)=%s\n", (unsigned long int)name, name)); + switch( va_arg(alist, enum SDF_type_enum) ){ + case SDF_INT: + ival = va_arg(alist, int); + sprintf(line, "int %s = %d;\n", name, ival); outstr(line); + break; + case SDF_FLOAT: + dval = va_arg(alist, double); + sprintf(line, "float %s = %.8g;\n", name, dval); outstr(line); + break; + case SDF_DOUBLE: + dval = va_arg(alist, double); + sprintf(line, "double %s = %.16g;\n", name, dval); + outstr(line); + break; + case SDF_STRING: + sval = va_arg(alist, char *); + sprintf(line, "char %s[] = \"%s\";\n", name, sval); + outstr(line); + break; + default: + Shout("Unexpected type in wtdata\n"); + break; + } + } + if( bodydesc ){ + outstr(bodydesc); + if( gnobj > 0 ) + sprintf(line, "[%d];\n", gnobj); + else + sprintf(line, "[];\n"); + outstr(line); + } + outstr("#\f\n"); + outstr ("# SDF-EOH "); + /* This little bit of magic will cause the first word of data */ + /* to be aligned. This isn't required by anything, but it makes */ + /* it a lot easier to use really primitive tools like od. */ + pad = (header_len+1)%DATAALIGN; /* the +1 is to account for the '\n' */ + if( pad ) + pad = DATAALIGN - pad; + for(i=0; i 0 ) { +#if __WORDSIZE==64 + sprintf(line, "[%ld];\n", gnobj); +#else + sprintf(line, "[%lld];\n", gnobj); +#endif + } else { + sprintf(line, "[];\n"); + } + outstr(line); + } + outstr("#\f\n"); + outstr ("# SDF-EOH "); + /* This little bit of magic will cause the first word of data */ + /* to be aligned. This isn't required by anything, but it makes */ + /* it a lot easier to use really primitive tools like od. */ + pad = (header_len+1)%DATAALIGN; /* the +1 is to account for the '\n' */ + if( pad ) + pad = DATAALIGN - pad; + for(i=0; i +#include +#include +#include +#include "abm.h" +#include "protos.h" +#include "Assert.h" +#include "mpmy.h" +#include "Malloc.h" +#include "Msgs.h" +#include "verify.h" +#include "gc.h" +#include "dll.h" + +#define Ver(x) Verify((x) == MPMY_SUCCESS) + +#define ABMDONETYPE -1 +#define ABMALLDONETYPE -2 + +/* Use this for queueing entries for later packetization */ +typedef struct{ + void *arg; + int type; /* request/reply/done */ + int sz; + ABMpktz_t *func; /* how to bundle the data */ +} Qelmt_t ; + +/* Send exaclty one pkt */ +static void SendPkt(ABM *abm, void *ptr, int sz, int dest); + +/* Keep track of the resources tied up in outgoing messages */ +static void DeliveryWait(ABM *abm); +static void DeliveryTest(ABM *abm); + +/* Common subroutine between ABMPoll and ABMPollWait */ +static int ABMPoll_common(ABM *abm, MPMY_Status *status); + +/* Keep track of the messages that haven't been MPMY_Test'ed affirmative yet */ +typedef struct { + void *ptr; + MPMY_Comm_request req; +} undelivered_t; + +/* To be "safe", ABMFlushOne should also eliminate dest from destarr[], but + the precise behavior of the callers of this function make that + unnecessary. If you call ABMFlushOne, you must guarantee that + destarr[] is left in a correct state when you are done. */ +static void ABMFlushOne(ABM *abm, int dest); + +static int hist_enable; +Timer_t ABMDlvrTm; +Counter_t ABMIsendCnt; +Counter_t ABMPostCnt; +Counter_t ABMByteCnt; +Counter_t ABMHistCnt[ABMHISTLEN]; + +void +ABMHistEnable(int log2lo, int log2hi){ + int i; + char name[32]; + + /* Sanity check the args. Maybe we should assert these?? */ + if( log2lo < 0 ) + log2lo = 0; + if( log2hi >= ABMHISTLEN ) + log2hi = ABMHISTLEN-1; + if( log2lo > log2hi ) + log2lo = log2hi; + if( log2hi < log2lo ) + log2hi = log2lo; + for(i=log2lo; i<=log2hi; i++){ + sprintf(name, "Pkt(>=%d)", 1<pktsize = pktsize; + abm->nfuncs = nfuncs; + abm->hndlarray = Malloc(nfuncs*sizeof(ABMhndlr_t *)); + memcpy(abm->hndlarray, hndlarray, nfuncs*sizeof(ABMhndlr_t *)); + nproc = MPMY_Nproc(); + procnum = MPMY_Procnum(); + abm->doc = ilog2(nproc); + if (nproc != 1 << (abm->doc)) + abm->doc++; /* for non power-of-two sizes */ + /* This shouldn't require 10 lines of code... */ + if( procnum == 0 ){ + abm->allbitsdone = (1 << (abm->doc+1))-1; + }else if( lobit(procnum) == 0 ){ + abm->allbitsdone = 1; + }else{ + bit = 1<<(lobit(procnum)-1); + Msg("abmdone", ("bit=%d\n", bit)); + while( (bit^procnum) >= nproc ){ + Msg("abmdone", ("looping, bit=%d\n", bit)); + bit >>= 1; + } + abm->allbitsdone = (bit==0)? 1 : (bit<<2)-1; + } + abm->done = 0; + abm->alldone = 0; + Msg("abmdone", ("ABM:allbitsdone=%d, done=%d\n", abm->allbitsdone, abm->done)); + abm->recvbufA = abm->recvbuf1 = Malloc(pktsize); + abm->recvbufB = abm->recvbuf2 = Malloc(pktsize); + abm->tag = tag; + Ver(MPMY_Irecv(abm->recvbufA, abm->pktsize, + MPMY_SOURCE_ANY, abm->tag, &abm->Recv_Hndl)); + abm->Enqueued = Malloc(sizeof(Dll)*nproc); + abm->destarr = Malloc(sizeof(int)*nproc); + abm->cntarr = Calloc(nproc, sizeof(int)); + abm->ndests = 0; + DllCreateChn(&abm->QelmtChn, sizeof(Qelmt_t), 10); + for(i=0; iEnqueued[i], &abm->QelmtChn); + } + + DllCreateChn(&abm->undelChn, sizeof(undelivered_t), 20); + DllCreate(&abm->undeliveredLL, &abm->undelChn); +} + +void +ABMIamDone(ABM *abm){ + int i, procnum; + + procnum = MPMY_Procnum(); + if( abm->done & 1 ){ + SeriousWarning("ABMIamDone apparently called twice!\n"); + Shout("allbitsdone=%d, done=%d\n", abm->allbitsdone, abm->done); + return; + } + if( abm->alldone ){ + SeriousWarning("ABMIamDone called after alldone\n"); + Shout("allbitsdone=%d, done=%d\n", abm->allbitsdone, abm->done); + return; + } + Msg("abmdone", ("ABMIamDone\n")); + ABMPost(abm, procnum, 0, ABMDONETYPE, NULL, NULL); + ABMFlushOne(abm, procnum); + for(i=abm->ndests-1; i>=0; --i){ + if( abm->destarr[i] == procnum ){ + abm->destarr[i] = abm->destarr[--abm->ndests]; + break; + } + } + assert(i >= 0); +} + +int +ABMAllDone(ABM *abm){ + return abm->alldone; +} + +void +ABMShutdown(ABM *abm){ + int i; + int junk = 0; + MPMY_Status stat; + MPMY_Comm_request req; + + while( !ABMAllDone(abm) ){ + ABMFlush(abm); + if( ABMPoll(abm) < 0 ) + Error("AbmPoll failed in ABMShutdown!\n"); + } + /* There's still a recv outstanding in NLPoll */ + Ver(MPMY_Isend(&junk, sizeof(int), MPMY_Procnum(), abm->tag, &req)); + MPMY_Wait2(req, NULL, abm->Recv_Hndl, &stat); + assert(MPMY_Source(&stat) == MPMY_Procnum() + && MPMY_Count(&stat) == sizeof(int)); + + for(i=0; iEnqueued[i]); + } + ChnTerminate(&abm->QelmtChn); + Free(abm->hndlarray); + Free(abm->Enqueued); + Free(abm->destarr); + Free(abm->cntarr); + Free(abm->recvbuf1); + Free(abm->recvbuf2); + DllTerminate(&abm->undeliveredLL); + ChnTerminate(&abm->undelChn); +} + +static void +Donehndlr(ABM *abm, int who){ + int mask, dest, type; + int procnum= MPMY_Procnum(); + + mask = who ^ procnum; + + if( mask == 0 ) + mask = 1; + else + mask <<= 1; + Msg("abmdone", ("Donehndlr(who=%d), mask=%x\n", who, mask)); + if( (abm->done & mask) || !(abm->allbitsdone & mask)){ + SeriousWarning("Unexepected bit set in 'done' or 'allbitsdone':\n"); + Shout("done=%x, allbitsdone=%x, who=%x, procnum=%d, mask=%x\n", + abm->done, abm->allbitsdone, who, procnum, mask); + } + abm->done |= mask; + + if( abm->done == abm->allbitsdone ){ + if( procnum == 0 ){ + type = ABMALLDONETYPE; + dest = 0; + Msg("abmdone", ("Proc 0 is done. Initiatiing ALLDONE cascade\n")); + }else{ + type = ABMDONETYPE; + dest = (1<done: %d\n", abm->alldone, abm->done)); + } +} + +static void +AllDonehndlr(ABM *abm) +{ + int chan; + int procnum = MPMY_Procnum(); + int nproc = MPMY_Nproc(); + + Msg("abmdone", ("AllDonehndlr\n")); + + for (chan = hibit(MPMY_Procnum())+1; chan < abm->doc; chan++) { + int dest = procnum ^ (1 << chan); + if (!(procnum & (1 << chan)) && dest < nproc) { + Msgf(("AllDone: informing p%d\n", dest)); + ABMPost(abm, dest, 0, ABMALLDONETYPE, NULL, NULL); + ABMFlush(abm); + } + } + assert(abm->ndests == 0); + Msg("abmdone", ("Waiting for all Isends to complete\n")); + Msg_flush(); + DeliveryWait(abm); + Msg("abmdone", ("Everybody done. Everybody informed\n")); + Msg_flush(); + abm->alldone = 1; +} + +int +ABMPoll(ABM *abm){ + MPMY_Status status; + int flag; + + Ver(MPMY_Test(abm->Recv_Hndl, &flag, &status)); + if( flag == 0 ) + return 0; + return ABMPoll_common(abm, &status); +} + +int +ABMPollWait(ABM *abm){ + MPMY_Status status; + + ABMFlush(abm); + if( Msg_test(__FILE__) ){ + Msg_do("Waiting in ABMPollWait\n"); + Msg_flush(); + } + Ver(MPMY_Wait(abm->Recv_Hndl, &status)); + return ABMPoll_common(abm, &status); +} + +static int +ABMPoll_common(ABM *abm, MPMY_Status *status){ + int type; + int flag; + int len; + int sz; + int src; + char *in, *end; + int nloop = 0; + + do{ + /* Testing inside the while with a "," operator breaks the T3D */ + nloop++; + if (MPMY_Tag(status) != abm->tag) + Error("Bad tag (%d) in ABMPoll(), should be %d, source %d, len %d\n", + MPMY_Tag(status), abm->tag, MPMY_Source(status), + MPMY_Count(status)); + src = MPMY_Source(status); + len = MPMY_Count(status); + Msgf(("ABMPoll Received %d byte packet from p%d\n", len, src)); + + in = abm->recvbufA; + Ver(MPMY_Irecv(abm->recvbufB, abm->pktsize, + MPMY_SOURCE_ANY, abm->tag, &abm->Recv_Hndl)); + abm->recvbufA = abm->recvbufB; + abm->recvbufB = in; + end = in + len; + while( in < end ){ + if( abm->alldone ){ + long int *ip = (long int *)in; + SeriousWarning("message arrived after alldone.\n"); + Shout("allbitsdone=%d, done=%d\n", abm->allbitsdone, + abm->done); + Shout("src=%d, total len=%d, buf=%#lx, in=%#lx\n", + src, len, (long)abm->recvbufB, (long)in); + Shout("type=%ld, message size: %ld, contents: %#lx %#lx %#lx %#lx\n", + ip[0], ip[1], ip[2], ip[3], ip[4], ip[5]); + return -1; + } + type = *(int *)in; + in += sizeof(int); + sz = *(int *)in; + in += sizeof(int); + + switch(type){ + case ABMALLDONETYPE: + Msgf(("ABMALLDONE recvd from p%d.\n", src)); + AllDonehndlr(abm); + break; + case ABMDONETYPE: + Msgf(("ABMDONE recvd from p%d.\n", src)); + Donehndlr(abm, src); + break; + default: + Msgf(("ABMPoll type %d\n", type)); + if( type >= 0 && type < abm->nfuncs ) + abm->hndlarray[type](src, sz, in); + else{ + Error("Bad type, %d!\n", type); + } + break; + } + in += sz; + } + Ver(MPMY_Test(abm->Recv_Hndl, &flag, status)); + }while(flag); + return nloop; +} + +void +ABMPost(ABM *abm, int dest, int sz, int type, ABMpktz_t *func, void *arg){ + Dll *Q = &abm->Enqueued[dest]; + Qelmt_t *new; + + IncrCounter(&ABMPostCnt); + if( sz + 2*sizeof(int) > abm->pktsize){ + Error("Can't ABMPost a message of size %d. pktsize=%d\n", + sz, abm->pktsize); + } + if( DllLength(Q) == 0 ){ + abm->destarr[abm->ndests++] = dest; + } + if( abm->cntarr[dest] + sz + 2*sizeof(int) > abm->pktsize ){ + ABMFlushOne(abm, dest); + } + abm->cntarr[dest] += sz + 2*sizeof(int); + Msgf(("ABMPost %ld for p%d, cntarr now %d\n", sz + 2*sizeof(int), dest, abm->cntarr[dest])); + new = DllData( DllInsertAtBottom(Q) ); + new->sz = sz; + new->func = func; + new->arg = arg; + new->type = type; +} + +static int +cmp_xor(const void *p1, const void *p2){ + int d1 = *(int *)p1; + int d2 = *(int *)p2; + int procnum = MPMY_Procnum(); + + return (d1^procnum) - (d2^procnum); +} + +/* To be "safe", ABMFlushOne should also eliminate dest from destarr[], but + the precise behavior of the callers of this function make that + unnecessary. If you call this function, you must guarantee that + destarr[] is left in a correct state whey you are done. */ +static void +ABMFlushOne(ABM *abm, int dest){ + char *cp; + void *buf; + Qelmt_t *Qelmt; + Dll *Q; + Dll_elmt *q; + int szleft, used; + + Q = &abm->Enqueued[dest]; + if( DllLength(Q) == 0 ) + return; + +#if 0 + /* This loop is wasting time. If we get here from ABMFlush, then + we should already know which element in destarr is in question. + If we get here from ABMPost, then we know the very next thing + we are going to do is put the same destination back in destarr. + Thus, we can just forget this altogether! */ + for(i=0; indests; i++){ + if( abm->destarr[i] == dest ) + break; + } + assert(i < abm->ndests);; /* we actually found one! */ + abm->destarr[i] = abm->destarr[--abm->ndests]; +#endif + + Msgf(("ABMFl to p%d\n", dest)); + cp = buf = Malloc(abm->pktsize); + szleft = abm->pktsize; + + for(q = DllTop(Q); q!=DllInf(Q); q=DllDeleteDown(Q, q)){ + Qelmt = (Qelmt_t *)DllData(q); + if( Qelmt->sz + 2*sizeof(int) > szleft ){ + /* Now that ABMPost flushes automatically when the count + passes pktsize, this test may be overkill. It's not worth + simplifying. */ + used = abm->pktsize - szleft; + buf = Realloc(buf, used); + Msgf(("SendQ full packet for p%d (len=%d)\n", + dest, used)); + SendPkt(abm, buf, used, dest); + cp = buf = Malloc(abm->pktsize); + szleft = abm->pktsize; + } + /* I think it would be possible to recover here if we + introduce another TYPE of message 'NLPKTGROWTYPE' which + tells the recipient to Realloc his receive buf. A good project + for a rainy day.... */ + assert(Qelmt->sz + 2*sizeof(int) <= szleft && Qelmt->sz >= 0 ); + *(int *)cp = Qelmt->type; + cp += sizeof(int); + szleft -= sizeof(int); + *(int *)cp = Qelmt->sz; + cp += sizeof(int); + szleft -= sizeof(int); + if( Qelmt->func ){ + Qelmt->func(cp, Qelmt->arg, Qelmt->sz); + cp += Qelmt->sz; + szleft -= Qelmt->sz; + } + } + assert(szleft < abm->pktsize); + used = abm->pktsize - szleft; + Msgf(("SendQ: Remaining packet for p%d (len=%d)\n", + dest, used)); + SendPkt(abm, buf, used, dest); + abm->cntarr[dest] = 0; +} + +void +ABMFlush(ABM *abm){ + int i; + + if (abm->ndests == 0) return; /* Required to avoid huge overhead. */ + + /* reorder the destinations so everybody doesn't immediately dump on 0 ! */ + qsort(abm->destarr, abm->ndests, sizeof(int), cmp_xor); /* unnecessary? */ + + /* loop over the destinations that have something queued. */ + for(i=0; indests; i++){ + ABMFlushOne(abm, abm->destarr[i]); + } + abm->ndests = 0; + DeliveryTest(abm); +} + + +static +void SendPkt(ABM *abm, void *ptr, int sz, int dest){ + undelivered_t *new; + + new = DllData(DllInsertAtTop(&abm->undeliveredLL)); + new->ptr = ptr; + AddCounter(&ABMByteCnt, sz); + IncrCounter(&ABMIsendCnt); + if( hist_enable ){ + int h = ilog2(sz); + /* If I weren't so lazy, I'd record the outliers separately... */ + if( h < ABMHISTFIRST ) h = ABMHISTFIRST; + if( h >=ABMHISTLEN ) h = ABMHISTLEN-1; + /* We could just increment these by 1, or by sz. You learn + something slightly different either way... */ + AddCounter(&ABMHistCnt[h], sz); + } + Ver(MPMY_Isend(ptr, sz, dest, abm->tag, &new->req)); + /* Calling DeliveryTest here is not strictly necessary for correctness, + but it is a good idea to try to reduce the number of outstanding + requests. See the comment near DeliverTest for other ideas */ + DeliveryTest(abm); +} + +/* It might be useful to have two versions of this. The one called + from ABMFlush (and hence at user request) should be aggressive and try + each and every outstanding request. The one called from ABMFlushOne, + (and hence more frequently, but perhaps asynchronously), could give up + after the first failure. This would keep those machines that need + constant prodding working, without putting an unnecessary burden on + others. */ +static void DeliveryTest(ABM *abm){ + int flag; + undelivered_t *p; + Dll_elmt *pp; + + /* Another good opportunity to call Flick? This is done once per + ABMFlush. */ + StartTimer(&ABMDlvrTm); + MPMY_Flick(); + for(pp=DllBottom(&abm->undeliveredLL); + pp != DllSup(&abm->undeliveredLL); + /* pp incremented inside body */){ + p = DllData(pp); + Ver( MPMY_Test(p->req, &flag, NULL) ); + if( flag ){ + Free(p->ptr); + pp = DllDeleteUp(&abm->undeliveredLL, pp); + }else{ + pp = DllUp(pp); + } + } + StopTimer(&ABMDlvrTm); +} + +static void DeliveryWait(ABM *abm){ + Msgf(("ABMWait\n")); + while( DllLength(&abm->undeliveredLL) ){ + DeliveryTest(abm); + } +} + diff --git a/external/libsdf/libsw/alloca.c b/external/libsdf/libsw/alloca.c new file mode 100644 index 0000000..854abb1 --- /dev/null +++ b/external/libsdf/libsw/alloca.c @@ -0,0 +1,204 @@ +/* + alloca -- (mostly) portable public-domain implementation -- D A Gwyn + + last edit: 93/06/06 johns + use Malloc instead of xmalloc. + + last edit: 86/05/30 rms + include config.h, since on VMS it renames some symbols. + Use xmalloc instead of malloc. + + This implementation of the PWB library alloca() function, + which is used to allocate space off the run-time stack so + that it is automatically reclaimed upon procedure exit, + was inspired by discussions with J. Q. Johnson of Cornell. + + It should work under any C implementation that uses an + actual procedure stack (as opposed to a linked list of + frames). There are some preprocessor constants that can + be defined when compiling for your specific system, for + improved efficiency; however, the defaults should be okay. + + The general concept of this implementation is to keep + track of all alloca()-allocated blocks, and reclaim any + that are found to be deeper in the stack than the current + invocation. This heuristic does not reclaim storage as + soon as it becomes invalid, but it will do so eventually. + + As a special case, alloca(0) reclaims storage without + allocating any. It is a good idea to use alloca(0) in + your main control loop, etc. to force garbage collection. +*/ +#ifndef lint +static char SCCSid[] = "@(#)alloca.c 1.1"; /* for the "what" utility */ +#endif + +#ifdef emacs +#include "config.h" +#ifdef static +/* actually, only want this if static is defined as "" + -- this is for usg, in which emacs must undefine static + in order to make unexec workable + */ +#ifndef STACK_DIRECTION +you +lose +-- must know STACK_DIRECTION at compile-time +#endif /* STACK_DIRECTION undefined */ +#endif /* static */ +#endif /* emacs */ + +#ifndef alloca /* If compiling with GCC, this file's not needed. */ + +#ifdef __STDC__ +typedef void *pointer; /* generic pointer type */ +#else +typedef char *pointer; /* generic pointer type */ +#endif + +#define NULL 0 /* null pointer constant */ + +#if 0 +extern void free(); +extern pointer xmalloc(); +#else +/* Johns mods for swutils library. */ +#define xmalloc Malloc +#define free Free +#include "Malloc.h" +#endif + +/* + Define STACK_DIRECTION if you know the direction of stack + growth for your system; otherwise it will be automatically + deduced at run-time. + + STACK_DIRECTION > 0 => grows toward higher addresses + STACK_DIRECTION < 0 => grows toward lower addresses + STACK_DIRECTION = 0 => direction of growth unknown +*/ + +#ifndef STACK_DIRECTION +#define STACK_DIRECTION 0 /* direction unknown */ +#endif + +#if STACK_DIRECTION != 0 + +#define STACK_DIR STACK_DIRECTION /* known at compile-time */ + +#else /* STACK_DIRECTION == 0; need run-time code */ + +static int stack_dir; /* 1 or -1 once known */ +#define STACK_DIR stack_dir + +static void +find_stack_direction (/* void */) +{ + static char *addr = NULL; /* address of first + `dummy', once known */ + auto char dummy; /* to get stack address */ + + if (addr == NULL) + { /* initial entry */ + addr = &dummy; + + find_stack_direction (); /* recurse once */ + } + else /* second entry */ + if (&dummy > addr) + stack_dir = 1; /* stack grew upward */ + else + stack_dir = -1; /* stack grew downward */ +} + +#endif /* STACK_DIRECTION == 0 */ + +/* + An "alloca header" is used to: + (a) chain together all alloca()ed blocks; + (b) keep track of stack depth. + + It is very important that sizeof(header) agree with malloc() + alignment chunk size. The following default should work okay. +*/ + +#ifndef ALIGN_SIZE +#define ALIGN_SIZE sizeof(double) +#endif + +typedef union hdr +{ + char align[ALIGN_SIZE]; /* to force sizeof(header) */ + struct + { + union hdr *next; /* for chaining headers */ + char *deep; /* for stack depth measure */ + } h; +} header; + +/* + alloca( size ) returns a pointer to at least `size' bytes of + storage which will be automatically reclaimed upon exit from + the procedure that called alloca(). Originally, this space + was supposed to be taken from the current stack frame of the + caller, but that method cannot be made to work for some + implementations of C, for example under Gould's UTX/32. +*/ + +static header *last_alloca_header = NULL; /* -> last alloca header */ + +pointer +alloca (size) /* returns pointer to storage */ + unsigned size; /* # bytes to allocate */ +{ + auto char probe; /* probes stack depth: */ + register char *depth = &probe; + +#if STACK_DIRECTION == 0 + if (STACK_DIR == 0) /* unknown growth direction */ + find_stack_direction (); +#endif + + /* Reclaim garbage, defined as all alloca()ed storage that + was allocated from deeper in the stack than currently. */ + + { + register header *hp; /* traverses linked list */ + + for (hp = last_alloca_header; hp != NULL;) + if ((STACK_DIR > 0 && hp->h.deep > depth) + || (STACK_DIR < 0 && hp->h.deep < depth)) + { + register header *np = hp->h.next; + + free ((pointer) hp); /* collect garbage */ + + hp = np; /* -> next header */ + } + else + break; /* rest are not deeper */ + + last_alloca_header = hp; /* -> last valid storage */ + } + + if (size == 0) + return NULL; /* no allocation required */ + + /* Allocate combined header + user data storage. */ + + { + register pointer new = xmalloc (sizeof (header) + size); + /* address of header */ + + ((header *)new)->h.next = last_alloca_header; + ((header *)new)->h.deep = depth; + + last_alloca_header = (header *)new; + + /* User storage begins just after header. */ + + return (pointer)((char *)new + sizeof(header)); + } +} + +#endif /* no alloca */ diff --git a/external/libsdf/libsw/batch.c b/external/libsdf/libsw/batch.c new file mode 100644 index 0000000..746dd66 --- /dev/null +++ b/external/libsdf/libsw/batch.c @@ -0,0 +1,82 @@ +/* batch.c: Collect a series of small sends into larger ones */ + +#include "batch.h" +#include "mpmy.h" +#include "stk.h" +#include "Malloc.h" +#include "Msgs.h" + +void PollWait(MPMY_Comm_request req, int tag); + +static Stk **stks; +static int tag; +static int batch_size; + +void +SetupBatch(int ttag, int size) +{ + int dest; + + Msgf(("SetupBatch: tag %d size %d\n", ttag, size)); + tag = ttag; + batch_size = size; + stks = Calloc(MPMY_Nproc(), sizeof(Stk *)); + /* allocate all memory beforehand */ + /* Otherwise the incoming poll buffer will fight with the batch stks */ + /* for heap space, and we end up with a bunch of holes in the heap */ + for (dest = 0; dest < MPMY_Nproc(); dest++) { + stks[dest] = Malloc(sizeof(Stk)); + StkInit(stks[dest], batch_size, Realloc_f, 0); + StkPushType(stks[dest], dest, int); + } +} + +void +FinishBatch(void) +{ + int i; + MPMY_Comm_request req; + int nproc = MPMY_Nproc(); + int procnum = MPMY_Procnum(); + int procs_per_node = MPMY_ProcsPerNode(); + int my_master = (procnum / procs_per_node) * procs_per_node; + int ddest; + + for (i = 0; i < nproc; i++) { + Stk *s = stks[i]; + if (StkSz(s) > sizeof(int)) { + ddest = (i / procs_per_node) * procs_per_node; + if (ddest == my_master) ddest = i; + Msgf(("SendBatch: %ld to %d via %d\n", StkSz(s), i, ddest)); + MPMY_Isend(StkBase(s), StkSz(s), ddest, tag, &req); + PollWait(req, tag); + } + StkTerminate(s); + Free(s); + } + Free(stks); + Msgf(("FinishBatch done\n")); +} + +void +SendBatch(void *outbuf, int size, int dest) +{ + Stk *s = stks[dest]; + + StkPushData(s, outbuf, size); + if (StkSz(s) > batch_size - size) { + MPMY_Comm_request req; + int ddest, my_master; + int procs_per_node = MPMY_ProcsPerNode(); + int procnum = MPMY_Procnum(); + + ddest = (dest / procs_per_node) * procs_per_node; + my_master = (procnum / procs_per_node) * procs_per_node; + if (ddest == my_master) ddest = dest; + Msgf(("SendBatch: %ld to %d via %d\n", StkSz(s), dest, ddest)); + MPMY_Isend(StkBase(s), StkSz(s), ddest, tag, &req); + PollWait(req, tag); + StkClear(s); + StkPushType(s, dest, int); + } +} diff --git a/external/libsdf/libsw/bcopy.c b/external/libsdf/libsw/bcopy.c new file mode 100644 index 0000000..20855f5 --- /dev/null +++ b/external/libsdf/libsw/bcopy.c @@ -0,0 +1,6 @@ +#include + +void bcopy(void *b1, void *b2, int length){ + memmove(b2, b1, length); +} + diff --git a/external/libsdf/libsw/byteswap.c b/external/libsdf/libsw/byteswap.c new file mode 100644 index 0000000..4d2bdad --- /dev/null +++ b/external/libsdf/libsw/byteswap.c @@ -0,0 +1,127 @@ +/* Functions to do general byte swapping: */ + +#include +#include "byteswap.h" + +static int swap2(int, void *, void *); +static int swap4(int, void *, void *); +static int swap8(int, void *, void *); +static int swapgen(int, int, void *, void *); + +/* A general, in-place, byte-swapper. */ +/* It swaps a total of unit_len*n_units bytes, unit_len bytes at a time. */ +/* Thus, you can use it for arrays of doubles with unit_len=8, or */ +/* arrays of chars with unit_len=1 (which is equivalent to memcpy). */ +/* It checks for stupid arguments. It works fine when */ +/* from and to are identical. It breaks if from and to are */ +/* almost the same. */ +int Byteswap(int unit_len, int n_units, void *from, void *to) +{ + int ptrdiff; + + if(n_units < 0 || unit_len < 0){ + return -1; + } + + /* This isn't ANSI conforming. I don't think it can be. */ + /* It's the canonical "You can't write memcpy in ANSI C because */ + /* you can't compare the pointers and learn anything reliable." */ + /* problem. The compiler is free to return nonsense for ptrdiff. */ + ptrdiff = (char *)from - (char *)to; + if( ptrdiff != 0 && (ptrdiff < unit_len*n_units && ptrdiff > -unit_len) ){ + return -1; + } + + switch(unit_len){ + case 1: + memcpy(to, from, n_units); + return 0; + case 2: + return swap2(n_units, from, to); + case 4: + return swap4(n_units, from, to); + case 8: + return swap8(n_units, from, to); + default: + return swapgen(unit_len, n_units, from, to); + } +} + +static int swap2(int n, void *from, void *to) +{ + char *fromc = from; + char *toc = to; + char tmp; + + while(n--){ + tmp = fromc[0]; + toc[0] = fromc[1]; + toc[0] = tmp; + fromc+= 2; + toc += 2; + } + return 0; +} + +static int swap4(int n, void *from, void *to) +{ + char *toc = to; + char *fromc = from; + char tmp; + + while(n--){ + tmp = fromc[3]; + toc[3] = fromc[0]; + toc[0] = tmp; + tmp = fromc[2]; + toc[2] = fromc[1]; + toc[1] = tmp; + fromc += 4; + toc += 4; + } + return 0; +} + +static int swap8(int n, void *from, void *to) +{ + char *toc = to; + char *fromc = from; + char tmp; + + while(n--){ + tmp = fromc[7]; + toc[7] = fromc[0]; + toc[0] = tmp; + tmp = fromc[6]; + toc[6] = fromc[1]; + toc[1] = tmp; + tmp = fromc[5]; + toc[5] = fromc[2]; + toc[2] = tmp; + tmp = fromc[4]; + toc[4] = fromc[3]; + toc[3] = tmp; + fromc += 8; + toc += 8; + } + return 0; +} + +static int swapgen(int unit_len, int nunits, void *from, void *to) +{ + char *toc = to; + char *fromc = from; + char tmp; + int i; + + while(nunits--){ + for(i=0; i +#include "Assert.h" +#include "chn.h" +#include "Msgs.h" +#include "malloc.h" + +/* The Chain macro is used to chain together 'freed' nodes. */ +/* We write a pointer on top of the first word and a magic */ +/* number on top of the second word. */ +#if !defined(ChnNext) || !defined(ChnMagic) || !defined(ChnMAGIC) + # error Chn macros undefined +#endif + +/* STOLEN FROM OBSTACK.C */ +#ifdef __STDC__ +#define PTR_INT_TYPE ptrdiff_t +#else +#define PTR_INT_TYPE long +#endif +/* Determine default alignment. */ +struct fooalign {char x; double d;}; +#define DEFAULT_ALIGNMENT \ + ((PTR_INT_TYPE) ((char *)&((struct fooalign *) 0)->d - (char *)0)) +/* END OF STOLEN CODE */ +#define ALIGNMENT_MASK (DEFAULT_ALIGNMENT-1) +#define Align(x) ((x)+ALIGNMENT_MASK)&(~(ALIGNMENT_MASK)) +#define ChnHDRSZ (Align(sizeof(int)+sizeof(void *))) + +void ChnInit(Chn *new, int sz, int nalloc, + void *(*realloc_like)(void *, size_t)) +{ + if(sz < ChnHDRSZ) + sz = ChnHDRSZ; + + new->sz = Align(sz); + Msgf(("chn=%#lx, sz=%d, new->sz=%d\n", (unsigned long)new, sz, new->sz)); + new->nalloc = nalloc; + new->free_list = NULL; + new->free_cnt = 0; + new->nmalloced = 0; + new->realloc_like = realloc_like; + new->first_chunk = NULL; +} + +void ChnFreeAll(Chn *id) +{ + void *chunk, *nextchunk; + + Msgf(("ChnFreeAll(%#lx)\n", (unsigned long)id)); + for(chunk=id->first_chunk; chunk ; chunk = nextchunk){ + nextchunk = ChnNext(chunk); + Msgf(("ChnFree(%#lx)\n", (unsigned long)chunk)); + id->realloc_like(chunk, 0); + } + id->free_cnt = 0; + id->free_list = NULL; + id->nmalloced = 0; +} + +void ChnTerminate(Chn *id) +{ + ChnFreeAll(id); + id->realloc_like = NULL; +} + +int ChnCheck(Chn *id) +/* Return < 0 if there is something wrong with the freelist. */ +/* Return 0 if all is ok. */ +{ + int i=0; + void *p = id->free_list; + + while(p){ + if(ChnMagic(p) != ChnMAGIC) + return -1; + if(i++ < id->free_cnt) + return -2; + p = ChnNext(p); + } + return 0; +} + +/* + * ChnMoreMem: grab some more memory for a Chn. + */ +int ChnMoreMem(Chn *id) +{ + void *newchunk; + int i, sz, nnew; + char *p; + + if( id->free_cnt == 0 && id->free_list != NULL ){ + SeriousWarning("Impossible situation in ChnMoreMem, freecnt=0, free_list = %#lx\n", (unsigned long)id->free_list); + malloc_print(); + } + + nnew = id->nalloc; + sz = id->sz; + while( (newchunk=id->realloc_like(NULL, nnew*sz + ChnHDRSZ)) == 0 && + nnew > 0 ) + nnew /= 2; + if(newchunk == 0){ + return -1; + } + Msgf(("ChnMoreMem(chn=%#lx, newsz=%ld, address=%#lx)\n", + (unsigned long)id, (unsigned long)(nnew*sz+ChnHDRSZ), + (unsigned long)newchunk)); + + id->nmalloced += nnew; + ChnNext(newchunk) = id->first_chunk; + ChnMagic(newchunk) = ChnMAGIC; + id->first_chunk = newchunk; + newchunk = (char *)newchunk + ChnHDRSZ; + + p = (char *)newchunk; + for(i=0; ifree_list; + ChnMagic(p) = ChnMAGIC; + id->free_list = newchunk; + id->free_cnt += nnew; + return 0; +} + +/* + * end of: CHN.C + */ diff --git a/external/libsdf/libsw/class_params.c b/external/libsdf/libsw/class_params.c new file mode 100644 index 0000000..2d89495 --- /dev/null +++ b/external/libsdf/libsw/class_params.c @@ -0,0 +1,96 @@ +#include "class.h" +#include "cosmo.h" +#include "Malloc.h" +#include "error.h" + +#define class_fail(function, \ + error_message_from_function, \ + error_message_output) \ + do { \ + if (function == _FAILURE_) { \ + ErrorMsg Transmit_Error_Message; \ + sprintf(Transmit_Error_Message,"%s(L:%d) : error in %s;\n=>%s", \ + __func__,__LINE__,#function,error_message_from_function); \ + Error("%s",Transmit_Error_Message); \ + } \ + } while(0); + +struct class_s { + struct precision pr; /* for precision parameters */ + struct background ba; /* for cosmological background */ + struct thermo th; /* for thermodynamics */ + struct perturbs pt; /* for source functions */ + struct bessels bs; /* for bessel functions */ + struct transfers tr; /* for transfer functions */ + struct primordial pm; /* for primordial spectra */ + struct spectra sp; /* for output spectra */ + struct nonlinear nl; /* for non-linear spectra */ + struct lensing le; /* for lensed spectra */ + struct output op; /* for output files */ +}; + +void +class_params(cosmology *c, char *class_ini) +{ + struct file_content fc; + struct class_s *p; + double *pvec; + ErrorMsg errmsg; + + fc.size = 0; + + p = Calloc(1,sizeof(struct class_s)); + + class_fail(parser_read_file(class_ini,&fc,errmsg), + errmsg,errmsg); + + class_fail(input_init(&fc, &p->pr, &p->ba, &p->th, &p->pt, &p->bs, &p->tr, + &p->pm, &p->sp, &p->nl, &p->le, &p->op, errmsg), + errmsg, errmsg); + + class_fail(parser_free(&fc),errmsg,errmsg); + + p->ba.background_verbose = 0; + + class_fail(background_init(&p->pr, &p->ba),errmsg,errmsg); + + pvec = Malloc(p->ba.bg_size*sizeof(double)); + class_fail(background_functions(&p->ba, 1.0, p->ba.long_info, pvec), + errmsg, errmsg); + c->Omega0_m = pvec[p->ba.index_bg_Omega_m]; + c->Omega0_r = pvec[p->ba.index_bg_Omega_r]; + Free(pvec); + + c->Omega0 = p->ba.Omega0_g + p->ba.Omega0_b; + if (p->ba.has_cdm == _TRUE_) { + c->Omega0 += p->ba.Omega0_cdm; + } + if (p->ba.has_ncdm == _TRUE_) { + c->Omega0 += p->ba.Omega0_ncdm_tot; + } + if (p->ba.has_lambda == _TRUE_) { + c->Omega0 += p->ba.Omega0_lambda; + } + if (p->ba.has_fld == _TRUE_) { + c->Omega0 += p->ba.Omega0_fld; + } + if (p->ba.has_ur == _TRUE_) { + c->Omega0 += p->ba.Omega0_ur; + } + c->h_100 = p->ba.h; + c->H0 = p->ba.H0*_Gyr_over_Mpc_; + c->Omega0_cdm = p->ba.Omega0_cdm; + c->Omega0_ncdm_tot = p->ba.Omega0_ncdm_tot; + c->Omega0_b = p->ba.Omega0_b; + c->Omega0_g = p->ba.Omega0_g; + c->Omega0_ur = p->ba.Omega0_ur; + c->Omega0_lambda = p->ba.Omega0_lambda; + c->Omega0_fld = p->ba.Omega0_fld; + c->w0_fld = p->ba.w0_fld; + c->wa_fld = p->ba.wa_fld; + c->age = p->ba.age; + c->Gnewt = GM_cgs*(g_Msol10/g_Msol)*pow(sec_Gyr, 2)/pow(cm_kpc,3); + + class_fail(background_free(&p->ba),errmsg,errmsg); + Free(p); +} diff --git a/external/libsdf/libsw/class_wrap.c b/external/libsdf/libsw/class_wrap.c new file mode 100644 index 0000000..dbe886b --- /dev/null +++ b/external/libsdf/libsw/class_wrap.c @@ -0,0 +1,366 @@ +#include "class.h" +#include "cosmo.h" +#include "Malloc.h" +#include "error.h" + +#define ERRTOL 5e-8 + +#define class_fail(function, \ + error_message_from_function, \ + error_message_output) \ + do { \ + if (function == _FAILURE_) { \ + ErrorMsg Transmit_Error_Message; \ + sprintf(Transmit_Error_Message,"%s(L:%d) : error in %s;\n=>%s", \ + __func__,__LINE__,#function,error_message_from_function); \ + Error("%s",Transmit_Error_Message); \ + } \ + } while(0); + + +struct class_s { + struct precision pr; /* for precision parameters */ + struct background ba; /* for cosmological background */ + struct thermo th; /* for thermodynamics */ + struct perturbs pt; /* for source functions */ + struct bessels bs; /* for bessel functions */ + struct transfers tr; /* for transfer functions */ + struct primordial pm; /* for primordial spectra */ + struct spectra sp; /* for output spectra */ + struct nonlinear nl; /* for non-linear spectra */ + struct lensing le; /* for lensed spectra */ + struct output op; /* for output files */ +}; + +static void +class_background_at_tau(cosmology *c, double tau) +{ + struct class_s *p = c->private; + int idx; + double *pvec; + ErrorMsg errmsg; + + pvec = Malloc(p->ba.bg_size*sizeof(double)); + class_fail(background_at_tau(&p->ba, tau, p->ba.long_info, p->ba.inter_normal, + &idx, pvec), errmsg, errmsg); + + c->a = pvec[p->ba.index_bg_a]; + c->z = 1.0/c->a - 1.0; + c->t = pvec[p->ba.index_bg_time]/_Gyr_over_Mpc_; + c->tau = tau; + c->H = pvec[p->ba.index_bg_H]*_Gyr_over_Mpc_; + c->conf_distance = pvec[p->ba.index_bg_conf_distance]*1000.0; /* kpc */ + c->kick = pvec[p->ba.index_bg_kick]/_Gyr_over_Mpc_; + c->drift = pvec[p->ba.index_bg_drift]/_Gyr_over_Mpc_; + c->Omega_r = pvec[p->ba.index_bg_Omega_r]; + c->Omega_m = pvec[p->ba.index_bg_Omega_m]; + Free(pvec); +} + + +static void +class_background_at_z(cosmology *c, double z) +{ + struct class_s *p = c->private; + double tau; + double err; + ErrorMsg errmsg; + + class_fail(background_tau_of_z(&p->ba, z, &tau), errmsg, errmsg); + class_background_at_tau(c, tau); + err = (1.0+c->z)/(1.0+z) - 1.0; + if (fabs(err) > ERRTOL) + Error("Poor precision in background_at_z, relerr = %g\n", err); +} + +static void +class_background_at_t(cosmology *c, double t) +{ + struct class_s *p = c->private; + double tau; + double err; + ErrorMsg errmsg; + + class_fail(background_tau_of_t(&p->ba, t*_Gyr_over_Mpc_, &tau), + errmsg, errmsg); + class_background_at_tau(c, tau); + err = c->t/t - 1.0; + if (fabs(err) > ERRTOL) + Error("Poor precision in background_at_t, relerr = %g\n", err); +} + +static double +class_t_at_z(cosmology *c, double z) +{ + if (c->z != z) class_background_at_z(c, z); + return(c->t); +} + +static double +class_z_at_t(cosmology *c, double t) +{ + if (c->t != t) class_background_at_t(c, t); + return(c->z); +} + +static double +class_a_at_t(cosmology *c, double t) +{ + if (c->t != t) class_background_at_t(c, t); + return(c->a); +} + +static double +class_t_at_a(cosmology *c, double a) +{ + if (c->a != a) class_background_at_z(c, 1.0/a-1.0); + return(c->t); +} + +static double +class_H_at_z(cosmology *c, double z) +{ + struct class_s *p = c->private; + double *pvec; + ErrorMsg errmsg; + double a = 1.0/(1.0+z); + double H; + + pvec = Malloc(p->ba.bg_size_normal*sizeof(double)); + class_fail(background_functions(&p->ba, a, p->ba.short_info, pvec), + errmsg, errmsg); + H = pvec[p->ba.index_bg_H]*_Gyr_over_Mpc_; + Free(pvec); + return(H); +} + +static double +class_H_at_t(cosmology *c, double t) +{ + if (c->t != t) class_background_at_t(c, t); + return(c->H); +} + + +static double +class_conformal_distance_at_z(cosmology *c, double z) +{ + if (c->z != z) class_background_at_z(c, z); + return(c->conf_distance); +} + +static double +class_conformal_distance_at_t(cosmology *c, double t) +{ + if (c->t != t) class_background_at_t(c, t); + return(c->conf_distance); +} + +static double +class_angular_diameter_distance_at_z(cosmology *c, double z) +{ + if (c->z != z) class_background_at_z(c, z); + return(c->conf_distance/(1.0+z)); +} + +static double +class_angular_diameter_distance_at_t(cosmology *c, double t) +{ + if (c->t != t) class_background_at_t(c, t); + return(c->conf_distance/(1.0+c->z)); +} + +static double +class_luminosity_distance_at_z(cosmology *c, double z) +{ + if (c->z != z) class_background_at_z(c, z); + return(c->conf_distance*(1.0+z)); +} + +static double +class_luminosity_distance_at_t(cosmology *c, double t) +{ + if (c->t != t) class_background_at_t(c, t); + return(c->conf_distance*(1.0+c->z)); +} + +static double +class_growthfac_at_z(cosmology *c, double z) +{ + struct class_s *p = c->private; + double pk, pk0, *pk_ic=NULL; + double k = 1e-7; + ErrorMsg errmsg; + + class_fail(spectra_pk_at_k_and_z(&p->ba, &p->pm, &p->sp, k, 0.0, &pk0, pk_ic), + errmsg, errmsg); + + class_fail(spectra_pk_at_k_and_z(&p->ba, &p->pm, &p->sp, k, z, &pk, pk_ic), + errmsg, errmsg); + + return sqrt(pk/pk0); +} + +static double +class_growthfac_at_t(cosmology *c, double t) +{ + return(class_growthfac_at_z(c, class_z_at_t(c, t))); +} + +static double +class_kick_t0_t1(cosmology *c, double t0, double t1) +{ + double k0, k1; + class_background_at_t(c, t0); + k0 = c->kick; + class_background_at_t(c, t1); + k1 = c->kick; + return(k1-k0); +} + + +static double +class_drift_t0_t1(cosmology *c, double t0, double t1) +{ + double d0, d1; + class_background_at_t(c, t0); + d0 = c->drift; + class_background_at_t(c, t1); + d1 = c->drift; + return(d1-d0); +} + +static void +class_free(cosmology *c) +{ + struct class_s *p = c->private; + ErrorMsg errmsg; + + class_fail(spectra_free(&p->sp),errmsg,errmsg); + class_fail(primordial_free(&p->pm),errmsg,errmsg); + class_fail(transfer_free(&p->tr),errmsg,errmsg); + class_fail(perturb_free(&p->pt),errmsg,errmsg); + class_fail(thermodynamics_free(&p->th),errmsg,errmsg); + class_fail(background_free(&p->ba),errmsg,errmsg); + Free(p); +} + + +void +class_init(cosmology *c, char *class_ini, char *class_pre, double zmax) +{ + struct file_content fc; + struct file_content fc_input; + struct file_content fc_precision; + struct class_s *p; + ErrorMsg errmsg; + + fc.size = 0; + fc_input.size = 0; + fc_precision.size = 0; + + memset(c, 0, sizeof(cosmology)); + p = Calloc(1,sizeof(struct class_s)); + + if (class_ini) + class_fail(parser_read_file(class_ini,&fc_input,errmsg), + errmsg,errmsg); + + if (class_pre) + class_fail(parser_read_file(class_pre,&fc_precision,errmsg), + errmsg, errmsg); + + if (class_ini || class_pre) + class_fail(parser_cat(&fc_input,&fc_precision,&fc,errmsg), + errmsg, errmsg); + + class_fail(parser_free(&fc_input),errmsg,errmsg); + class_fail(parser_free(&fc_precision),errmsg,errmsg); + + class_fail(input_init(&fc, &p->pr, &p->ba, &p->th, &p->pt, &p->bs, &p->tr, + &p->pm, &p->sp, &p->nl, &p->le, &p->op, errmsg), + errmsg, errmsg); + + class_fail(parser_free(&fc),errmsg,errmsg); + + p->ba.background_verbose = 0; + p->th.thermodynamics_verbose = 0; + p->pt.perturbations_verbose = 0; + p->tr.transfer_verbose = 0; + p->pm.primordial_verbose = 0; + p->sp.spectra_verbose = 0; + + if (p->sp.z_max_pk < zmax) p->sp.z_max_pk = zmax; + + class_fail(background_init(&p->pr, &p->ba),errmsg,errmsg); + + class_fail(thermodynamics_init(&p->pr,&p->ba,&p->th),errmsg,errmsg); + + class_fail(perturb_init(&p->pr,&p->ba,&p->th,&p->pt),errmsg,errmsg); + + class_fail(transfer_init(&p->pr,&p->ba,&p->th,&p->pt,&p->bs,&p->tr), + errmsg,errmsg); + + class_fail(primordial_init(&p->pr,&p->pt,&p->pm),errmsg,errmsg); + + class_fail(spectra_init(&p->pr,&p->ba,&p->pt,&p->tr,&p->pm,&p->sp), + errmsg,errmsg); + + c->Omega0 = p->ba.Omega0_g + p->ba.Omega0_b; + if (p->ba.has_cdm == _TRUE_) { + c->Omega0 += p->ba.Omega0_cdm; + } + if (p->ba.has_ncdm == _TRUE_) { + c->Omega0 += p->ba.Omega0_ncdm_tot; + } + if (p->ba.has_lambda == _TRUE_) { + c->Omega0 += p->ba.Omega0_lambda; + } + if (p->ba.has_fld == _TRUE_) { + c->Omega0 += p->ba.Omega0_fld; + } + if (p->ba.has_ur == _TRUE_) { + c->Omega0 += p->ba.Omega0_ur; + } + c->h_100 = p->ba.h; + c->H0 = p->ba.H0*_Gyr_over_Mpc_; + c->Omega0_cdm = p->ba.Omega0_cdm; + c->Omega0_ncdm_tot = p->ba.Omega0_ncdm_tot; + c->Omega0_b = p->ba.Omega0_b; + c->Omega0_g = p->ba.Omega0_g; + c->Omega0_ur = p->ba.Omega0_ur; + c->Omega0_lambda = p->ba.Omega0_lambda; + c->Omega0_fld = p->ba.Omega0_fld; + c->w0_fld = p->ba.w0_fld; + c->wa_fld = p->ba.wa_fld; + c->age = p->ba.age; + c->Gnewt = GNEWT; + + c->private = p; + class_background_at_z(c, 0.0); + c->Omega0_m = c->Omega_m; + c->Omega0_r = c->Omega_r; + + /* Function pointers */ + c->background_at_z = class_background_at_z; + c->background_at_t = class_background_at_t; + c->background_at_tau = class_background_at_tau; + c->t_at_z = class_t_at_z; + c->z_at_t = class_z_at_t; + c->a_at_t = class_a_at_t; + c->t_at_a = class_t_at_a; + c->H_at_z = class_H_at_z; + c->H_at_t = class_H_at_t; + c->conformal_distance_at_z = class_conformal_distance_at_z; + c->conformal_distance_at_t = class_conformal_distance_at_t; + c->angular_diameter_distance_at_z = class_angular_diameter_distance_at_z; + c->angular_diameter_distance_at_t = class_angular_diameter_distance_at_t; + c->luminosity_distance_at_z = class_luminosity_distance_at_z; + c->luminosity_distance_at_t = class_luminosity_distance_at_t; + c->growthfac_at_z = class_growthfac_at_z; + c->growthfac_at_t = class_growthfac_at_t; + c->kick_t0_t1 = class_kick_t0_t1; + c->drift_t0_t1 = class_drift_t0_t1; + c->free = class_free; +} + diff --git a/external/libsdf/libsw/cosmo.c b/external/libsdf/libsw/cosmo.c new file mode 100644 index 0000000..09e84ee --- /dev/null +++ b/external/libsdf/libsw/cosmo.c @@ -0,0 +1,288 @@ +#include +#include +#include "Msgs.h" +#include "qromo.h" +#include "cosmo.h" + +static struct cosmo_s C; + +static double +adot(double a) +{ + return C.H0*sqrt(C.Omega_m/a + C.Omega_r/(a*a) + C.Lambda*a*a + (1.0 - C.Omega0 - C.Lambda)); +} + +static double +addot(double a) +{ /* factor of a? */ + return C.H0 * C.H0 * (C.Lambda*a-C.Omega_r/(a*a*a)-0.5*C.Omega_m/(a*a)); +} + +static double +integrand(double a) +{ + double x; + + x = adot(a); + return 1.0/(x*x*x); + +} + +static double +t_integrand(double a) +{ + double x; + + x = adot(a); + return 1.0/x; +} + +static double +dp_integrand(double a) +{ + double x; + + x = adot(a); + return 1.0/(a*x); + +} + +static double +kick_integrand(double t) +{ + double a; + + a = Anow(&C, t); + return 1.0/a; + +} + +static double +drift_integrand(double t) +{ + double a; + + a = Anow(&C, t); + return 1.0/(a*a); +} + +double +growthfac_from_Z(struct cosmo_s *c, double z) +{ + double a = 1.0/(1.0+z); + C = *c; + return 2.5*c->H0*c->H0*adot(a)*qromod(integrand, 0.0, a, midpntd)/a; +} + +double +velfac_from_Z(struct cosmo_s *c, double z) +{ + double d, a_dot; + double a = 1.0/(1.0+z); + C = *c; + d = qromod(integrand, 0.0, a, midpntd); + a_dot = adot(a); + return addot(a) + *a/(a_dot*a_dot) - 1.0 + a/(a_dot*a_dot*a_dot*d); +} + +double +velfac_approx_from_Z(struct cosmo_s *c, double z) +{ + double aomega; + double a = 1.0/(1.0+z); + C = *c; + aomega = C.Omega_m + C.Omega_r/a + C.Lambda*a*a*a + (1.0 - C.Omega0 - C.Lambda)*a; + return pow(C.Omega0/aomega, 0.6); +} + +double +t_from_Z(struct cosmo_s *c, double z) +{ + double d; + double a = 1.0/(1.0+z); + C = *c; + d = qromod(t_integrand, 0.0, a, midpntd); + return (d); +} + + +double +dp_from_Z(struct cosmo_s *c, double z) +{ + double d; + double a = 1.0/(1.0+z); + if (a == 1.0) return 0.0; + C = *c; + d = qromod(dp_integrand, a, 1.0, midpntd); + return (d); +} + +double +comoving_distance_from_Z(struct cosmo_s *c, double z) +{ + return speed_of_light*(one_Gyr/one_kpc)*dp_from_Z(c, z); +} + +double +hubble_from_Z(struct cosmo_s *c, double z) +{ + double a = 1.0/(1.0+z); + C = *c; + return adot(a)/a; +} + +double +kick_delta(struct cosmo_s *c, double t0, double t1) +{ + double d; + C = *c; + Msgf(("kick_delta %lf %lf\n", t0, t1)); + if (t0 == t1) return 0.0; + d = qromod(kick_integrand, t0, t1, midpntd); + return (d); +} + +double +drift_delta(struct cosmo_s *c, double t0, double t1) +{ + double d; + C = *c; + Msgf(("drift_delta %lf %lf\n", t0, t1)); + if (t0 == t1) return 0.0; + d = qromod(drift_integrand, t0, t1, midpntd); + return (d); +} + +double +Anow(struct cosmo_s *c, double time) +{ + struct cosmo_s foo; + + foo = *c; + CosmoPush(&foo, time); + return foo.a; +} + +double +Znow(struct cosmo_s *c, double time) +{ + return 1.0/Anow(c, time) - 1.0; +} + +double +Hnow(struct cosmo_s *c, double time) +{ + struct cosmo_s foo; + + foo = *c; + CosmoPush(&foo, time); + C = *c; + return adot(foo.a)/foo.a; +} + +void CosmoPush(struct cosmo_s *p, double time) +{ + double Omega0 = p->Omega0; + double Omega_r = p->Omega_r; + double Omega_m = p->Omega_m; + double Lambda = p->Lambda; + double H0 = p->H0; + double H, a2, a3, aold, anew, a2dot; + double deltat, dt; + int i; + int nstep; + + /* The cosmo structure holds,H0, Omega0, Lambda' = Lambda/3H0^2, a + and t. We integrate (forward or backward) to the new 'time' */ + + deltat = time - p->t; + if (deltat == 0.0) + return; + + /* Felten et al do all their integrals with dt=1/(400 H0). We can + do the same by choosing Nstep appropriately. In fact, we can + do better by ensuring dt < 1/(800 H). */ + aold = p->a; + a2 = aold*aold; + H = (H0 / aold) * sqrt(Omega_m/aold + Omega_r/a2 + Lambda*a2 + (1.0 - Omega0 - Lambda)); + nstep = (int)(800.*H*fabs(deltat)) + 1; + Msgf(("Cosmo push %d steps, deltat=%g, H*deltat=%g\n", + nstep, deltat, deltat*H)); + dt = deltat/(double)nstep; + + anew = p->a; + for (i = 0; i < nstep; i++) { + aold = anew; + a2 = aold*aold; + a3 = a2*aold; + H = (H0 / aold) * sqrt(Omega_m/aold + Omega_r/a2 + Lambda*a2 + (1.0 - Omega0 - Lambda)); + /* Follow the advice of Felten et al. Do this to second-order */ + a2dot = H0*H0*(-0.5*Omega_m/a2 - Omega_r/a3 + Lambda*aold); + anew = aold + dt*H*aold + 0.5*dt*dt*a2dot; + } + Msgf(("After push Z=%g\n", 1./anew - 1.)); + p->a = anew; + p->t = time; +} + +#if 0 +/* Crays don't have acosh */ +static double Acosh(double x) +{ + return log(x + sqrt(x*x-1.0)); +} + +static double +growthfac_from_Z(double Omega0, double H0, double Z) +{ + /* This is just the growing mode */ + /* See Weinberg 15.9.27--15.9.31 or Peebles LSS 11.16 */ + double d, d0; + + if (Omega0 == 1.0) { + d = 1.0/(1.0+Z); + d0 = 1.0; + } else if(Omega0 < 1.0) { + /* Using doubles can cause roundoff problems near Omega0=1 */ + double psi, coshpsi; + coshpsi = 1.0 + 2.0*(1.0 - Omega0)/(Omega0*(1.0+Z)); + psi = Acosh(coshpsi); + d = - 3.0 * psi * sinh(psi)/((coshpsi-1.0)*(coshpsi-1.0)) + + (5.0+coshpsi)/(coshpsi-1.0); + coshpsi = 1.0 + 2.0*(1.0 - Omega0)/Omega0; + psi = Acosh(coshpsi); + d0 = - 3.0 * psi * sinh(psi)/((coshpsi-1.0)*(coshpsi-1.0)) + + (5.0+coshpsi)/(coshpsi-1.0); + } else { + double theta, costheta; + costheta = 1.0 - 2.0*(Omega0-1.0)/(Omega0*(1.0+Z)); + theta = acos(costheta); + d = - 3.0 * theta * sin(theta)/((1.0-costheta)*(1.0-costheta)) + + (5.0+costheta)/(1.0-costheta); + costheta = 1.0 - 2.0*(Omega0-1.0)/Omega0; + theta = acos(costheta); + d0 = - 3.0 * theta * sin(theta)/((1.0-costheta)*(1.0-costheta)) + + (5.0+costheta)/(1.0-costheta); + } + return d/d0; +} + +static double +t_from_Z(double Omega0, double H0, double Z) +{ + double t, theta, psi; + + if(Omega0 == 1.0){ + t = (2.0/3.0) * pow(1.0+Z, -1.5); + }else if(Omega0 < 1.0){ + psi = Acosh( 1.0 + 2.0*(1.0 - Omega0)/(Omega0*(1.0+Z)) ); + t = (Omega0/2.0)*pow(1.0-Omega0, -1.5)*(sinh(psi) - psi) ; + }else{ + theta = acos( 1.0 - 2.0*(Omega0-1.)/(Omega0*(1.0+Z)) ); + t = (Omega0/2.0)*pow(Omega0-1.0, -1.5)*(theta-sin(theta)); + } + t /= H0; + return t; +} +#endif diff --git a/external/libsdf/libsw/cosmo_tbl.c b/external/libsdf/libsw/cosmo_tbl.c new file mode 100644 index 0000000..5dc9de1 --- /dev/null +++ b/external/libsdf/libsw/cosmo_tbl.c @@ -0,0 +1,575 @@ +#include +#include +#include "cosmo.h" +#include "Malloc.h" +#include "error.h" +#include "macr.h" + +#define _SUCCESS_ 0 +#define _FAILURE_ 1 +#define _ERRORMSGSIZE_ 256 +typedef char ErrorMsg[_ERRORMSGSIZE_]; +#define _SPLINE_NATURAL_ 0 +#define _SPLINE_EST_DERIV_ 1 + +#define _Gyr_over_Mpc_ 3.06601394e2 + +struct tbl_s { + int bt_size, bg_size; + double *tau_tbl, *z_tbl, *t_tbl, *tbl; + double *d2tau_dz2_tbl, *d2tau_dt2_tbl, *d2b_dtau2_tbl; + ErrorMsg error_message; +}; + +static int +array_spline_table_lines( + double * x, /* vector of size x_size */ + int x_size, + double * y_array, /* array of size x_size*y_size with elements + y_array[index_x*y_size+index_y] */ + int y_size, + double * ddy_array, /* array of size x_size*y_size */ + short spline_mode, + ErrorMsg errmsg + ) { + double * p; + double * qn; + double * un; + double * u; + double sig; + int index_x; + int index_y; + double dy_first; + double dy_last; + + u = Malloc((x_size-1) * y_size * sizeof(double)); + p = Malloc(y_size * sizeof(double)); + qn = Malloc(y_size * sizeof(double)); + un = Malloc(y_size * sizeof(double)); + + index_x=0; + + if (spline_mode == _SPLINE_NATURAL_) { + for (index_y=0; index_y < y_size; index_y++) { + ddy_array[index_x*y_size+index_y] = u[index_x*y_size+index_y] = 0.0; + } + } + else { + if (spline_mode == _SPLINE_EST_DERIV_) { + + for (index_y=0; index_y < y_size; index_y++) { + + dy_first = + ((x[2]-x[0])*(x[2]-x[0])* + (y_array[1*y_size+index_y]-y_array[0*y_size+index_y])- + (x[1]-x[0])*(x[1]-x[0])* + (y_array[2*y_size+index_y]-y_array[0*y_size+index_y]))/ + ((x[2]-x[0])*(x[1]-x[0])*(x[2]-x[1])); + + ddy_array[index_x*y_size+index_y] = -0.5; + + u[index_x*y_size+index_y] = + (3./(x[1] - x[0]))* + ((y_array[1*y_size+index_y]-y_array[0*y_size+index_y])/ + (x[1] - x[0])-dy_first); + + } + } + else { + sprintf(errmsg,"%s(L:%d) Spline mode not identified: %d",__func__,__LINE__,spline_mode); + return _FAILURE_; + } + } + + + for (index_x=1; index_x < x_size-1; index_x++) { + + sig = (x[index_x] - x[index_x-1])/(x[index_x+1] - x[index_x-1]); + + for (index_y=0; index_y < y_size; index_y++) { + + p[index_y] = sig * ddy_array[(index_x-1)*y_size+index_y] + 2.0; + + ddy_array[index_x*y_size+index_y] = (sig-1.0)/p[index_y]; + + u[index_x*y_size+index_y] = + (y_array[(index_x+1)*y_size+index_y] - y_array[index_x*y_size+index_y]) + / (x[index_x+1] - x[index_x]) + - (y_array[index_x*y_size+index_y] - y_array[(index_x-1)*y_size+index_y]) + / (x[index_x] - x[index_x-1]); + + u[index_x*y_size+index_y] = (6.0 * u[index_x*y_size+index_y] / + (x[index_x+1] - x[index_x-1]) + - sig * u[(index_x-1)*y_size+index_y]) / p[index_y]; + } + + } + + if (spline_mode == _SPLINE_NATURAL_) { + + for (index_y=0; index_y < y_size; index_y++) { + qn[index_y]=un[index_y]=0.0; + } + + } + else { + if (spline_mode == _SPLINE_EST_DERIV_) { + + for (index_y=0; index_y < y_size; index_y++) { + + dy_last = + ((x[x_size-3]-x[x_size-1])*(x[x_size-3]-x[x_size-1])* + (y_array[(x_size-2)*y_size+index_y]-y_array[(x_size-1)*y_size+index_y])- + (x[x_size-2]-x[x_size-1])*(x[x_size-2]-x[x_size-1])* + (y_array[(x_size-3)*y_size+index_y]-y_array[(x_size-1)*y_size+index_y]))/ + ((x[x_size-3]-x[x_size-1])*(x[x_size-2]-x[x_size-1])*(x[x_size-3]-x[x_size-2])); + + qn[index_y]=0.5; + + un[index_y]= + (3./(x[x_size-1] - x[x_size-2]))* + (dy_last-(y_array[(x_size-1)*y_size+index_y] - y_array[(x_size-2)*y_size+index_y])/ + (x[x_size-1] - x[x_size-2])); + + } + } + else { + sprintf(errmsg,"%s(L:%d) Spline mode not identified: %d",__func__,__LINE__,spline_mode); + return _FAILURE_; + } + } + + index_x=x_size-1; + + for (index_y=0; index_y < y_size; index_y++) { + ddy_array[index_x*y_size+index_y] = + (un[index_y] - qn[index_y] * u[(index_x-1)*y_size+index_y]) / + (qn[index_y] * ddy_array[(index_x-1)*y_size+index_y] + 1.0); + } + + for (index_x=x_size-2; index_x >= 0; index_x--) { + for (index_y=0; index_y < y_size; index_y++) { + + ddy_array[index_x*y_size+index_y] = ddy_array[index_x*y_size+index_y] * + ddy_array[(index_x+1)*y_size+index_y] + u[index_x*y_size+index_y]; + + } + } + + Free(qn); + Free(un); + Free(p); + Free(u); + + return _SUCCESS_; + } + + /** + * interpolate to get y_i(x), when x and y_i are in different arrays + * + */ +static int +array_interpolate_spline( + double * x_array, + int n_lines, + double * array, + double * array_splined, + int n_columns, + double x, + int * last_index, + double * result, + int result_size, /** from 1 to n_columns */ + ErrorMsg errmsg) { + + int inf,sup,mid,i; + double h,a,b; + + inf=0; + sup=n_lines-1; + + if (x_array[inf] < x_array[sup]){ + + if (x < x_array[inf]) { + sprintf(errmsg,"%s(L:%d) : x=%e < x_min=%e",__func__,__LINE__,x,x_array[inf]); + return _FAILURE_; + } + + if (x > x_array[sup]) { + sprintf(errmsg,"%s(L:%d) : x=%e > x_max=%e",__func__,__LINE__,x,x_array[sup]); + return _FAILURE_; + } + + while (sup-inf > 1) { + + mid=(int)(0.5*(inf+sup)); + if (x < x_array[mid]) {sup=mid;} + else {inf=mid;} + + } + + } + + else { + + if (x < x_array[sup]) { + sprintf(errmsg,"%s(L:%d) : x=%e < x_min=%e",__func__,__LINE__,x,x_array[sup]); + return _FAILURE_; + } + + if (x > x_array[inf]) { + sprintf(errmsg,"%s(L:%d) : x=%e > x_max=%e",__func__,__LINE__,x,x_array[inf]); + return _FAILURE_; + } + + while (sup-inf > 1) { + + mid=(int)(0.5*(inf+sup)); + if (x > x_array[mid]) {sup=mid;} + else {inf=mid;} + + } + + } + + *last_index = inf; + + h = x_array[sup] - x_array[inf]; + b = (x-x_array[inf])/h; + a = 1-b; + + for (i=0; iprivate; + int last_index; + double *pvec = Malloc(p->bg_size*sizeof(double)); + + array_interpolate_spline(p->tau_tbl, + p->bt_size, + p->tbl, + p->d2b_dtau2_tbl, + p->bg_size, + tau, + &last_index, + pvec, + p->bg_size, + p->error_message); + c->z = pvec[0]; + c->t = pvec[1]/_Gyr_over_Mpc_; + c->tau = tau; + c->a = 1.0/(1.0+c->z); + c->H = pvec[2]*_Gyr_over_Mpc_; + c->conf_distance = pvec[3]*1000.0; /* kpc */ + c->kick = tau/_Gyr_over_Mpc_;; + c->drift = pvec[4]/_Gyr_over_Mpc_; + c->growthfac = pvec[5]; + /* c->velfac = pow(c->Omega0_m/(c->a*c->a*c->a)*c->H0*c->H0/(c->H*c->H), 0.6); */ + c->velfac = pvec[7]; + c->velfac2 = 2.0*pow(c->Omega0_m/(c->a*c->a*c->a)*c->H0*c->H0/(c->H*c->H), 4./7.); + Free(pvec); +} + + +static void +tbl_background_at_z(cosmology *c, double z) +{ + struct tbl_s *p = c->private; + int last_index; + double tau; + + array_interpolate_spline(p->z_tbl, + p->bt_size, + p->tau_tbl, + p->d2tau_dz2_tbl, + 1, + z, + &last_index, + &tau, + 1, + p->error_message); + tbl_background_at_tau(c, tau); +} + +static void +tbl_background_at_t(cosmology *c, double t) +{ + struct tbl_s *p = c->private; + int last_index; + double tau; + + array_interpolate_spline(p->t_tbl, + p->bt_size, + p->tau_tbl, + p->d2tau_dt2_tbl, + 1, + t*_Gyr_over_Mpc_, + &last_index, + &tau, + 1, + p->error_message); + tbl_background_at_tau(c, tau); +} + +static double +tbl_t_at_z(cosmology *c, double z) +{ + tbl_background_at_z(c, z); + return(c->t); +} + +static double +tbl_z_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->z); +} + +static double +tbl_a_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->a); +} + +static double +tbl_t_at_a(cosmology *c, double a) +{ + tbl_background_at_z(c, 1.0/a-1.0); + return(c->t); +} + +static double +tbl_H_at_z(cosmology *c, double z) +{ + tbl_background_at_z(c, z); + return(c->H); +} + +static double +tbl_H_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->H); +} + +static double +tbl_conformal_distance_at_z(cosmology *c, double z) +{ + tbl_background_at_z(c, z); + return(c->conf_distance); +} + +static double +tbl_conformal_distance_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->conf_distance); +} + +static double +tbl_angular_diameter_distance_at_z(cosmology *c, double z) +{ + tbl_background_at_z(c, z); + return(c->conf_distance/(1.0+z)); +} + +static double +tbl_angular_diameter_distance_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->conf_distance/(1.0+c->z)); +} + +static double +tbl_luminosity_distance_at_z(cosmology *c, double z) +{ + tbl_background_at_z(c, z); + return(c->conf_distance*(1.0+z)); +} + +static double +tbl_luminosity_distance_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->conf_distance*(1.0+c->z)); +} + +static double +tbl_growthfac_at_z(cosmology *c, double z) +{ + tbl_background_at_z(c, z); + return(c->growthfac); +} + +static double +tbl_growthfac_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->growthfac); +} + +static double +tbl_velfac_at_z(cosmology *c, double z) +{ + tbl_background_at_z(c, z); + return(c->velfac); +} + +static double +tbl_velfac_at_t(cosmology *c, double t) +{ + tbl_background_at_t(c, t); + return(c->velfac); +} + +static double +tbl_kick_t0_t1(cosmology *c, double t0, double t1) +{ + double k0, k1; + tbl_background_at_t(c, t0); + k0 = c->kick; + tbl_background_at_t(c, t1); + k1 = c->kick; + return(k1-k0); +} + +static double +tbl_drift_t0_t1(cosmology *c, double t0, double t1) +{ + double d0, d1; + tbl_background_at_t(c, t0); + d0 = c->drift; + tbl_background_at_t(c, t1); + d1 = c->drift; + return(d1-d0); +} + +static void +tbl_free(cosmology *c) +{ + struct tbl_s *p = c->private; + + Free(p->d2tau_dz2_tbl); + Free(p->d2tau_dt2_tbl); + Free(p->d2b_dtau2_tbl ); + Free(p->tbl); + Free(p->t_tbl); + Free(p->z_tbl); + Free(p->tau_tbl); + Free(p); + c->private = NULL; +} + +void +tbl_init(cosmology *c, char *tbl) +{ + char line[1024]; + FILE *fp; + int i; + double tau, z, t, H, R, drift, growth, growth_cdm, velfac; + struct tbl_s *p = Malloc(sizeof(struct tbl_s)); + p->bt_size = 8192; + p->bg_size = 8; + + p->tau_tbl = Malloc(p->bt_size*sizeof(double)); + p->z_tbl = Malloc(p->bt_size*sizeof(double)); + p->t_tbl = Malloc(p->bt_size*sizeof(double)); + p->tbl = Malloc(p->bg_size*p->bt_size*sizeof(double)); + Fopen(fp, tbl, "r"); + i = 0; + while (fgets(line, sizeof(line), fp)) { + if (line[0] == '#') continue; + if (sscanf(line, "%lg %lg %lg %lg %lg %lg %lg %lg %lg\n", + &tau, &z, &t, &H, &R, &drift, &growth, &growth_cdm, &velfac) != 9) + Error("Did not parse %s", line); + p->tau_tbl[i] = tau; + p->z_tbl[i] = z; + p->t_tbl[i] = t; + p->tbl[i*p->bg_size+0] = z; + p->tbl[i*p->bg_size+1] = t; + p->tbl[i*p->bg_size+2] = H; + p->tbl[i*p->bg_size+3] = R; + p->tbl[i*p->bg_size+4] = drift; + p->tbl[i*p->bg_size+5] = growth; + p->tbl[i*p->bg_size+6] = growth_cdm; + p->tbl[i*p->bg_size+7] = velfac; + i++; + if (i >= p->bt_size) { + p->bt_size *= 2; + p->tau_tbl = Realloc(p->tau_tbl, p->bt_size*sizeof(double)); + p->z_tbl = Realloc(p->z_tbl, p->bt_size*sizeof(double)); + p->t_tbl = Realloc(p->t_tbl, p->bt_size*sizeof(double)); + p->tbl = Realloc(p->tbl, p->bg_size*p->bt_size*sizeof(double)); + } + } + Fclose(fp); + p->bt_size = i; + p->tau_tbl = Realloc(p->tau_tbl, p->bt_size*sizeof(double)); + p->z_tbl = Realloc(p->z_tbl, p->bt_size*sizeof(double)); + p->t_tbl = Realloc(p->t_tbl, p->bt_size*sizeof(double)); + p->tbl = Realloc(p->tbl, p->bg_size*p->bt_size*sizeof(double)); + + p->d2tau_dz2_tbl = Malloc(p->bt_size*sizeof(double)); + p->d2tau_dt2_tbl = Malloc(p->bt_size*sizeof(double)); + p->d2b_dtau2_tbl = Malloc(p->bg_size*p->bt_size*sizeof(double)); + + array_spline_table_lines(p->z_tbl, + p->bt_size, + p->tau_tbl, + 1, + p->d2tau_dz2_tbl, + _SPLINE_EST_DERIV_, + p->error_message); + array_spline_table_lines(p->t_tbl, + p->bt_size, + p->tau_tbl, + 1, + p->d2tau_dt2_tbl, + _SPLINE_EST_DERIV_, + p->error_message); + array_spline_table_lines(p->tau_tbl, + p->bt_size, + p->tbl, + p->bg_size, + p->d2b_dtau2_tbl, + _SPLINE_EST_DERIV_, + p->error_message); + + c->private = p; + + /* Function pointers */ + c->background_at_z = tbl_background_at_z; + c->background_at_t = tbl_background_at_t; + c->background_at_tau = tbl_background_at_tau; + c->t_at_z = tbl_t_at_z; + c->z_at_t = tbl_z_at_t; + c->a_at_t = tbl_a_at_t; + c->t_at_a = tbl_t_at_a; + c->H_at_z = tbl_H_at_z; + c->H_at_t = tbl_H_at_t; + c->conformal_distance_at_z = tbl_conformal_distance_at_z; + c->conformal_distance_at_t = tbl_conformal_distance_at_t; + c->angular_diameter_distance_at_z = tbl_angular_diameter_distance_at_z; + c->angular_diameter_distance_at_t = tbl_angular_diameter_distance_at_t; + c->luminosity_distance_at_z = tbl_luminosity_distance_at_z; + c->luminosity_distance_at_t = tbl_luminosity_distance_at_t; + c->growthfac_at_z = tbl_growthfac_at_z; + c->growthfac_at_t = tbl_growthfac_at_t; + c->velfac_at_z = tbl_velfac_at_z; + c->velfac_at_t = tbl_velfac_at_t; + c->kick_t0_t1 = tbl_kick_t0_t1; + c->drift_t0_t1 = tbl_drift_t0_t1; + c->free = tbl_free; +} diff --git a/external/libsdf/libsw/counters.c b/external/libsdf/libsw/counters.c new file mode 100644 index 0000000..47ec3d1 --- /dev/null +++ b/external/libsdf/libsw/counters.c @@ -0,0 +1,123 @@ +#include +#include +#include +#include "Malloc.h" +#include "timers.h" +#include "mpmy.h" +#include "Assert.h" + +#define MAXENABLED 100 + +static Counter_t *enabled_counters[MAXENABLED]; +static int nenabled_counters; + +void ClearCounter(Counter_t *c){ + c->counter = 0; +} + +void ClearEnabledCounters(void){ + int i; + for(i=0; iname = Malloc(strlen(name)+1); + strcpy(c->name, name); + c->enabled = 1; + c->counter = 0; +} + +void DisableCounter(Counter_t *t){ + int i; + + for(i=0; ienabled = 0; + Free(t->name); + t->name = NULL; + enabled_counters[i] = enabled_counters[--nenabled_counters]; +} + +void SumCounters(void){ + double nprocinv; + Counter_t *c; + int i; + MPMY_Comm_request req; + + MPMY_ICombine_Init(&req); + for(i=0; isum = c->min = c->max = c->counter; + MPMY_ICombine(&c->min, &c->min, 1, MPMY_INT64, MPMY_MIN, req); + MPMY_ICombine(&c->max, &c->max, 1, MPMY_INT64, MPMY_MAX, req); + MPMY_ICombine(&c->sum, &c->sum, 1, MPMY_DOUBLE, MPMY_SUM, req); + } + MPMY_ICombine_Wait(req); + + /* Now loop a second time and divide the mean by Nproc */ + nprocinv = 1./MPMY_Nproc(); + for(i=0; imean = c->sum * nprocinv; + } +} + +void OutputCounters(int (*Printf_Like)(const char *, ...)){ + int i; + Counter_t *c; + + SumCounters(); + Printf_Like("%12s %12s %12s %15s %14s\n", + "Counters", "Min", "Max", "Sum", "Avg"); + for (i = 0; i < nenabled_counters; i++) { + c = enabled_counters[i]; + if( c->enabled && c->name ) + Printf_Like("%12s %12ld %12ld %15.0f %14.2f\n", c->name, + c->min, c->max, c->sum, c->mean); + } +} + +void OutputIndividualCounters(int (*Printf_Like)(const char *, ...)){ + int i; + Counter_t *c; + + Printf_Like("%12s %12s\n", "Counters", "Count"); + for (i = 0; i < nenabled_counters; i++) { + c = enabled_counters[i]; + if( c->enabled && c->name ) + Printf_Like("%12s %12ld\n", c->name, c->counter); + } +} + +int64_t ReadCounter(Counter_t *c){ + return c->counter; +} + +int64_t ReadCounter64(Counter_t *c){ + return c->counter; +} + +static void SumOneCounter(Counter_t *c){ + MPMY_Comm_request req; + MPMY_ICombine_Init(&req); + c->sum = c->min = c->max = c->counter; + MPMY_ICombine(&c->min, &c->min, 1, MPMY_INT64, MPMY_MIN, req); + MPMY_ICombine(&c->max, &c->max, 1, MPMY_INT64, MPMY_MAX, req); + MPMY_ICombine(&c->sum, &c->sum, 1, MPMY_DOUBLE, MPMY_SUM, req); + MPMY_ICombine_Wait(req); + c->mean = c->sum / MPMY_Nproc(); +} + +void OutputOneCounter(Counter_t *c, int (*Printf_Like)(const char *, ...)){ + SumOneCounter(c); + Printf_Like("%12s %12ld %12ld %14.0g %14.2g\n", + (c->name)?c->name:"(noname)", + c->min, c->max, c->sum, c->mean); +} diff --git a/external/libsdf/libsw/dbg.c b/external/libsdf/libsw/dbg.c new file mode 100644 index 0000000..2cc9031 --- /dev/null +++ b/external/libsdf/libsw/dbg.c @@ -0,0 +1,36 @@ +#include + +#define FORCE(t) ( (t) | (1<<30)) +#define DBG_REQUEST_TYPE FORCE(0x1a2b3c4) + +#define WHAT_DBGBUF 1 +#define WHAT_MEMORY 2 +#define WHAT_STACK 3 +#define WHAT_CODE 4 + +typedef struct { + int what; +} dbg_request_t; + +void PrintMemfile(); +static dbg_request_t dbg_request; + +void +jab_dbg_handler(int proc) +{ + csend(DBG_REQUEST_TYPE, (char *)&dbg_request, sizeof(dbg_request), proc,0); +{ + +void +set_dbg_handler(void){ + hrecv(DBG_REQUEST_TYPE, (char *)&dbg_request, + sizeof(dbg_request), dbg_handler); +} + +static void +dbg_handler(long type, long count, long node, long pid) +{ + PrintMemfile(); + set_dbg_handler(); + return; +} diff --git a/external/libsdf/libsw/dll.c b/external/libsdf/libsw/dll.c new file mode 100644 index 0000000..6e5ac55 --- /dev/null +++ b/external/libsdf/libsw/dll.c @@ -0,0 +1,237 @@ +/* DLL: doubly linked lists. + + Support: + insertion either before or after a given element (Above, Below). + deletion. + traversal in either direction. (Up, Down) + + Use a vertical metaphor for mnemonics. You can grow the data + structure up, down, or outward from the middle. It makes + absolutely no difference. + + The implementation uses two 'sentinels'. The one above the topmost + element is called 'Sup' and the one below the lowest element is + called 'Inf'. These are good places to start and end 'for' loops, e.g., + + for(p=DllTop(dll); p!=DllInf(dll); p = DllDown(dll, p))... + + or + + for(p=DllBottom(dll); p!=DllSup(dll); p = DllUp(dll, p))... + + Space for 'user' data of size 'sz' is allocated with each element. + This is ideal if you want to allocate associate a fixed size object + with each list element. If you want more flexibility, there's + nothing stopping you from making that fixed size object a void* + that points to something else. + +*/ + +#include "chn.h" +#include "dll.h" +#include "Malloc.h" + +/* Initializing is a two step process because we have to do some + funny stuff to get a chain allocating chunks just slightly bigger than + what the user asks for. The resulting chain can be ChnTerminated + at the caller's leisure. */ +void DllCreateChn(Chn *chn, int sz, int n){ + n+=2; + if( sz > sizeof(int) ) + sz -= sizeof(int); + ChnInit(chn, sizeof(Dll_elmt)+sz, n, Realloc_f); +} + +void DllCreate(Dll *dll, Chn *chn){ + dll->chn = chn; + dll->Sup.down = &dll->Inf; + dll->Sup.up = NULL; + dll->Inf.down = NULL; + dll->Inf.up = &dll->Sup; + dll->length = 0; +} + +/* Terminate a dll */ +void DllTerminate(Dll *dll){ + /* Hmmm. There's really nothing to do since the user is now + empowered to free the chain. We can't even do a FreeAll because + there might be other stuff (other DLL's?) using the chain. */ +} + +/* Insert closer to the Top */ +Dll_elmt *DllInsertAbove(Dll *dll, Dll_elmt *down){ + Dll_elmt *new = ChnAlloc(dll->chn); + Dll_elmt *up = down->up; + + if( new == NULL ){ + Shout("ChnAlloc returns null in DllInsertAbove\n"); + return NULL; + } + dll->length++; + new->up = up; + new->down = down; + down->up = new; + up->down = new; + return new; +} + +/* Insert closer to the bottom */ +Dll_elmt *DllInsertBelow(Dll *dll, Dll_elmt *up){ + Dll_elmt *new = ChnAlloc(dll->chn); + Dll_elmt *down = up->down; + + if( new == NULL ){ + Shout("ChnAlloc returns null in DllInsertBelow\n"); + return NULL; + } + dll->length++; + new->up = up; + new->down = down; + down->up = new; + up->down = new; + return new; +} + +/* These two should be inlined, with __inline__ ... */ +/* Insert at the bottom */ +Dll_elmt *DllInsertAtBottom(Dll *dll){ + return DllInsertAbove(dll, &(dll->Inf)); +} + +/* Insert at the top */ +Dll_elmt *DllInsertAtTop(Dll *dll){ + return DllInsertBelow(dll, &(dll->Sup)); +} + +/* These three are VERY similar, but they are useful for traversals iin + different directions. Otherwise the caller needs to save the 'up' or + 'down' element */ +/* Delete an entry. Return nothing. */ +void DllDelete(Dll *dll, Dll_elmt *old){ + Dll_elmt *up = old->up; + Dll_elmt *down = old->down; + + dll->length--; + up->down = down; + down->up = up; + ChnFree(dll->chn, old); +} + +/* Delete an entry. Return the entry that used to be above it. */ +Dll_elmt *DllDeleteUp(Dll *dll, Dll_elmt *old){ + Dll_elmt *up = old->up; + Dll_elmt *down = old->down; + + dll->length--; + up->down = down; + down->up = up; + ChnFree(dll->chn, old); + return up; +} + +/* Delete an entry. Return the entry that used to be below it. */ +Dll_elmt *DllDeleteDown(Dll *dll, Dll_elmt *old){ + Dll_elmt *up = old->up; + Dll_elmt *down = old->down; + + dll->length--; + up->down = down; + down->up = up; + ChnFree(dll->chn, old); + return down; +} + +/* The next two have two plausible returns: the item directly above or + below the original position of the mover. Rather than confuse + things, I won't return either, and leave it up to the caller to keep + track of whatever he wants. */ + +/* Extract the 'mover' and place it immediately below 'up'. + Like DllDelete, followed by DllInsertBelow, but preserve the + data in the object. */ +void DllMoveBelow(Dll *dll, Dll_elmt *mover, Dll_elmt *up){ + Dll_elmt *down; + /* Extract the mover */ + mover->down->up = mover->up; + mover->up->down = mover->down; + /* Now insert it below up */ + down = up->down; + mover->up = up; + mover->down = down; + down->up = mover; + up->down = mover; +} + +/* Extract the 'mover' and place it immediately above 'down'. + Like DllDelete, followed by DllInsertAbove, but preserve the + data in the object. */ +void DllMoveAbove(Dll *dll, Dll_elmt *mover, Dll_elmt *down){ + Dll_elmt *up; + /* Extract the mover */ + mover->down->up = mover->up; + mover->up->down = mover->down; + /* Now insert it above down */ + up = down->up; + mover->up = up; + mover->down = down; + down->up = mover; + up->down = mover; +} + +/* These should be inlined... */ +/* Move to bottom */ +void DllMoveToBottom(Dll *dll, Dll_elmt *mover){ + DllMoveAbove(dll, mover, &(dll->Inf)); +} + +/* Move to top */ +void DllMoveToTop(Dll *dll, Dll_elmt *mover){ + DllMoveBelow(dll, mover, &(dll->Sup)); +} + +/* These are generally 'inlined' with appropriate #defines in dll.h. + They're simple enough to allow use of the pre-processor instead of + __inline__. */ +#undef DllLength +int DllLength(Dll *dll){ + return dll->length; +} + +#undef DllSup +Dll_elmt *DllSup(Dll *dll){ + return &dll->Sup; +} + +#undef DllInf +Dll_elmt *DllInf(Dll *dll){ + return &dll->Inf; +} + +/* The highest 'real' element */ +#undef DllTop +Dll_elmt *DllTop(Dll *dll){ + return dll->Sup.down; +} + +/* The lowest 'real' element */ +#undef DllBottom +Dll_elmt *DllBottom(Dll *dll){ + return dll->Inf.up; +} + +#undef DllData +void *DllData(Dll_elmt *elmt){ + return &elmt->stuff; +} + +#undef DllUp +Dll_elmt *DllUp(Dll_elmt *elmt){ + return elmt->up; +} + +#undef DllDown +Dll_elmt *DllDown(Dll_elmt *elmt){ + return elmt->down; +} + + diff --git a/external/libsdf/libsw/dofZ.c b/external/libsdf/libsw/dofZ.c new file mode 100644 index 0000000..dffddf4 --- /dev/null +++ b/external/libsdf/libsw/dofZ.c @@ -0,0 +1,211 @@ +#include +#include +#include "qromo.h" +#include "dofz.h" + +static double Omega0; +static double Omega_m; +static double Omega_r; +static double Omega_de; +static double w0; +static double wa; +static double Lambda_prime; +static double H0; + +static double +adot(double a) +{ + return H0*sqrt(Omega_m/a + Omega_r/(a*a) + Lambda_prime*a*a + (1.0 - Omega0 - Lambda_prime)); +} + +static double +addot(double a) +{ + return H0 * H0 * (Lambda_prime*a-Omega_r/(a*a*a)-0.5*Omega_m/(a*a)); +} + +static double +integrand(double a) +{ + double x; + + x = adot(a); + return 1.0/(x*x*x); + +} + +static double +t_integrand(double a) +{ + double x; + + x = adot(a); + return 1.0/x; +} + +static double +dp_integrand(double a) +{ + double x; + + x = adot(a); + return 1.0/(a*x); + +} + +double +growthfac_from_Z(double omega0, double h0, double lambda_prime, double z) +{ + double z0gf; + double a = 1.0/(1.0+z); + double h = 10.0*h0*(one_kpc/one_Gyr); + Omega0 = omega0; + Omega_r = omega_r / (h * h); + Omega_m = omega0-Omega_r; + H0 = h0; + Lambda_prime = lambda_prime; + return 2.5*H0*H0*adot(a)*qromod(integrand, 0.0, a, midpntd)/a; +} + +double +velfac_from_Z(double omega0, double h0, double lambda_prime, double z) +{ + double d, a_dot; + double a = 1.0/(1.0+z); + double h = 10.0*h0*(one_kpc/one_Gyr); + Omega0 = omega0; + Omega_r = omega_r / (h * h); + Omega_m = omega0-Omega_r; + H0 = h0; + Lambda_prime = lambda_prime; + d = qromod(integrand, 0.0, a, midpntd); + a_dot = adot(a); + return addot(a) + *a/(a_dot*a_dot) - 1.0 + a/(a_dot*a_dot*a_dot*d); +} + +double +t_from_Z(double omega0, double h0, double lambda_prime, double z) +{ + double d; + double a = 1.0/(1.0+z); + double h = 10.0*h0*(one_kpc/one_Gyr); + Omega0 = omega0; + Omega_r = omega_r / (h * h); + Omega_m = omega0-Omega_r; + H0 = h0; + Lambda_prime = lambda_prime; + d = qromod(t_integrand, 0.0, a, midpntd); + return (d); +} + +double +dp_from_Z(double omega0, double h0, double lambda_prime, double z) +{ + double d; + double a = 1.0/(1.0+z); + double h = 10.0*h0*(one_kpc/one_Gyr); + if (a == 1.0) return 0.0; + Omega0 = omega0; + Omega_r = omega_r / (h * h); + Omega_m = omega0-Omega_r; + H0 = h0; + Lambda_prime = lambda_prime; + d = qromod(dp_integrand, a, 1.0, midpntd); + return (d); +} + +double +hubble_from_Z(double omega0, double h0, double lambda_prime, double z) +{ + double a = 1.0/(1.0+z); + double h = 10.0*h0*(one_kpc/one_Gyr); + Omega0 = omega0; + Omega_r = omega_r / (h * h); + Omega_m = omega0-Omega_r; + H0 = h0; + Lambda_prime = lambda_prime; + return adot(a)/a; +} + +#if 0 +/* Crays don't have acosh */ +static double Acosh(double x) +{ + return log(x + sqrt(x*x-1.0)); +} + +static double +growthfac_from_Z(double Omega0, double H0, double Z) +{ + /* This is just the growing mode */ + /* See Weinberg 15.9.27--15.9.31 or Peebles LSS 11.16 */ + double d, d0; + + if (Omega0 == 1.0) { + d = 1.0/(1.0+Z); + d0 = 1.0; + } else if(Omega0 < 1.0) { + /* Using doubles can cause roundoff problems near Omega0=1 */ + double psi, coshpsi; + coshpsi = 1.0 + 2.0*(1.0 - Omega0)/(Omega0*(1.0+Z)); + psi = Acosh(coshpsi); + d = - 3.0 * psi * sinh(psi)/((coshpsi-1.0)*(coshpsi-1.0)) + + (5.0+coshpsi)/(coshpsi-1.0); + coshpsi = 1.0 + 2.0*(1.0 - Omega0)/Omega0; + psi = Acosh(coshpsi); + d0 = - 3.0 * psi * sinh(psi)/((coshpsi-1.0)*(coshpsi-1.0)) + + (5.0+coshpsi)/(coshpsi-1.0); + } else { + double theta, costheta; + costheta = 1.0 - 2.0*(Omega0-1.0)/(Omega0*(1.0+Z)); + theta = acos(costheta); + d = - 3.0 * theta * sin(theta)/((1.0-costheta)*(1.0-costheta)) + + (5.0+costheta)/(1.0-costheta); + costheta = 1.0 - 2.0*(Omega0-1.0)/Omega0; + theta = acos(costheta); + d0 = - 3.0 * theta * sin(theta)/((1.0-costheta)*(1.0-costheta)) + + (5.0+costheta)/(1.0-costheta); + } + return d/d0; +} + +static double +t_from_Z(double Omega0, double H0, double Z) +{ + double t, theta, psi; + + if(Omega0 == 1.0){ + t = (2.0/3.0) * pow(1.0+Z, -1.5); + }else if(Omega0 < 1.0){ + psi = Acosh( 1.0 + 2.0*(1.0 - Omega0)/(Omega0*(1.0+Z)) ); + t = (Omega0/2.0)*pow(1.0-Omega0, -1.5)*(sinh(psi) - psi) ; + }else{ + theta = acos( 1.0 - 2.0*(Omega0-1.)/(Omega0*(1.0+Z)) ); + t = (Omega0/2.0)*pow(Omega0-1.0, -1.5)*(theta-sin(theta)); + } + t /= H0; + return t; +} +#endif + + +#ifdef STANDALONE + +int +main(int argc, char *argv[]) +{ + double z; + double omega0 = atof(argv[1]); + double h0 = atof(argv[2]); + double lp = atof(argv[3]); + + for (z = 0; z <= 100; z++) { + printf("%g %g %g %g %g\n", z, growthfac_from_Z(omega0, h0, lp, z), + velfac_from_Z(omega0, h0, lp, z), t_from_Z(omega0, h0, lp, z), + dp_from_Z(omega0, h0, lp, z)); + } + exit(0); +} + +#endif diff --git a/external/libsdf/libsw/error.c b/external/libsdf/libsw/error.c new file mode 100644 index 0000000..35698d7 --- /dev/null +++ b/external/libsdf/libsw/error.c @@ -0,0 +1,201 @@ +/* + * Copyright 1991 Michael S. Warren and John K. Salmon. All Rights Reserved. + */ +#ifdef __SRV__ + # error This file should not use SRV +#endif + +#include +#include +#include +#include +#include "error.h" +#include "Msgs.h" +#include "mpmy.h" +#include "mpmy_abnormal.h" +#include "gccextensions.h" +#include "protos.h" +#include "memfile.h" + +#undef Error +#undef SinglError +#undef Warning +#undef SinglWarning +#undef SeriousWarning +#undef Shout +#undef SinglShout + +static int recursion; + +/* We call this SWError because of a namespace conflict when linking SDF + into perl5. error.h should do the switcheroo automatically... */ +void SWError(const char * mesg, ...) +{ + va_list alist; + + if( recursion++ ) + MPMY_SystemAbort(); /* errors within errors. A very bad sign */ + + va_start(alist, mesg); + fprintf(stderr, "ERROR: Node %d (%s) ", MPMY_Procnum(), MPMY_Physnode()); + vfprintf(stderr, mesg, alist); + fflush(stderr); + va_end(alist); + Msg_do("ERROR: "); + va_start(alist, mesg); + Msg_doalist(mesg, alist); + va_end(alist); + Msg_flush(); + MPMY_Abort(); +} + +/* Is this right? Is it even possible? Can I pass the same va_list + to two different subroutines? I can't use va_start and va_end because + this isn't a varargs function! It works in Msgs, so it ought to work + here too.*/ +void vError(const char * mesg, va_list alist) +{ + if( recursion++ ) + MPMY_SystemAbort(); /* errors within errors. A very bad sign */ + + fprintf(stderr, "ERROR: Node %d (%s) ", MPMY_Procnum(), MPMY_Physnode()); + vfprintf(stderr, mesg, alist); + fflush(stderr); + + Msg_do("ERROR: "); + Msg_doalist(mesg, alist); + Msg_flush(); + PrintMemfile(); + MPMY_Abort(); +} + +void SinglError(const char * mesg, ...) +{ + va_list alist; + + if( recursion++ ) + MPMY_SystemAbort(); + if( MPMY_Procnum() == 0 ){ + va_start(alist, mesg); + fprintf(stderr, "Single ERROR: "); + vfprintf(stderr, mesg, alist); + fflush(stderr); + va_end(alist); + Msg_do("Single ERROR: "); + va_start(alist, mesg); + Msg_doalist(mesg, alist); + va_end(alist); + PrintMemfile(); + Msg_flush(); + } + MPMY_Abort(); +} + +void +Warning(const char *mesg, ...) +{ + va_list alist; + + if( recursion++ ){ + --recursion; + return; + } + Msg_do("WARNING: "); + va_start(alist, mesg); + Msg_doalist(mesg, alist); + va_end(alist); + Msg_flush(); + --recursion; +} + +void +SinglWarning(const char *mesg, ...) +{ + va_list alist; + + if( recursion++ ){ + --recursion; + return; + } + if( MPMY_Procnum() == 0 ){ + /* Since there's only one, it's safe to send it to stderr too... */ + va_start(alist, mesg); + fprintf(stderr, "WARNING: (single): "); + vfprintf(stderr, mesg, alist); + fflush(stderr); + va_end(alist); + Msg_do("WARNING (Single mode): "); + va_start(alist, mesg); + Msg_doalist(mesg, alist); + va_end(alist); + Msg_flush(); + } + --recursion; +} + +/* This one goes to stderr, for all to see immediately */ +void +SeriousWarning(const char *mesg, ...) +{ + va_list alist; + + if( recursion++ ){ + --recursion; + return; + } + va_start(alist, mesg); + fprintf(stderr, "WARNING: Node %d ", MPMY_Procnum()); + vfprintf(stderr, mesg, alist); + fflush(stderr); + va_end(alist); + Msg_do("WARNING: "); + va_start(alist, mesg); + Msg_doalist(mesg, alist); + va_end(alist); + Msg_flush(); + --recursion; +} + +/* No "WARNING", no line numbers, etc... Just the arguments */ +void +Shout(const char *mesg, ...) +{ + va_list alist; + + if( recursion++ ){ + --recursion; + return; + } + va_start(alist, mesg); + vfprintf(stderr, mesg, alist); + fflush(stderr); + va_end(alist); + va_start(alist, mesg); + Msg_doalist(mesg, alist); + va_end(alist); + Msg_flush(); + --recursion; +} + +void +SinglShout(const char *mesg, ...) +{ + va_list alist; + + if( recursion++ ){ + --recursion; + return; + } + if( MPMY_Procnum() == 0 ){ + va_start(alist, mesg); + vfprintf(stderr, mesg, alist); + fflush(stderr); + va_end(alist); + va_start(alist, mesg); + Msg_doalist(mesg, alist); + va_end(alist); + Msg_flush(); + } + --recursion; +} + diff --git a/external/libsdf/libsw/files.c b/external/libsdf/libsw/files.c new file mode 100644 index 0000000..0dc42cc --- /dev/null +++ b/external/libsdf/libsw/files.c @@ -0,0 +1,59 @@ +/* Some common routines for dealing with files. */ +#include +#include +#include "protos.h" +#include "Malloc.h" +#include "mpmy.h" + +int fexists(const char *name){ + int fd, ret; + + ret = 0; + if( MPMY_Procnum() == 0 ){ + /* We could call stat, but then we'd have to deal with the */ + /* complications of different flavors of struct stat on different */ + /* machines...Yuck. */ + if( (fd=open(name, O_RDONLY)) >= 0 ){ + close(fd); + ret = 1; + } + } + MPMY_Combine(&ret, &ret, 1, MPMY_INT, MPMY_SUM); + return ret; +} + +int fexists_and_unlink(const char *name){ + int fd, ret; + + ret = 0; + if( MPMY_Procnum() == 0 ){ + /* We could call stat, but then we'd have to deal with the */ + /* complications of different flavors of struct stat on different */ + /* machines...Yuck. */ + if( (fd=open(name, O_RDONLY)) >= 0 ){ + close(fd); + ret = 1; + } + unlink(name); + } + MPMY_Combine(&ret, &ret, 1, MPMY_INT, MPMY_SUM); + return ret; +} + +int +ForceCheckpoint(void) +{ + return fexists_and_unlink("_ForceCheckpoint_") || fexists("_ForceStop_"); +} + +int +ForceOutput(void) +{ + return fexists_and_unlink("_ForceOutput_"); +} + +int +ForceStop(void) +{ + return fexists_and_unlink("_ForceStop_"); +} diff --git a/external/libsdf/libsw/finite.c b/external/libsdf/libsw/finite.c new file mode 100644 index 0000000..67c687e --- /dev/null +++ b/external/libsdf/libsw/finite.c @@ -0,0 +1,11 @@ +/* This really should be implemented for us... */ +/* Nevertheless, we should do better.. */ +#include +#ifndef HUGE +#define HUGE 1.e38 +#endif + +int finite(double x){ + return x-HUGE; +} + diff --git a/external/libsdf/libsw/gc.c b/external/libsdf/libsw/gc.c new file mode 100644 index 0000000..f142a78 --- /dev/null +++ b/external/libsdf/libsw/gc.c @@ -0,0 +1,180 @@ +/* little functions for doing gray-code stuff. */ + +#include +#include +#include "gc.h" + +/* Both parity and firstbit could be sped up with some lookup tables. */ +/* Who cares? */ +unsigned int parity(unsigned int num) +{ + unsigned int answer = 0; + while( num ){ + if( num&1 ) + answer ^= 1; + num >>= 1; + } + return answer; +} + +int hibit(unsigned int num){ + /* return the index of the highest bit in num. */ + /* -1 if num == 0 */ + int bit = -1; + + while(num) { + bit++; + num >>= 1; + } + return bit; +} + +int lobit(unsigned int num){ + /* Return the index of the lowest bit in num */ + /* Return BITS_PER_WORD in if num==0. This relies on overflow in left- + shift returning 0 */ + unsigned int m=1; + int bit = 0; + + while((num&m) != m){ + m <<= 1; + bit++; + } + return bit; +} + +int ilog2(unsigned int n){ + return hibit(n); +} + +unsigned int cksum(const void *buf, unsigned int n){ + unsigned int sum = 0; + unsigned int leftover; + const unsigned int *ip = buf; + const unsigned char *cp; + + /* Worthwhile to make the result independent of word-ordering, wordsize, + etc?? Not now... */ + leftover = n%sizeof(unsigned int); + n /= sizeof(unsigned int); + while(n--){ + sum ^= *ip++; + } + cp = (const unsigned char *)ip; + for(n=0; n>= 1)); + return ret; +} + +/* My two neighbors are obtained by : + a) toggle the lowest bit of procnum + b) toggle one past the lowest turned-on bit + ( but be careful when there are no turned on bits, or only the highest + is turned on). + */ +int Gcup(unsigned int proc, unsigned int nproc){ + if( proc == nproc>>1 ){ + return -1; + }else if( parity(proc) ){ + return proc^(1<>1); +} + +/* If you look back at the property of ^ and >>, you will notice ^ behave + like an addtion: + + x + y = y + x <-> x ^ y = y ^ x + (x + y) + z = x + (y + z) <-> (x ^ y) ^ z = x ^ (y ^ z) + + and >> behave like a linear operator L: + + L(x + y) = L(x) + L(y) <-> (x ^ y)>>1 = (x>>1) ^ (y>>1) + + Hence the bin2gray operation behave just like the linear operator: + + 1 + L + + with inverse + -1 2 3 2 4 8 + (1+L) = 1 - L + L - L + ... = (1 - L)(1 + L )(1 + L )(1 + L )... (*) + + Since x ^ y ^ y = x, addition is just the same as subaction. Furthurmore, + x>>32 = 0 for any 32 bit integer and hence all of + 32 64 + (1 + L ) , (1 + L ), .... equal to 1. + + As a result, your just need to keep the first 5 terms of RHS of (*) +*/ +unsigned long +gray2bin(unsigned long g){ + register unsigned long t = g; + t ^= t>>1; /* 1 - L */ + /* 2 */ + t ^= t>>2; /* 1 + L */ + /* 4 */ + t ^= t>>4; /* 1 + L */ + /* 8 */ + t ^= t>>8; /* 1 + L */ + /* 16 */ + t ^= t>>16; /* 1 + L */ + return t; +} diff --git a/external/libsdf/libsw/gnusort.c b/external/libsdf/libsw/gnusort.c new file mode 100644 index 0000000..82143a0 --- /dev/null +++ b/external/libsdf/libsw/gnusort.c @@ -0,0 +1,269 @@ +/* Use a static buffer instead of malloc for small 'size' */ +/* Fixed a bug when called with total_elems=0, ptr=NULL */ +/* The default is now to not use alloca unless ALLOCA_PREFERRED is defined */ +/* -DNO_ALLOCA and -DC_ALLOCA are equivalent. Neither uses alloca herein. */ +/* Changed alloca -> Malloc/Free (johns) */ +/* Hacked by msw for even more speed (2x - 3x) */ +/* Assumes size is a multiple of sizeof(int) */ +/* Originally from glibc-1.03.tar.Z stdlib/qsort.c */ + +/* Copyright (C) 1991 Free Software Foundation, Inc. +This file is part of the GNU C Library. +Written by Douglas C. Schmidt (schmidt@ics.uci.edu). + +The GNU C Library is free software; you can redistribute it and/or +modify it under the terms of the GNU Library General Public License as +published by the Free Software Foundation; either version 2 of the +License, or (at your option) any later version. + +The GNU C Library is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +Library General Public License for more details. + +You should have received a copy of the GNU Library General Public +License along with the GNU C Library; see the file COPYING.LIB. If +not, write to the Free Software Foundation, Inc., 675 Mass Ave, +Cambridge, MA 02139, USA. */ + +#include +#include +#ifndef ALLOCA_PREFERRED +#include "Malloc.h" +static char static_pivot[128]; +#endif +#include "error.h" + +/* Int-wise swap two items of size SIZE. */ +#define SWAP(a, b, size) \ + do \ + { \ + register size_t __size = (size)/sizeof(int); \ + register int *__a = (int *)(a), *__b = (int *)(b); \ + do \ + { \ + int __tmp = *__a; \ + *__a++ = *__b; \ + *__b++ = __tmp; \ + } while (--__size > 0); \ + } while (0) + +/* Discontinue quicksort algorithm when partition gets below this size. + This particular magic number was chosen to work best on a Sun 4/260. */ +#define MAX_THRESH 4 + +/* Stack node declarations used to store unfulfilled partition obligations. */ +typedef struct + { + char *lo; + char *hi; + } stack_node; + +/* The next 4 #defines implement a very fast in-line stack abstraction. */ +#define STACK_SIZE (8 * sizeof(unsigned long int)) +#define PUSH(low, high) ((void) ((top->lo = (low)), (top->hi = (high)), ++top)) +#define POP(low, high) ((void) (--top, (low = top->lo), (high = top->hi))) +#define STACK_NOT_EMPTY (stack < top) + + +/* Order size using quicksort. This implementation incorporates + four optimizations discussed in Sedgewick: + + 1. Non-recursive, using an explicit stack of pointer that store the + next array partition to sort. To save time, this maximum amount + of space required to store an array of MAX_INT is allocated on the + stack. Assuming a 32-bit integer, this needs only 32 * + sizeof(stack_node) == 136 bits. Pretty cheap, actually. + + 2. Chose the pivot element using a median-of-three decision tree. + This reduces the probability of selecting a bad pivot value and + eliminates certain extraneous comparisons. + + 3. Only quicksorts TOTAL_ELEMS / MAX_THRESH partitions, leaving + insertion sort to order the MAX_THRESH items within each partition. + This is a big win, since insertion sort is faster for small, mostly + sorted array segements. + + 4. The larger of the two sub-partitions is always pushed onto the + stack first, with the algorithm then concentrating on the + smaller partition. This *guarantees* no more than log (n) + stack size is needed (actually O(1) in this case)! */ + +/* STDC says it's void, but Sun knows better, but SRV knows better still... */ +#if defined(__SRV__) || defined(__SUN_CC_UNPROTO__) || defined(__SUN5__) || !defined(sparc) +void +#else +int +#endif +qsort(void *pbase, size_t total_elems, size_t size, + int (*cmp)(const void *, const void *)) +{ + register char *base_ptr = (char *) pbase; + + /* Allocating SIZE bytes for a pivot buffer facilitates a better + algorithm below since we can do comparisons directly on the pivot. */ +#ifdef ALLOCA_PREFERRED + char *pivot_buffer = (char *) alloca (size); +#else + char *pivot_buffer = (size > sizeof(static_pivot)) ? (char *)Malloc(size) : static_pivot; +#endif + const size_t max_thresh = MAX_THRESH * size; + + if (size % sizeof(int) || (int)pbase % sizeof(int)) + Error("This qsort only works on int aligned stuff\n"); + + if (total_elems > MAX_THRESH) + { + char *lo = base_ptr; + char *hi = &lo[size * (total_elems - 1)]; + /* Largest size needed for 32-bit int!!! */ + stack_node stack[STACK_SIZE]; + stack_node *top = stack + 1; + + while (STACK_NOT_EMPTY) + { + char *left_ptr; + char *right_ptr; + + char *pivot = pivot_buffer; + + /* Select median value from among LO, MID, and HI. Rearrange + LO and HI so the three values are sorted. This lowers the + probability of picking a pathological pivot value and + skips a comparison for both the LEFT_PTR and RIGHT_PTR. */ + + char *mid = lo + size * ((hi - lo) / size >> 1); + + if ((*cmp)((void *) mid, (void *) lo) < 0) + SWAP(mid, lo, size); + if ((*cmp)((void *) hi, (void *) mid) < 0) + SWAP(mid, hi, size); + else + goto jump_over; + if ((*cmp)((void *) mid, (void *) lo) < 0) + SWAP(mid, lo, size); + jump_over:; + memcpy(pivot, mid, size); + pivot = pivot_buffer; + + left_ptr = lo + size; + right_ptr = hi - size; + + /* Here's the famous ``collapse the walls'' section of quicksort. + Gotta like those tight inner loops! They are the main reason + that this algorithm runs much faster than others. */ + do + { + while ((*cmp)((void *) left_ptr, (void *) pivot) < 0) + left_ptr += size; + + while ((*cmp)((void *) pivot, (void *) right_ptr) < 0) + right_ptr -= size; + + if (left_ptr < right_ptr) + { + SWAP(left_ptr, right_ptr, size); + left_ptr += size; + right_ptr -= size; + } + else if (left_ptr == right_ptr) + { + left_ptr += size; + right_ptr -= size; + break; + } + } + while (left_ptr <= right_ptr); + + /* Set up pointers for next iteration. First determine whether + left and right partitions are below the threshold size. If so, + ignore one or both. Otherwise, push the larger partition's + bounds on the stack and continue sorting the smaller one. */ + + if ((size_t) (right_ptr - lo) <= max_thresh) + { + if ((size_t) (hi - left_ptr) <= max_thresh) + /* Ignore both small partitions. */ + POP(lo, hi); + else + /* Ignore small left partition. */ + lo = left_ptr; + } + else if ((size_t) (hi - left_ptr) <= max_thresh) + /* Ignore small right partition. */ + hi = right_ptr; + else if ((right_ptr - lo) > (hi - left_ptr)) + { + /* Push larger left partition indices. */ + PUSH(lo, right_ptr); + lo = left_ptr; + } + else + { + /* Push larger right partition indices. */ + PUSH(left_ptr, hi); + hi = right_ptr; + } + } + } + + /* Once the BASE_PTR array is partially sorted by quicksort the rest + is completely sorted using insertion sort, since this is efficient + for partitions below MAX_THRESH size. BASE_PTR points to the beginning + of the array to sort, and END_PTR points at the very last element in + the array (*not* one beyond it!). */ + +#define min(x, y) ((x) < (y) ? (x) : (y)) + + /* johns - avoid potential segfault if total_elems==0 and base_ptr == 0 !*/ + if( total_elems > 0 ) + { + char *const end_ptr = &base_ptr[size * (total_elems - 1)]; + char *tmp_ptr = base_ptr; + char *thresh = min(end_ptr, base_ptr + max_thresh); + register char *run_ptr; + + /* Find smallest element in first threshold and place it at the + array's beginning. This is the smallest array element, + and the operation speeds up insertion sort's inner loop. */ + + for (run_ptr = tmp_ptr + size; run_ptr <= thresh; run_ptr += size) + if ((*cmp)((void *) run_ptr, (void *) tmp_ptr) < 0) + tmp_ptr = run_ptr; + + if (tmp_ptr != base_ptr) + SWAP(tmp_ptr, base_ptr, size); + + /* Insertion sort, running from left-hand-side up to right-hand-side. */ + + run_ptr = base_ptr + size; + while ((run_ptr += size) <= end_ptr) + { + tmp_ptr = run_ptr - size; + while ((*cmp)((void *) run_ptr, (void *) tmp_ptr) < 0) + tmp_ptr -= size; + + tmp_ptr += size; + if (tmp_ptr != run_ptr) + { + int *trav; + + trav = (int *)(run_ptr + size); + while (--trav >= (int *)run_ptr) + { + int c = *trav; + int *hi, *lo; + + for (hi = lo = trav; + (lo -= size/sizeof(int)) >= (int *)tmp_ptr; hi = lo) + *hi = *lo; + *hi = c; + } + } + } + } +#ifndef ALLOCA_PREFERRED + if( pivot_buffer != static_pivot) + Free(pivot_buffer); +#endif +} diff --git a/external/libsdf/libsw/heap.c b/external/libsdf/libsw/heap.c new file mode 100644 index 0000000..edfeeda --- /dev/null +++ b/external/libsdf/libsw/heap.c @@ -0,0 +1,87 @@ +/* Super-fast priority queue. ? Completely inlined by gcc. */ +/* Assume that each pointer points at */ +/* a key, AND whatever else the caller is interested in keeping. */ + +#ifndef assert +#include "Assert.h" +#endif +#include +#include +#include "Malloc.h" +#define HEAPdotC +#include "heap.h" + +#if !defined(HeapKey) || !defined(HeapLeft) + # error Heap macros undefined +#endif + +/* These save us from an extra comparison inside our loops */ +const float HeapMinf = -FLT_MAX; +const float HeapInf = FLT_MAX; + +/* I wonder if some const decls would be correct? */ +void HeapInit(Heap *hp, unsigned int sz){ + assert(sz > 0); + hp->arr = Malloc( (sz+1)*sizeof(*(hp->arr)) ); + assert(hp->arr); + hp->sz = sz; + hp->cnt = 1; + hp->arr[0] = &HeapInf; +} + +void HeapTerminate(Heap *hp){ + Free((void *)hp->arr); + hp->arr = NULL; + hp->sz = hp->cnt = 0; +} + +int HeapIsBad(const Heap *hp){ + unsigned int i, ri, li; + const float **arr = hp->arr; + + for( i=1, li=2, ri=3; + li < hp->cnt; + i++, li = HeapLeft(i), ri = li+1){ + if( HeapKey(i) < HeapKey(li) ) + return li; + if( ri < hp->cnt && HeapKey(i) < HeapKey(ri) ) + return ri; + } + return 0; +} + +#ifdef STANDALONE +#include + +float data[] = {5., 1., 3., 2., 7., 3., 4.}; + +void HeapPrint(Heap *hp){ + unsigned int i; + for(i=1; icnt; i++){ + printf("%d %g\n", i, *(hp->arr[i])); + } +} + +int main(int argc, char **argv){ + Heap H; + float *top; + unsigned int i; + unsigned int ndata; + + HeapInit(&H, 2); + ndata = sizeof(data)/sizeof(*data); + for(i=0; i +#include +#include "error.h" +#include "hwclock.h" + +static double hwtick_val; +#define two_to_32 4294967296.0 + +static double hwclock_offset; + +#if defined(__x86_64__) && defined(__GNUC__) +#define DEFAULT_MHZ 1200.e6 +static double +mhz_from_proc(){ + FILE *fp = fopen("/proc/cpuinfo", "r"); + char line[512]; + + if( fp == NULL ){ + Warning("Can't open /proc/cpuinfo\n"); + return DEFAULT_MHZ; + } + + while( fgets(line, sizeof(line), fp) ){ + char *m; + int nscan; + double mhz; + + if( (m = strstr(line, "cpu MHz")) ){ + /* sscanf should match any amount of whitespace around the : */ + nscan = sscanf(m, "cpu MHz : %lf\n", &mhz); + if( nscan == 1 ){ + fclose(fp); + return mhz*1.0e6; + }else{ + Warning("sscanf returns %d, on '%s' of /proc/cpuinfo mhz line failed\n", nscan, line); + fclose(fp); + return DEFAULT_MHZ; /* let's just guess 200Mhz */ + } + } + } + Warning("Did not find cpu MHz in /proc/cpuinfo\n"); + fclose(fp); + return DEFAULT_MHZ; +} +#endif + +/* This doesn't belong here, but I can't be bothered to figure out where + it does belong! */ +double +hwclock(void) +{ +#if defined(__x86_64__) + static int init = 1; + unsigned int counter[2]; + + __asm__("rdtsc \n\t" + "movl %%eax,%0 \n\t" + "movl %%edx,%1 \n\t" + : "=m" (((unsigned *)counter)[0]), "=m" (((unsigned *)counter)[1]) + : + : "eax" , "edx"); + + if (init) { + init = 0; + hwtick_val = 1.0/mhz_from_proc(); + hwclock_offset = hwtick_val*((double)counter[1]*two_to_32 + (double)counter[0]); + return 0.0; + } + return(hwtick_val*((double)counter[1]*two_to_32 + (double)counter[0])-hwclock_offset); +#else + { + struct timeval tp; + struct timezone tzp; + + gettimeofday(&tp,&tzp); + return ( (double) tp.tv_sec + (double) tp.tv_usec * 1.e-6 ); + } +#endif +} + +void +zero_hwclock(void) +{ + hwclock_offset += hwclock(); +} + +double +hwtick(void) +{ + return hwtick_val; +} diff --git a/external/libsdf/libsw/ivfprintf.c b/external/libsdf/libsw/ivfprintf.c new file mode 100644 index 0000000..51d56a3 --- /dev/null +++ b/external/libsdf/libsw/ivfprintf.c @@ -0,0 +1,54 @@ +#ifdef __SRV__ + # error This file should not be compiled with srv +#endif + +#include +#include +#include +#ifndef FORCE_TYPE +/* mesh.h is not protected against multiple inclusion ! */ +#include +#endif + +/* Why isn't this in mesh.h? */ +extern void iowait(int); + +/* A fast asynchronous disk message primitive */ + +/* There is circumstantial evidence that the delta dies if we have many */ +/* iwrites pending and call ifflush ???? (maybe fixed - johns ) ????*/ + +static int id[] = {-1, -1, -1, -1, -1, -1, -1, -1}; +#define NID (sizeof(id)/sizeof(id[0])) +static int nid = 0; +static char buf[NID][1024]; + +void +ivfprintf(FILE *stream, const char *fmt, va_list args) +{ + if (id[nid] != -1){ + iowait(id[nid]); + } + vsprintf(buf[nid], fmt, args); + id[nid] = iwrite(fileno(stream), buf[nid], strlen(buf[nid])); + if( ++nid == NID ) + nid = 0; +} + + +void +ifflush(FILE *stream) +{ + int i; + int ii; + + /* stream not used. Just wait for all the pending i/o */ + for (ii = nid, i = 0; i < NID; i++){ + if (id[ii] != -1) { + iowait(id[ii]); + id[ii] = -1; + } + if( ++ii == NID ) + ii = 0; + } +} diff --git a/external/libsdf/libsw/key.c b/external/libsdf/libsw/key.c new file mode 100644 index 0000000..e801fdd --- /dev/null +++ b/external/libsdf/libsw/key.c @@ -0,0 +1,79 @@ +#define KEYdotC +/* Most of the definitions are in key.h */ +#include /* just for sprintf */ +#include "key.h" +#include "protos.h" + +/* Non-inlined definitions go here. */ + +char * +PrintKey(Key_t key) +{ + static char str[128]; + + if (NK == 1) + sprintf(str, "%lo", key.k[0]); + else { + if (key.k[1] == 0) { + sprintf(str, "%lo", key.k[0]); + } else { + /* This only works for NDIM==3 */ + sprintf(str, "%lo%01lo%021lo", key.k[1] >> 2, /* bits 66-126*/ + ((key.k[1] & 03) << 1) | (key.k[0] >> 63), /* bits 63,64,65 */ + key.k[0] & ~(1L << 63)); /* bits 0-62 */ + } + } + return str; +} + +#if 0 +int +TreeLevel(Key_t key, int ndim) +{ + int level; + int chubits = (KEYBITS-1)/ndim; + Key_t testkey; + + /* First check whether it's a 'body' (at the deepest level.) */ + /* This will save considerable time... */ + testkey = KeyLshift(KeyInt(1), chubits*ndim); + if( KeyEQ( testkey, KeyAnd(testkey, key) ) ) + return chubits; + + /* Now start looking from low levels */ + testkey = KeyInt(1); + for (level = 0; level +#include +#include "protos.h" +#include "mpmy.h" +#include "Msgs.h" +#include "Assert.h" +#include "key.h" +#include "peano.h" +#include "keycvt.h" + +#ifndef FLT_MAX +/* I wonder what they put in float.h, anyway */ +#define FLT_MAX 1.e38 +#endif + +int KeyOutOfBounds; + +/* This gives a tight bounding box around the list of positions. Note + that the result is rmin <= pos[i] and rmax >= pos[i]. The equality + can be a headache for float-to-int conversions! Consider using + InflateBbox and or CubeBbox! */ +void +TightBbox(float *pstart, int nobj, int pstride, int ndim, tbbox *bb){ + MPMY_Comm_request req; + int d; + float *pos = pstart; + float rmin[MAXNDIMKU]; + float rmax[MAXNDIMKU]; + + assert(ndim < MAXNDIMKU); + + for(d=0; dndim = ndim; + while(nobj--){ + for(d=0; d pd ) rmin[d] = pd; + if( rmax[d] < pd ) rmax[d] = pd; + } + pos = (float *)(pstride + (char *)pos); + } + MPMY_ICombine_Init(&req); + MPMY_ICombine(rmin, bb->rmin, ndim, MPMY_FLOAT, MPMY_MIN, req); + MPMY_ICombine(rmax, rmax, ndim, MPMY_FLOAT, MPMY_MAX, req); + MPMY_ICombine_Wait(req); + for(d=0; dsz[d] = rmax[d] - bb->rmin[d]; + } +} + +void +CenterBbox(tbbox *bb, float *center){ + int d; + for(d=0; dndim; d++){ + center[d] = bb->rmin[d] + 0.5F*bb->sz[d]; + } +} + +int +ContainsBbox(tbbox *bb1, tbbox *bb2){ + /* Return 1 if bb1 completely contains bb2 */ + int d; + + for(d=0; dndim; d++){ + if( bb1->rmin[d] > bb2->rmin[d] || + bb1->rmin[d] + bb1->sz[d] < bb2->rmin[d]+bb2->sz[d] ) + return 0; + } + return 1; +} + +/* Construct bbu, the 'union' of bb1 and bb2 */ +void +UnionBbox(tbbox *bb1, tbbox *bb2, tbbox *bbu){ + int d; + + bbu->ndim = bb1->ndim; + for(d=0; dndim; d++){ + float max1, max2; + float min1, min2; + + /* Read these first in case bbu is equal to bb1 or bb2! */ + min1 = bb1->rmin[d]; + min2 = bb2->rmin[d]; + bbu->rmin[d] = (min1 < min2) ? min1 : min2; + max1 = min1 + bb1->sz[d]; + max2 = min2 + bb2->sz[d]; + bbu->sz[d] = (max1 > max2) ? max1 - bbu->rmin[d] : max2 - bbu->rmin[d]; + } +} + +/* Make the bbox a cube by expanding the smaller dimensions. */ +void +CubeBbox(tbbox *bb){ + int d; + float maxs, halfmaxs; + float center[MAXNDIMKU]; + + maxs = 0.; + for(d=0; dndim; d++) { + center[d] = bb->rmin[d] + 0.5f * bb->sz[d]; + if( maxs < bb->sz[d] ) + maxs = bb->sz[d]; + } + halfmaxs = 0.5f*maxs; + for( d=0; dndim; d++){ + bb->rmin[d] = center[d] - halfmaxs; + bb->sz[d] = maxs; + } +} + +/* Increase the linear dimension by 'factor' on all sides */ +void +InflateBbox(tbbox *bb, float factor){ + float center; + int d; + for( d=0; dndim; d++){ + center = 0.5f*bb->sz[d] + bb->rmin[d]; + bb->sz[d] *= factor; + bb->rmin[d] = center - 0.5f*bb->sz[d]; + } +} + +void +GenerateKeys(float *pstart, int nobj, int pstride, tbbox *bb, + Key_t *kstart, int kstride, int order_type){ + int i, d; + float keyfactor[MAXNDIMKU]; + unsigned int ik[MAXNDIMKU]; + unsigned int chubits; + Key_t *kp; + float *pos; + unsigned int maxikey; + float fmaxikey; + Key_t (*KfI)(unsigned int *xp, int ndim, int nbits); + + chubits = ((KEYBITS-1) / bb->ndim ); + maxikey = (1U<ndim; d++){ + /* Divide by 0 ?? */ + keyfactor[d] = (1U<sz[d]); + } + pos = pstart; + kp = kstart; + switch( order_type ){ + case MORTON_ORDER: + KfI = &KeyFromInts; + break; + case PH_ORDER: + assert(bb->ndim == 3); /* assumed by the current implementation*/ + KfI = &PHKeyFromInts; + break; + default: + Error("Unrecognized order_type in GenerateKeys\n"); + } + + for(i=0; indim; d++){ + float fk = keyfactor[d] * (pos[d] - bb->rmin[d]); + if( fk < 0. ){ + KeyOutOfBounds = 1; + ik[d] = 0; + }else if( fk > fmaxikey ){ + ik[d] = maxikey; + KeyOutOfBounds = 1; + }else + ik[d] = (unsigned int)fk; + } + *kp = (*KfI)(ik, bb->ndim, chubits); + kp = (Key_t *) ( kstride + (char *)kp); + pos = (float *) ( pstride + (char *)pos); + } +} + +void +CellBBFromKey(Key_t key, tbbox *bb, tbbox *cellbb, int order_type) +{ + unsigned int icorner[MAXNDIMKU]; + unsigned int iscale; + float factor; + int d; + + switch(order_type){ + case MORTON_ORDER: + iscale = (1<ndim)); + break; + case PH_ORDER: + iscale = (1<ndim)); + break; + default: + Error("Unrecognized ordering in CellBBFromKey\n"); + } + + /* Now scale it back to "physical" units */ + cellbb->ndim = bb->ndim; + for(d=0; dndim; d++){ + factor = (bb->sz[d])/iscale; + cellbb->rmin[d] = bb->rmin[d] + factor*icorner[d]; + cellbb->sz[d] = factor; + } +} + +void +IntsFromFloats(const float *x, unsigned int *ix, tbbox *bb, int nbits){ + int d; + unsigned int iscale = 1<= CHAR_BIT*sizeof(iscale)) { + Error("nbits out of range in IntsFromFloats\n"); + } + for(d=0; dndim; d++){ + ix[d] = iscale * (x[d] - bb->rmin[d]) / (bb->sz[d]); + } +} + +void +FloatsFromInts(const int *ix, float *x, tbbox *bb, int nbits){ + unsigned int iscale = 1<= CHAR_BIT*sizeof(iscale)) { + Error("nbits out of range in FloatsFromInts\n"); + } + for(d=0; dndim; d++){ + x[d] = ix[d] * ((bb->sz[d])/iscale); + } +} + +/* These two names conflict with physics_generic.c. Good. It will + keep me from using physics_generic.c accidentally. */ +Key_t +KeyFromInts(unsigned int *xp, int ndim, int nbits){ + Key_t key; + unsigned int rshift, bits, dim; + + /* Set first bit. Important! */ + key = KeyInt(1); + for(rshift=nbits; rshift; ){ + rshift--; + bits = 0; + for(dim = 0; dim < ndim; dim++ ) + bits |= ((xp[dim]>>rshift)&1) << dim; + key = KeyOrInt(KeyLshift(key, ndim), bits); + } + return(key); +} + +/* Notice that the return value is DIFFERENT FROM physics_generic.c + Here, we return the number of bits. It's easier for the caller to do + left-shift than ilog2. */ +int +IntsFromKey(Key_t key, unsigned int *ip, int ndim){ + unsigned int iscale = 1; + unsigned int lev = 0; + int i; + + for(i=0; i= CHAR_BIT*sizeof(iscale)) { + Error("lev out of range in IntsFromKey\n"); + } + } + return lev; +} + diff --git a/external/libsdf/libsw/lsv.c b/external/libsdf/libsw/lsv.c new file mode 100644 index 0000000..4de9e83 --- /dev/null +++ b/external/libsdf/libsw/lsv.c @@ -0,0 +1,792 @@ +/* + * Copyright 1992,1993 Michael S. Warren. All Rights Reserved. + */ + +/* define this to skip the FIONREAD code entirely */ +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#ifndef FD_SET +/* Sigh. Where o' where is FD_SET found sys/types?, sys/select? */ +/* Unfortunately, sys/select doesn't always exist! */ +#include +#endif +#include "protos.h" +#include "lsv.h" +#include "error.h" +#include "Msgs.h" +#include "Assert.h" +#include "Malloc.h" + +#ifdef __DO_SWAP__ +/* There's probably a better way to do most of the multi-word swaps... */ +#include "byteswap.h" +static int _x, _y; +#define Swap(x) (_x=x, Byteswap(sizeof(int), 1, &_x, &_y), _y) +#else +#define Swap(x) x +#endif + +/* This test should be accomplished some other way!! */ +#if defined(USE_ALARM) && !( defined(__x86_64__) || defined(_AIX) || defined(__SUN5__) ) +#define HAVE_SIGVEC +#endif + +#if defined(__SUN4__) +/* These should really be prototyped somewhere else, + but this will do for now */ +int close(int); +int getpid(void); +#ifdef HAVE_SIGVEC +int sigvec(int sig, struct sigvec *vec, struct sigvec *ovec); +#endif +int ioctl(int fd, int cmd, void *p); +int gethostname(char *name, int namelen); +char *inet_ntoa(struct in_addr in); +int socket(int domain, int type, int protocol); +int bind(int s, struct sockaddr *name, int namelen); +int recvfrom(int s, void *buf, int len, int flags, + struct sockaddr *from, int *fromlen); +int sendto(int s, const void *msg, int len, + int flags, struct sockaddr *to, int tolen); +int select(int width, fd_set *readfds, fd_set *writefds, fd_set *exceptfds, + struct timeval *timeout); +int getsockname(int s, struct sockaddr *name, int *namelen); +void bzero(char *b, int length); +#endif + +#if defined(__SUN5__) +/* This is a bsd-ism. It may not even exist everywhere?! */ +int gethostname(char *name, int namelen); +#endif + +#define MAXDEFER 4000 /* maximum number of messages deferred */ +#define HLEN (5*sizeof(int)) /* size of packet header */ +#define MAXLEN 8192 /* size of total packet */ +#define BLOCK (MAXLEN-HLEN) /* size of basic data packet */ +#define ACK_TYPE (-1) /* Message type for ack */ +#define H_MAGIC (0x9f07) /* Magic number for headers */ + +#define MAX_PORT_ATTEMPTS 150 /* Number of ports to try before giving up */ +#define LSV_TYPE 2 /* Type used to communicate with host */ +#define HOST_NUM (-1) /* can't redefine this without mem adjust. */ +#ifndef INADDR_NONE +#define INADDR_NONE -1 /* 'bad' return from inet_addr */ +#endif + +static int host_num = HOST_NUM; /* host integer address */ +static int my_pid; /* my process id */ +static unsigned int *seqout; /* array of outgoing sequence counters */ +static unsigned int *seqin; /* array of incoming sequence counters */ +static unsigned int *nretry; /* array of retry counts */ +static struct sockaddr_in host_addr; +static struct sockaddr_in my_addr; + +static int sock; /* file descriptor for my UDP socket */ +static struct sockaddr_in *addr; /* sockaddrs for all processors */ +static char *msgbuf[MAXDEFER]; /* pointers to deferred messages */ +static int msgcnt; /* number of deferred messages */ +#ifdef USE_ALARM +static volatile int failed; /* flag for recvfrom timeout */ +static void to_alarm(int); +#endif +static void sock_init(struct sockaddr_in *acc, int bind_flag); +static void common_init(const int n); +static int bsend(int s, const void *outbuf, int sent, int dest, int type); +static int brecv(int s, void *inb, int sent, int *dest, int type, int block); +static int chk_deferw(int type, int *src); +static void send_ack(int s, int dest, int seq, struct sockaddr_in *dest_addr); +static int chk_defer(int src, int type, int seq); + +int LSV_procnum; /* my integer address (procnum) */ +int LSV_nproc; /* how many 'elt's */ + +/* If hears nothing before its 'block' arg expires, it returns + BRECV_TIMEDOUT */ +#define BRECV_TIMEDOUT (-2) + +/* It sure would be nice to be able to set these at runtime!!! */ +/* TIMEOUT1 passed to brecv when we're 'blocking', i.e., from Srecv_block. */ +#define TIMEOUT1 200 +/* TIMEOUT2 is what we pass when we really expect something, i.e., when + we are waiting for subsequent blocks after we have received the first + one. */ +#define TIMEOUT2 50 +/* Srecv{_block} will allow retry brecv after a timeout this many + times before giving up altogether. */ +#define NRETRY 2 +/* Ssend waits for an ack from the recipient. It retries this many + times, incrementing the timer by one each time, so in the end, the + total time waited is ACK_NRETRY*(ACK_NRETRY+1)/2 seconds. Now that + the user code is free to use alarm, we can rely on the user to bail out + if things seem to be taking too long. Thus, we set this quite high. */ +#ifndef USE_ALARM +#define ACK_NRETRY 100 +#else +#define ACK_NRETRY 20 +#endif + +void +Sclose(void) +{ + close(sock); + /* Should we free some of the arrays??? */ +} + +void +Sinit_elt(void) +{ + int suspend_proc; + int pid; + int nin; + int size = sizeof(struct sockaddr_in); + int hostport; + char *hostip; + unsigned long inaddr; + + assert(getenv("LSV_PROCNUM") && getenv("LSV_NPROC") + && getenv("LSV_HOSTPORT") && getenv("LSV_HOST") ); + + LSV_procnum = atoi(getenv("LSV_PROCNUM")); + LSV_nproc = atoi(getenv("LSV_NPROC")); + hostip = getenv("LSV_HOST"); + hostport = atoi(getenv("LSV_HOSTPORT")); + + /* fill in host_addr here */ + memset(&host_addr, sizeof(host_addr), 0); + host_addr.sin_family = AF_INET; + /* Now try to figure out the host's address from hostname */ + /* Don't bother with gethostbyname here. Assume that the host + has worked out its preferred numeric IP address and passed that + to us through the environment var LSV_HOST */ + if ((inaddr = inet_addr(hostip)) != INADDR_NONE) /* it is numeric */ + host_addr.sin_addr.s_addr = inaddr; + else + Error("inet_addr(%s) failed, errno=%d\n", hostip, errno); + host_addr.sin_port = htons(hostport); + Msgf(("hostip: %s, host_addr.sin_addr %s, host_addr.sin_port: %d\n", + hostip, inet_ntoa(host_addr.sin_addr), ntohs(host_addr.sin_port))); + + sock_init(&host_addr, 0); /* get host sockaddr */ + sock_init(&my_addr, 1); /* get my sockaddr */ + Msgf(("my_addr.sin_addr: %s, my_addr.sin_port: %d\n", + inet_ntoa(my_addr.sin_addr), ntohs(my_addr.sin_port))); + + + common_init(LSV_nproc); + + Ssend(&my_addr, size, HOST_NUM, LSV_TYPE); /* send to host */ + + nin = Srecv_block(addr, LSV_nproc * size, LSV_TYPE, &host_num); + if (nin != LSV_nproc * size) + Error("Bad number of addrs received\n"); + + if( getenv("LSV_SUSPEND") && strlen(getenv("LSV_SUSPEND")) > 0 ){ + pid = getpid(); + suspend_proc = atoi(getenv("LSV_SUSPEND")); + if (suspend_proc == -1) /* suspend all */ + suspend_proc = LSV_procnum; + if (LSV_procnum == suspend_proc){ + Shout("suspending pid=%d, LSV_procnum=%d\n", + pid, LSV_procnum); + kill(pid, SIGSTOP); + } + } +} + +/* Sinit_host1 is called BEFORE the host forks off child processes. + It has to figure out its own port and hostname, so it can pass it + to the children in the environment. */ +void +Sinit_host1(int *portp, char **namep){ + char *p; + + sock_init(&host_addr, 1); + *portp = ntohs(host_addr.sin_port); + if( (p=getenv("LSV_HOST")) == NULL ){ + /* If there is no LSV_HOST in the environment, then ask the system */ + p = inet_ntoa(host_addr.sin_addr); + } + *namep = malloc(strlen(p)+1); + strcpy(*namep, p); +} + +/* Sinit_host is called after the host forks the child processes. + Its job is to communicate all the port info with them so they + can talk to one another directly. */ +void +Sinit_host(int n) /* n is how many nodes we talk to */ +{ + int i, nin; + int size = sizeof(struct sockaddr_in); + + common_init(n); + Msgf(("After common_init\n")); + LSV_procnum = HOST_NUM; + + for (i = 0; i < n; i++) { + nin = Srecv_block(&addr[i], size, LSV_TYPE, &i); + if (nin != size) + Error("Bad recv size (%d) in Sinit_host, i=%d\n", nin, i); + Msgf(("Received addr[%d]: family: %d, port: %d, addr: %s\n", + i, addr[i].sin_family, ntohs(addr[i].sin_port), + inet_ntoa(addr[i].sin_addr))); + } + Msgf(("All sockaddrs received. Broadcasting to compute processes\n")); + for (i = 0; i < n; i++) { + Ssend(addr, n * size, i, LSV_TYPE); + Msgf(("Sent addrs to %d\n", i)); + } +} + +static void +common_init(const int n) +{ +#if defined(HAVE_SIGVEC) + struct sigvec vec; +#endif + + /* to enforce sequencing */ + seqin = (unsigned int *) calloc(n+1, sizeof(unsigned int)); + seqout = (unsigned int *) calloc(n+1, sizeof(unsigned int)); + nretry = (unsigned int *) calloc(n+1, sizeof(unsigned int)); + addr = (struct sockaddr_in *) calloc(n+1, sizeof(struct sockaddr_in)); + if (seqin == NULL || seqout == NULL || nretry == NULL || addr == NULL) + Error("No more memory in Sinit\n"); + seqin++; /* allow indexing [-1:n-1] */ + seqout++; + nretry++; + addr++; + + /* We use the convention that 0, ..., n-1 are nodes, and -1 is the host */ + addr[HOST_NUM] = host_addr; /* struct assignment */ + my_pid = getpid(); + +#ifdef USE_ALARM +#if defined (HAVE_SIGVEC) + vec.sv_handler = to_alarm; + vec.sv_flags = SV_INTERRUPT; + vec.sv_mask = 0; + sigvec(SIGALRM, &vec, NULL); +#else + signal(SIGALRM, to_alarm); +#endif /* HAVE_SIGVEC */ +#endif /* USE_ALARM */ + +} + +#ifdef USE_ALARM +static void +to_alarm(int sig) +{ + assert( sig == SIGALRM ); + failed = 1; + signal(SIGALRM, to_alarm); +} +#endif + +/* dest should be last argument */ + +void +Ssend(const void *outb, int outcnt, int dest, int type) +{ + int sent; + const char *buf = outb; + int ret; + + Msgf(("Ssend: (%d) to %d len %d\n", type, dest, outcnt)); + do { + sent = (outcnt > BLOCK) ? BLOCK : outcnt; + ret = bsend(sock, buf, sent, dest, type); + if( ret < 0 ){ + Error("Ssend: bsend failed, type=%d, dest=%d, outcnt=%d\n", + type, dest, outcnt); + } + outcnt -= sent; + buf += sent; + } while (outcnt || sent == BLOCK); +} + +int +Srecv(void *inb, int size, int type, int *from) +{ + int sent, inbytes = 0; + char *buf = inb; + int timeout; + int verbose = 0; + + /* Only the first brecv is non-blocking. We wait for the rest. */ + sent = brecv(sock, buf+inbytes, size, from, type, 0); + if( sent < 0 ){ + if( sent == BRECV_TIMEDOUT ){ + return -1; /* nothing available, not an error! */ + } + Error("First brecv failed in Srecv, errno=%d\n", errno); + } + + inbytes += sent; + size -= sent; + timeout = TIMEOUT2; + while(sent == BLOCK){ + tryagain: + sent = brecv(sock, buf+inbytes, size, from, type, timeout); + if( sent < 0 ){ + if( sent == BRECV_TIMEDOUT ){ + Warning("Srecv: brecv timed out, will wait indefinitely now\n"); + verbose = 1; + timeout = -1; + goto tryagain; + } + Error("brecv failed in Srecv, inbytes=%d, errno=%d\n", + inbytes, errno); + } + inbytes += sent; + size -= sent; + } + if( verbose || Msg_test(__FILE__) ) + Msg_do("Srecv: (%d) from %d ret %d\n", type, *from, inbytes); + return(inbytes); +} + +/* last arg should not be a pointer */ + +int +Srecv_block(void *inb, int size, int type, int *from) +{ + int sent, inbytes = 0; + char *buf = inb; + int timeout = TIMEOUT1; + int verbose = 0; + + do { + tryagain: + sent = brecv(sock, buf+inbytes, size, from, type, timeout); + if( sent < 0 ){ + if( sent == BRECV_TIMEDOUT ){ + Warning("Srecv_block: brecv(size=%d, from=%d, type=%d, timeout=%d) timed out, inbytes=%d, errno=%d\n", + size, *from, type, timeout, + inbytes, errno); + timeout = -1; + verbose = 1; + goto tryagain; + } + return -1; + } + inbytes += sent; + size -= sent; + timeout = TIMEOUT2; + } while (sent == BLOCK); + if( verbose || Msg_test(__FILE__) ) + Msg_do("Srecv_block: (%d) from %d ret %d\n", type, *from, inbytes); + return(inbytes); +} + +void Sdiag(int (*printf_like)(const char *, ...)){ + int i; + int *buf; + printf_like("lsv.c: counters to and from various destinations\n[dest seqout[dest] seqin[dest] nretry[dest]]\n"); + for(i=-1; i ACK_NRETRY ){ + SeriousWarning("bsend timed out %d times waiting for ack\n", + retry); + return -1; + } + verbose = 1; + goto try_again; + } + if( verbose || Msg_test(__FILE__) ) + Msg_do("bsend ackrecv ready\n"); + + memset(&src_addr, 0, sizeof(struct sockaddr_in)); + incnt = recvfrom(s, (void *)inbuf, MAXLEN, 0, (struct sockaddr *)&src_addr, &len); +#else + memset(&src_addr, 0, sizeof(struct sockaddr_in)); + failed = 0; + alarm(1+retry); + errno = 0; + incnt = recvfrom(s, inbuf, MAXLEN, 0, (struct sockaddr *)&src_addr, &len); + if (failed) { + Msgf(("retry, errno=%d\n", errno)); + if (retry > ACK_NRETRY) { + SeriousWarning("bsend timed out on ack, errno=%d\n", errno); + return -1; + } + goto try_again; + } + alarm(0); +#endif + + if (incnt < 0) { + Warning("recvfrom(ack from %d) returns %d, errno=%d\n", dest, incnt, errno); + if(nfail++ > 5){ + SeriousWarning("recvfrom(ack from %d) returns %d, errno=%d\n", dest, incnt, errno); + return -1; + } + goto try_again; + } + if (Swap(inbuf[0]) != H_MAGIC) { + Warning("Bad header in bsend\n"); + goto ackrecv; + } + +#ifdef __DO_SWAP__ + inbuf[1] = Swap(inbuf[1]); + inbuf[2] = Swap(inbuf[2]); + inbuf[3] = Swap(inbuf[3]); + inbuf[4] = Swap(inbuf[4]); +#endif + + src = inbuf[1]; + seq = inbuf[2]; + intype = inbuf[3]; + inlen = inbuf[4]; + + if (intype != ACK_TYPE) { /* got a data packet */ + if (seq < seqin[src]) + Msgf(("aignore2 %d from %d ", seq, src)); + else if (chk_defer(src, intype, seq) > -1) + Msgf(("aignore %d from %d ", seq, src)); + else { + msgbuf[msgcnt] = Malloc((incnt+8)&~07); + memcpy(msgbuf[msgcnt], inbuf, incnt); + msgcnt++; + if (msgcnt >= MAXDEFER) Error("msgcnt too large\n"); + Msgf(("adefer seq:%d type:%d, len:%d from %d ", + seq, intype, inlen, src)); + } + send_ack(s, src, seq, &src_addr); + goto ackrecv; + } else { + if (incnt != HLEN) + Warning("Bad incnt, errno=%d", errno); + else if (dest != src || seq != seqout[dest]) + Msgf(("aduplicate ack %d from %d ", seq, src)); + else { + Msgf(("ack seq: %d, dest: %d\n", seq, dest)); + seqout[dest]++; + return (sent-HLEN); + } + goto ackrecv; + } +} + +/* Block is a timeout. We treat a negative value as meaning to block + forever*/ +static int +brecv(int s, void *inb, int sent, int *dest, int type, int block) +{ + int src, seq, incnt; + struct sockaddr_in src_addr; + int len = sizeof(struct sockaddr_in); + int inlen, i; + int intype; + int inbuf[(BLOCK+HLEN)/sizeof(int)]; + fd_set rdset; + struct timeval timeout, *timeoutp; + int selreturn; + + Msgf(("brecv(sent=%d, dest=%d, type=%d, block=%d)\n", + sent, *dest, type, block)); + if (*dest == LSV_ANY) + i = chk_deferw(type, &src); /* sets src */ + else + i = chk_defer(*dest, type, seqin[*dest]); + if (i != -1) { + if (*dest == LSV_ANY) *dest = src; + inlen = *(int *)(msgbuf[i]+4*sizeof(int)); + if (inlen > sent) Error("Too much data\n"); + memcpy(inb, msgbuf[i]+HLEN, inlen); + Free(msgbuf[i]); + msgbuf[i] = msgbuf[--msgcnt]; + Msgf(("brecv %d (%d) from %d.\n", seqin[*dest], type, *dest)); + seqin[*dest]++; + return(inlen); + } + + datrecv: + if( block >= 0 ){ + timeout.tv_sec = block; + timeout.tv_usec = 0; + timeoutp = &timeout; + }else{ + timeoutp = NULL; + } + FD_ZERO(&rdset); + FD_SET(s, &rdset); + selreturn = select(s+1, &rdset, NULL, NULL, timeoutp); + if( selreturn < 0 ){ + if (errno == EINTR) + goto datrecv; /* SIGPROF interrupts select */ + else { + SeriousWarning("select failed, errno=%d\n", errno); + return -1; + } + }else if(selreturn == 0){ + return BRECV_TIMEDOUT; + } + Msgf(("brecv any (%d)...", type)); + + memset(&src_addr, 0, sizeof(struct sockaddr_in)); + /* What's the best thing to do here if we timed out? + return? goto datrecv? something else?*/ + incnt = recvfrom(s, (void *)inbuf, MAXLEN, 0, (struct sockaddr *)&src_addr, &len); + + if (incnt < 0) { + Warning("brecv: recvfrom, errno=%d", errno); + goto datrecv; + } + if (Swap(inbuf[0]) != H_MAGIC) { + Warning("Bad header in brecv (%x,%x,%x,%x), incnt %d\n", + Swap(inbuf[0]), Swap(inbuf[1]), Swap(inbuf[2]), Swap(inbuf[3]), + incnt); + goto datrecv; + } + +#ifdef __DO_SWAP__ + inbuf[1] = Swap(inbuf[1]); + inbuf[2] = Swap(inbuf[2]); + inbuf[3] = Swap(inbuf[3]); + inbuf[4] = Swap(inbuf[4]); +#endif + + src = inbuf[1]; + seq = inbuf[2]; + intype = inbuf[3]; + inlen = inbuf[4]; + + Msgf((" [%d, %d, %d, %d] ", src, seq, intype, inlen)); + + if (intype == ACK_TYPE) { + Msgf(("duplicate ack %d from %d ", seq, src)); + goto datrecv; + } + else if (inlen != incnt-HLEN) { + Shout("Bad inlen in bsend, errno=%d", errno); + goto datrecv; + } + else if (type == intype && seq == seqin[src] && + (*dest == src || *dest == LSV_ANY)) { + if (*dest == LSV_ANY) { + *dest = src; + Msgf(("%d from %d ", seq, src)); + } + send_ack(s, src, seq, &src_addr); + if (inlen > sent) Error("Too much data\n"); + memcpy(inb, inbuf+HLEN/sizeof(int), inlen); + seqin[src]++; + return(inlen); + } else { + if (seq < seqin[src]) + Msgf(("ignore2 %d from %d ", seq, src)); + else if (chk_defer(src, intype, seq) > -1) + Msgf(("ignore3 %d from %d ", seq, src)); + else if (src != *dest || type != intype || seq != seqin[src]) { + msgbuf[msgcnt] = Malloc((incnt+8)&~07); + memcpy(msgbuf[msgcnt], inbuf, incnt); + msgcnt++; + if (msgcnt >= MAXDEFER) Error("msgcnt too large\n"); + Msgf(("defer %d (%d) seqin[%d]=%d ", seq, intype, src, seqin[src])); + } + send_ack(s, src, seq, &src_addr); + goto datrecv; + } +} + +static int +chk_defer(int src, int type, int seq) +{ + int i, insrc, inseq, intype; + + Msgf(("chk_defer(src=%d, type=%d, seq=%d)\n", src, type, seq)); + for (i = 0; i < msgcnt; i++) { + insrc = *(int *)(msgbuf[i]+sizeof(int)); + inseq = *(int *)(msgbuf[i]+2*sizeof(int)); + intype = *(int *)(msgbuf[i]+3*sizeof(int)); + if (src == insrc && type == intype && seq == inseq){ + Msgf(("deferred match: msgbuf[%d]\n", i)); + return(i); + } + } + Msgf(("no match\n")); + return(-1); +} + +static int +chk_deferw(int type, int *src) +{ + int i, insrc, inseq, intype; + + for (i = 0; i < msgcnt; i++) { + insrc = *(int *)(msgbuf[i]+sizeof(int)); + inseq = *(int *)(msgbuf[i]+2*sizeof(int)); + intype = *(int *)(msgbuf[i]+3*sizeof(int)); + if (type == intype && inseq == seqin[insrc]) { + *src = insrc; + return(i); + } + } + return(-1); +} + +static void +send_ack(int s, int dest, int seq, struct sockaddr_in *dest_addr) +{ + int ack[HLEN/sizeof(int)]; + + ack[0] = Swap(H_MAGIC); + ack[1] = Swap(LSV_procnum); + ack[2] = Swap(seq); + ack[3] = Swap(ACK_TYPE); + ack[4] = Swap(0); + if (sendto(s, (void *)ack, HLEN, 0, (struct sockaddr *)dest_addr, + sizeof(struct sockaddr_in)) != HLEN) + Warning("sendto, errno=%d", errno); + else + Msgf(("acked\n")); +} + +static void +sock_init(struct sockaddr_in *acc, int bind_flag) +{ + sock = socket(AF_INET, SOCK_DGRAM, 0); + if( sock < 0 ){ + Error("socket failed, errno=%d\n", errno); + } +#ifdef SOCKBUF + { + int sockbuf; + /* These appear not to be supported on the delta */ + sockbuf= SOCKBUF; + if (setsockopt(sock, SOL_SOCKET, SO_SNDBUF, &sockbuf, sizeof(int))) { + SeriousWarning("sockopt sndbuf, errno=%d", errno); + exit(1); + } + if (setsockopt(sock, SOL_SOCKET, SO_RCVBUF, &sockbuf, sizeof(int))) { + SeriousWarning("sockopt rcvbuf, errno=%d", errno); + exit(1); + } + } +#endif + + if (bind_flag) { + int ret; + int len = sizeof(struct sockaddr_in); + char hostname[256]; + struct hostent *hp; + char *p; + + /* If we're being asked to do a 'bind', then that implies that + we should fill in the sockaddr as well */ + memset(acc, 0, sizeof(struct sockaddr_in)); + acc->sin_family = AF_INET; + acc->sin_addr.s_addr = INADDR_ANY; + acc->sin_port = 0; /* INADDR_ANY? */ + ret = bind(sock,(struct sockaddr *)acc,sizeof(struct sockaddr_in)); + if (ret < 0 ) { + Error("LSV: Can't bind socket. errno=%d\n", + errno); + } + if( getsockname(sock, (struct sockaddr *)acc, &len) ){ + Error("LSV: Can't getsockname. errno=%d\n", errno); + } + + /* Unfortunately, getsockname doesn't replace INADDR_ANY + with a valid saddr_in. We should be able to overrule + gethostname with an env var or a cmd-line arg */ + if( (p = getenv("LSV_MYNAME")) ){ + strncpy(hostname, p, sizeof(hostname)); + }else{ + if( gethostname(hostname, sizeof(hostname)) ) + Error("gethostname failed, errno=%d\n", errno); + } + hostname[sizeof(hostname)-1] = '\0'; + if( (hp = gethostbyname(hostname)) == NULL ) + Error("gethostbyname(%s) failed\n", hostname); + memcpy(&(acc->sin_addr), hp->h_addr, hp->h_length); + + /* Now acc holds 'correct' info about the socket */ + Msgf(("Bound port %d\n", ntohs(acc->sin_port))); + errno = 0; /* clear errors */ + } +} diff --git a/external/libsdf/libsw/malloc.c b/external/libsdf/libsw/malloc.c new file mode 100644 index 0000000..8db5f9f --- /dev/null +++ b/external/libsdf/libsw/malloc.c @@ -0,0 +1,1067 @@ +/* Combine malloc.c, prt_mem.c and _malloc.h into one file. */ +/* Then remove abort() in favor of Error(). - johns, 12/24/92*/ + +/* Throw out the whole thing if USE_SYSTEM_MALLOC is defined. */ +#ifdef USE_SYSTEM_MALLOC +#include +#include +#include +#include +#include "Msgs.h" +#ifndef __INSIGHT__ +int malloc_debug(int i){ return -1; } +int malloc_verify(void){return 0;} +void malloc_print(void){Msg_do("Can't print malloc structures for system malloc\n");} +#endif +size_t malloc_avail(void){return -1;} + +size_t +malloc_heapsz(void) +{ + char fname[128]; + char line[128]; + FILE *fp; + int ret; + size_t size = 0; + + sprintf(fname, "/proc/%d/status", getpid()); + if ((fp = fopen(fname, "r")) == NULL) return 0; + while (fgets(line, sizeof(line), fp)) { + ret = sscanf(line, "VmPeak: %ld kB", &size); + if (ret == 1) break; + } + fclose(fp); + return size*1024; +} + +size_t +malloc_used(void) +{ + char fname[128]; + char line[128]; + FILE *fp; + int ret; + size_t size = 0; + + sprintf(fname, "/proc/%d/status", getpid()); + if ((fp = fopen(fname, "r")) == NULL) return 0; + while (fgets(line, sizeof(line), fp)) { + ret = sscanf(line, "VmRSS: %ld kB", &size); + if (ret == 1) break; + } + fclose(fp); + return size*1024; +} + + + +/* Do not define malloc, calloc, realloc or free!!! */ + +#else /* USE_SYSTEM_MALLOC */ + +#include +#include +#include +#include +#include "malloc.h" +#include "protos.h" +#include "Msgs.h" +#include "error.h" + +#define ALIGN 8 + +/* Knuth's_c must be >= (sizeof(fheader_t) + sizeof(aheader_t)) */ +/* See p. 438 of Knuth, and the soln to exercise 12. */ +/* When equal to the above amount, */ +/* the control data in a free block created */ +/* at the end of a not-quite-filled block completely fills the block. */ +/* Since an aheader_t is smaller than an fheader_t, you can still */ +/* get it if you ask for <= (sizeof(fheader_t) - sizeof(aheader_t)) */ + +#define KNUTHS_C (sizeof(fheader_t) + sizeof(aheader_t)) +#define FREE '-' +#define INUSE '+' + +/* The CHUNKSIZE is the amount of memory allocated */ +/* by each call to sbrk. It should be reasonably large, */ +/* but not so large that it is likely to fail. */ +#ifdef __NCUBE1__ +#define CHUNKSIZE (16*1024) +#else +#define CHUNKSIZE (256*1024*1024) +#endif + +/* MAXCHUNKS is the maximum number of chunks */ +/* that may be obtained. The value is used */ +/* exclusively for the size of an array with */ +/* information describing each chunk. With a */ +/* little cleverness, we could chain the chunks */ +/* together, but it hardly seems worth it since */ +/* the only purpose of this information is to */ +/* allow debugging/printing of diagnostics. */ +/* Now that chunks get coallesced when sbrk has */ +/* not been called by the user, there is even less */ +/* reason for MAXCHUNKS to be large... */ +#define MAXCHUNKS 8 + +/* If any of these typedefs are changed, be sure that */ +/* sizeof(header_t) and sizeof(trailer_t) both are multiples */ +/* of ALIGN */ + +typedef int tag_t; + +/* An aheader is the header at the beginning of an */ +/* allocated block. It must be identical to an fheader, */ +/* except that allocated fields don't need linking info */ +/* WARNING: If this struct is changed, it must also be */ +/* changed in ../stdio/prt_mem.c!!!! */ +typedef struct aheader{ + tag_t tag; + size_t size; +} aheader_t ; + +typedef struct fheader{ + tag_t tag; /* The tag is a single bit */ + size_t size; /* it should be merged into size */ + struct fheader *link; + struct fheader *prev; +} fheader_t; + +typedef struct trailer{ + tag_t tag; + size_t size; +} trailer_t; + + +/* Each piece of memory is preceded by a header, and followed */ +/* by a trailer. If the piece is a free block, the tag fields */ +/* in the header and trailer are FREE, otherwise, they are INUSE */ +/* For free blocks, the size field in the trailer is valid, but it */ +/* is not necessarily valid for allocated blocks. */ +/* The point of all this is to make freeing an object take */ +/* constant time. One can examine the trailer that immediately */ +/* precedes the to-be-freed object, and determine whether the */ +/* block immediately below should be coalesced. Similarly for */ +/* the block immediately above. */ + +/* The avail fheader_t is special. Its link field */ +/* points to the front of the free list. Its prev field */ +/* points to the end of the free list, and its space field */ +/* contains the total available space in the free list, so */ +/* mall_avail runs quickly. Unfortunately, mall_avail can't*/ +/* give any info about fragmentation. It also doesn't */ +/* call getrlimit, although it might be useful to ask the */ +/* OS exactly how much farther we will be able to extend */ +/* memory. */ + +/* Export */ +/* In addition to malloc, calloc, realloc, free */ +char malloc_errstring[256]; +int malloc_verify(void); +void malloc_print(void); +int malloc_debug(int); + +/* The avail fheader_t is special. Its link field */ +/* points to the front of the free list. Its prev field */ +/* points to the end of the free list, and its space field */ +/* contains the total available space in the free list, so */ +/* mall_avail runs quickly. Unfortunately, mall_avail can't*/ +/* give any info about fragmentation. */ + +static fheader_t avail = { + FREE, /* tag */ + 0, /* size */ + &avail, /* link */ + &avail /* prev */ +}; + +/* The last entry in the chunk_tbl is special */ +struct _chunk_desc{ + size_t size; + void *begin; +}; + +static fheader_t *rover = &avail; +static size_t chunksize = CHUNKSIZE; +static int debug_lev = 1; +static int debug_verify; +static int debug_abort_nomem; +static int debug_test; +static fheader_t *top_of_mem, *bottom_of_mem; +static int n_free_blocks; +static struct _chunk_desc _chunk_tbl[MAXCHUNKS + 1]; +static int _nchunks; + +static int extend_mem(size_t n); +static int verify_blk(fheader_t *p); + +/* We use these system-dependent functions. */ +extern void *sbrk(int incr); +extern int brk(void *addr); +static void *getmaxbrk(void); + +void +*calloc(size_t nmemb, size_t size) +{ + void *p; + + /* What are we supposed to do here??? Should we round up the size */ + /* so all elements are aligned? I don't think so, but I'm not */ + /* confident. */ +/* + if(size > 1 && size % ALIGN) + size += ALIGN - size%ALIGN; +*/ + + p = malloc(size*nmemb); + if(p) + (void)memset(p, 0, size*nmemb); + return p; +} + +void +*malloc(size_t size) +/* Implement Knuth's Algorithm A, modified for doubly linked */ +/* lists, and using the A4' step to avoid small blocks. */ +/* It may be found on p. 598 in the soln. to problem 12. */ +{ + size_t N, k; + int looped; + fheader_t *p, *p1; + aheader_t *L; + trailer_t *trl; + size_t askfor; + int got; + + if(debug_verify && malloc_verify()){ + Error("bad malloc structures: %s\n", malloc_errstring); + } + + if(size == 0) + return (void *)0; + + if(size%ALIGN) + size += ALIGN - size%ALIGN; + + /* If we allocate a block that's too small, we will have */ + /* BIG problems when we try to free it!! */ + if(size + sizeof(aheader_t) < sizeof(fheader_t)) + N = sizeof(fheader_t) + sizeof(trailer_t); + else + N = size + sizeof(aheader_t) + sizeof(trailer_t); + p = rover; + looped = 0; + while(p->size < N || p == &avail){ + if(p == &avail){ + if(looped){ + askfor = (Nlink; + } + + if(debug_test && verify_blk(p)){ + Error("Bad block (%#lx) in malloc: %s\n", + (unsigned long)p, malloc_errstring); + return (void *)0; + } + rover = p->link; + k = p->size - N; +#if 0 + /* This code allocates the new block at the top of */ + /* the free block. This has very bad consequences */ + /* for fragmentation when sbrk is called many times */ + /* to get new chunks. Thus, the improved code below. */ + if(k < KNUTHS_C){ + avail.size -= p->size; + p->prev->link = rover; + rover->prev = p->prev; + L = (aheader_t *)p; + n_free_blocks--; + }else{ + /* Here, we split the large block */ + avail.size -= N; + p->size = k; + L = (aheader_t *)((char *)p + k); + trl = (trailer_t *)L - 1; + trl->size = k; + trl->tag = FREE; + L->size = N; + } + L->tag = INUSE; + trl = (trailer_t *)((char *)L + L->size) - 1; + trl->tag = INUSE; + trl->size = L->size; +#else + L = (aheader_t *)p; + if(k < KNUTHS_C){ + avail.size -= p->size; + p->prev->link = rover; + rover->prev = p->prev; + n_free_blocks--; + }else{ + /* Here, we split a large free block into an allocated block */ + /* and a smaller free block. */ + /* Knuth's method is considerably easier, since */ + /* it leaves the free block essentially alone. There */ + /* is no need to poke around in the free list and fix pointers */ + /* Undoubtedly, the correct thing is to make the linked lists */ + /* out of the trailers instead... Maybe in some other life */ + avail.size -= N; + L->size = N; + p1 = (fheader_t *)((char *)L + N); + p1->tag = FREE; + p1->size = k; + p1->link = p->link; + p1->prev = p->prev; + p->prev->link = p1; + p->link->prev = p1; + trl = (trailer_t *)((char *)p1 + k) - 1; + trl->size = k; + /* the tag is FREE already */ + } + L->tag = INUSE; + trl = (trailer_t *)((char *)L + L->size) - 1; + trl->tag = INUSE; + trl->size = L->size; +#endif + return (void *)(L+1); +} + +void +*realloc(void *ptr, size_t size) +{ + void *ret; + aheader_t *p0; + fheader_t *p, *p1, *p2; + trailer_t *trl, *trl0; + size_t N; + size_t ptrsz; + int k; + + if(debug_verify && malloc_verify()){ + Error("Bad structures in realloc: %s\n", malloc_errstring); + return (void *)0; + } + + if(size == 0){ + if(ptr) + free(ptr); + return (void *)0; + } + + if(ptr == (void *)0) + return malloc(size); + + if(size%ALIGN) + size += ALIGN - size%ALIGN; + + /* If we allocate a block that's too small, we will have */ + /* BIG problems when we try to free it!! */ + if(size + sizeof(aheader_t) < sizeof(fheader_t)) + N = sizeof(fheader_t) + sizeof(trailer_t); + else + N = size + sizeof(aheader_t) + sizeof(trailer_t); + + /* This code is very similar to that in free. */ + /* They should probably be combined in a common subroutine */ + + p0 = (aheader_t *) ((aheader_t *)ptr - 1); + /* How much data is actuall stored under ptr??? */ + /* I think this is correct even for 'small' N */ + ptrsz = p0->size - (sizeof(aheader_t) + sizeof(trailer_t)); + p = (fheader_t *)((char *)p0 + p0->size); + if(debug_test && verify_blk(p)){ + Error("Bad blk (%#lx) in realloc: %s\n", + (unsigned long)p, malloc_errstring); + return (void *)0; + } + if(p->tag == FREE){ + /* The next block is free. Combine them */ + p1 = p->link; + p2 = p->prev; + p1->prev = p2; + p2->link = p1; + p0->size += p->size; + avail.size -= p->size; + /* p just got removed from the free list. */ + /* Make sure it's not equal to rover */ + if(p == rover) + rover = &avail; + p = (fheader_t *)( (char *)p + p->size ); + trl = (trailer_t *)p - 1; + trl->tag = INUSE; + trl->size = p0->size; + n_free_blocks--; + } + + trl = (trailer_t *)p0 -1; +#ifdef UNFRAGMENT + /* Combine this block and the one before it whenever possible */ + if(trl->tag == FREE && (trl->size + p0->size) > N){ +#else + /* Only combine this block with the one before it if we don't have */ + /* enough space yet. */ + if(trl->tag == FREE && (p0->size < N) && (trl->size + p0->size) > N){ +#endif + /* In the event that the previous blk is free, and it doesn't meet */ + /* our needs, we will coalesce it when the current block */ + /* is freed, postponing any irreversible damage until after */ + /* new space has been successfully found */ + p = (fheader_t *)((char *)p0 - trl->size); + p1 = p->link; + p2 = p->prev; + p1->prev = p2; + p2->link = p1; + avail.size -= p->size; + p->size += p0->size; + /* p just got removed from the free list. */ + /* Make sure it's not equal to rover */ + if(p == rover) + rover = &avail; + /* Copy the data from p0 to p */ + if( (char *)((aheader_t*)p+1) < (char *)ptr - ptrsz ) + memcpy(((aheader_t *)p+1), ptr, ptrsz); + else + memmove(((aheader_t *)p+1), ptr, ptrsz); + p->tag = INUSE; + p0 = (aheader_t *)p; + trl0 = ((trailer_t *)((char *)p + p->size)) -1; + trl0->size = p->size; + n_free_blocks--; + } + + if(debug_test && verify_blk((fheader_t *)p0)){ + Error("Bad block (p0) (%#lx) in realloc: %s\n", + (unsigned long)p0, malloc_errstring); + return (void *)0; + } + k = p0->size - N; + if(k<0){ + ret = malloc(size); + if(ret){ + memcpy(ret, ptr, p0->size - sizeof(aheader_t)-sizeof(trailer_t)); + free(ptr); + }else if(debug_abort_nomem){ + Error("Out of mem in realloc\n"); + } + /* Fortunately, we didn't combine the preceding block */ + /* with the given block unless we were sure that would */ + /* result in enough space. Thus, we didn't molest the data */ + /* or the list structures in any way that would need to */ + /* be reversed. */ + return ret; + }else if(k >= KNUTHS_C){ + /* First, shrink this node. */ + p0->size = N; + p1 = (fheader_t *)((char *)p0 +N); + trl = (trailer_t *)p1 - 1; + trl->tag = INUSE; + trl->size = N; + /* then create a new free node at the top of this block */ + n_free_blocks++; + avail.size += k; + p1->size = k; + p1->tag = FREE; + p1->link = avail.link; + p1->prev = &avail; + avail.link->prev = p1; + avail.link = p1; + trl = (trailer_t *) ((char *)p1 + k) -1; + trl->tag = FREE; + trl->size = k; + }/* else there is nothing to do. The change is not significant */ + return (void *)((aheader_t *)p0 + 1); +} + +void +free(void *ptr) +/* Implement Knuth's Algorithm C, p. 442 */ +{ + fheader_t *p, *p0, *p1, *p2; + trailer_t *trl; + + if(debug_verify && malloc_verify()){ + Error("Bad structures in free: %s\n", malloc_errstring); + return; + } + + if(ptr == (void *)0) + return; + + p0 = (fheader_t *) ((aheader_t *)ptr - 1); + if(debug_test && verify_blk(p0)){ + Error("Bad block (%#lx) in free: %s\n", + (unsigned long)p0, malloc_errstring); + } + trl = (trailer_t *)p0 -1; + avail.size += p0->size; + if(trl->tag == FREE){ + /* The preceding block is free. Combine them */ + p = (fheader_t *)((char *)p0 - trl->size); + p1 = p->link; + p2 = p->prev; + p1->prev = p2; + p2->link = p1; + p->size += p0->size; + /* p just got removed from the free list. */ + /* Make sure it's not equal to rover */ + if(p == rover) + rover = &avail; + p0 = p; + n_free_blocks--; + } + + p = (fheader_t *)((char *)p0 + p0->size); + if(debug_test && verify_blk(p)){ + Error("Bad block (p) (%#lx) in free: %s\n", + (unsigned long)p, malloc_errstring); + } + if(p->tag == FREE){ + /* The anteceding block is free. Combine them */ + p1 = p->link; + p2 = p->prev; + p1->prev = p2; + p2->link = p1; + p0->size += p->size; + /* p just got removed from the free list. */ + /* Make sure it's not equal to rover */ + if(p == rover) + rover = &avail; + p = (fheader_t *)( (char *)p + p->size ); + n_free_blocks--; + } + + /* link the new block into the front of the avail list */ + trl = (trailer_t *)p - 1; + trl->size = p0->size; + trl->tag = FREE; + p0->link = avail.link; + p0->prev = &avail; + avail.link->prev = p0; + avail.link = p0; + p0->tag = FREE; + n_free_blocks++; +} + + +size_t +malloc_avail(void) +{ + /* The space unallocated in the blocks we have appropriated, */ + /* minus the space left that the OS will give us. */ + /* If the OS won't tell us, we have to return -1, which is */ + /* more likely to mean we don't know than that there is none! */ + void *memlim = getmaxbrk(); + void *curtop = sbrk(0); + + if((long)memlim == -1 || (long)curtop == -1) + return -1; + return avail.size + ((char *)memlim - (char *)curtop); +} + +size_t +malloc_heapsz(void) +{ + void *curtop = sbrk(0); + return((long)curtop - (long)_chunk_tbl[0].begin); +} + +size_t +malloc_used(void) +{ + return malloc_heapsz()-avail.size; +} + + +static int +extend_mem(size_t n) +/* Try to obtain a block of at least size n */ +/* from the OS using sbrk. Return the size */ +/* of the block actually obtained. */ +{ + trailer_t *trl0; + aheader_t *hdr0; + aheader_t *hdr1; + fheader_t *hdr; + trailer_t *trl; + char *old_top; + void *max_brk, *begin; + + Msgf(("extend_mem(%ld)\n", (long)n)); + /* Check structure sizes */ + if (sizeof(aheader_t) % ALIGN != 0) + Error("sizeof(aheader_t) (%d) not multiple of ALIGN (%d)\n", + (int)sizeof(aheader_t), ALIGN); + if (sizeof(aheader_t) != sizeof(trailer_t)) + Error("sizeof(aheader_t) (%d) != sizeof(trailer_t) (%d)\n", + (int)sizeof(aheader_t), (int)sizeof(trailer_t)); + /* First, try to get a chunk of memory */ + /* If the system won't give us one of the */ + /* size we ask for, then halve the size */ + /* until we get something */ + + /* Technically, sbrk returns a caddr_t, which */ + /* is typedef'ed to be a char *. It returns */ + /* an error condition, however, by returning the */ + /* integer -1, so the casts are necessary. */ + while (1) { + if(n < (sizeof(fheader_t) + sizeof(trailer_t))) { + Error("no more memory!\n"); + } + begin = sbrk(n + sizeof(aheader_t) +sizeof(trailer_t)); + if ((long int)begin != -1L) break; + Msgf(("sbrk(%ld) returns -1\n", + (long)(n+sizeof(aheader_t) + sizeof(trailer_t)))); + n = n/2; + } + old_top = (char *)top_of_mem; + if(bottom_of_mem == 0){ + /* The first time through here, remember the lowest possible */ + /* malloc'ed location */ + bottom_of_mem = (fheader_t *)(begin); + } + /* Assume sbrk is strictly increasing, so the end of memory */ + /* is represented by the end of the most recent sbrk call */ + top_of_mem = (fheader_t *)((char*)begin + n+sizeof(aheader_t)+sizeof(trailer_t)); + + if(begin == old_top){ + /* put a fake INUSE block at the top */ + Msgf(("Adding extended mem to previous chunk\n")); + hdr1 = ((aheader_t *)top_of_mem) - 1; + hdr1->size = sizeof(aheader_t); + hdr1->tag = INUSE; + /* Now change the hdr that used to be at the top of memory */ + hdr0 = ((aheader_t *)old_top) - 1; + hdr0->tag = INUSE; + hdr0->size = n + sizeof(aheader_t) + sizeof(trailer_t); + /* Now put a consistent trailer at the end of it */ + trl0 = ((trailer_t *)hdr1) - 1; + trl0->tag = INUSE; + trl0->size = hdr0->size; + /* And now free it to connect it with the free-list */ + /* This might change trl0->size */ + free((void *)(hdr0 + 1)); + /* Finally, record that the chunk has grown */ +#if 0 /* wrong? */ + _chunk_tbl[_nchunks-1].size += n + sizeof(trailer_t); +#else + _chunk_tbl[_nchunks-1].size += n+sizeof(aheader_t)+sizeof(trailer_t); +#endif + Msgf(("extend_mem returns %lu\n", (unsigned long)trl0->size)); + return trl0->size; + } + + /* Now we've got a big chunk of memory */ + /* Prepare it by placing INUSE markers at */ + /* both ends, and a header after the INUSE */ + /* marker at the beginning. */ + trl0 = (trailer_t *)begin; + trl0->tag = INUSE; + hdr0 = (aheader_t *)((char *)begin + n + sizeof(trailer_t)); + hdr0->tag = INUSE; + /* WARNING! These two lines will keep verify_blk happy */ + /* if it is asked to verify the markers at the beginning */ + /* and end of memory. They RELY ON THE FACT that: */ + /* sizeof(header_t) == sizeof(trailer_t) */ + hdr0->size = sizeof(aheader_t); + trl0->size = sizeof(trailer_t); + hdr = (fheader_t *)(trl0 + 1); /* the header of the new free block */ + trl = ((trailer_t *)hdr0) - 1; /* the trailer of the new free block */ + + /* Now link the header into the free chain */ + hdr->link = avail.link; + hdr->prev = &avail; + avail.link->prev = hdr; + avail.link = hdr; + hdr->tag = FREE; + hdr->size = n; + trl->tag = FREE; + trl->size = n; + avail.size += n; + n_free_blocks++; + + /* Now record info about this chunk in the table */ + if(_nchunks < MAXCHUNKS){ + _chunk_tbl[_nchunks].size = n; + _chunk_tbl[_nchunks].begin = (void *)hdr; + _nchunks++; + }else{ + /* There's no more room in the table! */ + /* Don't panic, though, just give up on */ + /* recording the sizes of individual chunks for */ + /* posterity. */ + _chunk_tbl[MAXCHUNKS].size += n; + } + + Msgf(("extend_mem returns %lu\n", (unsigned long)n)); + return n; +} + +/* These two are patterned after the diagnostic version of */ +/* malloc available on SUNs. Malloc_debug(lev) sets the debugging */ +/* level to its arg. Values are: */ +/* 0 : the default case */ +/* 1 : abort if any problem is detected in malloc's data structures */ +/* Level 1 does not actively seek out problems, but if it happens upon */ +/* one, it calls abort. This is the default level. It should */ +/* be only marginally slower than level 0. */ +/* 2 : Examine the entire data structure on every call of malloc/calloc/ */ +/* realloc/free. This may be VERY slow. */ +/* 3 : Same as 1, but also abort when malloc would return NULL. */ +/* 4 : Same as 2, but also abort when malloc would return NULL. */ +/* Neither of the last two abort if 0 bytes are requested */ +/* malloc_verify() performs a very thorough of the entire malloc */ +/* data structures. If all is well, it returns 0, otherwise it */ +/* returns 1 */ +/* Whenever an error is detected, the external (void *) malloc_bad_block */ +/* is set to point to the bad block. It points to the value that */ +/* would have been returned by malloc, i.e. sizeof(aheader_t) past */ +/* the address of the header. If the error is a global one that */ +/* cannot be blamed on a single block, e.g. sizes not adding up, */ +/* malloc_bad_block points to the static avail. */ +int +malloc_debug(int level) +{ + int ret; + + if(level > 4 || level < 0) + return -1; + + ret = debug_lev; + debug_lev = level; + switch(debug_lev){ + case 0: + debug_test = 0; + debug_verify = 0; + debug_abort_nomem = 0; + break; + case 1: + debug_test = 1; + debug_verify = 0; + debug_abort_nomem = 0; + break; + case 2: + debug_test = 1; + debug_verify = 1; + debug_abort_nomem = 0; + break; + case 3: + debug_test = 1; + debug_verify = 0; + debug_abort_nomem = 1; + break; + case 4: + debug_test = 1; + debug_verify = 1; + debug_abort_nomem = 1; + break; + } + return ret; +} + +int +malloc_verify(void) +{ + fheader_t *last; + fheader_t *hdr; + int i; + int n_free_blocks1 = 0; + int n_free_blocks2 = 0; + size_t sz_free = 0; + + /* This is very similar to the loop in prt_mem... */ + /* we loop over all chunks, and all blocks in the chunk, */ + /* calling verify_blk for each one. We also count the */ + /* number of free blocks, which we compare with a scan down the */ + /* linked list of free blocks starting at avail. */ + for(i=0; i<_nchunks; i++){ + hdr = (fheader_t *)_chunk_tbl[i].begin; + last = (fheader_t *)((char *)hdr + _chunk_tbl[i].size); + while(hdr < last){ + if(verify_blk(hdr)){ + return 1; + } + if(hdr->tag == FREE){ + n_free_blocks1++; + sz_free += hdr->size; + } + /* Avoid infinite loops by counting free blocks */ + if(n_free_blocks1 > n_free_blocks){ + sprintf(malloc_errstring, "nfree_blocks1(%d) > nfreeblocks(%d)\n", + n_free_blocks1, n_free_blocks); + return 1; + } + hdr = (fheader_t *)((char *)hdr + hdr->size); + } + if( hdr != last ){ + SeriousWarning("Possible too-long chunk: hdr=%#lx != last=%#lx\n", + (unsigned long)hdr, (unsigned long)last); + } + } + + /* Only make this test if the chunk_tbl contains all the relevant */ + /* information */ + if((_nchunks != MAXCHUNKS) && + (sz_free != avail.size || n_free_blocks != n_free_blocks1)){ + sprintf(malloc_errstring, "Sizes don't add up, sz_free=%ld, avail.size=%ld, n_free_blocks=%d, n_free_blocks1=%d!\n", + (long)sz_free, (long)avail.size, + n_free_blocks, n_free_blocks1); + SeriousWarning("%s", malloc_errstring); + /* return 1; */ + } + + /* Now scan the linked list of free blocks and make sure */ + /* it is the right length, and that its size adds up too */ + sz_free = 0; + for(hdr = avail.link; hdr != &avail; hdr = hdr->link){ + n_free_blocks2++; + sz_free += hdr->size; + /* avoid infinite loops this way */ + if(n_free_blocks2 > n_free_blocks1){ + sprintf(malloc_errstring, "Too many links in chain\n"); + return 1; + } + } + if(sz_free != avail.size || n_free_blocks2 != n_free_blocks){ + sprintf(malloc_errstring, "Sizes or counts don't add up!\n"); + return 1; + } + + return 0; +} + +static int +verify_blk(fheader_t *p) +{ + fheader_t *l; + trailer_t *tp; + /* rely on the two variables top_of_mem and bottom_of_mem */ + /* being set before entering here... */ + /* How slow is this????? Is it unreasonable to call it */ + /* every time through the linked list in malloc??? */ + if(p >= top_of_mem || p < bottom_of_mem){ + sprintf(malloc_errstring, "ptr (%#lx) outside of memory\n", + (unsigned long)p); + return 1; + } + + tp = ((trailer_t *)((char *)p + p->size)) - 1; + /* make sure the size isn't preposterous */ + if((fheader_t *)tp > top_of_mem){ + sprintf(malloc_errstring, "block goes past (%#lx) top of memory\n", + (unsigned long)tp); + return 1; + } + + /* Verify that the flag is ok */ + if((p->tag != FREE && p->tag != INUSE)){ + sprintf(malloc_errstring, + "Bad magic byte in hdr %#lx\n", (unsigned long)p); + return 1; + } + + /* Verify that the header and trailer match */ + if(p->tag != tp->tag){ + sprintf(malloc_errstring, + "Tags don't match %#lx, %#lx\n", (unsigned long)p, (unsigned long)tp); + return 1; + } + + /* Check that the sizes agree in header and trailer */ + if(p->size != tp->size){ + sprintf(malloc_errstring, "Sizes don't match %#lx, %#lx\n", + (unsigned long)p, (unsigned long)tp); + return 1; + } + + /* Check that the size is reasonable */ + /* Accepting sizes exactly equal to sizeof(trailer_t) */ + /* allows the "dummy" headers at both ends of each chunk */ + /* to pass. It hardly seems worth the effort of looking through */ + /* the chunk_tbl, to verify that we are actually looking at */ + /* such a block */ + if(p->size < sizeof(fheader_t) + sizeof(trailer_t) && + p->size != sizeof(trailer_t)){ + sprintf(malloc_errstring, "Size doesn't make sense p=%#lx\n", + (unsigned long)p); + return 1; + } + + /* Verify that the forward and backward pointers are reasonable */ + if(p->tag == FREE){ + l = p->link; + if(l != &avail && (l >= top_of_mem || l < bottom_of_mem)){ + sprintf(malloc_errstring, "Link ptr (%#lx) of %#lx out of range\n", + (unsigned long)l, (unsigned long)p); + return 1; + } + + l = p->prev; + if(l != &avail && (l >= top_of_mem || l < bottom_of_mem)){ + sprintf(malloc_errstring, "Prev ptr (%#lx) of %#lx out of range\n", + (unsigned long)l, (unsigned long)p); + return 1; + } + } + + /* Amazing, all is well. */ + return 0; +} + + +void +malloc_print(void) +/* Print the map of allocated memory. */ +/* Beware that printf might call malloc if */ +/* the buffer for the file, fp, is extensible */ +/* This will have dire consequences. */ +/* For the moment, a solution is to guarantee that */ +/* fprintf is linked with any paralib program via */ +/* the trick in stdio/data.c */ +{ + aheader_t *last; + aheader_t *hdr; + fheader_t *fhdr; + int ch, i; + int nfb; + + Msg_do("Malloc_print called. heapsz: %ld, avail: %ld, used: %ld\n", + (long)malloc_heapsz(), (long)malloc_avail(), (long)malloc_used()); + Msg_do( "Memory map:\n"); + for(i=0; i<_nchunks; i++){ + Msg_do( "Chunk %d of size %lu\n", + i, (unsigned long)_chunk_tbl[i].size); + Msg_do( "address size [allocated|free]\n"); + hdr = (aheader_t *)_chunk_tbl[i].begin; + last = (aheader_t *)((char *)hdr + _chunk_tbl[i].size); + while(hdr < last){ + if( verify_blk((fheader_t *)hdr) ){ + Msg_do( "Bad block (%#lx): %s\n", + (unsigned long)hdr, malloc_errstring); + break; + } + /* ch is 'f' for free, 'a' for allocated */ + /* We print out the address that would have been */ + /* returned by malloc, and the maximum possible */ + /* size of USER memory in the block. Due to */ + /* rounding, this may be more than he asked for, */ + /* but the size of any headers and/or trailers is */ + /* subtracted. Note the use of the %p conversion */ + /* specifier. */ + ch = (hdr->tag == INUSE)? 'a' : 'f'; + Msg_do( "%#lx %lu %c\n", (unsigned long)(hdr+1), + (unsigned long)(hdr->size - sizeof(aheader_t) - sizeof(trailer_t)), + ch); + hdr = (aheader_t *)((char *)hdr + hdr->size); + } + Msg_do( "\n"); + } + if(_nchunks == MAXCHUNKS){ + Msg_do( "There are more chunks with\n"); + Msg_do( "a total size of %lu. I don't\n", + (unsigned long)_chunk_tbl[MAXCHUNKS].size); + Msg_do( "have detailed info about them though.\n"); + } + Msg_do("Free list:\n"); + nfb = 0; + for(fhdr = avail.link; fhdr != &avail; fhdr = fhdr->link){ + nfb++; + ch = (fhdr->tag == INUSE)? 'a' : 'f'; + if( verify_blk(fhdr) ){ + Msg_do("Bad block (%#lx): %s\n", (unsigned long)fhdr, malloc_errstring); + } + Msg_do( "%#lx %lu %c\n", (unsigned long)((aheader_t *)fhdr+1), + (unsigned long)(fhdr->size - sizeof(aheader_t) - sizeof(trailer_t)), + ch); + /* avoid infinite loops this way */ + if(nfb > n_free_blocks){ + Msg_do("Too many free blocks. Possible loop\n"); + break; + } + } + +} + +/* This uses the same discredited idea that we abandoned */ +/* in the tree11/sysdep.c: define a GETMAXBRK_DEFINED symbol */ +/* as soon as we find a system-predicate we like. When we hit the */ +/* end, complain if we haven't yet defined GETMAXBRK_DEFINED. */ +#undef GETMAXBRK_DEFINED + +#if defined(__hpux) +#define GETMAXBRK_DEFINED +#include +static void *getmaxbrk(void){ + (void *)ulimit(UL_GETMAXBRK); +} +#endif + +#if defined(sun)||defined(__PARAGON__)||defined(linux) +/* is getrlimit a sunos'ism, a bsd'ism or what??? */ +#define GETMAXBRK_DEFINED +#include +#include + +extern char etext; + +static void * +getmaxbrk(void) +{ + struct rlimit rl; + + if(getrlimit(RLIMIT_DATA, &rl)) + return (void *)-1; + if(rl.rlim_max == RLIM_INFINITY){ + return (void *)-1; + }else{ + return &etext + 2750LL*1024LL*1024LL; + } +} + +#endif /* sun||PARAGON */ + +#ifdef __DELTA__ +#define GETMAXBRK_DEFINED +#define BEGIN 0x10000000 +static void * +getmaxbrk(void){ + return (void *)(BEGIN + 12*1024*1024); /* 12Meg */ +} +#endif + +#ifndef GETMAXBRK_DEFINED +static void * +getmaxbrk(void) +{ + return (void *)-1; +} +#endif /* GETMAXBRK_DEFINED */ + +#endif /* USE_SYSTEM_MALLOC */ + diff --git a/external/libsdf/libsw/memfile.c b/external/libsdf/libsw/memfile.c new file mode 100644 index 0000000..556c167 --- /dev/null +++ b/external/libsdf/libsw/memfile.c @@ -0,0 +1,63 @@ +#include +#include +#include "Malloc.h" +#include "Msgs.h" +#include "error.h" +#include "protos.h" +#include "mpmy.h" + +/* Here we try to implement a cicular memory buffer which we can use as */ +/* the vfprintf-like arg to Msg_init */ + +static char *memfile; +static int memfile_offset; +static int memfile_bufsz; + +void +memfile_init(int sz) +{ + memfile = Malloc(sz); + memfile_offset = 0; + memfile_bufsz = sz-1; + memfile[sz-1] = 0; /* final null to make wrapped output cleaner */ +} + +void +memfile_delete(void) +{ + Free(memfile); + memfile_offset = 0; + memfile_bufsz = 0; +} + +#define BUFSZ 1024 + +void +memfile_vfprintf(void *junk, const char *fmt, va_list args) +{ + char tbuf[BUFSZ]; /* This might overflow, but msgs should be small */ + int i, len; + + vsprintf(tbuf, fmt, args); + len = strlen(tbuf); + if (len >= BUFSZ) Error("Buffer overflowed in mem_vfprintf\n"); + for (i = 0; i <= len; i++) { + memfile[(memfile_offset+i)%memfile_bufsz] = tbuf[i]; + } + memfile_offset += len; +} + +void +PrintMemfile(void) +{ + if (memfile_offset == 0) return; + printf("----- Messages from procnum %d -----\n", MPMY_Procnum()); + if (memfile_offset < memfile_bufsz) { + printf("%s\n", memfile); + } else { + printf("Buffer has wrapped\n"); + printf("%s\n%s\n", memfile+(memfile_offset+1)%memfile_bufsz, + memfile); + } + fflush(stdout); +} diff --git a/external/libsdf/libsw/memmove.c b/external/libsdf/libsw/memmove.c new file mode 100644 index 0000000..517805c --- /dev/null +++ b/external/libsdf/libsw/memmove.c @@ -0,0 +1,41 @@ +#include + +void +*memmove(void *s1, const void *s2, size_t n) +{ +#if 0 /* Assume the caller knew this when he decided to use memmove */ + if ((char *)s1 >= (char *)s2 + n || (char *)s1 + n <= (char *)s2){ + return memcpy(s1, s2, n); + } +#endif + if(((unsigned long)s1)%sizeof(int)==0 && + ((unsigned long)s2)%sizeof(int)==0 && + n%sizeof(int)==0){ + /* Everything is alligned. We can use int assignment. */ + int *ip1 = s1; + const int *ip2 = s2; + n /= sizeof(int); + if( ip1 < ip2 ){ + while(n--) + *ip1++ = *ip2++; + }else{ + ip2 += n; + ip1 += n; + while(n--) + *--ip1 = *--ip2; + } + }else{ + char *cp1 = s1; + const char *cp2 = s2; + if(cp1 < cp2) { + while(n--) + *cp1++ = *cp2++; + }else{ + cp2 += n; + cp1 += n; + while(n--) + *--cp1 = *--cp2; + } + } + return s1; +} diff --git a/external/libsdf/libsw/mpi_bcast.c b/external/libsdf/libsw/mpi_bcast.c new file mode 100644 index 0000000..796e958 --- /dev/null +++ b/external/libsdf/libsw/mpi_bcast.c @@ -0,0 +1,56 @@ +#include "swampi.h" +#include "Msgs.h" +#include "gc.h" /* for ilog2 */ +#include "error.h" + +#define TAG 4 + +int +MPI_Bcast(void *buf, int count, MPI_Datatype type, int srcproc, MPI_Comm comm) +{ + int chan; + int doc; + int sendproc; + int ret; + MPI_Request rreq, sreq; + MPI_Status status; + int procnum = _MPI_Procnum; + int nproc = _MPI_Nproc; + int nbytes = _MPI_Datasize[type] * count; + + Msgf(("mpi: Bcast\n")); + if (srcproc != 0) { /* Is this stupid? */ + if (_MPI_Procnum == 0) { + MPI_Irecv(buf, nbytes, MPI_BYTE, srcproc, TAG, MPI_COMM_PRIVATE, + &rreq); + MPI_Wait(&rreq, &status); + MPI_Get_count(&status, MPI_BYTE, &ret); + if (ret != nbytes) Error("Bcast got wrong len\n"); + } else if (_MPI_Procnum == srcproc) { + MPI_Isend(buf, nbytes, MPI_BYTE, 0, TAG, MPI_COMM_PRIVATE, &sreq); + MPI_Wait(&sreq, 0); + } + } + + doc = ilog2(nproc); + if (nproc != 1 << doc) + doc++; /* for non power-of-two sizes */ + + for (chan = 0; chan < doc; chan++) { + sendproc = procnum ^ (1 << chan); + if (sendproc >= 0 && sendproc < nproc) { + if (procnum & (1 << chan)) { + MPI_Irecv(buf, nbytes, MPI_BYTE, sendproc, TAG, + MPI_COMM_PRIVATE, &rreq); + MPI_Wait(&rreq, &status); + MPI_Get_count(&status, MPI_BYTE, &ret); + if (ret != nbytes) Error("Bcast got wrong len\n"); + } else { + MPI_Isend(buf, nbytes, MPI_BYTE, sendproc, TAG, + MPI_COMM_PRIVATE, &sreq); + MPI_Wait(&sreq, 0); + } + } + } + return MPI_SUCCESS; +} diff --git a/external/libsdf/libsw/mpi_reduce.c b/external/libsdf/libsw/mpi_reduce.c new file mode 100644 index 0000000..050231d --- /dev/null +++ b/external/libsdf/libsw/mpi_reduce.c @@ -0,0 +1,378 @@ +#include +#include "swampi.h" +#include "stk.h" +#include "gc.h" +#include "Msgs.h" +#include "error.h" + +static int setup; +static int mask; + +static Stk cstk, outdata; + +#define TAG 5 + +struct comb_st { + void *recvbuf; + int count; + MPI_Datatype datatype; + union{ + MPI_Op op; + MPI_user_comb_func user_func; + } u; +}; + +/* This currently only works with one request outstanding at a time */ + +static int +MPI_IreduceInit(MPI_Request *reqp) +{ + StkInitEz(&cstk); + StkInitEz(&outdata); + setup = 1; + mask = ~0; + *reqp = 0; + return MPI_SUCCESS; +} + +static int +MPI_Ireduce(void *sendbuf, void *recvbuf, int count, + MPI_Datatype datatype, MPI_Op op, + MPI_Request req, MPI_Comm comm) +{ + struct comb_st *combuf; + int total_size; + + if (setup == 0) + Error("MPI_Ireduce with no call to MPI_IreduceInit\n"); + total_size = count * _MPI_Datasize[datatype]; + + StkPushData(&outdata, (void *)sendbuf, total_size); + combuf = StkPush(&cstk, sizeof(struct comb_st)); + + combuf->recvbuf = recvbuf; + combuf->count = count; + combuf->datatype = datatype; + combuf->u.op = op; + + return MPI_SUCCESS; +} + +#if 0 +static int +MPI_IreduceFunc(void *sendbuf, void *recvbuf, int size, + void (*func)(void *, void *, void*), + MPI_Request req) +{ + struct comb_st *combuf; + int total_size; + + if (setup == 0) + Error("MPI_Ireduce_func with no call to MPI_IreduceInit\n"); + total_size = size; + + StkPushData(&outdata, (void *)sendbuf, total_size); + combuf = StkPush(&cstk, sizeof(struct comb_st)); + + combuf->recvbuf = recvbuf; + combuf->count = size; + combuf->datatype = MPI_USER_DATA; + combuf->u.user_func = func; + + return MPI_SUCCESS; +} + +static int +MPI_Setmask(unsigned int maskval) +{ + mask = maskval; + return MPI_SUCCESS; +} +#endif + +#define ALIGN(n) ((n + _STK_DEFAULT_ALIGNMENT)&~_STK_DEFAULT_ALIGNMENT) + +/* gcc with optimization will cause this to fail efence */ +#define Do_Op(outbuf, op, inbuf, type, cnt) \ +do{char *oend = (char *)outbuf + ALIGN(cnt*sizeof(type)); \ + char *iend = (char *)inbuf + ALIGN(cnt*sizeof(type)); \ + while (cnt--) { \ + *(type *)(outbuf) op *(type *)(inbuf); \ + (outbuf) = (char *)(outbuf) + sizeof(type); \ + (inbuf) = (char *)(inbuf) + sizeof(type); \ + } \ + outbuf = oend; \ + inbuf = iend; \ +}while(0) + + +static void +do_combine(void *inbuf, void *outbuf, int n, struct comb_st *manifest) +{ + int i; + int count; + + for (i = 0; i < n; i++, manifest++) { + count = manifest->count; + if (count < 0) Error("Bad count in reduce\n"); + + Msgf(("mpi: reduce Op %s Datatype %s\n", mpi_op_name[manifest->u.op], + mpi_datatype_name[manifest->datatype])); + switch(manifest->datatype) { + case MPI_FLOAT: +#define Type float +#include "mpi_template.c" +#undef Type + break; + case MPI_DOUBLE: +#define Type double +#include "mpi_template.c" +#undef Type + break; + case MPI_LONG_DOUBLE: +#define Type double +#include "mpi_template.c" +#undef Type + break; + +#define BIT_OPS /* Turns on bitwise ops in mpi_template.c */ + case MPI_BYTE: + case MPI_CHAR: +#define Type char +#include "mpi_template.c" +#undef Type + break; + case MPI_SHORT: +#define Type short +#include "mpi_template.c" +#undef Type + break; + case MPI_INT: +#define Type int +#include "mpi_template.c" +#undef Type + break; + case MPI_LONG: +#define Type long +#include "mpi_template.c" +#undef Type + break; + case MPI_LONG_LONG: +#define Type long +#include "mpi_template.c" +#undef Type + break; + case MPI_UNSIGNED: + case MPI_UNSIGNED_INT: +#define Type unsigned int +#include "mpi_template.c" +#undef Type + break; + case MPI_UNSIGNED_CHAR: +#define Type unsigned char +#include "mpi_template.c" +#undef Type + break; + case MPI_UNSIGNED_SHORT: +#define Type unsigned short +#include "mpi_template.c" +#undef Type + break; + case MPI_UNSIGNED_LONG: +#define Type unsigned long +#include "mpi_template.c" +#undef Type + break; + case MPI_UNSIGNED_LONG_LONG: +#define Type unsigned long +#include "mpi_template.c" +#undef Type + break; +#undef BIT_OPS + +#define LOC_OPS + case MPI_FLOAT_INT: +#define Type MPI_float_int +#include "mpi_template.c" +#undef Type + break; + case MPI_DOUBLE_INT: +#define Type MPI_double_int +#include "mpi_template.c" +#undef Type + break; + case MPI_LONG_INT: +#define Type MPI_long_int +#include "mpi_template.c" +#undef Type + break; + case MPI_2INT: +#define Type MPI_2int +#include "mpi_template.c" +#undef Type + break; + case MPI_SHORT_INT: +#define Type MPI_short_int +#include "mpi_template.c" +#undef Type + break; + case MPI_LONG_DOUBLE_INT: +#define Type MPI_long_double_int +#include "mpi_template.c" +#undef Type + break; +#undef LOC_OPS + case MPI_COMPLEX: +#define Type MPI_complex + { + Type *out = outbuf; + Type *in = inbuf; + outbuf = (char *)outbuf + ALIGN(count*sizeof(Type)); + inbuf = (char *)inbuf + ALIGN(count*sizeof(Type)); + switch(manifest->u.op) { + case MPI_SUM: + while (count--) { + out->real += in->real; + out->imag += in->imag; + out++; + in++; + } + break; + case MPI_PROD: + while (count--) { + out->real = out->real*in->real - out->imag*in->imag; + out->imag = out->real*in->imag + out->imag*in->real; + out++; + in++; + } + break; + default: + Error("Unknown op in MPI_reduce\n"); + } + } +#undef Type + break; + case MPI_DOUBLE_COMPLEX: +#define Type MPI_double_complex + { + Type *out = outbuf; + Type *in = inbuf; + outbuf = (char *)outbuf + ALIGN(count*sizeof(Type)); + inbuf = (char *)inbuf + ALIGN(count*sizeof(Type)); + switch(manifest->u.op) { + case MPI_SUM: + while (count--) { + out->real += in->real; + out->imag += in->imag; + out++; + in++; + } + break; + case MPI_PROD: + while (count--) { + out->real = out->real*in->real - out->imag*in->imag; + out->imag = out->real*in->imag + out->imag*in->real; + out++; + in++; + } + break; + default: + Error("Unknown op in MPI_reduce\n"); + } + } +#undef Type + break; + case MPI_USER_DATA: + /* This came from a MPI_ICombine_func */ + (*manifest->u.user_func)(inbuf, outbuf, outbuf); + (outbuf) = (char *)(outbuf) + ALIGN(manifest->count); + (inbuf) = (char *)(inbuf) + ALIGN(manifest->count); + break; + default: + Error("Unknown type in Combine\n"); + } + } +} + +int +MPI_IreduceWait(MPI_Request req, MPI_Comm comm) +{ + int chan; + int doc; + int i; + void *inbuf; + void *outbuf = StkBase(&outdata); + MPI_Status stat; + int ret; + int total_size; + int sendproc; + int procnum = _MPI_Procnum; + int nproc = _MPI_Nproc; + int bufsz = StkSz(&outdata); + struct comb_st *manifest = StkBase(&cstk); + int n = StkSz(&cstk)/sizeof(struct comb_st); + + doc = ilog2(nproc); + if (nproc != 1 << doc) + doc++; /* for non power-of-two sizes */ + if ((inbuf = malloc(bufsz)) == NULL) + Error("out of memory\n"); + + for (chan = 0; chan < doc; chan++) { + if (mask & (1 << chan)) { + sendproc = procnum^(1<= 0 && sendproc < nproc) { + MPI_Sendrecv(outbuf, bufsz, MPI_BYTE, sendproc, TAG, + inbuf, bufsz, MPI_BYTE, sendproc, TAG, + comm, &stat); + MPI_Get_count(&stat, MPI_BYTE, &ret); + if (ret != bufsz) + Error("Shift failed, expected %d got %d\n", bufsz, ret); + do_combine(inbuf, outbuf, n, manifest); + } + } + } + if (nproc != 1 << doc) { + MPI_Bcast(outbuf, bufsz, MPI_BYTE, 0, comm); + } + for (i = 0; i < n; i++) { + total_size = manifest->count * _MPI_Datasize[manifest->datatype]; + memcpy(manifest->recvbuf, outbuf, total_size); + outbuf = (char *)outbuf + ALIGN(total_size); + manifest = (struct comb_st *)((char *)manifest + ALIGN(sizeof(*manifest))); + } + free(inbuf); + StkTerminate(&outdata); + StkTerminate(&cstk); + setup = 0; + return MPI_SUCCESS; +} + +int +MPI_Allreduce(void *sendbuf, void *recvbuf, int count, + MPI_Datatype datatype, MPI_Op op, MPI_Comm comm) +{ + MPI_Request req; + + Msgf(("mpi: Allreduce\n")); + if (op < 0 || op >= _MPI_NUMOPS) Error("Bad MPI_Op\n"); + MPI_IreduceInit(&req); + MPI_Ireduce(sendbuf, recvbuf, count, datatype, op, req, MPI_COMM_PRIVATE); + MPI_IreduceWait(req, MPI_COMM_PRIVATE); + return MPI_SUCCESS; +} + +int +MPI_Reduce(void *sendbuf, void *recvbuf, int count, MPI_Datatype datatype, + MPI_Op op, int root, MPI_Comm comm) +{ + Msgf(("mpi: Reduce\n")); + if (op < 0 || op >= _MPI_NUMOPS) Error("Bad MPI_Op\n"); + if (_MPI_Procnum != root) + if ((recvbuf = malloc(count * _MPI_Datasize[datatype])) == NULL) + Error("out of memory\n"); + MPI_Allreduce(sendbuf, recvbuf, count, datatype, op, comm); + if (_MPI_Procnum != root) free(recvbuf); + return MPI_SUCCESS; +} + diff --git a/external/libsdf/libsw/mpi_template.c b/external/libsdf/libsw/mpi_template.c new file mode 100644 index 0000000..b2b7e57 --- /dev/null +++ b/external/libsdf/libsw/mpi_template.c @@ -0,0 +1,66 @@ + + /* This is a template that is included multiple times in MPI_reduce.c */ + + switch(manifest->u.op) { +#ifndef LOC_OPS + case MPI_SUM: + Do_Op(outbuf, +=, inbuf, Type, count); + break; + case MPI_PROD: + Do_Op(outbuf, *=, inbuf, Type, count); + break; + case MPI_MAX: + Do_Op(outbuf, + = (*(Type *)outbuf > *(Type *)inbuf) ? *(Type *)outbuf :, + inbuf, Type, count); + break; + case MPI_MIN: + Do_Op(outbuf, + = (*(Type *)outbuf < *(Type *)inbuf) ? *(Type *)outbuf :, + inbuf, Type, count); + break; +#ifdef BIT_OPS + case MPI_BAND: + Do_Op(outbuf, &=, inbuf, Type, count); + break; + case MPI_BOR: + Do_Op(outbuf, |=, inbuf, Type, count); + break; + case MPI_BXOR: + Do_Op(outbuf, ^=, inbuf, Type, count); + break; + case MPI_LAND: + Do_Op(outbuf, = *(Type *)outbuf && , inbuf, Type, count); + break; + case MPI_LOR: + Do_Op(outbuf, = *(Type *)outbuf || , inbuf, Type, count); + break; + case MPI_LXOR: /* cripes */ + Do_Op(outbuf, = (!*(Type *)outbuf == !*(Type *)inbuf) ? + 0 : 1 ||, inbuf, Type, count); + break; +#endif /*BIT_OPS */ +#else /* LOC_OPS */ + case MPI_MAXLOC: + Do_Op(outbuf, = (((Type *)outbuf)->x > ((Type *)inbuf)->x) ? + *(Type *)outbuf :, inbuf, Type, count); + break; + case MPI_MINLOC: + Do_Op(outbuf, = (((Type *)outbuf)->x < ((Type *)inbuf)->x) ? + *(Type *)outbuf :, inbuf, Type, count); + break; +#endif /* LOC_OPS */ + default: + Error("Unknown op in MPI_reduce\n"); + } + + + + + + + + + + + diff --git a/external/libsdf/libsw/mpmy_combine.c b/external/libsdf/libsw/mpmy_combine.c new file mode 100644 index 0000000..67f61d2 --- /dev/null +++ b/external/libsdf/libsw/mpmy_combine.c @@ -0,0 +1,252 @@ +#include +#include "Malloc.h" +#include "mpmy.h" +#include "stk.h" +#include "gc.h" +#include "Msgs.h" + +static int setup; +static int mask; + +static Stk cstk, outdata; + +struct comb_st { + void *recvbuf; + int count; + int datatype; + union{ + MPMY_Op op; + MPMY_user_comb_func user_func; + } u; +}; + +/* This currently only works with one request outstanding at a time */ + +int +MPMY_ICombine_Init(MPMY_Comm_request *reqp) +{ + StkInitEz(&cstk); + StkInitEz(&outdata); + setup = 1; + mask = ~0; + *reqp = 0; + return MPMY_SUCCESS; +} + +int +MPMY_ICombine(const void *sendbuf, void *recvbuf, int count, + MPMY_Datatype datatype, MPMY_Op op, + MPMY_Comm_request req) +{ + struct comb_st *combuf; + int total_size; + + if (setup == 0) + Error("MPMY_ICombine with no call to Init\n"); + total_size = count * MPMY_Datasize[datatype]; + + StkPushData(&outdata, (void *)sendbuf, total_size); + combuf = StkPush(&cstk, sizeof(struct comb_st)); + + combuf->recvbuf = recvbuf; + combuf->count = count; + combuf->datatype = datatype; + combuf->u.op = op; + + return MPMY_SUCCESS; +} + +int +MPMY_ICombine_func(const void *sendbuf, void *recvbuf, int size, + void (*func)(const void *, const void *, void*), + MPMY_Comm_request req) +{ + struct comb_st *combuf; + int total_size; + + Msgf(("MPMY_Icombine_func(): %p %p %d %p\n", + sendbuf, recvbuf, size, func)); + if (setup == 0) + Error("MPMY_ICombine with no call to Init\n"); + total_size = size; + + StkPushData(&outdata, (void *)sendbuf, total_size); + combuf = StkPush(&cstk, sizeof(struct comb_st)); + + combuf->recvbuf = recvbuf; + combuf->count = size; + combuf->datatype = MPMY_USER_DATA; + combuf->u.user_func = func; + + return MPMY_SUCCESS; +} + +void +MPMY_Combine_Mask(unsigned int maskval) +{ + mask = maskval; +} + + +/* gcc with optimization will cause this to fail efence */ +#define Do_Op(outbuf, op, inbuf, type, cnt) \ +do{char *oend = (char *)outbuf + StkAlign(&outdata, cnt*sizeof(type)); \ + char *iend = (char *)inbuf + StkAlign(&outdata, cnt*sizeof(type)); \ + while (cnt--) { \ + *(type *)(outbuf) op *(type *)(inbuf); \ + (outbuf) = (char *)(outbuf) + sizeof(type); \ + (inbuf) = (char *)(inbuf) + sizeof(type); \ + } \ + outbuf = oend; \ + inbuf = iend; \ +}while(0) + + +static void +do_combine(const void *inbuf, void *outbuf, const int n, + const struct comb_st *manifest) +{ + int i; + int count; + + for (i = 0; i < n; i++, manifest++) { + count = manifest->count; + if (count < 0) Error("Bad count in Combine\n"); + Msgf(("do_combine: %p %p %d %d\n", + inbuf, outbuf, manifest->datatype, manifest->u.op)); + switch(manifest->datatype) { + case MPMY_FLOAT: +#define Type float +#include "op_template.c" +#undef Type + break; + case MPMY_DOUBLE: +#define Type double +#include "op_template.c" +#undef Type + break; + case MPMY_INT: +#define BIT_OPS /* Turns on bitwise ops in op_template.c */ +#define Type int +#include "op_template.c" +#undef Type + break; + case MPMY_CHAR: +#define Type char +#include "op_template.c" +#undef Type + break; + case MPMY_SHORT: +#define Type short +#include "op_template.c" +#undef Type + break; + case MPMY_LONG: +#define Type long +#include "op_template.c" +#undef Type + case MPMY_OFFT: +#define Type off_t +#include "op_template.c" +#undef Type + break; + case MPMY_INT64: +#define Type int64_t +#include "op_template.c" +#undef Type + break; + case MPMY_UNSIGNED_INT: +#define Type unsigned int +#include "op_template.c" +#undef Type + break; + case MPMY_UNSIGNED_CHAR: +#define Type unsigned char +#include "op_template.c" +#undef Type + break; + case MPMY_UNSIGNED_SHORT: +#define Type unsigned short +#include "op_template.c" +#undef Type + break; + case MPMY_UNSIGNED_LONG: +#define Type unsigned long +#include "op_template.c" +#undef Type + break; +#undef BIT_OPS + case MPMY_USER_DATA: + /* This came from a MPMY_ICombine_func */ + (*manifest->u.user_func)(inbuf, outbuf, outbuf); + (outbuf) = (char *)(outbuf) + StkAlign(&outdata, manifest->count); + (inbuf) = (char *)(inbuf) + StkAlign(&outdata, manifest->count); + break; + default: + Error("Unknown type in Combine\n"); + } + } +} + +int +MPMY_ICombine_Wait(MPMY_Comm_request req) +{ + int chan; + int doc; + int i; + void *inbuf; + void *outbuf = StkBase(&outdata); + MPMY_Status stat; + int ret; + int total_size; + int sendproc; + int procnum = MPMY_Procnum(); + int nproc = MPMY_Nproc(); + int bufsz = StkSz(&outdata); + const struct comb_st *manifest = StkBase(&cstk); + int n = StkSz(&cstk)/sizeof(struct comb_st); + + Msgf(("MPMY_ICombine_Wait()\n")); + doc = ilog2(nproc); + if (nproc != 1 << doc) + doc++; /* for non power-of-two sizes */ + inbuf = Malloc(bufsz); + + for (chan = 0; chan < doc; chan++) { + if (mask & (1 << chan)) { + sendproc = procnum^(1<= 0 && sendproc < nproc) { + MPMY_Shift(sendproc, inbuf, bufsz, outbuf, bufsz, &stat); + ret = MPMY_Count(&stat); + if (ret != bufsz) + Error("Shift failed, expected %d got %d\n", bufsz, ret); + do_combine(inbuf, outbuf, n, manifest); + } + } + } + if (nproc != 1 << doc) { + MPMY_Bcast(outbuf, bufsz, MPMY_CHAR, 0); + } + for (i = 0; i < n; i++) { + total_size = manifest->count * MPMY_Datasize[manifest->datatype]; + memcpy(manifest->recvbuf, outbuf, total_size); + outbuf = (char *)outbuf + StkAlign(&outdata, total_size); + manifest = (const struct comb_st *)((char *)manifest + StkAlign(&outdata, sizeof(*manifest))); + } + Free(inbuf); + StkTerminate(&outdata); + StkTerminate(&cstk); + setup = 0; + return MPMY_SUCCESS; +} + +int +MPMY_Combine(const void *sendbuf, void *recvbuf, const int count, + const MPMY_Datatype datatype, const MPMY_Op op) +{ + MPMY_Comm_request req; + + MPMY_ICombine_Init(&req); + MPMY_ICombine(sendbuf, recvbuf, count, datatype, op, req); + return MPMY_ICombine_Wait(req); +} diff --git a/external/libsdf/libsw/mpmy_gather.c b/external/libsdf/libsw/mpmy_gather.c new file mode 100644 index 0000000..f1ac585 --- /dev/null +++ b/external/libsdf/libsw/mpmy_gather.c @@ -0,0 +1,556 @@ +#include +#include +#include "mpmy.h" +#include "Msgs.h" +#include "Malloc.h" +#include "gc.h" /* for ilog2 */ +#include "verify.h" + +/* Could we implement gather using MPMY_Combine with MPMY_Op == MPMY_GATHER? */ + + +#define BCAST_DEFAULT_TAG 0x47 + +#define GATHER_BCAST_TAG 0x1145 +#define GATHER_TAG 0x2145 +#define NGATHER_TAG 0x3145 +#define ALLTOALL_TAG 0x4145 +#define ALLTOALL_RTAG 0x4146 +#define ALLTOALL_NTAG 0x4147 + +unsigned int MPMY_Datasize[] = +{ sizeof(float), sizeof(double), sizeof(int), sizeof(char), sizeof(short), + sizeof(long), sizeof(unsigned int), sizeof(unsigned char), + sizeof(unsigned short), sizeof(unsigned long), sizeof(off_t), sizeof(int64_t), 1/*user_data*/ +}; + +void +MPMY_send(const void *buf, int cnt, int dest, int tag) +{ + MPMY_Comm_request req; + + MPMY_Isend(buf, cnt, dest, tag, &req); + MPMY_Wait(req, 0); + if( Msg_test(__FILE__)){ + int i; + int sum = 0; + const char *cbuf = buf; + for(i=0; i= 0 && sendproc < nproc) { + nin = nout = (1 << chan); + if (procnum & (1 << chan)) { + if (nproc - procnum < nout) nout = nproc - procnum; + outbufsz = nout * nbytes; + outptr = (char *)recvbuf + (procnum & mask) * nbytes; + Msgf(("Gather: to %d, outidx %d, outsz %d\n", + sendproc, (procnum & mask), nout)); + MPMY_send(outptr, outbufsz, sendproc, GATHER_TAG+chan); + break; + } else { + if (nproc - sendproc < nin) nin = nproc - sendproc; + inbufsz = nin * nbytes; + inptr = (char *)recvbuf + (sendproc & mask) * nbytes; + Msgf(("Gather: from %d, inidx %d, insz %d\n", + sendproc, (sendproc & mask), nin)); + MPMY_recvn(inptr, inbufsz, sendproc, GATHER_TAG+chan); + } + } + mask <<= 1; + } + if (procnum != recvproc) + Free(recvbuf); + return MPMY_SUCCESS; +} + +/* "count" can vary for each processor in NGather */ + +int +MPMY_NGather(const void *sendbuf, int count, MPMY_Datatype type, + void **recvhndl, int recvproc) +{ + int chan; + int doc; + int sendproc; + int bufsz; + int inbytes; + unsigned int mask; + void *buf; + int procnum = MPMY_Procnum(); + int nproc = MPMY_Nproc(); + int nbytes = MPMY_Datasize[type] * count; + + if (recvproc != 0) Error("NGather to procnum != 0 not supported yet.\n"); + + doc = ilog2(nproc); + if (nproc != 1 << doc) + doc++; /* for non power-of-two sizes */ + + buf = Malloc(nbytes); + memcpy(buf, sendbuf, nbytes); + bufsz = nbytes; + + mask = ~0; + for (chan = 0; chan < doc; chan++) { + sendproc = procnum ^ (1 << chan); + if (sendproc >= 0 && sendproc < nproc) { + if (procnum & (1 << chan)) { + Msgf(("NGather: to %d, bufsz %d\n", sendproc, bufsz)); + MPMY_send(&bufsz, sizeof(int), sendproc, NGATHER_TAG+chan); + MPMY_send(buf, bufsz, sendproc, NGATHER_TAG+chan); + break; + } else { + MPMY_recvn(&inbytes, sizeof(int), sendproc, NGATHER_TAG+chan); + Msgf(("NGather: from %d, inbytes %d\n", sendproc, inbytes)); + buf = Realloc(buf, bufsz+inbytes); + MPMY_recvn(bufsz+(char*)buf, inbytes, sendproc, NGATHER_TAG+chan); + bufsz += inbytes; + } + } + mask <<= 1; + } + if (procnum != recvproc) { + Free(buf); + return 0; + } else { + Msgf(("NGather: Final bufsz %d\n", bufsz)); + *recvhndl = buf; + return bufsz/MPMY_Datasize[type]; + } +} + +int MPMY_Bcast(void *buf, int count, MPMY_Datatype type, int srcproc) +{ + return MPMY_BcastTag(buf, count, type, srcproc, BCAST_DEFAULT_TAG); +} + +int +MPMY_BcastTag(void *buf, int count, MPMY_Datatype type, int srcproc, int tag) +{ + int chan; + int doc; + int sendproc; + int procnum = MPMY_Procnum(); + int nproc = MPMY_Nproc(); + int nbytes = MPMY_Datasize[type] * count; + + if (srcproc != 0) Error("Bcast from procnum != 0 not supported yet.\n"); + + Msgf(("MPMYBcast(buf=%p, count=%d, type=%d, srcproc=%d\n", + buf, count, type, srcproc)); Msg_flush(); + doc = ilog2(nproc); + if (nproc != 1 << doc) + doc++; /* for non power-of-two sizes */ + + for (chan = 0; chan < doc; chan++) { + sendproc = procnum ^ (1 << chan); + if (sendproc >= 0 && sendproc < nproc) { + if (procnum & (1 << chan)) { + Msgf(("Bcast: recv from %d\n", sendproc)); + MPMY_recvn(buf, nbytes, sendproc, tag+chan); + } else { + Msgf(("Bcast: send to %d\n", sendproc)); + MPMY_send(buf, nbytes, sendproc, tag+chan); + } + } + } + return MPMY_SUCCESS; +} + + +int +MPMY_Alltoall(void *sendbuf, int sendcount, MPMY_Datatype sendtype, + void *recvbuf, int recvcount, MPMY_Datatype recvtype) +{ + MPMY_Status stat; + int i; + int ret; + int doc_procs, doc_nodes, relative, dest, proc; + int sendbytes = MPMY_Datasize[sendtype] * sendcount; + int recvbytes = MPMY_Datasize[recvtype] * recvcount; + int my_node = MPMY_Procnum() / PROCS_PER_NODE; + int my_leader = my_node * PROCS_PER_NODE; + int local_rank = MPMY_Procnum() % PROCS_PER_NODE; + int dest_leader; + char *tmpbuf_s = NULL; + char *tmpbuf_r = NULL; + MPMY_Comm_request req; + + doc_nodes = ilog2(MPMY_Nproc()/PROCS_PER_NODE-1) + 1; + doc_procs = ilog2(PROCS_PER_NODE-1) + 1; + Msgf(("Alltoall my_leader %d, local_rank %d, doc_nodes %d, doc_procs %d\n", + my_leader, local_rank, doc_nodes, doc_procs)); + /* Within a node */ + for (relative = 0; relative < 1 << doc_procs; relative++) { + dest = my_leader + (relative ^ local_rank); + if ((dest >= MPMY_Nproc()) || (dest >= my_leader + PROCS_PER_NODE)) + continue; + Msgf(("local Alltoall MPMY_Shift(dest=%d, instart=%p, incnt=%d, outstart=%p, outcnt=%d)\n", + dest, recvbuf + dest * recvbytes, recvbytes, + sendbuf + dest * sendbytes, sendbytes)); + MPMY_Shift(dest, recvbuf + dest * recvbytes, recvbytes, + sendbuf + dest * sendbytes, sendbytes, &stat); + ret = MPMY_Count(&stat); + if (ret != recvbytes) + Error("Shift failed, expected %d got %d\n", recvbytes, ret); + } + if (MPMY_Procnum() == my_leader) { + tmpbuf_s = Malloc(sendbytes*PROCS_PER_NODE); + tmpbuf_r = Malloc(recvbytes*PROCS_PER_NODE); + } + /* Across nodes */ + for (relative = 1; relative < 1 << doc_nodes; relative++) { + for (proc = 0; proc < PROCS_PER_NODE; proc++) { + dest_leader = (relative ^ my_node) * PROCS_PER_NODE; + dest = dest_leader + proc; + if (dest >= MPMY_Nproc()) + continue; + if (MPMY_Procnum() != my_leader) { + MPMY_Isend(sendbuf + dest * sendbytes, sendbytes, my_leader, ALLTOALL_RTAG, &req); + MPMY_Wait(req, 0); + if (MPMY_Procnum() == my_leader + proc) { + Msgf(("Alltoall relay recv from %d (source=%d, incnt=%d)\n", + my_leader, dest_leader, PROCS_PER_NODE * recvbytes)); + MPMY_Irecv(recvbuf + dest_leader * recvbytes, PROCS_PER_NODE * recvbytes, + my_leader, ALLTOALL_TAG, &req); + MPMY_Wait(req, &stat); + ret = MPMY_Count(&stat); + if (ret != PROCS_PER_NODE * recvbytes) { + Error("Irecv failed, expected %d got %d\n", PROCS_PER_NODE * recvbytes, ret); + } + } + } else { + memcpy(tmpbuf_s, sendbuf + dest * sendbytes, sendbytes); + for (i = 1; i < PROCS_PER_NODE; i++) { + if (my_leader + i >= MPMY_Nproc()) break; + MPMY_Irecv(tmpbuf_s + i * recvbytes, recvbytes, my_leader+i, ALLTOALL_RTAG, &req); + MPMY_Wait(req, &stat); + ret = MPMY_Count(&stat); + if (ret != recvbytes) { + Error("Irecv failed, expected %d got %d\n", recvbytes, ret); + } + } + Msgf(("Alltoall relay for %d MPMY_Shift(dest=%d, incnt=%d, outcnt=%d)\n", + dest, dest_leader, PROCS_PER_NODE * recvbytes, PROCS_PER_NODE * sendbytes)); + MPMY_Shift(dest_leader, tmpbuf_r, PROCS_PER_NODE * recvbytes, + tmpbuf_s, PROCS_PER_NODE * sendbytes, &stat); + ret = MPMY_Count(&stat); + if (ret != PROCS_PER_NODE * recvbytes) + Error("Shift failed, expected %d got %d\n", PROCS_PER_NODE * recvbytes, ret); + if (proc == 0) { + Msgf(("Alltoall local copy (source=%d, incnt=%d)\n", + dest_leader, PROCS_PER_NODE * recvbytes)); + memcpy(recvbuf + dest_leader * recvbytes, tmpbuf_r, PROCS_PER_NODE * recvbytes); + } else { + Msgf(("Alltoall relay send to %d (source=%d, incnt=%d)\n", + my_leader+proc, dest_leader, PROCS_PER_NODE * recvbytes)); + MPMY_Isend(tmpbuf_r, PROCS_PER_NODE * recvbytes, my_leader+proc, ALLTOALL_TAG, &req); + MPMY_Wait(req, 0); + } + } + } + } + if (MPMY_Procnum() == my_leader) { + Free(tmpbuf_r); + Free(tmpbuf_s); + } + return MPMY_SUCCESS; +} + +int +MPMY_Alltoallv(void *sendbuf, int *scount, int *soff, MPMY_Datatype sendtype, + void *recvbuf, int *rcount, int *roff, MPMY_Datatype recvtype) +{ + MPMY_Status stat; + int i; + int ret; + int doc_procs, doc_nodes, relative, dest, proc; + int ssize = MPMY_Datasize[sendtype]; + int rsize = MPMY_Datasize[recvtype]; + int my_node = MPMY_Procnum() / PROCS_PER_NODE; + int my_leader = my_node * PROCS_PER_NODE; + int local_rank = MPMY_Procnum() % PROCS_PER_NODE; + int dest_leader; + char *tmpbuf_s = NULL; + char *tmpbuf_r = NULL; + int *stcnt = NULL; + int *stoff = NULL; + MPMY_Comm_request req; + int sendbytes, recvbytes; + + doc_nodes = ilog2(MPMY_Nproc()/PROCS_PER_NODE-1) + 1; + doc_procs = ilog2(PROCS_PER_NODE-1) + 1; + if (MPMY_Procnum() == my_leader) { + stcnt = Calloc(PROCS_PER_NODE, sizeof(int)); + stoff = Calloc(PROCS_PER_NODE, sizeof(int)); + } + Msgf(("Alltoallv my_leader %d, local_rank %d, doc_nodes %d, doc_procs %d\n", + my_leader, local_rank, doc_nodes, doc_procs)); + /* Within a node */ + for (relative = 0; relative < 1 << doc_procs; relative++) { + dest = my_leader + (relative ^ local_rank); + if ((dest >= MPMY_Nproc()) || (dest >= my_leader + PROCS_PER_NODE)) + continue; + if ((rcount[dest] == 0) && (scount[dest] == 0)) + continue; + Msgf(("local Alltoallv MPMY_Shift(dest=%d, instart=%p, incnt=%d, outstart=%p, outcnt=%d)\n", + dest, recvbuf + roff[dest] * rsize, rcount[dest] * rsize, + sendbuf + soff[dest] * ssize, scount[dest] * ssize)); + MPMY_Shift(dest, recvbuf + roff[dest] * rsize, rcount[dest] * rsize, + sendbuf + soff[dest] * ssize, scount[dest] * ssize, &stat); + ret = MPMY_Count(&stat); + if (ret != rcount[dest] * rsize) + Error("Shift failed, expected %d got %d\n", rcount[dest] * rsize, ret); + } + /* Across nodes */ + for (relative = 1; relative < 1 << doc_nodes; relative++) { + for (proc = 0; proc < PROCS_PER_NODE; proc++) { + dest_leader = (relative ^ my_node) * PROCS_PER_NODE; + dest = dest_leader + proc; + if (dest >= MPMY_Nproc()) + continue; + if (MPMY_Procnum() != my_leader) { + MPMY_Isend(&scount[dest], sizeof(int), my_leader, ALLTOALL_NTAG, &req); + MPMY_Wait(req, 0); + if (MPMY_Procnum() == my_leader + proc) { + for (recvbytes = 0, i = dest_leader; i < dest_leader + PROCS_PER_NODE; i++) { + if (i >= MPMY_Nproc()) break; + recvbytes += rcount[i] * rsize; + } + MPMY_Isend(&recvbytes, sizeof(int), my_leader, ALLTOALL_NTAG, &req); + MPMY_Wait(req, 0); + } + if (scount[dest]) { + MPMY_Isend(sendbuf + soff[dest] * ssize, scount[dest] * ssize, my_leader, + ALLTOALL_RTAG, &req); + MPMY_Wait(req, 0); + } + if ((MPMY_Procnum() == my_leader + proc) && recvbytes) { + Msgf(("Alltoallv relay recv from %d (source=%d, incnt=%d)\n", + my_leader, dest_leader, recvbytes)); + MPMY_Irecv(recvbuf + roff[dest_leader] * rsize, recvbytes, + my_leader, ALLTOALL_TAG, &req); + MPMY_Wait(req, &stat); + ret = MPMY_Count(&stat); + if (ret != recvbytes) { + Error("Irecv failed, expected %d got %d\n", recvbytes, ret); + } + } + } else { + stcnt[0] = scount[dest]; + stoff[0] = 0; + sendbytes = stcnt[0] * ssize; + for (i = 1; i < PROCS_PER_NODE; i++) { + if (my_leader + i >= MPMY_Nproc()) break; + MPMY_Irecv(stcnt+i, sizeof(int), my_leader+i, ALLTOALL_NTAG, &req); + MPMY_Wait(req, &stat); + ret = MPMY_Count(&stat); + if (ret != sizeof(int)) { + Error("Irecv failed, expected %ld got %d\n", sizeof(int), ret); + } + stoff[i] = stoff[i-1] + stcnt[i-1]; + sendbytes += stcnt[i] * ssize; + } + if (proc == 0) { + for (recvbytes = 0, i = dest_leader; i < dest_leader + PROCS_PER_NODE; i++) { + if (i >= MPMY_Nproc()) break; + recvbytes += rcount[i] * rsize; + } + } else if (my_leader+proc < MPMY_Nproc()) { + MPMY_Irecv(&recvbytes, sizeof(int), my_leader+proc, ALLTOALL_NTAG, &req); + MPMY_Wait(req, &stat); + ret = MPMY_Count(&stat); + if (ret != sizeof(int)) { + Error("Irecv failed, expected %ld got %d\n", sizeof(int), ret); + } + } + if (sendbytes || recvbytes) { + Msgf(("allocating %d bytes for send\n", sendbytes)); + tmpbuf_s = Malloc(sendbytes); + memcpy(tmpbuf_s, sendbuf + soff[dest] * ssize, scount[dest] * ssize); + for (i = 1; i < PROCS_PER_NODE; i++) { + if (my_leader + i >= MPMY_Nproc()) break; + if (stcnt[i]) { + MPMY_Irecv(tmpbuf_s + stoff[i] * ssize, stcnt[i] * ssize, + my_leader+i, ALLTOALL_RTAG, &req); + MPMY_Wait(req, &stat); + ret = MPMY_Count(&stat); + if (ret != stcnt[i] * ssize) { + Error("Irecv failed, expected %d got %d\n", stcnt[i] * ssize, ret); + } + } + } + Msgf(("allocating %d bytes for recv\n", recvbytes)); + Msgf(("Alltoallv relay for %d MPMY_Shift(dest=%d, incnt=%d, outcnt=%d)\n", + dest, dest_leader, recvbytes, sendbytes)); + tmpbuf_r = Malloc(recvbytes); + if (sendbytes || recvbytes) { + MPMY_Shift(dest_leader, tmpbuf_r, recvbytes, tmpbuf_s, sendbytes, &stat); + ret = MPMY_Count(&stat); + if (ret != recvbytes) + Error("Shift failed, expected %d got %d\n", recvbytes, ret); + } + Free(tmpbuf_s); + if (recvbytes) { + if (proc == 0) { + Msgf(("Alltoallv local copy (source=%d, incnt=%d)\n", + dest_leader, recvbytes)); + memcpy(recvbuf + roff[dest_leader] * rsize, tmpbuf_r, recvbytes); + } else if (my_leader+proc < MPMY_Nproc()) { + Msgf(("Alltoallv relay send to %d (source=%d, incnt=%d)\n", + my_leader+proc, dest_leader, recvbytes)); + MPMY_Isend(tmpbuf_r, recvbytes, my_leader+proc, ALLTOALL_TAG, &req); + MPMY_Wait(req, 0); + } + } + Free(tmpbuf_r); + } + } + } + } + if (MPMY_Procnum() == my_leader) { + Free(stcnt); + Free(stoff); + } + return MPMY_SUCCESS; +} + +int +MPMY_Alltoallv_simple(void *sendbuf, int *scount, int *soff, MPMY_Datatype sendtype, + void *recvbuf, int *rcount, int *roff, MPMY_Datatype recvtype) +{ + MPMY_Status stat; + int ret; + int doc, relative, i; + int ssize = MPMY_Datasize[sendtype]; + int rsize = MPMY_Datasize[recvtype]; + + doc = ilog2(MPMY_Nproc()-1) + 1; + i = MPMY_Procnum(); + if (scount[i]) { + memcpy(recvbuf + roff[i] * rsize, sendbuf + soff[i] * ssize, scount[i] * ssize); + } + for (relative = 1; relative < 1<= MPMY_Nproc()) + continue; + if ((scount[i] == 0) && (rcount[i] == 0)) + continue; + Msgf(("Alltoallv MPMY_Shift(dest=%d, instart=%p, incnt=%d, outstart=%p, outcnt=%d)\n", + i, recvbuf + roff[i] * rsize, rcount[i] * rsize, + sendbuf + soff[i] * ssize, scount[i] * ssize)); + MPMY_Shift(i, recvbuf + roff[i] * rsize, rcount[i] * rsize, + sendbuf + soff[i] * ssize, scount[i] * ssize, &stat); + ret = MPMY_Count(&stat); + if (ret != rcount[i] * rsize) + Error("Shift failed, expected %d got %d\n", rcount[i] * rsize, ret); + } + return MPMY_SUCCESS; +} diff --git a/external/libsdf/libsw/msgdirinit.c b/external/libsdf/libsw/msgdirinit.c new file mode 100644 index 0000000..bd92246 --- /dev/null +++ b/external/libsdf/libsw/msgdirinit.c @@ -0,0 +1,83 @@ +#define NO_MSGS +#include +#include +#include +#include +#include +#include +#include +#include +#include "Malloc.h" +#include "mpmy_abnormal.h" +#include "protos.h" +#include "error.h" +#include "Msgs.h" +#include "mpmy.h" +#include "files.h" +#include "mpmy_io.h" + +#ifdef __DELTA__ +void ivfprintf(FILE *stream, const char *fmt, va_list args); +void ifflush(FILE *stream); +#endif + +void MsgdirInit(const char *name) +{ + char *junk; + void *dbgfp; + char *lastslash; + char *dirname; + +#ifdef __PARAGON__ +#define PARAGON_MAX_OPEN 128 + if (MPMY_Nproc() > PARAGON_MAX_OPEN) { + SinglWarning("Can't open more than %d files on paragon because of NORMA-IPC\n", + PARAGON_MAX_OPEN); + return; + } +#endif + +#ifdef sun + if (MPMY_Nproc() > 32) { + SinglWarning("Can't open %d files on cm5 because of fd limit\n", + MPMY_Nproc()); + return; + } +#endif + + junk = Malloc(strlen(name)+1); /* enough? */ + lastslash = strrchr(name, '/'); + if( lastslash == NULL || lastslash == &name[0] ){ + /* Either it's in "." or in "/" */ + /* In either case we don't need to do a mkdir */ + dirname = NULL; + }else{ + strncpy(junk, name, lastslash-name); + junk[lastslash-name] = '\0'; + dirname = junk; + } + + /* We only try to create the lowest level of the name. No + recursion here... Nosiree. */ + if( dirname && !fexists(dirname) ){ + if( MPMY_Mkdir(dirname, 0777) ){ + SinglWarning("Mkdir(%s) failed\n", dirname); + goto outahere; + } + } + + /* Now the directory is sure to exist. */ + dbgfp = fopen(name, "w"); + if( dbgfp == NULL ){ + Error("Could not fopen %s, errno=%d\n", name, errno); + } + +#ifdef __DELTA__ + Msg_addfile(dbgfp, (Msgvfprintf_t)ivfprintf, (Msgfflush_t)ifflush); +#else + Msg_addfile(dbgfp, (Msgvfprintf_t)vfprintf, (Msgfflush_t)fflush); +#endif + outahere: + Free(junk); +} + diff --git a/external/libsdf/libsw/obstack.c b/external/libsdf/libsw/obstack.c new file mode 100644 index 0000000..75440d9 --- /dev/null +++ b/external/libsdf/libsw/obstack.c @@ -0,0 +1,441 @@ +/* obstack.c - subroutines used implicitly by object stack macros + Copyright (C) 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1996, 1997, 1998, + 1999, 2000, 2001, 2002, 2003, 2004, 2005 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, write to the Free + Software Foundation, Inc., 51 Franklin Street, Fifth Floor, + Boston, MA 02110-1301, USA. */ + + +#ifdef HAVE_CONFIG_H +# include +#endif + +#ifdef _LIBC +# include +# include +#else +# include "obstack.h" +#endif + +/* NOTE BEFORE MODIFYING THIS FILE: This version number must be + incremented whenever callers compiled using an old obstack.h can no + longer properly call the functions in this obstack.c. */ +#define OBSTACK_INTERFACE_VERSION 1 + +/* Comment out all this code if we are using the GNU C Library, and are not + actually compiling the library itself, and the installed library + supports the same library interface we do. This code is part of the GNU + C Library, but also included in many other GNU distributions. Compiling + and linking in this code is a waste when using the GNU C library + (especially if it is a shared library). Rather than having every GNU + program understand `configure --with-gnu-libc' and omit the object + files, it is simpler to just do this in the source for each such file. */ + +#include /* Random thing to get __GNU_LIBRARY__. */ +#if !defined _LIBC && defined __GNU_LIBRARY__ && __GNU_LIBRARY__ > 1 +# include +# if _GNU_OBSTACK_INTERFACE_VERSION == OBSTACK_INTERFACE_VERSION +# define ELIDE_CODE +# endif +#endif + +#include + +#ifndef ELIDE_CODE + + +# if HAVE_INTTYPES_H +# include +# endif +# if HAVE_STDINT_H || defined _LIBC +# include +# endif + +/* Determine default alignment. */ +union fooround +{ + uintmax_t i; + long double d; + void *p; +}; +struct fooalign +{ + char c; + union fooround u; +}; +/* If malloc were really smart, it would round addresses to DEFAULT_ALIGNMENT. + But in fact it might be less smart and round addresses to as much as + DEFAULT_ROUNDING. So we prepare for it to do that. */ +enum + { + DEFAULT_ALIGNMENT = offsetof (struct fooalign, u), + DEFAULT_ROUNDING = sizeof (union fooround) + }; + +/* When we copy a long block of data, this is the unit to do it with. + On some machines, copying successive ints does not work; + in such a case, redefine COPYING_UNIT to `long' (if that works) + or `char' as a last resort. */ +# ifndef COPYING_UNIT +# define COPYING_UNIT int +# endif + + +/* The functions allocating more room by calling `obstack_chunk_alloc' + jump to the handler pointed to by `obstack_alloc_failed_handler'. + This can be set to a user defined function which should either + abort gracefully or use longjump - but shouldn't return. This + variable by default points to the internal function + `print_and_abort'. */ +static void print_and_abort (void); +void (*obstack_alloc_failed_handler) (void) = print_and_abort; + +/* Exit value used when `print_and_abort' is used. */ +# include +# ifdef _LIBC +int obstack_exit_failure = EXIT_FAILURE; +# else +# include "exitfail.h" +# define obstack_exit_failure exit_failure +# endif + +# ifdef _LIBC +# if SHLIB_COMPAT (libc, GLIBC_2_0, GLIBC_2_3_4) +/* A looong time ago (before 1994, anyway; we're not sure) this global variable + was used by non-GNU-C macros to avoid multiple evaluation. The GNU C + library still exports it because somebody might use it. */ +struct obstack *_obstack_compat; +compat_symbol (libc, _obstack_compat, _obstack, GLIBC_2_0); +# endif +# endif + +/* Define a macro that either calls functions with the traditional malloc/free + calling interface, or calls functions with the mmalloc/mfree interface + (that adds an extra first argument), based on the state of use_extra_arg. + For free, do not use ?:, since some compilers, like the MIPS compilers, + do not allow (expr) ? void : void. */ + +# define CALL_CHUNKFUN(h, size) \ + (((h) -> use_extra_arg) \ + ? (*(h)->chunkfun) ((h)->extra_arg, (size)) \ + : (*(struct _obstack_chunk *(*) (long)) (h)->chunkfun) ((size))) + +# define CALL_FREEFUN(h, old_chunk) \ + do { \ + if ((h) -> use_extra_arg) \ + (*(h)->freefun) ((h)->extra_arg, (old_chunk)); \ + else \ + (*(void (*) (void *)) (h)->freefun) ((old_chunk)); \ + } while (0) + + +/* Initialize an obstack H for use. Specify chunk size SIZE (0 means default). + Objects start on multiples of ALIGNMENT (0 means use default). + CHUNKFUN is the function to use to allocate chunks, + and FREEFUN the function to free them. + + Return nonzero if successful, calls obstack_alloc_failed_handler if + allocation fails. */ + +int +_obstack_begin (struct obstack *h, + int size, int alignment, + void *(*chunkfun) (long), + void (*freefun) (void *)) +{ + register struct _obstack_chunk *chunk; /* points to new chunk */ + + if (alignment == 0) + alignment = DEFAULT_ALIGNMENT; + if (size == 0) + /* Default size is what GNU malloc can fit in a 4096-byte block. */ + { + /* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc. + Use the values for range checking, because if range checking is off, + the extra bytes won't be missed terribly, but if range checking is on + and we used a larger request, a whole extra 4096 bytes would be + allocated. + + These number are irrelevant to the new GNU malloc. I suspect it is + less sensitive to the size of the request. */ + int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1)) + + 4 + DEFAULT_ROUNDING - 1) + & ~(DEFAULT_ROUNDING - 1)); + size = 4096 - extra; + } + + h->chunkfun = (struct _obstack_chunk * (*)(void *, long)) chunkfun; + h->freefun = (void (*) (void *, struct _obstack_chunk *)) freefun; + h->chunk_size = size; + h->alignment_mask = alignment - 1; + h->use_extra_arg = 0; + + chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size); + if (!chunk) + (*obstack_alloc_failed_handler) (); + h->next_free = h->object_base = __PTR_ALIGN ((char *) chunk, chunk->contents, + alignment - 1); + h->chunk_limit = chunk->limit + = (char *) chunk + h->chunk_size; + chunk->prev = 0; + /* The initial chunk now contains no empty object. */ + h->maybe_empty_object = 0; + h->alloc_failed = 0; + return 1; +} + +int +_obstack_begin_1 (struct obstack *h, int size, int alignment, + void *(*chunkfun) (void *, long), + void (*freefun) (void *, void *), + void *arg) +{ + register struct _obstack_chunk *chunk; /* points to new chunk */ + + if (alignment == 0) + alignment = DEFAULT_ALIGNMENT; + if (size == 0) + /* Default size is what GNU malloc can fit in a 4096-byte block. */ + { + /* 12 is sizeof (mhead) and 4 is EXTRA from GNU malloc. + Use the values for range checking, because if range checking is off, + the extra bytes won't be missed terribly, but if range checking is on + and we used a larger request, a whole extra 4096 bytes would be + allocated. + + These number are irrelevant to the new GNU malloc. I suspect it is + less sensitive to the size of the request. */ + int extra = ((((12 + DEFAULT_ROUNDING - 1) & ~(DEFAULT_ROUNDING - 1)) + + 4 + DEFAULT_ROUNDING - 1) + & ~(DEFAULT_ROUNDING - 1)); + size = 4096 - extra; + } + + h->chunkfun = (struct _obstack_chunk * (*)(void *,long)) chunkfun; + h->freefun = (void (*) (void *, struct _obstack_chunk *)) freefun; + h->chunk_size = size; + h->alignment_mask = alignment - 1; + h->extra_arg = arg; + h->use_extra_arg = 1; + + chunk = h->chunk = CALL_CHUNKFUN (h, h -> chunk_size); + if (!chunk) + (*obstack_alloc_failed_handler) (); + h->next_free = h->object_base = __PTR_ALIGN ((char *) chunk, chunk->contents, + alignment - 1); + h->chunk_limit = chunk->limit + = (char *) chunk + h->chunk_size; + chunk->prev = 0; + /* The initial chunk now contains no empty object. */ + h->maybe_empty_object = 0; + h->alloc_failed = 0; + return 1; +} + +/* Allocate a new current chunk for the obstack *H + on the assumption that LENGTH bytes need to be added + to the current object, or a new object of length LENGTH allocated. + Copies any partial object from the end of the old chunk + to the beginning of the new one. */ + +void +_obstack_newchunk (struct obstack *h, int length) +{ + register struct _obstack_chunk *old_chunk = h->chunk; + register struct _obstack_chunk *new_chunk; + register long new_size; + register long obj_size = h->next_free - h->object_base; + register long i; + long already; + char *object_base; + + /* Compute size for new chunk. */ + new_size = (obj_size + length) + (obj_size >> 3) + h->alignment_mask + 100; + if (new_size < h->chunk_size) + new_size = h->chunk_size; + + /* Allocate and initialize the new chunk. */ + new_chunk = CALL_CHUNKFUN (h, new_size); + if (!new_chunk) + (*obstack_alloc_failed_handler) (); + h->chunk = new_chunk; + new_chunk->prev = old_chunk; + new_chunk->limit = h->chunk_limit = (char *) new_chunk + new_size; + + /* Compute an aligned object_base in the new chunk */ + object_base = + __PTR_ALIGN ((char *) new_chunk, new_chunk->contents, h->alignment_mask); + + /* Move the existing object to the new chunk. + Word at a time is fast and is safe if the object + is sufficiently aligned. */ + if (h->alignment_mask + 1 >= DEFAULT_ALIGNMENT) + { + for (i = obj_size / sizeof (COPYING_UNIT) - 1; + i >= 0; i--) + ((COPYING_UNIT *)object_base)[i] + = ((COPYING_UNIT *)h->object_base)[i]; + /* We used to copy the odd few remaining bytes as one extra COPYING_UNIT, + but that can cross a page boundary on a machine + which does not do strict alignment for COPYING_UNITS. */ + already = obj_size / sizeof (COPYING_UNIT) * sizeof (COPYING_UNIT); + } + else + already = 0; + /* Copy remaining bytes one by one. */ + for (i = already; i < obj_size; i++) + object_base[i] = h->object_base[i]; + + /* If the object just copied was the only data in OLD_CHUNK, + free that chunk and remove it from the chain. + But not if that chunk might contain an empty object. */ + if (! h->maybe_empty_object + && (h->object_base + == __PTR_ALIGN ((char *) old_chunk, old_chunk->contents, + h->alignment_mask))) + { + new_chunk->prev = old_chunk->prev; + CALL_FREEFUN (h, old_chunk); + } + + h->object_base = object_base; + h->next_free = h->object_base + obj_size; + /* The new chunk certainly contains no empty object yet. */ + h->maybe_empty_object = 0; +} +# ifdef _LIBC +libc_hidden_def (_obstack_newchunk) +# endif + +/* Return nonzero if object OBJ has been allocated from obstack H. + This is here for debugging. + If you use it in a program, you are probably losing. */ + +/* Suppress -Wmissing-prototypes warning. We don't want to declare this in + obstack.h because it is just for debugging. */ +int _obstack_allocated_p (struct obstack *h, void *obj); + +int +_obstack_allocated_p (struct obstack *h, void *obj) +{ + register struct _obstack_chunk *lp; /* below addr of any objects in this chunk */ + register struct _obstack_chunk *plp; /* point to previous chunk if any */ + + lp = (h)->chunk; + /* We use >= rather than > since the object cannot be exactly at + the beginning of the chunk but might be an empty object exactly + at the end of an adjacent chunk. */ + while (lp != 0 && ((void *) lp >= obj || (void *) (lp)->limit < obj)) + { + plp = lp->prev; + lp = plp; + } + return lp != 0; +} + +/* Free objects in obstack H, including OBJ and everything allocate + more recently than OBJ. If OBJ is zero, free everything in H. */ + +# undef obstack_free + +void +obstack_free (struct obstack *h, void *obj) +{ + register struct _obstack_chunk *lp; /* below addr of any objects in this chunk */ + register struct _obstack_chunk *plp; /* point to previous chunk if any */ + + lp = h->chunk; + /* We use >= because there cannot be an object at the beginning of a chunk. + But there can be an empty object at that address + at the end of another chunk. */ + while (lp != 0 && ((void *) lp >= obj || (void *) (lp)->limit < obj)) + { + plp = lp->prev; + CALL_FREEFUN (h, lp); + lp = plp; + /* If we switch chunks, we can't tell whether the new current + chunk contains an empty object, so assume that it may. */ + h->maybe_empty_object = 1; + } + if (lp) + { + h->object_base = h->next_free = (char *) (obj); + h->chunk_limit = lp->limit; + h->chunk = lp; + } + else if (obj != 0) + /* obj is not in any of the chunks! */ + abort (); +} + +# ifdef _LIBC +/* Older versions of libc used a function _obstack_free intended to be + called by non-GCC compilers. */ +strong_alias (obstack_free, _obstack_free) +# endif + +int +_obstack_memory_used (struct obstack *h) +{ + register struct _obstack_chunk* lp; + register int nbytes = 0; + + for (lp = h->chunk; lp != 0; lp = lp->prev) + { + nbytes += lp->limit - (char *) lp; + } + return nbytes; +} + +/* Define the error handler. */ +# ifdef _LIBC +# include +# else +# include "gettext.h" +# endif +# ifndef _ +# define _(msgid) gettext (msgid) +# endif + +# ifdef _LIBC +# include +# endif + +# ifndef __attribute__ +/* This feature is available in gcc versions 2.5 and later. */ +# if __GNUC__ < 2 || (__GNUC__ == 2 && __GNUC_MINOR__ < 5) +# define __attribute__(Spec) /* empty */ +# endif +# endif + +static void +__attribute__ ((noreturn)) +print_and_abort (void) +{ + /* Don't change any of these strings. Yes, it would be possible to add + the newline to the string and use fputs or so. But this must not + happen because the "memory exhausted" message appears in other places + like this and the translation should be reused instead of creating + a very similar string which requires a separate translation. */ +# ifdef _LIBC + (void) __fxprintf (NULL, "%s\n", _("memory exhausted")); +# else + fprintf (stderr, "%s\n", _("memory exhausted")); +# endif + exit (obstack_exit_failure); +} + +#endif /* !ELIDE_CODE */ diff --git a/external/libsdf/libsw/op_template.c b/external/libsdf/libsw/op_template.c new file mode 100644 index 0000000..afff00b --- /dev/null +++ b/external/libsdf/libsw/op_template.c @@ -0,0 +1,34 @@ + + /* This is a template that is included multiple times in mpmy_combine.c */ + + switch(manifest->u.op) { + case MPMY_SUM: + Do_Op(outbuf, +=, inbuf, Type, count); + break; + case MPMY_PROD: + Do_Op(outbuf, *=, inbuf, Type, count); + break; + case MPMY_MAX: + Do_Op(outbuf, + = (*(Type *)outbuf > *(Type *)inbuf) ? *(Type *)outbuf :, + inbuf, Type, count); + break; + case MPMY_MIN: + Do_Op(outbuf, + = (*(Type *)outbuf < *(Type *)inbuf) ? *(Type *)outbuf :, + inbuf, Type, count); + break; +#ifdef BIT_OPS + case MPMY_BAND: + Do_Op(outbuf, &=, inbuf, Type, count); + break; + case MPMY_BOR: + Do_Op(outbuf, |=, inbuf, Type, count); + break; + case MPMY_BXOR: + Do_Op(outbuf, ^=, inbuf, Type, count); + break; +#endif + default: + Error("Unknown op in mpmy_combine\n"); + } diff --git a/external/libsdf/libsw/peano.c b/external/libsdf/libsw/peano.c new file mode 100644 index 0000000..72609e5 --- /dev/null +++ b/external/libsdf/libsw/peano.c @@ -0,0 +1,363 @@ +#ifndef NDIM +#define NDIM 3 +#endif + +#include "Msgs.h" +#include "Assert.h" +#include "key.h" +#include "vop.h" + + +/* The long-awaited peano-hilbert key. */ +/* "When the going gets wierd, the wierd turn pro." */ +/* Interestingly, it isn't particularly more complicated by virtue */ +/* of the arbitrary NDIM support. The NDIM=3 only code was essentially */ +/* the same except for some loop indices */ + +#if NDIM==3 +/* The possible places to start. */ +/* They can only have an even number of bits turned on. */ +#define S000 0 +#define S011 1 +#define S101 2 +#define S110 3 +/* one for each of the possible startindices */ +static int sindex_to_mask[1<<(NDIM-1)] = {0, 3, 5, 6}; +static int smask_to_index[1<>rshift)&1; +#if NDIM > 1 + bits |= ((ikey1>>rshift)&1) << 1; +#if NDIM > 2 + bits |= ((ikey2>>rshift)&1) << 2; +#endif +#endif + ret = KeyOrInt(KeyLshift(ret, NDIM), bitmap[start][type][bits]); + otype = type; + type = typmap[start][otype][bits]; + start = startmap[start][otype][bits]; + } + return ret; +} + +Key_t PHKeyFromInts(unsigned int ikey[NDIM], int ndim, unsigned int depth){ + assert(ndim == NDIM); + return _PHKeyFromInts(ikey, depth, 0, 0); +} + +/* Convert from a PH key to a NDIM-tuple of ints. */ +/* Return the "depth" of the key */ + +static unsigned int +_IntsFromPHKey(Key_t key, unsigned int ikey[NDIM], + int depth, int start, int type){ + unsigned int lobits, unscrambled; + unsigned int otype; + unsigned int rshift, ret; + Vxd(unsigned int out); + Key_t keymax; + Key_t key0; + + if( !setup_done ) + setup(); + keymax = KeyLshift(KeyInt(1), NDIM*depth); + key0 = KeyInt(0); + + VxS(out, = 0); + /* We need to start at the left, so we need to figure out where the */ + /* left of key is! */ + while( KeyEQ(KeyAnd(keymax, key), key0) ){ + keymax = KeyRshift(keymax, NDIM); + depth--; + } + ret = depth; + + rshift = depth*NDIM; + Msgf(("IntsFromPHKey(%s)\n", PrintKey(key))); + while( rshift > 0 ){ + Key_t kr; + rshift -= NDIM; + depth--; + kr = KeyRshift(key, rshift); + lobits = KeyAndInt(kr, (1< 1 + out1 |= ((unscrambled>>1)&1)< 2 + out2 |= ((unscrambled>>2)&1)<= NDIM ) + Error("bad type\n"); + if( start < 0 || start >= 1<<(NDIM-1) ) + Error("bad start"); + + /* Test each ikey in a 3-d grid. Make sure that + _IntsFromKey(KeyFromInts(ik)) == ik */ + for(ik[0]=0; ik[0]<(1< %s -> revk=(%x %x %x)\n", + ik[0], ik[1], ik[2], + PrintKey(key), + revk[0], revk[1], revk[2]); + } + } + } + + /* Test each key starting at 0. Make sure that + KeyFromInts(IntsFromKey(k))==k */ + base = KeyLshift(KeyInt(1), depth*NDIM); + for(ikey=0; ikey < (1<<(NDIM*depth)); ikey++){ + Key_t revkey; + unsigned int di[NDIM]; + unsigned int iklast[NDIM]; + int d2; + + revd = _IntsFromPHKey(base, ik, depth, start, type); + assert(revd == depth); + revkey = _PHKeyFromInts(ik, depth, start, type); + if( KeyNEQ(revkey, base)){ + char s[64]; + strcpy(s, PrintKey(revkey)); + Warning("key %s -> %x %x %x -> %s\n", + PrintKey(base), ik[0], ik[1], ik[2], s); + } + VVV(di, = ik, - iklast); + d2 = Dot(di, di); + if( d2 != 1 && ikey ){ + Warning("Move by more than 1 at ikey=%#0o\n", ikey); + } + + VV(iklast, = ik); + base = KeyAddInt(base, 1); + } + exit(0); +} + +#endif + +#if NDIM==2 + +#define SZ 6. /* inches */ +main(int argc, char **argv){ + int depth; + unsigned int i, ikey; + unsigned int ik[NDIM]; + int type, start; + unsigned int revk[NDIM], revd; + Key_t key, base; + + depth = atoi(argv[1]); + start = atoi(argv[2]); + type = atoi(argv[3]); + if( depth < 0 ) + Error("bad depth\n"); + if( type < 0 || type >= NDIM ) + Error("bad type\n"); + if( start < 0 || start >= 1<<(NDIM-1) ) + Error("bad start"); + + Warning("This is test of the emergency warning system\n"); + /* Test each ikey in a 3-d grid. Make sure that + _IntsFromKey(KeyFromInts(ik)) == ik */ + for(ik[0]=0; ik[0]<(1< %s -> revk=(%x %x)\n", + ik[0], ik[1], + PrintKey(key), + revk[0], revk[1]); + } + } + + /* Test each key starting at 0. Make sure that + KeyFromInts(IntsFromKey(k))==k */ + printf("%%!\n"); + printf("/L {2 copy lineto stroke moveto pop} def\n"); + printf("72 72 translate\n"); + printf("%%scale to 6 inches width, hgt\n"); + printf("%g %g scale\n", 72.*SZ/(1< %x %x -> %s\n", + PrintKey(base), ik[0], ik[1], s); + } + printf("%d %d %d L\n", i, ik[0], ik[1]); + VVV(di, = ik, - iklast); + d2 = Dot(di, di); + if( d2 != 1 && ikey ){ + Warning("Move by more than 1 at ikey=%#0o\n", ikey); + } + + VV(iklast, = ik); + base = KeyAddInt(base, 1); + } + printf("showpage\n"); + exit(0); +} + +#endif /* NDIM==2 */ +#endif /* STANDALONE */ diff --git a/external/libsdf/libsw/poll.1.c b/external/libsdf/libsw/poll.1.c new file mode 100644 index 0000000..ac904ae --- /dev/null +++ b/external/libsdf/libsw/poll.1.c @@ -0,0 +1,225 @@ +#include +#include "mpmy.h" +#include "Msgs.h" +#include "assert.h" +#include "gc.h" +#include "timers.h" +#include "error.h" +#include "Malloc.h" +#include "dll.h" +#include "chn.h" +#include "poll.h" + +#define INBUFSZ (16384*sizeof(int)) +#define MAXRELAY 4096 +#define MAXLOCAL 64 +#define MAXREMOTE 2048 +Timer_t PollWaitTm; + +static void (*func)(); +static int size; +static int polldone; +static int localdone; +static int remotedone; +static int localid; +static int nremote; +static int nrelay_active; +static int nrelay_highwater; +static MPMY_Comm_request inreq; +static char localpoll[MAXLOCAL]; +static char remotepoll[MAXREMOTE]; +static int *relaybuf[MAXRELAY]; +static int inbuf[INBUFSZ/sizeof(int)]; /* avoid using malloc */ + +static void +process(int count, int tag) +{ + int dest = inbuf[0]; + + if (dest < 0 || dest >= MPMY_Nproc()) { + Error("Bad dest in Poll()\n"); + } + if (dest != MPMY_Procnum()) { /* relay */ + MPMY_Comm_request req; + + Msgf(("PollRelay: %d to %d using buffer %d\n", count, dest, nrelay_active)); + if (nrelay_active > MAXRELAY) Error("Out of relay buffers\n"); + memcpy(relaybuf[nrelay_active], inbuf, count); + MPMY_Isend(relaybuf[nrelay_active++], count, dest, tag, &req); + if (nrelay_active > nrelay_highwater) nrelay_highwater = nrelay_active; + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); + /* This should be something more like a select() to free intermediate buffers */ + PollWait(req, tag); + if (--nrelay_active < 0) Error("Bad value for nrelay_active\n"); + } else if (count == 2*sizeof(int)) { + int src = inbuf[1]; + if (src < 0 || src >= MAXLOCAL+MAXREMOTE) { + Error("Bad src in Poll()\n"); + } + Msgf(("PollDone msg from %d\n", src)); + if (localid) polldone = 1; + else if (src < MAXLOCAL) { + /* local done message */ + if (src >= MPMY_ProcsPerNode()) Error("Bad src in Poll()\n"); + if (localpoll[src]) + Error("localpoll[%d] already set!\n", src); + if (localdone >= MPMY_ProcsPerNode()) Error("localdone already acheived\n"); + localpoll[src] = 1; + localdone++; + Msgf(("Poll: localdone is %d\n", localdone)); + } else { + /* remote done message */ + src -= MAXLOCAL; + if (src >= nremote) Error("Bad src in Poll()\n"); + if (remotepoll[src]) + Error("remotepoll[%d] already set!\n", src); + if (remotedone >= nremote) Error("remotedone already acheived\n"); + remotepoll[src] = 1; + if (++remotedone >= nremote) polldone = 1; + Msgf(("Poll: remotedone is %d\n", remotedone)); + } + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); + } else { + Msgf(("PollProcess: %d\n", count)); + func(inbuf+1, count-sizeof(int)); + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); + } +} + + +void +PollSetup(void put(void *buf, int size), int max_size, int tag) +{ + int i; + int procs_per_node = MPMY_ProcsPerNode(); + + Msgf(("PollSetup\n")); + func = put; + if (max_size > INBUFSZ) SinglError("INBUFSZ too small\n"); + if (procs_per_node > MAXLOCAL) SinglError("MAXLOCAL too small\n"); + if ((MPMY_Nproc()-1)/procs_per_node >= MAXREMOTE) SinglError("MAXREMOTE too small\n"); + nrelay_active = nrelay_highwater = polldone = localdone = remotedone = 0; + localid = MPMY_Procnum() % procs_per_node; + nremote = (MPMY_Nproc() + procs_per_node - 1) / procs_per_node; + for (i = 0; i < procs_per_node; i++) localpoll[i] = 0; + for (i = 0; i < nremote; i++) remotepoll[i] = 0; + if (MPMY_Nproc() % procs_per_node) { + if (MPMY_Procnum() == MPMY_Nproc() - 1) { + for (i = MPMY_Nproc() % procs_per_node; i < procs_per_node; i++) { + localpoll[i] = 1; + ++localdone; + } + } + } + if (localid == 0) { + for (i = 0; i < MAXRELAY; i++) { + relaybuf[i] = Malloc(INBUFSZ); + } + } + + /* In fact, this test is insufficient if somebody decides to send + a short message anyway we'll still be confused! */ + if (max_size == sizeof(int) || max_size == 2*sizeof(int)) + SinglError("Poll uses size for message sorting. You can't use size=%ld or %ld without some new coding\n", (long)sizeof(int), (long)2*sizeof(int)); + + size = max_size; + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); +} + +void +Poll(int tag) +{ + int flag; + MPMY_Status stat; + + Msgf(("P(tag=%d)\n", tag)); + MPMY_Flick(); + while (MPMY_Test(inreq, &flag, &stat), flag) { + process(stat.count, tag); + } +} + +/* If we use a plain MPMY_Wait() during a poll session, deadlock may */ +/* result from isends blocking */ +void +PollWait(MPMY_Comm_request req, int tag) +{ + int flag; + MPMY_Status stat; + + Msgf(("PW(tag=%d)\n", tag)); + while (1) { + if (MPMY_Test(req, &flag, 0), flag) return; + if (MPMY_Test(inreq, &flag, &stat), flag) process(stat.count, tag); + MPMY_Flick(); + } +} + +void +PollUntilDone(int tag) +{ + MPMY_Comm_request req; + MPMY_Status stat; + int buf[2]; + int i; + int procnum = MPMY_Procnum(); + int procs_per_node = MPMY_ProcsPerNode(); + + Msg("polldone", ("PUD(tag=%d)\n", tag)); + StartTimer(&PollWaitTm); + if (localid == 0) { + /* I'm a group master */ + if (localpoll[0]) Error("localpoll[0] already set!\n"); + localpoll[0] = 1; + ++localdone; + while (localdone != procs_per_node) { /* not right if Nproc() % procs_per_node */ + MPMY_Wait(inreq, &stat); + process(stat.count, tag); + } + Msg("polldone", ("PollDone local group finished\n")); + /* Tell masters our group is done */ + for (i = 0; i < nremote; i++) { + buf[0] = i*procs_per_node; + buf[1] = MAXLOCAL + MPMY_Procnum() / procs_per_node; + Msg("polldone", ("PollDone sent to remote master %d\n", i*procs_per_node)); + MPMY_Isend(buf, 2*sizeof(int), i*procs_per_node, tag, &req); + PollWait(req, tag); + } + while (!polldone) { + MPMY_Wait(inreq, &stat); + process(stat.count, tag); + } + /* Tell local group we are done */ + for (i = MPMY_Procnum()+1; i < MPMY_Procnum()+procs_per_node && i < MPMY_Nproc(); i++) { + buf[0] = i; + buf[1] = 0; + Msg("polldone", ("PollDone sent to %d\n", i)); + MPMY_Isend(buf, 2*sizeof(int), i, tag, &req); + PollWait(req, tag); + } + } else { + /* I'm a slave in the local group */ + int dest = (procnum / procs_per_node) * procs_per_node; + buf[0] = dest; + buf[1] = localid; + /* Tell local master we're done */ + Msg("polldone", ("PollDone sent to local master %d\n", dest)); + MPMY_Isend(buf, 2*sizeof(int), dest, tag, &req); + PollWait(req, tag); + while (!polldone) { + MPMY_Wait(inreq, &stat); + process(stat.count, tag); + } + } + /* self send to clean up inreq */ + MPMY_Isend(buf, sizeof(int), MPMY_Procnum(), tag, &req); + MPMY_Wait(req, 0); + MPMY_Wait(inreq, 0); + StopTimer(&PollWaitTm); + if (localid == 0) { + for (i = 0; i < MAXRELAY; i++) { + Free(relaybuf[i]); + } + } + Msgf(("PollDone, nrelay_highwater = %d\n", nrelay_highwater)); +} diff --git a/external/libsdf/libsw/poll.c b/external/libsdf/libsw/poll.c new file mode 100644 index 0000000..6e2cd9e --- /dev/null +++ b/external/libsdf/libsw/poll.c @@ -0,0 +1,244 @@ +#include +#include "mpmy.h" +#include "Msgs.h" +#include "assert.h" +#include "gc.h" +#include "timers.h" +#include "error.h" +#include "dll.h" +#include "chn.h" +#include "poll.h" + +#define INBUFSZ (16384*sizeof(int)) +#define MAXRELAY 512 +#define MAXLOCAL 64 +#define MAXREMOTE 2048 +Timer_t PollWaitTm; + +static void (*func)(); +static int size; +static int polldone; +static int localdone; +static int remotedone; +static int localid; +static int nremote; +static MPMY_Comm_request inreq; +static char localpoll[MAXLOCAL]; +static char remotepoll[MAXREMOTE]; +static int bufused[MAXRELAY]; +MPMY_Comm_request Rreq[MAXRELAY]; +static int relaybuf[MAXRELAY][INBUFSZ/sizeof(int)]; +static int inbuf[INBUFSZ/sizeof(int)]; /* avoid using malloc */ + +static int +allocbuf(void) +{ + int i; + int flag; + int inuse; + + inuse = 0; + for (i = 0; i < MAXRELAY; i++) { + if (bufused[i] == 1) { + if (MPMY_Test(Rreq[i], &flag, 0), flag) { + bufused[i] = 0; + } else inuse++; + } + } + Msgf(("%d relay inuse\n", ++inuse)); + for (i = 0; i < MAXRELAY; i++) { + if (bufused[i] == 0) { + bufused[i] = 1; + return i; + } + } + Error("Out of relay buffers\n"); +} + + +static void +process(int count, int tag) +{ + int dest = inbuf[0]; + + if (dest < 0 || dest >= MPMY_Nproc()) { + Error("Bad dest in Poll()\n"); + } + if (dest != MPMY_Procnum()) { /* relay */ + int i; + i = allocbuf(); + Msgf(("PollRelay: %d to %d using buffer %d\n", count, dest, i)); + memcpy(relaybuf[i], inbuf, count); + MPMY_Isend(relaybuf[i], count, dest, tag, Rreq+i); + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); + } else if (count == 2*sizeof(int)) { + int src = inbuf[1]; + if (src < 0 || src >= MAXLOCAL+MAXREMOTE) { + Error("Bad src in Poll()\n"); + } + Msgf(("PollDone msg from %d\n", src)); + if (localid) polldone = 1; + else if (src < MAXLOCAL) { + /* local done message */ + if (src >= MPMY_ProcsPerNode()) Error("Bad src in Poll()\n"); + if (localpoll[src]) + Error("localpoll[%d] already set!\n", src); + if (localdone >= MPMY_ProcsPerNode()) Error("localdone already acheived\n"); + localpoll[src] = 1; + localdone++; + Msgf(("Poll: localdone is %d\n", localdone)); + } else { + /* remote done message */ + src -= MAXLOCAL; + if (src >= nremote) Error("Bad src in Poll()\n"); + if (remotepoll[src]) + Error("remotepoll[%d] already set!\n", src); + if (remotedone >= nremote) Error("remotedone already acheived\n"); + remotepoll[src] = 1; + if (++remotedone >= nremote) polldone = 1; + Msgf(("Poll: remotedone is %d\n", remotedone)); + } + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); + } else { + Msgf(("PollProcess: %d\n", count)); + func(inbuf+1, count-sizeof(int)); + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); + } +} + + +void +PollSetup(void put(void *buf, int size), int max_size, int tag) +{ + int i; + int procs_per_node = MPMY_ProcsPerNode(); + + Msgf(("PollSetup\n")); + func = put; + if (max_size > INBUFSZ) SinglError("INBUFSZ too small\n"); + if (procs_per_node > MAXLOCAL) SinglError("MAXLOCAL too small\n"); + if ((MPMY_Nproc()-1)/procs_per_node >= MAXREMOTE) SinglError("MAXREMOTE too small\n"); + polldone = localdone = remotedone = 0; + localid = MPMY_Procnum() % procs_per_node; + nremote = (MPMY_Nproc() + procs_per_node - 1) / procs_per_node; + for (i = 0; i < procs_per_node; i++) localpoll[i] = 0; + for (i = 0; i < nremote; i++) remotepoll[i] = 0; + for (i = 0; i < MAXRELAY; i++) bufused[i] = 0; + if (MPMY_Nproc() % procs_per_node) { + if (MPMY_Procnum() == MPMY_Nproc() - 1) { + for (i = MPMY_Nproc() % procs_per_node; i < procs_per_node; i++) { + localpoll[i] = 1; + ++localdone; + } + } + } + + /* In fact, this test is insufficient if somebody decides to send + a short message anyway we'll still be confused! */ + if (max_size == sizeof(int) || max_size == 2*sizeof(int)) + SinglError("Poll uses size for message sorting. You can't use size=%ld or %ld without some new coding\n", (long)sizeof(int), (long)2*sizeof(int)); + + size = max_size; + MPMY_Irecv(&inbuf, size, MPMY_SOURCE_ANY, tag, &inreq); +} + +void +Poll(int tag) +{ + int flag; + MPMY_Status stat; + + Msgf(("P(tag=%d)\n", tag)); + while (MPMY_Test(inreq, &flag, &stat), flag) { + process(stat.count, tag); + MPMY_Flick(); + } +} + +/* If we use a plain MPMY_Wait() during a poll session, deadlock may */ +/* result from isends blocking */ +void +PollWait(MPMY_Comm_request req, int tag) +{ + int flag; + MPMY_Status stat; + + Msgf(("PW(tag=%d)\n", tag)); + while (1) { + if (MPMY_Test(req, &flag, 0), flag) return; + if (MPMY_Test(inreq, &flag, &stat), flag) process(stat.count, tag); + MPMY_Flick(); + } +} + +void +PollUntilDone(int tag) +{ + MPMY_Comm_request req; + MPMY_Status stat; + int buf[2]; + int i; + int procnum = MPMY_Procnum(); + int procs_per_node = MPMY_ProcsPerNode(); + + Msg("polldone", ("PUD(tag=%d)\n", tag)); + StartTimer(&PollWaitTm); + if (localid == 0) { + /* I'm a group master */ + if (localpoll[0]) Error("localpoll[0] already set!\n"); + localpoll[0] = 1; + ++localdone; + while (localdone != procs_per_node) { /* not right if Nproc() % procs_per_node */ + MPMY_Wait(inreq, &stat); + process(stat.count, tag); + } + Msg("polldone", ("PollDone local group finished\n")); + /* Tell masters our group is done */ + for (i = 0; i < nremote; i++) { + buf[0] = i*procs_per_node; + buf[1] = MAXLOCAL + MPMY_Procnum() / procs_per_node; + Msg("polldone", ("PollDone sent to remote master %d\n", i*procs_per_node)); + MPMY_Isend(buf, 2*sizeof(int), i*procs_per_node, tag, &req); + PollWait(req, tag); + } + while (!polldone) { + MPMY_Wait(inreq, &stat); + process(stat.count, tag); + for (i = 0; i < MAXRELAY; i++) { + int flag; + if (bufused[i] == 1) { + if (MPMY_Test(Rreq[i], &flag, 0), flag) { + bufused[i] = 0; + } + } + } + } + /* Tell local group we are done */ + for (i = MPMY_Procnum()+1; i < MPMY_Procnum()+procs_per_node && i < MPMY_Nproc(); i++) { + buf[0] = i; + buf[1] = 0; + Msg("polldone", ("PollDone sent to %d\n", i)); + MPMY_Isend(buf, 2*sizeof(int), i, tag, &req); + PollWait(req, tag); + } + } else { + /* I'm a slave in the local group */ + int dest = (procnum / procs_per_node) * procs_per_node; + buf[0] = dest; + buf[1] = localid; + /* Tell local master we're done */ + Msg("polldone", ("PollDone sent to local master %d\n", dest)); + MPMY_Isend(buf, 2*sizeof(int), dest, tag, &req); + PollWait(req, tag); + while (!polldone) { + MPMY_Wait(inreq, &stat); + process(stat.count, tag); + } + } + /* self send to clean up inreq */ + MPMY_Isend(buf, sizeof(int), MPMY_Procnum(), tag, &req); + MPMY_Wait(req, 0); + MPMY_Wait(inreq, 0); + StopTimer(&PollWaitTm); + Msgf(("PollDone\n")); +} diff --git a/external/libsdf/libsw/qromo.c b/external/libsdf/libsw/qromo.c new file mode 100644 index 0000000..f781a00 --- /dev/null +++ b/external/libsdf/libsw/qromo.c @@ -0,0 +1,111 @@ +#include +#include "Malloc.h" +#include "error.h" +#include "qromo.h" + +#define NR_END 1 + +static float *vector(long nl, long nh) +/* allocate a float vector with subscript range v[nl..nh] */ +{ + float *v; + + v = Malloc((nh-nl+1+NR_END)*sizeof(float)); + return v-nl+NR_END; +} + +static void free_vector(v,nl,nh) +float *v; +long nh,nl; +/* free a float vector allocated with vector() */ +{ + Free(v+nl-NR_END); +} + +void polint(float xa[], float ya[], int n, float x, float *y, float *dy) +{ + int i,m,ns=1; + float den,dif,dift,ho,hp,w; + float *c,*d; + + dif=fabs(x-xa[1]); + c=vector(1,n); + d=vector(1,n); + for (i=1;i<=n;i++) { + if ( (dift=fabs(x-xa[i])) < dif) { + ns=i; + dif=dift; + } + c[i]=ya[i]; + d[i]=ya[i]; + } + *y=ya[ns--]; + for (m=1;m= K) { + polint(&h[j-K],&s[j-K],K,0.0,&ss,&dss); + if (fabs(dss) < EPS*fabs(ss)) return ss; + } + s[j+1]=s[j]; + h[j+1]=h[j]/9.0; + } + Error("Too many steps in routing qromo"); + return 0.0; +} diff --git a/external/libsdf/libsw/qromod.c b/external/libsdf/libsw/qromod.c new file mode 100644 index 0000000..d50fed9 --- /dev/null +++ b/external/libsdf/libsw/qromod.c @@ -0,0 +1,111 @@ +#include +#include "Malloc.h" +#include "error.h" +#include "qromo.h" + +#define NR_END 1 + +static double *vector(long nl, long nh) +/* allocate a double vector with subscript range v[nl..nh] */ +{ + double *v; + + v = Malloc((nh-nl+1+NR_END)*sizeof(double)); + return v-nl+NR_END; +} + +static void free_vector(v,nl,nh) +double *v; +long nh,nl; +/* free a double vector allocated with vector() */ +{ + Free(v+nl-NR_END); +} + +void polintd(double xa[], double ya[], int n, double x, double *y, double *dy) +{ + int i,m,ns=1; + double den,dif,dift,ho,hp,w; + double *c,*d; + + dif=fabs(x-xa[1]); + c=vector(1,n); + d=vector(1,n); + for (i=1;i<=n;i++) { + if ( (dift=fabs(x-xa[i])) < dif) { + ns=i; + dif=dift; + } + c[i]=ya[i]; + d[i]=ya[i]; + } + *y=ya[ns--]; + for (m=1;m= K) { + polintd(&h[j-K],&s[j-K],K,0.0,&ss,&dss); + if (fabs(dss) < EPS*fabs(ss)) return ss; + } + s[j+1]=s[j]; + h[j+1]=h[j]/9.0; + } + Error("Too many steps in routine qromod\n"); + return 0.0; +} diff --git a/external/libsdf/libsw/raise.c b/external/libsdf/libsw/raise.c new file mode 100644 index 0000000..7bdb4d0 --- /dev/null +++ b/external/libsdf/libsw/raise.c @@ -0,0 +1,6 @@ +#include +#include + +int raise(int sig){ + return kill(getpid(), sig); +} diff --git a/external/libsdf/libsw/randoms.c b/external/libsdf/libsw/randoms.c new file mode 100644 index 0000000..4e9513d --- /dev/null +++ b/external/libsdf/libsw/randoms.c @@ -0,0 +1,126 @@ +#include "error.h" +#include "randoms.h" + +/* ran2 from Numerical Recipes 2nd ed. modified to conform to our interface */ +/* You win $1000 from Press et al. if it fails non-trivially */ + +#define IM1 2147483563 +#define IM2 2147483399 +#define AM (1.0/IM1) +#define IMM1 (IM1-1) +#define IA1 40014 +#define IA2 40692 +#define IQ1 53668 +#define IQ2 52774 +#define IR1 12211 +#define IR2 3791 +#define NDIV (1+IMM1/NTAB) +#define EPS 1.2e-7 +#define RNMX (1.0-EPS) + +void +ran_init(int seed, ran_state *s) +{ + int j; + long k; + + if (seed < 1) + Error("Bad seed in ran_init2 (%d)\n", seed); + s->idum = s->idum2 = seed; + s->next_norml_ok = 0; + for (j=NTAB+7;j>=0;j--) { + k=(s->idum)/IQ1; + s->idum=IA1*(s->idum-k*IQ1)-k*IR1; + if (s->idum < 0) s->idum += IM1; + if (j < NTAB) s->iv[j] = s->idum; + } + s->iy=s->iv[0]; + s->did_init = IQ1; +} + +float +uniform_rand(ran_state *s) +{ + int j; + long k; + float temp; + + if (s->did_init != IQ1) + Error("You forgot to call ran_init2\n"); + k=(s->idum)/IQ1; + s->idum=IA1*(s->idum-k*IQ1)-k*IR1; + if (s->idum < 0) s->idum += IM1; + k=s->idum2/IQ2; + s->idum2=IA2*(s->idum2-k*IQ2)-k*IR2; + if (s->idum2 < 0) s->idum2 += IM2; + j=s->iy/NDIV; + s->iy=s->iv[j]-s->idum2; + s->iv[j] = s->idum; + if (s->iy < 1) s->iy += IMM1; + if ((temp=AM*s->iy) > RNMX) return RNMX; + else return temp; +} + +float normal_rand(ran_state *st) +/* +This is the Polar method for normal distributions, as described on or near +page 104 of Knuth, Semi-numerical Algorithms. To quote Knuth, "The polar +method is quite slow, but it has essentially perfect accuracy, and it is very +easy to write a program for the polar method..." 'nuf said. Algorithm due +to Box, Muller and Marsaglia. +*/ +{ + float v1, v2; /* uniformly distributed on [-1, 1) */ + float s; /* radius of a point pulled from a uniform circle */ + float foo; /* A useful intermediate value. */ + double log(double), sqrt(double); + + if(st->next_norml_ok){ + st->next_norml_ok = 0; + return st->next_norml; + } + + do{ + v1 = 2.0F * uniform_rand(st) - 1.0F; + v2 = 2.0F * uniform_rand(st) - 1.0F; + s = v1*v1 + v2*v2; + } while(s >= 1.0F); + foo = sqrt( -2.0F * log(s)/s); + st->next_norml_ok = 1; + st->next_norml = v1*foo; + return v2*foo; +} + +/* Return a uniform point in a ndim-sphere by rejection. */ +/* If ndim is large (bigger than 4 or so), this becomes very inefficient */ +/* Return the radius-squard of the result. */ +float sphere_rand(ran_state *st, int ndim, float *x) +{ + int k; + float rsqx; + + do { + rsqx = (float)0.0; + for (k = 0; k < ndim; k++) { + x[k] = uniform_rand(st)*2.0F - 1.0F; /* a pt in (-1,1) */ + rsqx += x[k] * x[k]; + } + } while (rsqx > (float)1.0); + return rsqx; +} + +/* Return a uniform point in a ndim-cube */ +/* Return the radius-squard of the result. */ +float cube_rand(ran_state *st, int ndim, float *x) +{ + int k; + float rsqx; + + rsqx = (float)0.0; + for (k = 0; k < ndim; k++) { + x[k] = uniform_rand(st)*2.0F - 1.0F; /* a pt in (-1,1) */ + rsqx += x[k] * x[k]; + } + return rsqx; +} + diff --git a/external/libsdf/libsw/ring.c b/external/libsdf/libsw/ring.c new file mode 100644 index 0000000..82413f0 --- /dev/null +++ b/external/libsdf/libsw/ring.c @@ -0,0 +1,153 @@ +#include +#include "ring.h" +#include "Malloc.h" +#include "mpmy.h" +#include "gc.h" +#include "Msgs.h" +#include "singlio.h" + +#define MSGTYPE 142 + +void +Ring(void *bptr, int bsize, int bnobj, + void *optr, int osize, int onobj, int oused, + void initf(void *, void *), void interactf(void *, void *, int, int)) +{ + char *p; + void *travel_btab; + void *tmpbuf; + int travel_size; + int n, i; + int from_proc, to_proc; + int procnum = MPMY_Procnum(); + int nproc = MPMY_Nproc(); + int max_nobj = onobj; + + MPMY_Combine(&onobj, &max_nobj, 1, MPMY_INT, MPMY_MAX); + + travel_size = max_nobj * oused; + travel_btab = Malloc(travel_size); + tmpbuf = Malloc(travel_size); + for (i = 0; i < onobj; i++) { + p = (char *)optr+i*osize; + initf((char *)travel_btab+i*oused, p); + } + +#if GRAYDECOMP + /* There should be functions like Gcup() which are periodic */ + to_proc = Gcup(procnum, nproc); + from_proc = Gcdown(procnum, nproc); + if (to_proc == -1) to_proc = bin2gray(0); + if (from_proc == -1) from_proc = bin2gray(nproc-1); +#else + to_proc = (procnum+1)%nproc; + from_proc = (procnum+nproc-1)%nproc; +#endif + + /* local part */ + for (p = bptr; p < (char *)bptr + bnobj * bsize; p += bsize) { + interactf(p, travel_btab, oused, onobj); + } + + for(n = 1; n < nproc; n++) { + MPMY_Comm_request req, req2; + MPMY_Status stat; + + singlPrintf("cycle %d starting\n", n); + Msgf(("communicate, cycle %d\n", n)); + /* This uses a lot more memory than packets would */ + memcpy(tmpbuf, travel_btab, travel_size); + MPMY_Irecv(travel_btab, travel_size, from_proc, MSGTYPE, &req); + MPMY_Isend(tmpbuf, onobj * oused, to_proc, MSGTYPE, &req2); + MPMY_Wait2(req, &stat, req2, 0); + onobj = MPMY_Count(&stat)/oused; + + Msgf(("compute, cycle %d\n", n)); + for (p = bptr; p < (char *)bptr + bnobj * bsize; p += bsize) { + interactf(p, travel_btab, oused, onobj); + } + } + Free(tmpbuf); + Free(travel_btab); +} + +/* Swap source and sink, and provide finishf() */ +void +Ring2(void *bptr, int bsize, int bnobj, + void *optr, int osize, int onobj, int tsize, + void initf(void *, void *), void interactf(void *, void *, int, int), void finishf(void *, void *)) +{ + char *p; + void *travel_btab; + void *tmpbuf; + int travel_size; + int n, i; + int from_proc, to_proc; + int procnum = MPMY_Procnum(); + int nproc = MPMY_Nproc(); + int max_nobj = onobj; + int initial_onobj = onobj; + MPMY_Comm_request req, req2; + MPMY_Status stat; + + + MPMY_Combine(&onobj, &max_nobj, 1, MPMY_INT, MPMY_MAX); + + travel_size = max_nobj * tsize; + travel_btab = Malloc(travel_size); + tmpbuf = Malloc(travel_size); + for (i = 0; i < onobj; i++) { + p = (char *)optr+i*osize; + initf((char *)travel_btab+i*tsize, p); + } + +#if GRAYDECOMP + /* There should be functions like Gcup() which are periodic */ + to_proc = Gcup(procnum, nproc); + from_proc = Gcdown(procnum, nproc); + if (to_proc == -1) to_proc = bin2gray(0); + if (from_proc == -1) from_proc = bin2gray(nproc-1); +#else + to_proc = (procnum+1)%nproc; + from_proc = (procnum+nproc-1)%nproc; +#endif + + /* local part */ + for (p = travel_btab; p < (char *)travel_btab + onobj * tsize; p += tsize) { + interactf(p, bptr, bsize, bnobj); + } + + for(n = 1; n < nproc; n++) { + singlPrintf("ring2, cycle %d starting\n", n); + Msgf(("communicate, cycle %d\n", n)); + /* This uses a lot more memory than packets would */ + memcpy(tmpbuf, travel_btab, onobj * tsize); + MPMY_Irecv(travel_btab, travel_size, from_proc, MSGTYPE, &req); + MPMY_Isend(tmpbuf, onobj * tsize, to_proc, MSGTYPE, &req2); + MPMY_Wait2(req, &stat, req2, 0); + onobj = MPMY_Count(&stat)/tsize; + + Msgf(("compute, cycle %d\n", n)); + for (p = travel_btab; p < (char *)travel_btab + onobj * tsize; p += tsize) { + interactf(p, bptr, bsize, bnobj); + } + } + + memcpy(tmpbuf, travel_btab, onobj * tsize); + MPMY_Irecv(travel_btab, travel_size, from_proc, MSGTYPE, &req); + MPMY_Isend(tmpbuf, onobj * tsize, to_proc, MSGTYPE, &req2); + MPMY_Wait2(req, &stat, req2, 0); + onobj = MPMY_Count(&stat)/tsize; + Free(tmpbuf); + + if (onobj != initial_onobj) Error("onobj doesn't match after trip around ring\n"); + + Msgf(("finish ring\n")); + if (finishf) { + for (i = 0; i < onobj; i++) { + p = (char *)optr+i*osize; + finishf((char *)travel_btab+i*tsize, p); + } + } + Free(travel_btab); +} diff --git a/external/libsdf/libsw/rsort.c b/external/libsdf/libsw/rsort.c new file mode 100644 index 0000000..de78638 --- /dev/null +++ b/external/libsdf/libsw/rsort.c @@ -0,0 +1,93 @@ +#include +#include +#include "Malloc.h" +#include "key.h" + +#define pswap(a, b, sz) do { \ + char _c[sz]; \ + memcpy((void *)&_c, (void *)a, sz); \ + memcpy((void *)a, (void *)b, sz); \ + memcpy((void *)b, (void *)&_c, sz); \ + } while (0) + +#define STACKSIZE 16384 + +/* Only need to continue if there is more than 1 element */ +#define push(_shift, _offset, _n) \ + if (_n > 1LL && _shift >= 0) { \ + sp->shift = _shift; \ + sp->offset = _offset; \ + (sp++)->n = _n; \ + if (sp >= stack+STACKSIZE) Error("Stack overflow\n"); \ + } + +#define pop(_shift, _offset, _n) do { \ + _shift = (--sp)->shift; \ + _offset = sp->offset; \ + _n = sp->n; \ + } while (0) + + +/* insertion sort for small lists */ +static void +isort(void *k, int n, int sz, Key_t (*getkey)(const void *)) +{ + void *p, *q; + + for (p = k+sz; --n >= 1; p += sz) { + for (q = p; q > k; q -= sz) { + if (KeyGT(getkey(q), getkey(q-sz))) + break; + pswap(q, q-sz, sz); + } + } +} + +/* American Flag radix sort using aux storage of O(1) and stack of O(logN) */ +void +rsort(void *tag, int64_t n, int sz, int radixBits, int sortBits, Key_t (*getkey)(const void *)) +{ + int i, shift, c; + int64_t offset, *keyden, *pile; + void *ak; + unsigned int radix = 1 << radixBits; + unsigned int mask = radix - 1; + struct { + int shift; + int64_t offset; + int64_t n; + } stack[STACKSIZE], *sp = stack; + + keyden = Malloc(radix * sizeof(int64_t)); + pile = Malloc(radix * sizeof(int64_t)); + + /* start so last mask cycle uses all radixBits */ + push((sortBits-1)/radixBits*radixBits, 0, n); + + while (sp > stack) { + pop(shift, offset, n); + if (n < 64) { + isort(tag+offset*sz, n, sz, getkey); + continue; + } + + for (i = 0; i < radix; i++) + keyden[i] = 0; + for (ak = tag+offset*sz; ak < tag+(offset+n)*sz; ak += sz) + ++keyden[KeyAndInt(KeyRshift(getkey(ak), shift), mask)]; + for (pile[0] = keyden[0], i = 1; i < radix; i++) + pile[i] = pile[i-1]+keyden[i]; + push(shift-radixBits, offset, keyden[0]); + for (i = 1; i < radix; i++) + push(shift-radixBits, offset+pile[i-1], keyden[i]); + for (ak = tag+offset*sz; ak < tag+(offset+n-keyden[mask])*sz; ak += keyden[c]*sz) { + char tag_aux[sz]; + memcpy(tag_aux, ak, sz); /* in-place permutation */ + while (--pile[c = KeyAndInt(KeyRshift(getkey(tag_aux), shift), mask)] > (ak-tag)/sz-offset) + pswap(tag_aux, tag+(pile[c]+offset)*sz, sz); + memcpy(ak, tag_aux, sz); + } + } + Free(pile); + Free(keyden); +} diff --git a/external/libsdf/libsw/sigio_dump.c b/external/libsdf/libsw/sigio_dump.c new file mode 100644 index 0000000..47ad93a --- /dev/null +++ b/external/libsdf/libsw/sigio_dump.c @@ -0,0 +1,113 @@ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include "error.h" +#include "singlio.h" +#include "Msgs.h" +#include "protos.h" +#include "mpmy.h" +#include "byteswap.h" +#include "memfile.h" + +static void sock_init(char *hostname, int *port, + struct sockaddr_in *acc, int bind_flag); + +static void setup_handler(void); +static void io_ready(int); + +static int sock; /* file descriptor for my UDP socket */ + +void +sigio_setup(void) +{ + int port = 4000; + struct sockaddr_in my_addr; + + setup_handler(); + sock_init(NULL, &port, &my_addr, 1); /* get my sockaddr */ + + if (fcntl(sock, F_SETOWN, getpid()) < 0) + Error("F_SETOWN error\n"); + +#ifdef FASYNC + if (fcntl(sock, F_SETFL, FASYNC) < 0) + Error("F_SETFL FASYNC error\n"); +#endif + +} + + +static void +setup_handler(void) +{ + signal(SIGIO, io_ready); +} + +static void +io_ready(int sig) +{ + int node; + + PrintMemfile(); + signal(SIGIO, io_ready); + return; +} + +/* This is virtually identical to the lsv code */ + +static void +sock_init(char *hostname, int *port, struct sockaddr_in *acc, int bind_flag) +{ + struct hostent *hp; + char host_name[256]; + unsigned long inaddr; + int tries = 0; + + if (hostname == NULL) { + if( (hostname = getenv("LSV_HOSTNAME")) == NULL ){ + if (gethostname(host_name, 256)) + Error("sock_create: gethostname failed\n"); + hostname = host_name; + } + } + memset(acc, 0, sizeof(struct sockaddr_in)); + acc->sin_family = htons(AF_INET); + + if ((inaddr = inet_addr(hostname)) != -1) /* it is numeric */ + acc->sin_addr.s_addr = inaddr; + else if ((hp = gethostbyname(hostname)) != (struct hostent *)0) + memcpy(&(acc->sin_addr), hp->h_addr, hp->h_length); + else + Error("gethostbyname failed\n"); + + if (bind_flag) { + sock = socket(AF_INET, SOCK_DGRAM, 0); + } + try_again: + acc->sin_port = htons(*port); + if (bind_flag) { + int ret; + ret = bind(sock,(struct sockaddr *)acc,sizeof(struct sockaddr_in)); + if (ret < 0 ) { + if (tries < 100) { + /* printf("bind returns %d\n", ret); */ + (*port)++; + tries++; + goto try_again; + } else { + Error("Can't bind socket. Tried %d, up to port %d\n", + tries, *port); + exit(1); + } + } + Msg_do("sigio_dump at %s port %d\n", hostname, *port); + errno = 0; /* clear errors */ + } +} diff --git a/external/libsdf/libsw/singlio.c b/external/libsdf/libsw/singlio.c new file mode 100644 index 0000000..17523df --- /dev/null +++ b/external/libsdf/libsw/singlio.c @@ -0,0 +1,35 @@ +#include +#include +#include "protos.h" +#include "mpmy.h" +#include "Msgs.h" +#include "singlio.h" + +static int singl_auto_flush = 1; + +int singlAutoflush(int new){ + int ret = singl_auto_flush; + singl_auto_flush = new; + return ret; +} + +int singlPrintf(const char *fmt, ...){ + va_list ap; + int ret; + + if(MPMY_Procnum() != 0 ) + return 0; + va_start(ap, fmt); + ret = vfprintf(stdout, fmt, ap); + va_end(ap); + if( singl_auto_flush ) + fflush(stdout); + return ret; +} + +void singlFflush(void) +{ + if( MPMY_Procnum() != 0 ) + return; + fflush(stdout); +} diff --git a/external/libsdf/libsw/stk.c b/external/libsdf/libsw/stk.c new file mode 100644 index 0000000..1800e6b --- /dev/null +++ b/external/libsdf/libsw/stk.c @@ -0,0 +1,67 @@ +#define STKdotC +#include "Assert.h" +#include "stk.h" +#include "error.h" +/* Everything else is inlined in stk.h */ + +/* Any non-inlined definitions can go here. */ + +void StkInit(struct stk *s, size_t initial_sz, + void *(*realloc_like)(void *, size_t), unsigned int alignment){ + s->growby = initial_sz; + s->realloc_like = realloc_like; + s->ptr = s->bottom = realloc_like(NULL, initial_sz); + s->top = s->bottom + initial_sz; + if( alignment == 0 ) + alignment = _STK_DEFAULT_ALIGNMENT; + assert( (alignment & (alignment-1)) == 0 ); + s->align_mask = alignment - 1; +} + +void StkInitWithData(struct stk *s, size_t initial_sz, + void *(*realloc_like)(void *, size_t), void *data, + unsigned int alignment){ + s->growby = initial_sz; + s->realloc_like = realloc_like; + s->bottom = data; + s->top = s->ptr = s->bottom + initial_sz; + if( alignment == 0 ) + alignment = _STK_DEFAULT_ALIGNMENT; + assert( (alignment & (alignment-1)) == 0 ); + s->align_mask = alignment - 1; +} + +void StkCopy(struct stk *to, const struct stk *from){ + size_t initial_sz; + + to->growby = from->growby; + to->realloc_like = from->realloc_like; + initial_sz = StkSz(from); + to->bottom = to->realloc_like(NULL, initial_sz); + to->top = to->ptr = to->bottom + initial_sz; + memcpy(to->bottom, from->bottom, initial_sz); +} + +void StkTerminate(struct stk *s){ + (*s->realloc_like)(s->bottom, 0); /* free */ + s->ptr = s->bottom = s->top = NULL; +} + +void StkGrow(Stk *s, int nbytes){ + size_t newsz = (s->ptr - s->bottom) + nbytes + s->growby; + char *newbottom = (*s->realloc_like)(s->bottom, newsz); + if( newbottom == NULL ){ + Error("Can't realloc to sz=%ld in StkGrow\n", newsz); + } + s->ptr = (s->ptr - s->bottom) + newbottom; + s->top = newbottom + newsz; + s->bottom = newbottom; +} + +void *StkCrunch(struct stk *s){ + size_t newsz = s->ptr - s->bottom; + char *newbottom = (*s->realloc_like)(s->bottom, newsz); + s->ptr = s->top = newsz + newbottom; + s->bottom = newbottom; + return newbottom; +} diff --git a/external/libsdf/libsw/swampi.c b/external/libsdf/libsw/swampi.c new file mode 100644 index 0000000..6c6979a --- /dev/null +++ b/external/libsdf/libsw/swampi.c @@ -0,0 +1,1428 @@ +/* + * Copyright 1997 Michael Warren & John Salmon. All Rights Reserved. + */ +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#if defined(__SUN4__) || defined(__SUN5__) || defined(linux) +/* SunOS hides TCP_NODELAY and TCP_MAXSEG in netinet/tcp.h */ +#include +#endif +#ifdef __alpha +#include +#endif +#include "protos.h" +#include "swampi.h" +#include "mpmy.h" +#include "mpmy_io.h" +#include "mpmy_abnormal.h" +#include "error.h" +#include "Msgs.h" +#include "dll.h" +#include "hwclock.h" + +#ifndef INADDR_NONE +/* e.g., on SunOS */ +#define INADDR_NONE (-1) +#endif + +#ifndef MAX +#define MAX(a, b) ((a > b) ? a : b) +#define MIN(a, b) ((a < b) ? a : b) +#endif + +#define H_MAGIC (0x9f07) /* Magic number for headers */ + +#define HOST_NUM (-1) /* can't redefine this without mem adjust. */ + +/* message header */ +typedef struct { + int src : 20; + unsigned int comm : 6; + int len; + int tag; + int magic; +} msghdr_t; + +typedef struct { + int proc : 20; + unsigned int comm : 6; + unsigned int pending : 1; + unsigned int outgoing : 1; + unsigned int hdr_flag : 1; + unsigned int wild_src : 1; + unsigned int wild_tag : 1; + unsigned int buffered : 1; + int len; + int left; + int tag; + char *buf; /* This is where the message belongs */ + char *ptr; /* This might be a malloced temp buffer */ +} req_t; + +/* Must match MPI_Datatype enum */ +static unsigned int MPI_Datasize[] = +{ sizeof(float), sizeof(double), sizeof(long double), + sizeof(char), sizeof(char), sizeof(short), sizeof(int), + sizeof(long), sizeof(long long), + sizeof(unsigned), sizeof(unsigned int), sizeof(unsigned char), + sizeof(unsigned short), sizeof(unsigned long), sizeof(unsigned long long), + sizeof(MPI_float_int), sizeof(MPI_double_int), sizeof(MPI_long_int), + sizeof(MPI_2int), sizeof(MPI_short_int), sizeof(MPI_long_double_int), + 2*sizeof(float), 2*sizeof(double), + 1/*user_data*/ +}; + +char *mpi_datatype_name[_MPI_NUMDATATYPES] = { + "float", "double", "long double", + "byte", "char", "short", "int", + "long", "long long", + "unsigned", "unsigned int", "unsigned char", + "unsigned short", "unsigned long", "unsigned long long", + "float int", "double int", "long int", + "2int", "short int", "long double int", + "complex", "double complex", + "user data" +}; + +/* Must match MPI_Op enum */ +char *mpi_op_name[_MPI_NUMOPS] = { + "sum", "prod", "max", "min", "band", "bor", + "bxor", "land", "lor", "lxor", "maxloc", "minloc" +}; + +typedef enum { + ExecTm, WaitTm, TestTm, SelectTm, SendTm, RecvTm, SendBytes, RecvBytes, + NSends, NRecvs, + _MPI_NUMSTATS +} MPI_Statistics; + +char *mpi_stats_name[_MPI_NUMSTATS] = { + "Exec Time", "Wait Time", "Test Time", "Select Time", "Send Time", "Recv Time", + "Send MBytes", "Recv MBytes", "Num Sends", "Num Recvs" +}; + +static double mpi_statistics[_MPI_NUMSTATS]; + +unsigned int *_MPI_Datasize = MPI_Datasize; +int _MPI_Procnum, _MPI_Nproc; + +static int MPI_Procnum, MPI_Nproc; +static int Max_fd; /* for select */ + +/* these limits are arbitrary, and mostly to catch programming errors */ +#define MAX_NPOST 1000 /* This many req_t ptrs for each proc */ +#define MAX_NPOSTANY 1000 /* This many req ptrs for MPI_ANY_SOURCE */ + +#define CheckTypeOK(type) \ +if (type < 0 || type >= _MPI_NUMDATATYPES) Error("Invalid type (%d)\n", type) + +/* req_chn stores all send and recv request. It has a separate dll of send */ +/* and receive entries for each socket, and one wildcard receive entry */ +static Chn req_chn; +static Dll *recv_list, *send_list; /* we malloc Nproc of these */ +static Dll wild_dll, *wild_list; /* only need one of these */ + +/* For messages that arrive without the corresponding Irecv posted */ +static Dll bufr_dll, *bufr_list; /* only need one of these */ + +/* we malloc Nproc of these */ +static req_t **read_active, **write_active; + +static int my_pid; /* my process id */ +static struct sockaddr_in host_addr; +static struct sockaddr_in my_addr; +static int sock; /* file descriptor for listen socket */ +static int *s; /* array of file descriptor for all channels */ +static struct sockaddr_in *addr; /* sockaddrs for all listen sockets */ +static int sock_bind(struct sockaddr_in *acc); +static int sock_connect(struct sockaddr_in *acc); +static void sock_setopt(int fd); +static void req_done(Dll_elmt *req, MPI_Status *stat); +static int writemsg(int fd, const void *ptr, int nbytes); +static int readmsg(int fd, void *ptr, int nbytes); +static void writemsg_block(int fd, const void *ptr, int nbytes); +static void readmsg_block(int fd, void *ptr, int nbytes); +req_t *parse_hdr(msghdr_t *hdr, req_t *req, int proc); +static req_t *io_pending(Dll *d); +static void spin_io(Dll_elmt *req); +static int do_io(req_t *req, int proc); +static void do_io_local(req_t *ireq, int proc); +static void init_elt(void); +static void sock_getopt(int fd); +static void mpi_diagnostics(void); + +/* This controls the maximum amount read/written per system call */ +#define PKTSIZE (20*1460) +static void +init_elt(void) +{ + int suspend_proc; + int i, j; + int size = sizeof(struct sockaddr_in); + int hostport; + char *hostip; + unsigned long inaddr; + int addr_len = sizeof(struct sockaddr_in); + struct sockaddr_in tmp_addr; + int noblock = 1; + char msgfile[256]; + extern int singlAutoflush(int); + extern int _MPMY_procnum_, _MPMY_nproc_, _MPMY_initialized_; + + sock = sock_bind(&my_addr); /* establish port to listen */ + my_pid = getpid(); + + if (!(getenv("MPI_PROCNUM") && getenv("MPI_NPROC") + && getenv("MPI_HOSTPORT") && getenv("MPI_HOST"))) + Error("startup variables not in environment\n"); + + MPI_Procnum = atoi(getenv("MPI_PROCNUM")); + MPI_Nproc = atoi(getenv("MPI_NPROC")); + hostip = getenv("MPI_HOST"); + hostport = atoi(getenv("MPI_HOSTPORT")); + + if (listen(sock, MPI_Nproc+1) < 0) Error("listen failed, errno=%d\n", errno); + + if( getenv("MPI_SUSPEND") && strlen(getenv("MPI_SUSPEND")) > 0 ){ + suspend_proc = atoi(getenv("MPI_SUSPEND")); + if (suspend_proc == -1) /* suspend all */ + suspend_proc = MPI_Procnum; + if (MPI_Procnum == suspend_proc){ + Shout("suspending pid=%d, MPI_Procnum=%d\n", + my_pid, MPI_Procnum); +#ifdef linux + sleep(10); /* hack. How can we do it right? */ +#else + kill(my_pid, SIGSTOP); +#endif + } + } + + if (getenv("MPI_MESSAGE_TURNON") + && strlen(getenv("MPI_MESSAGE_TURNON")) > 0 ) { + sprintf(msgfile, "msgs/msg.%d", MPI_Procnum); + MsgdirInit(msgfile); + Msg_turnon(getenv("MPI_MESSAGE_TURNON")); + } + /* This allows us to be polite about hung processes */ + if (getenv("MPI_TIMEOUT") && strlen(getenv("MPI_TIMEOUT")) > 0 ) + MPMY_TimeoutSet(atoi(getenv("MPI_TIMEOUT"))); + + /* This lets us use the MPMY stuff like Error */ + _MPI_Procnum = MPI_Procnum; + _MPI_Nproc = MPI_Nproc; + _MPMY_procnum_ = MPI_Procnum; + _MPMY_nproc_ = MPI_Nproc; + _MPMY_initialized_ = 1; + _MPMY_setup_absigs(); + MPMY_OnAbnormal(MPMY_SystemAbort); +#if 1 /* should this be under argc/argv control? */ + sprintf(MPMY_Abchdir_arg, "mpi/%03d", MPI_Procnum); + MPMY_OnAbnormal(MPMY_Abchdir); +#endif + MPMY_OnAbnormal(mpi_diagnostics); + MPMY_OnAbnormal(MPMY_Abannounce); + singlAutoflush(1); + + /* fill in host_addr here */ + memset(&host_addr, sizeof(host_addr), 0); + host_addr.sin_family = AF_INET; + /* Now try to figure out the host's address from hostname */ + /* Don't bother with gethostbyname here. Assume that the host + has worked out its preferred numeric IP address and passed that + to us through the environment var MPI_HOST */ + if ((inaddr = inet_addr(hostip)) != INADDR_NONE) /* it is numeric */ + host_addr.sin_addr.s_addr = inaddr; + else + Error("inet_addr(%s) failed, errno=%d\n", hostip, errno); + host_addr.sin_port = htons(hostport); + Msgf(("mpi: host is %s, address is %s, port is %d\n", + hostip, inet_ntoa(host_addr.sin_addr), ntohs(host_addr.sin_port))); + + addr = (struct sockaddr_in *) calloc(MPI_Nproc+1, size); + s = calloc(MPI_Nproc+1, sizeof(int)); + if (addr == NULL || s == NULL) Error("out of memory\n"); + addr++; /* offset by one so host is at -1 */ + s++; + + read_active = malloc(MPI_Nproc * sizeof(req_t *)); + write_active = malloc(MPI_Nproc * sizeof(req_t *)); + recv_list = malloc(MPI_Nproc * sizeof(Dll)); + send_list = malloc(MPI_Nproc * sizeof(Dll)); + if (read_active == NULL || write_active == NULL || + recv_list == NULL || send_list == NULL) Error("out of memory\n"); + /* I want to use plain realloc here, and DllCreatChn won't let me */ + /* Thus, I use ChnInit explicitly */ + /* DllCreateChn(&req_chn, sizeof(req_t), 100); */ + ChnInit(&req_chn, sizeof(Dll_elmt)+sizeof(req_t)-sizeof(int), 100, + realloc); + for (i = 0; i < MPI_Nproc; i++) { + DllCreate(recv_list+i, &req_chn); + DllCreate(send_list+i, &req_chn); + read_active[i] = 0; + write_active[i] = 0; + } + wild_list = &wild_dll; + DllCreate(wild_list, &req_chn); + bufr_list = &bufr_dll; + DllCreate(bufr_list, &req_chn); + + s[HOST_NUM] = sock_connect(&host_addr); /* establish connection to host */ + + writemsg_block(s[HOST_NUM], &MPI_Procnum, sizeof(int)); + /* send port on which we listen for connections from other procs */ + writemsg_block(s[HOST_NUM], &my_addr, size); + + /* read array of listening ports */ + readmsg_block(s[HOST_NUM], addr, MPI_Nproc * size); + Msgf(("mpi: Got port list\n")); + + for (i = 0; i < MPI_Nproc; i++) { + if (i == MPI_Procnum) { + for (j = MPI_Procnum; j < MPI_Nproc-1; j++) { + int proc, ts; + memset(&tmp_addr, 0, sizeof(struct sockaddr_in)); + ts = accept(sock, (struct sockaddr *)&tmp_addr, &addr_len); + if (ts < 0) Error("accept failed, errno=%d\n", errno); + sock_setopt(ts); + readmsg_block(ts, &proc, sizeof(int)); + s[proc] = ts; +#ifdef FIONBIO + /* Isn't this the same as the TCP_NODELAY that we set in + sock_setopt() ? Solaris doesn't have FIONBIO at all. + Does it matter? */ + if(ioctl(s[proc], FIONBIO, &noblock)) { + Error("ioctl, errno=%d", errno); + } +#endif + Msgf(("mpi: Accepted %d on socket %d\n", proc, ts)); + } + } else if (MPI_Procnum > i) { + s[i] = sock_connect(addr+i); + writemsg_block(s[i], &MPI_Procnum, sizeof(int)); +#ifdef FIONBIO + /* Isn't this the same as the TCP_NODELAY that we set in + sock_setopt() ? Solaris doesn't have FIONBIO at all. + Does it matter? */ + if(ioctl(s[i], FIONBIO, &noblock)) { + Error("ioctl, errno=%d", errno); + } +#endif + Msgf(("mpi: Connected to %d on socket %d\n", i, s[i])); + } + } + Max_fd = 0; + for (i = 0; i < MPI_Nproc; i++) { + if (s[i] > Max_fd) Max_fd = s[i]; + } + s[MPI_Procnum] = -1; /* we should never connect to ourself */ + if (MPI_Nproc > 1 && MPI_Procnum == 0) sock_getopt(s[1]); + MPI_Barrier(MPI_COMM_PRIVATE); + zero_hwclock(); /* zero timer here */ + Msgf(("mpi: hwclock zeroed\n")); +} + +/* _MPI_init_host1 is called BEFORE the host forks off child processes. + It has to figure out its own port and hostname, so it can pass it + to the children in the environment. */ +void +_MPI_init_host1(int *portp, char **namep, int nproc){ + char *p; + + sock = sock_bind(&host_addr); + if (listen(sock, nproc+1) < 0) Error("listen failed, errno=%d\n", errno); + my_pid = getpid(); + MPI_Procnum = HOST_NUM; + *portp = ntohs(host_addr.sin_port); + p = inet_ntoa(host_addr.sin_addr); + *namep = malloc(strlen(p)+1); + strcpy(*namep, p); +} + +/* _MPI_init_host is called after the host forks the child processes. + Its job is to communicate all the port info with them so they + can talk to one another directly. */ +void +_MPI_init_host(int n) /* n is how many nodes we talk to */ +{ + int i; + int proc; + int addr_len = sizeof(struct sockaddr_in); + struct sockaddr_in tmp_addr; + int size = sizeof(struct sockaddr_in); + + addr = (struct sockaddr_in *) calloc(n+1, size); + s = calloc(n+1, sizeof(int)); + if (addr == NULL || s == NULL) Error("out of memory\n"); + addr++; /* offset by one so host is at -1 */ + s++; + + for (i = 0; i < n; i++) { + int ts; + memset(&tmp_addr, 0, sizeof(struct sockaddr_in)); + ts = accept(sock, (struct sockaddr *)&tmp_addr, &addr_len); + if (ts < 0) Error("accept failed, errno=%d\n", errno); + readmsg_block(ts, &proc, sizeof(int)); /* who did it come from */ + readmsg_block(ts, addr+proc, size); /* address that is listening */ + s[proc] = ts; + Msgf(("mpi: Received addr from %d\n", proc)); + } + s[HOST_NUM] = -1; + + for (i = 0; i < n; i++) { + writemsg_block(s[i], addr, n * size); + Msgf(("mpi: Sent addrs to %d\n", i)); + } +} + +static void +sock_setopt(int fd) +{ + int nodelay = 1; + int no_check = 0; /* not clear if this does anything */ + int sendbuf = 65535; + int recvbuf = 65535; + + + if (setsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (const void *)&nodelay, sizeof(int))) + Error("setsockopt tcp_nodelay, errno=%d", errno); + /* The idea is to turn off checksumming, since ethernet does it anyway */ +#ifdef SO_NO_CHECK + /* SunOS doesn't have SO_NO_CHECK */ + if (setsockopt(fd, SOL_SOCKET, SO_NO_CHECK, (const void *)&no_check, sizeof(int))) + Error("sockopt no_check, errno=%d", errno); +#endif + if (setsockopt(fd, SOL_SOCKET, SO_SNDBUF, (const void *)&sendbuf, sizeof(int))) + Error("sockopt sndbuf, errno=%d", errno); + if (setsockopt(fd, SOL_SOCKET, SO_RCVBUF, (const void *)&recvbuf, sizeof(int))) + Error("sockopt rcvbuf, errno=%d", errno); +} + +static void +sock_getopt(int fd) +{ + int val, len; + + len = sizeof(int); + if (getsockopt(fd, IPPROTO_TCP, TCP_NODELAY, (void *)&val, &len)) + Error("getsockopt, errno=%d", errno); + Msgf(("mpi: tcp_nodelay %d\n", val)); + len = sizeof(int); + if (getsockopt(fd, IPPROTO_TCP, TCP_MAXSEG, (void *)&val, &len)) + Error("getsockopt, errno=%d", errno); + Msgf(("mpi: tcp_maxseg %d\n", val)); + len = sizeof(int); +#ifdef SO_NO_CHECK + if (getsockopt(fd, SOL_SOCKET, SO_NO_CHECK, (void *)&val, &len)) + Error("getsockopt, errno=%d", errno); + Msgf(("mpi: so_no_check %d\n", val)); +#endif + len = sizeof(int); + if (getsockopt(fd, SOL_SOCKET, SO_SNDBUF, (void *)&val, &len)) + Error("getsockopt, errno=%d", errno); + Msgf(("mpi: so_sndbuf %d\n", val)); + len = sizeof(int); + if (getsockopt(fd, SOL_SOCKET, SO_RCVBUF, (void *)&val, &len)) + Error("getsockopt, errno=%d", errno); + Msgf(("mpi: so_rcvbuf %d\n", val)); + len = sizeof(int); +} + +static char * +printSockaddr(const struct sockaddr_in *sa){ + static char ans[512]; + sprintf(ans, "sockaddr_in: family: %d, sin_addr: %s, sin_port is %d\n", + sa->sin_family, inet_ntoa(sa->sin_addr), ntohs(sa->sin_port)); + return ans; +} + +/* Connect with listening socket described by acc, return a descriptor */ +static int +sock_connect(struct sockaddr_in *acc) +{ + int ret, fd; + + if ((fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP)) < 0) + Error("socket failed, errno=%d\n", errno); + sock_setopt(fd); +#if 0 + do { + /* we should use a select here */ + ret = connect(fd, (struct sockaddr *)acc, sizeof(struct sockaddr_in)); + if (ret && errno != ECONNREFUSED) + Error("connect failed, errno=%d\n", errno); + else Msgf(("mpi: connection refused\n")); + if (max_retries < 10) sleep(1); /* try not to beat on the other side */ + } while (ret && max_retries--); +#else + /* We want to make this non-blocking, so we can select on it. */ + fcntl(fd, F_SETFL, O_NONBLOCK); + ret = connect(fd, (struct sockaddr *)acc, sizeof(struct sockaddr_in)); + if( ret ){ + if( errno == EINPROGRESS ){ + fd_set wtfds; + struct timeval timeout; + int intlen; + FD_ZERO(&wtfds); + FD_SET(fd, &wtfds); + timeout.tv_sec = 20; + timeout.tv_usec = 0; + ret = select(fd+1, NULL, &wtfds, NULL, &timeout); + if( ret < 0 ) + Error("initial select on socket fails, errno=%d\n", errno); + if( ret == 0 ) + Error("Socket never became ready, timing out\n"); + if( ret != 1 ) + Error("Select returns unexpected value: %d. Giving up\n", ret); + /* Ok, there's one fd ready, we can look at it to see whether + it had a genuine error, or it's hunky dory (c.f., man connect) */ + intlen = sizeof(ret); + getsockopt(fd, SOL_SOCKET, SO_ERROR, (void *)&ret, &intlen); + if( ret ){ + Error("getsockopt says SO_ERROR=%d trying to connect to %s, Dazed and confused\n", ret, printSockaddr(acc)); + } + }else{ + Error("connect failed with errno=%d\n", errno); + } + } +#endif + return fd; +} + +/* Bind a port to listen on, fill acc with info, return a descriptor */ +static int +sock_bind(struct sockaddr_in *acc) +{ + int fd, ret; + int len = sizeof(struct sockaddr_in); + char hostname[256]; + struct hostent *hp; + + fd = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP); + if (fd < 0 ) Error("socket failed, errno=%d\n", errno); + sock_setopt(fd); + memset(acc, 0, sizeof(struct sockaddr_in)); + acc->sin_family = AF_INET; + acc->sin_addr.s_addr = INADDR_ANY; + acc->sin_port = 0; /* INADDR_ANY? */ + ret = bind(fd, (struct sockaddr *) acc, sizeof(struct sockaddr_in)); + if (ret < 0 ) Error(" Can't bind socket. errno=%d\n", errno); + if (getsockname(fd, (struct sockaddr *)acc, &len)) + Error(" Can't getsockname. errno=%d\n", errno); + + /* Unfortunately, getsockname doesn't replace INADDR_ANY with + a valid saddr_in */ + /* We should be able to overrule gethostname with an env var + or a cmd-line arg */ + if (gethostname(hostname, sizeof(hostname))) + Error("gethostname failed, errno=%d\n", errno); + hostname[sizeof(hostname)-1] = '\0'; + if ((hp = gethostbyname(hostname)) == NULL) + Error("gethostbyname(%s) failed\n", hostname); + memcpy(&(acc->sin_addr), hp->h_addr, hp->h_length); + + /* Now acc holds 'correct' info about the socket */ + Msgf(("mpi: Bound port %d\n", ntohs(acc->sin_port))); + errno = 0; /* clear errors */ + return fd; +} + +static void +req_done(Dll_elmt *elmt, MPI_Status *status) +{ + req_t *req = DllData(elmt); + Dll *remove; + int len; + + if (req->left || req->pending) + Error("req_done called before all data was delivered\n"); + + if (status) { + status->MPI_SOURCE = req->proc; + status->MPI_TAG = req->tag; + status->count = req->len; + } + if (req->outgoing) { + Msgf(("mpi: %9f delivered (%d.%d) to %d len %d\n", hwclock(), + req->tag, req->comm, req->proc, req->len)); + remove = send_list+req->proc; + } else { + Msgf(("mpi: %9f %s (%d.%d) from %d len %d\n", hwclock(), + (req->buffered) ? "unbuffered" : "received", + req->tag, req->comm, req->proc, req->len)); + if (req->buffered) remove = bufr_list; + else if (req->wild_src) remove = wild_list; + else remove = recv_list+req->proc; + } + DllDelete(remove, elmt); + len = DllLength(remove); + if (len < 0 || len > MAX_NPOSTANY) Error("bad dll length (%d)\n", len); +} + +static req_t * +io_pending(Dll *d) +{ + Dll_elmt *p; + + for (p = DllBottom(d); p != DllSup(d); p = DllUp(p)) { + req_t *r = DllData(p); + if (r->pending) return r; + } + return NULL; +} + + +/* if req is set, block until req is clear */ +static void +spin_io(Dll_elmt *elmt) +{ + int i, j, ret; + fd_set rdset, wtset; + struct timeval timeout, *tv; + req_t *req = NULL; + req_t *r; + req_t *wild_pending; + double t1, t2; + + if (elmt) { + int npoll = 100; + req = DllData(elmt); + Msgf(("mpi: %9f spin_io %s (%d.%d) %d \n", hwclock(), (req->outgoing) + ? "sending" : "receiving", req->tag, req->comm, req->proc)); + + /* Fast path to avoid select overhead */ + + if (req->proc != MPI_ANY_SOURCE && req->proc != MPI_Procnum) { + i = req->proc; + t1 = hwclock(); + while (req->pending && npoll-- > 0) { + if (req->outgoing) { + if (write_active[i]) + do_io(write_active[i], i); + else if ((r = io_pending(send_list+i)) != NULL) + do_io(r, i); + } else { + if (read_active[i]) + do_io(read_active[i], i); + else if ((r = io_pending(recv_list+i)) != NULL) + do_io(r, i); + } + } +#if 0 + t2 = hwclock(); + if (req->pending) + Msgf(("mpi: %9f polled %6.3f msec without success\n", t1, + (t2-t1)*1e3)); + else + Msgf(("mpi: %9f polled %6.3f msec (%d times)\n", t1, + (t2-t1)*1e3, 30-npoll)); +#endif + } + if (req->pending == 0) return; + } + + do { + FD_ZERO(&rdset); + FD_ZERO(&wtset); + + wild_pending = io_pending(wild_list); + for (i = 0; i < MPI_Nproc; i++) { + if (i == MPI_Procnum) continue; + /* We need to check all read ports if there is a wildcard recv */ + if (read_active[i] || wild_pending != NULL + || io_pending(recv_list+i) != NULL) + FD_SET(s[i], &rdset); + if (write_active[i] || io_pending(send_list+i) != NULL) + FD_SET(s[i], &wtset); + } + timeout.tv_sec = 0; + timeout.tv_usec = 0; + if (req && MPI_Nproc > 1 && req->proc != MPI_Procnum) { + tv = NULL; /* Don't let select block if there is only 1 proc */ + } else { + tv = &timeout; + } + + again: + t1 = hwclock(); + ret = select(Max_fd+1, &rdset, &wtset, NULL, tv); + t2 = hwclock(); + if (ret == -1) { + if (errno == EINTR) goto again; /* SIGPROF interrupts select */ + else Error("select, errno=%d\n", errno); + } + mpi_statistics[SelectTm] += t2-t1; + Msgf(("mpi: %9f select %6.3f msec\n", t1, (t2-t1)*1e3)); + + /* We use read_active and write_active to make sure we finish */ + /* reading an entire message before starting on a new one */ + + /* This might work better if one did hypercube channels first */ + /* or, flipped reads and writes based on procnum parity, etc. */ + + for (j = 0; j < MPI_Nproc; j++) { + i = (j + MPI_Procnum) % MPI_Nproc; + if (i == MPI_Procnum) continue; + if (FD_ISSET(s[i], &rdset)) { + if (read_active[i]) + do_io(read_active[i], i); + else if ((r = io_pending(recv_list+i)) != NULL) + do_io(r, i); + else if ((r = io_pending(wild_list)) != NULL) + if (!r->hdr_flag || r->proc == i) + do_io(r, i); + } + } + + for (j = 0; j < MPI_Nproc; j++) { + i = (j + MPI_Procnum) % MPI_Nproc; + if (i == MPI_Procnum) continue; + if (FD_ISSET(s[i], &wtset)) { + if (write_active[i]) + do_io(write_active[i], i); + else if ((r = io_pending(send_list+i)) != NULL) + do_io(r, i); + } + } + + /* Take care of messages to ourself */ + if (read_active[MPI_Procnum]) + do_io_local(read_active[MPI_Procnum], MPI_Procnum); + else if (io_pending(send_list+MPI_Procnum) != NULL) { + if ((r = io_pending(recv_list+MPI_Procnum)) != NULL) + do_io_local(r, MPI_Procnum); + else if ((r = io_pending(wild_list))) + if (!r->hdr_flag || r->proc == MPI_Procnum) + do_io_local(r, MPI_Procnum); + } + } while (req && req->pending); +} + +/* Return number of bytes if we got any, to help out spin_io */ +static int +do_io(req_t *req, int proc) +{ + int n = 0; + int left; + double t1, t2; + msghdr_t hdr; + int fd = s[proc]; + + if (!req->hdr_flag) { + + /* In order to reduce latency, we package the header together */ + /* with some data. One would think writev would work, but it */ + /* does not. tcpdump indicates we get multiple packets from writev */ + /* Doing this gets latency from 350 usec down to 220 (with polling) */ + + if (req->outgoing) { + char pktbuf[1024+sizeof(msghdr_t)]; + msghdr_t *hdrp; + hdrp = (msghdr_t *)pktbuf; + hdrp->src = MPI_Procnum; + hdrp->comm = req->comm; + hdrp->tag = req->tag; + hdrp->len = req->len; + hdrp->magic = H_MAGIC; + n = MIN(req->left, 1024); + memcpy(pktbuf + sizeof(msghdr_t), req->ptr, n); + t1 = hwclock(); + if (writemsg(fd, pktbuf, n+sizeof(msghdr_t)) == -1) return 0; + t2 = hwclock(); + Msgf(("mpi: %9f %s %6.3f msec %2d %5d - %5.2f Mb/s\n", + t1, "wrote ", (t2-t1)*1000.0, proc, n, n/(1e6*(t2-t1)))); + write_active[proc] = req; + req->left -= n; + req->ptr += n; + mpi_statistics[SendBytes] += n; + mpi_statistics[SendTm] += t2-t1; + } else { + if (readmsg(fd, &hdr, sizeof(msghdr_t)) == -1) return 0; + /* We may decide we match a different request */ + req = parse_hdr(&hdr, req, proc); + read_active[proc] = req; + } + req->hdr_flag = 1; /* mark header done */ + } + + if (req->pending == 0) Error("req already completed\n"); + if (req->proc != proc) Error("req is inconsistent with proc\n"); + left = MIN(req->left, PKTSIZE); + /* Small messages get sent with the header */ + if (left) { + t1 = hwclock(); +#if 1 + n = (req->outgoing) ? write(fd, req->ptr, left) + : read(fd, req->ptr, left); +#else + n = (req->outgoing) ? send(fd, req->ptr, left, 0) + : recv(fd, req->ptr, left, 0); +#endif + t2 = hwclock(); + if (n < 0) { + if (errno == EAGAIN) n = 0; + else Error("%s failed, errno=%d\n", + (req->outgoing) ? "write" : "read", errno); + } + if (req->outgoing) { + mpi_statistics[SendBytes] += n; + mpi_statistics[SendTm] += t2-t1; + } else { + mpi_statistics[RecvBytes] += n; + mpi_statistics[RecvTm] += t2-t1; + } + Msgf(("mpi: %9f %s %6.3f msec %2d %5d - %5.2f Mb/s\n", + t1, (req->outgoing) ? "wrote " : "read ", + (t2-t1)*1000.0, proc, n, n/(1e6*(t2-t1)))); + req->left -= n; + req->ptr += n; + } + if (req->left == 0) { + req->pending = 0; + if (req->outgoing) write_active[proc] = NULL; + else read_active[proc] = NULL; + } + return n; +} + +static int +writemsg(int fd, const void *p, int n) +{ + int nwrote; + int left = n; + const char *ptr = p; + while (left > 0) { + if ((nwrote = send(fd, ptr, left, 0)) < 0) { + if (errno == EAGAIN) { + if (left == n) return -1; + else continue; + } + else Error("write failed, errno=%d\n", errno); + } + left -= nwrote; + ptr += nwrote; + } + return 0; +} + +static int +readmsg(int fd, void *p, int n) +{ + int nread; + int left = n; + char *ptr = p; + while (left > 0) { + if ((nread = recv(fd, ptr, left, 0)) < 0) { + if (errno == EAGAIN) { + if (left == n) return -1; + else continue; + } + else Error("read failed, errno=%d\n", errno); + } + left -= nread; + ptr += nread; + } + return 0; +} + +/* Blocking write. Don't return until all data is sent */ +static void +writemsg_block(int fd, const void *p, int n) +{ + int nwrote; + int left = n; + const char *ptr = p; + while (left > 0) { + if ((nwrote = write(fd, ptr, left)) < 0) { + if (errno == EAGAIN) continue; + else Error("write failed, errno=%d\n", errno); + } + left -= nwrote; + ptr += nwrote; + } +} + +/* Blocking read. Don't return until all data is read */ +static void +readmsg_block(int fd, void *p, int n) +{ + int nread; + int left = n; + char *ptr = p; + while (left > 0) { + if ((nread = read(fd, ptr, left)) < 0) { + if (errno == EAGAIN) continue; + else Error("read failed, errno=%d\n", errno); + } + left -= nread; + ptr += nread; + } +} + +static req_t * +match_tag(msghdr_t *hdr, Dll *wildp, Dll *recvp) +{ + Dll_elmt *p; + req_t *req; + + for (p = DllBottom(wildp); p != DllSup(wildp); p = DllUp(p)) { + req_t *r = DllData(p); + if (r->pending && hdr->tag == r->tag && hdr->comm == r->comm) + return r; + } + for (p = DllBottom(recvp); p != DllSup(recvp); p = DllUp(p)) { + req_t *r = DllData(p); + if (r->pending && hdr->tag == r->tag && hdr->comm == r->comm) + return r; + } + /* No matches, so make a new entry which points to a malloced buffer */ + /* When Irecv is called with a matching tag, it will find this */ + p = DllInsertAtTop(bufr_list); + if (DllLength(bufr_list) > MAX_NPOSTANY) + Error("Too many recvs buffered\n"); + if (p == NULL) Error("dll is NULL\n"); + req = DllData(p); + req->proc = hdr->src; + req->pending = 1; + req->wild_src = 0; + req->outgoing = req->hdr_flag = req->wild_tag = 0; + req->buffered = 1; + req->len = req->left = hdr->len; + req->comm = hdr->comm; + req->tag = hdr->tag; + req->buf = req->ptr = malloc(hdr->len); + if (req->ptr == NULL) Error("out of memory\n"); + Msgf(("mpi: %9f buffering (%d.%d) from %d len %d\n", + hwclock(), req->tag, req->comm, req->proc, req->len)); + return req; +} + +/* return a req_t which matches the header we just read, or else buffer */ +req_t * +parse_hdr(msghdr_t *hdr, req_t *req, int proc) +{ + if (hdr->magic != H_MAGIC) + Error("bad magic number %d\n", hdr->magic); + if (hdr->tag != req->tag || hdr->comm != req->comm) { + if (req->wild_tag && hdr->comm == req->comm) + req->tag = hdr->tag; + else + req = match_tag(hdr, wild_list, recv_list+proc); + } + if (hdr->src != req->proc) { + if (req->wild_src) req->proc = hdr->src; + else Error("bad src, got %d expected %d\n", hdr->src, req->proc); + } + if (hdr->len != req->len) { + if (hdr->len >= 0 && hdr->len < req->len) + req->len = req->left = hdr->len; + else Error("bad len, got %d expected %d\n", hdr->len, req->len); + } + Msgf(("mpi: %9f receiving (%d.%d)\n", hwclock(), req->tag, req->comm)); + return req; +} + +static void +do_io_local(req_t *ireq, int proc) +{ + msghdr_t hdr; + int left; + double t1, t2; + req_t *oreq = io_pending(send_list + proc); + + hdr.src = MPI_Procnum; + hdr.tag = oreq->tag; + hdr.comm = oreq->comm; + hdr.len = oreq->len; + hdr.magic = H_MAGIC; + ireq = parse_hdr(&hdr, ireq, proc); + ireq->hdr_flag = oreq->hdr_flag = 1; + + left = MIN(ireq->left, PKTSIZE); + t1 = hwclock(); + memcpy(ireq->ptr, oreq->ptr, left); + t2 = hwclock()-t1; + Msgf(("mpi: %9f %s %2d %5d - %5.2f Mb/s\n", hwclock(), "copy ", + MPI_Procnum, left, left/(1e6*t2))); + ireq->left -= left; + ireq->ptr += left; + oreq->ptr += left; + if (ireq->left == 0) { + ireq->pending = 0; + oreq->pending = 0; + oreq->left = 0; /* input may be smaller than output */ + } +} + +static void +dump_req(Dll *d) +{ + Dll_elmt *p; + + for (p = DllBottom(d); p != DllSup(d); p = DllUp(p)) { + req_t *r = DllData(p); + Msg_do("\tproc %3d tag %6d flags %d%d%d%d%d len %6d left %6d\n", + r->proc, r->tag, r->pending, r->hdr_flag, r->wild_src, + r->wild_tag, r->buffered, r->len, r->left); + } +} + +static void +mpi_diagnostics(void) +{ + int i, sum; + + sum = 0; + for (i = 0; i < MPI_Nproc; i++) + sum += DllLength(send_list+i); + if (sum) { + Msg_do("mpi: %d writes active:\n", sum); + for (i = 0; i < MPI_Nproc; i++) + dump_req(send_list+i); + } + sum = 0; + for (i = 0; i < MPI_Nproc; i++) + sum += DllLength(recv_list+i); + if (sum) { + Msg_do("mpi: %d reads active:\n", sum); + for (i = 0; i < MPI_Nproc; i++) + dump_req(recv_list+i); + } + if (DllLength(bufr_list)) { + Msg_do("mpi: %d buffered reads active:\n", DllLength(bufr_list)); + dump_req(bufr_list); + } + if (DllLength(wild_list)) { + Msg_do("mpi: %d ANY_SOURCE reads active:\n", DllLength(wild_list)); + dump_req(wild_list); + } +} + + +int +MPI_Init(int *argcp, char ***argvp) +{ + int i; + init_elt(); + for (i = 0; i < _MPI_NUMSTATS; i++) mpi_statistics[i] = 0.0; + mpi_statistics[ExecTm] = hwclock(); + return MPI_SUCCESS; +} + +int +MPI_Finalize(void) +{ + int i; + Msgf(("mpi: Finalize\n")); + MPI_Barrier(MPI_COMM_PRIVATE); + mpi_statistics[ExecTm] = hwclock() - mpi_statistics[ExecTm]; + mpi_statistics[SendBytes] /= 1e6; + mpi_statistics[RecvBytes] /= 1e6; + for (i = 0; i < _MPI_NUMSTATS; i++) { + Msg_do("%14s %12.2f\n", mpi_stats_name[i], mpi_statistics[i]); + } + MPMY_Abchdir(); /* puts gmon.out in different dirs */ + /* Should we free some of the arrays??? */ + return MPI_SUCCESS; +} + +int +MPI_Abort(MPI_Comm comm, int errorcode) +{ + Error("MPI_Abort, errorcode=%d\n", errorcode); +} + +double +MPI_Wtime(void) +{ + return hwclock(); +} + +double +MPI_Wtick(void) +{ + return hwtick(); +} + +int +MPI_Comm_rank(MPI_Comm comm, int *rank) +{ + *rank = MPI_Procnum; + return MPI_SUCCESS; +} + +int +MPI_Comm_size(MPI_Comm comm, int *size) +{ + *size = MPI_Nproc; + return MPI_SUCCESS; +} + +int +MPI_Get_count(MPI_Status *status, MPI_Datatype type, int *cnt) +{ + if (type < 0 || type >= _MPI_NUMDATATYPES) + Error("Datatype invalid in Get_count (%d)\n", type); + *cnt = status->count/MPI_Datasize[type]; + return MPI_SUCCESS; +} + +int +MPI_Isend(void *buf, int cnt, MPI_Datatype type, int dest, int tag, + MPI_Comm comm, MPI_Request *req) +{ + Dll_elmt *dll; + req_t *request; + + Msgf(("mpi: %9f Isend: (%d.%d) to %d len %d\n", + hwclock(), tag, comm, dest, cnt * MPI_Datasize[type])); + if ((long)buf % MPI_Datasize[type]) + Msg_do("mpi: Unaligned send of %s at %p\n", + mpi_datatype_name[type], buf); + mpi_statistics[NSends] += 1.0; + + CheckTypeOK(type); + if (dest < 0 || dest > MPI_Nproc) + Error("dest invalid in Isend (%d)\n", dest); + + dll = DllInsertAtTop(send_list+dest); + if (dll == 0) Error("dll is NULL\n"); + if (DllLength(send_list+dest) > MAX_NPOST) + Error("Too many Isends pending\n"); + request = DllData(dll); + request->proc = dest; + request->pending = 1; + request->outgoing = 1; + request->hdr_flag = request->wild_src = request->wild_tag = 0; + request->buffered = 0; + request->len = request->left = cnt * MPI_Datasize[type]; + request->comm = comm; + request->tag = tag; + request->buf = request->ptr = (void *)buf; + *(MPI_Request *)req = dll; + return MPI_SUCCESS; +} + +int +MPI_Irecv(void *buf, int cnt, MPI_Datatype type, int src, int tag, + MPI_Comm comm, MPI_Request *req) +{ + Dll_elmt *elmt; + req_t *request; + + Msgf(("mpi: %9f Irecv: (%d.%d) from %d len %d\n", + hwclock(), tag, comm, src, cnt * MPI_Datasize[type])); + if ((long)buf % MPI_Datasize[type]) + Msg_do("mpi: Unaligned recv of %s at %p\n", + mpi_datatype_name[type], buf); + mpi_statistics[NRecvs] += 1.0; + + CheckTypeOK(type); + if (src != MPI_ANY_SOURCE && (src < 0 || src > MPI_Nproc)) + Error("source invalid in Irecv (%d)\n", src); + + /* Check if we already have the message */ + for (elmt = DllBottom(bufr_list); elmt != DllSup(bufr_list); + elmt = DllUp(elmt)) { + request = DllData(elmt); + if ((tag == request->tag || tag == MPI_ANY_TAG) + && (src == request->proc || src == MPI_ANY_SOURCE) + && comm == request->comm) { + Msgf(("mpi: %9f matched (%d.%d) len %d\n", hwclock(), tag, comm, + request->len-request->left)); + /* We might not have buffered the entire message yet */ + memcpy(buf, request->buf, request->len-request->left); + free(request->buf); + request->buf = buf; + request->ptr = request->buf + (request->len-request->left); + if (request->pending && read_active[request->proc] != request) + Error("Irecv request was buffered+pending but not active\n"); + *(MPI_Request *)req = elmt; + return MPI_SUCCESS; + } + } + if (src == MPI_ANY_SOURCE) { + elmt = DllInsertAtTop(wild_list); + if (DllLength(wild_list) > MAX_NPOSTANY) + Error("Too many wildcard Irecvs pending\n"); + } else { + elmt = DllInsertAtTop(recv_list+src); + if (DllLength(recv_list+src) > MAX_NPOST) + Error("Too many Irecvs pending\n"); + } + if (elmt == 0) Error("elmt is NULL\n"); + request = DllData(elmt); + if (request == 0) Error("dlldata is NULL\n"); + request->proc = src; + request->pending = 1; + request->wild_src = (src == MPI_ANY_SOURCE) ? 1 : 0; + request->outgoing = request->hdr_flag = request->wild_tag = 0; + request->buffered = 0; + request->len = request->left = cnt * MPI_Datasize[type]; + request->comm = comm; + request->tag = tag; + request->buf = request->ptr = buf; + *(MPI_Request *)req = elmt; + return MPI_SUCCESS; +} + +int +MPI_Test(MPI_Request *rptr, int *flag, MPI_Status *status) +{ + Dll_elmt *elmt = *rptr; + req_t *req; + double t1; + + if (elmt == NULL) Error("NULL message request\n"); + req = DllData(elmt); + +/* req->pending is the same as left being non-zero */ +/* except for zero length messages */ + + t1 = hwclock(); + if (req->pending) + spin_io(NULL); + t1 = hwclock() - t1; + + mpi_statistics[TestTm] += t1; + if (req->pending) { + *flag = 0; + } else { + req_done(elmt, status); + *flag = 1; + } + return MPI_SUCCESS; +} + +int +MPI_Wait(MPI_Request *rptr, MPI_Status *status) +{ + Dll_elmt *elmt = *rptr; + req_t *req; + double t1; + + if (elmt == NULL) Error("NULL message request\n"); + req = DllData(elmt); + + t1 = hwclock(); + while (req->pending) + spin_io(elmt); + t1 = hwclock() - t1; + + mpi_statistics[WaitTm] += t1; + req_done(elmt, status); + + return MPI_SUCCESS; +} + +int +MPI_Waitall(int count, MPI_Request *reqv, MPI_Status *statusv) +{ + int i; + for (i = 0; i < count; i++) + if (statusv == NULL) MPI_Wait(reqv+i, NULL); + else MPI_Wait(reqv+i, statusv+i); + return MPI_SUCCESS; +} + +int +MPI_Send(void *buf, int cnt, MPI_Datatype type, int dest, int tag, + MPI_Comm comm) +{ + MPI_Request req; + + MPI_Isend(buf, cnt, type, dest, tag, comm, &req); + MPI_Wait(&req, 0); + return MPI_SUCCESS; +} + +int +MPI_Recv(void *buf, int cnt, MPI_Datatype type, int src, int tag, + MPI_Comm comm, MPI_Status *status) +{ + MPI_Request req; + MPI_Irecv(buf, cnt, type, src, tag, comm, &req); + MPI_Wait(&req, status); + return MPI_SUCCESS; +} + + +int +MPI_Sendrecv(void *sendbuf, int sendcount, MPI_Datatype sendtype, + int dest, int sendtag, void *recvbuf, int recvcount, + MPI_Datatype recvtype, int source, int recvtag, + MPI_Comm comm, MPI_Status *status) +{ + MPI_Request rreq, sreq; + MPI_Status sstatus; + + Msgf(("mpi: Sendrecv\n")); + MPI_Irecv(recvbuf, recvcount, recvtype, source, recvtag, comm, &rreq); + MPI_Isend(sendbuf, sendcount, sendtype, dest, sendtag, comm, &sreq); + MPI_Wait(&sreq, &sstatus); + MPI_Wait(&rreq, status); + return MPI_SUCCESS; +} + +int +MPI_Barrier(MPI_Comm comm) +{ + int i, junk; + + Msgf(("mpi: Barrier\n")); + if (MPI_Nproc == 1) return MPI_SUCCESS; + if (MPI_Procnum != 0) { + MPI_Send(&junk, 1, MPI_INT, 0, 1, MPI_COMM_PRIVATE); + MPI_Recv(&junk, 1, MPI_INT, 0, 2, MPI_COMM_PRIVATE, NULL); + } else { + MPI_Request *req = malloc(MPI_Nproc * sizeof(MPI_Request)); + for (i = 1; i < MPI_Nproc; i++) + MPI_Irecv(&junk, 1, MPI_INT, i, 1, MPI_COMM_PRIVATE, req+i-1); + MPI_Waitall(MPI_Nproc-1, req, NULL); + for (i = 1; i < MPI_Nproc; i++) + MPI_Send(&junk, 1, MPI_INT, i, 2, MPI_COMM_PRIVATE); + free(req); + } + return MPI_SUCCESS; +} + +#define TAG 3 + +int +MPI_Alltoall(void *sendbuf, int sendcount, MPI_Datatype sendtype, + void *recvbuf, int recvcount, MPI_Datatype recvtype, + MPI_Comm comm) +{ + int i; + char *sbuf = sendbuf; + char *rbuf = recvbuf; + MPI_Request *rreq, *sreq; + MPI_Status *status; + + Msgf(("mpi: Alltoall\n")); + CheckTypeOK(sendtype); + CheckTypeOK(recvtype); + rreq = malloc(MPI_Nproc * sizeof(MPI_Request)); + sreq = malloc(MPI_Nproc * sizeof(MPI_Request)); + status = malloc(MPI_Nproc * sizeof(MPI_Status)); + if (rreq == NULL || sreq == NULL || status == NULL) + Error("out of memory\n"); + for (i = 0; i < MPI_Nproc; i++) { + MPI_Irecv(rbuf+i*recvcount*MPI_Datasize[recvtype], recvcount, + recvtype, i, TAG, MPI_COMM_PRIVATE, &rreq[i]); + } + for (i = 0; i < MPI_Nproc; i++) { + MPI_Isend(sbuf+i*sendcount*MPI_Datasize[sendtype], sendcount, + sendtype, i, TAG, MPI_COMM_PRIVATE, &sreq[i]); + } + for (i = 0; i < MPI_Nproc; i++) { + MPI_Wait(&rreq[i], &status[i]); + } + for (i = 0; i < MPI_Nproc; i++) { + MPI_Wait(&sreq[i], NULL); + } + free(status); + free(sreq); + free(rreq); + return MPI_SUCCESS; +} + +int +MPI_Alltoallv(void *sendbuf, int *sendcnts, int *sdispls, MPI_Datatype stype, + void *recvbuf, int *recvcnts, int *rdispls, MPI_Datatype rtype, + MPI_Comm comm) +{ + int i; + char *sbuf = sendbuf; + char *rbuf = recvbuf; + MPI_Request *rreq, *sreq; + MPI_Status *status; + + Msgf(("mpi: Alltoallv\n")); + CheckTypeOK(stype); + CheckTypeOK(rtype); + rreq = malloc(MPI_Nproc * sizeof(MPI_Request)); + sreq = malloc(MPI_Nproc * sizeof(MPI_Request)); + status = malloc(MPI_Nproc * sizeof(MPI_Status)); + if (rreq == NULL || sreq == NULL || status == NULL) + Error("out of memory\n"); + for (i = 0; i < MPI_Nproc; i++) { + MPI_Irecv(rbuf+rdispls[i]*MPI_Datasize[rtype], recvcnts[i], + rtype, i, TAG, MPI_COMM_PRIVATE, &rreq[i]); + } + for (i = 0; i < MPI_Nproc; i++) { + MPI_Isend(sbuf+sdispls[i]*MPI_Datasize[stype], sendcnts[i], + stype, i, TAG, MPI_COMM_PRIVATE, &sreq[i]); + } + for (i = 0; i < MPI_Nproc; i++) { + MPI_Wait(&rreq[i], &status[i]); + } + for (i = 0; i < MPI_Nproc; i++) { + MPI_Wait(&sreq[i], NULL); + } + free(status); + free(sreq); + free(rreq); + return MPI_SUCCESS; +} + +/* These Comm functions are not really implemented */ + +int +MPI_Comm_dup(MPI_Comm comm, MPI_Comm *newcomm) +{ + *newcomm = comm; + return MPI_SUCCESS; +} + +int +MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *newcomm) +{ + *newcomm = comm; + return MPI_SUCCESS; +} + +int +MPI_Comm_free(MPI_Comm *commp) +{ + return MPI_SUCCESS; +} + + + +/* These Type functions are not really implemented */ + +int +MPI_Type_contiguous(int len, MPI_Datatype type, MPI_Datatype *ptr) +{ + return MPI_SUCCESS; +} + +int +MPI_Type_commit(MPI_Datatype *ptr) +{ + return MPI_SUCCESS; +} + +/* g77 */ +#define _F77(sym) sym##__ +#include "swampif.c" +#undef _F77 + +/* pgf77 */ +#define _F77(sym) sym##_ +#include "swampif.c" +#undef _F77 + diff --git a/external/libsdf/libsw/swampif.c b/external/libsdf/libsw/swampif.c new file mode 100644 index 0000000..3c67db8 --- /dev/null +++ b/external/libsdf/libsw/swampif.c @@ -0,0 +1,140 @@ +void _F77(mpi_init)(int *ierr) +{ + *ierr = MPI_Init(0, 0); +} + +void _F77(mpi_finalize)(int *ierr) +{ + *ierr = MPI_Finalize(); +} + +void _F77(mpi_abort)(int *comm, int *err, int *ierr) +{ + *ierr = MPI_Abort(*comm, *err); +} + +void _F77(mpi_comm_rank)(int *comm, int *rank, int *ierr) +{ + *ierr = MPI_Comm_rank(*comm, rank); +} + +void _F77(mpi_comm_size)(int *comm, int *size, int *ierr) +{ + *ierr = MPI_Comm_size(*comm, size); +} + +void _F77(mpi_get_count)(MPI_Status *status, int *type, int *cnt, int *ierr) +{ + *ierr = MPI_Get_count(status, *type, cnt); +} + +void _F77(mpi_isend)(void *buf, int *cnt, int *type, int *dest, int *tag, + int *comm, MPI_Request req, int *ierr) +{ + *ierr = MPI_Isend(buf, *cnt, *type, *dest, *tag, *comm, req); +} + +void _F77(mpi_irecv)(void *buf, int *cnt, int *type, int *src, int *tag, + int *comm, MPI_Request req, int *ierr) +{ + *ierr = MPI_Irecv(buf, *cnt, *type, *src, *tag, *comm, req); +} + +void _F77(mpi_test)(MPI_Request req, int *flag, MPI_Status *stat, int *ierr) +{ + *ierr = MPI_Test(req, flag, stat); +} + +void _F77(mpi_wait)(MPI_Request req, MPI_Status *status, int *ierr) +{ + *ierr = MPI_Wait(req, status); +} + +void _F77(mpi_waitall)(int *count, MPI_Request *reqv, + MPI_Status *statusv, int *ierr) +{ + *ierr = MPI_Waitall(*count, reqv, statusv); +} + +void _F77(mpi_send)(void *buf, int *cnt, int *type, int *dest, int *tag, + int *comm, int *ierr) +{ + *ierr = MPI_Send(buf, *cnt, *type, *dest, *tag, *comm); +} + +void _F77(mpi_recv)(void *buf, int *cnt, int *type, int *src, int *tag, + int *comm, MPI_Status *status, int *ierr) +{ + *ierr = MPI_Recv(buf, *cnt, *type, *src, *tag, *comm, status); +} + +void _F77(mpi_sendrecv)(void *sendbuf, int *sendcnt, int *sendtype, + int *dest, int *sendtag, void *recvbuf, int *recvcnt, + int *recvtype, int *source, int *recvtag, + int *comm, MPI_Status *status, int *ierr) +{ + *ierr = MPI_Sendrecv(sendbuf, *sendcnt, *sendtype, *dest, *sendtag, + recvbuf, *recvcnt, *recvtype, *source, *recvtag, + *comm, status); +} + +void _F77(mpi_bcast)(void *buf, int *cnt, int *type, int *src, int *comm, + int *ierr) +{ + *ierr = MPI_Bcast(buf, *cnt, *type, *src, *comm); +} + +void _F77(mpi_reduce)(void *sendbuf, void *recvbuf, int *count, int *datatype, + int *op, int *root, int *comm, int *ierr) +{ + *ierr = MPI_Reduce(sendbuf, recvbuf, *count, *datatype, *op, *root, *comm); +} + +void _F77(mpi_allreduce)(void *sendbuf, void *recvbuf, int *count, + int *datatype, int *op, int *comm, int *ierr) +{ + *ierr = MPI_Allreduce(sendbuf, recvbuf, *count, *datatype, *op, *comm); +} + +void _F77(mpi_barrier)(int *comm, int *ierr) +{ + *ierr = MPI_Barrier(*comm); +} + +void _F77(mpi_alltoallv)(void *sbuf, int *sendcnts, int *sdispls, int *stype, + void *rbuf, int *recvcnts, int *rdispls, int *rtype, + int *comm, int *ierr) +{ + *ierr = MPI_Alltoallv(sbuf, sendcnts, sdispls, *stype, + rbuf, recvcnts, rdispls, *rtype, *comm); +} + +void _F77(mpi_alltoall)(void *sendbuf, int *sendcount, int *sendtype, + void *recvbuf, int *recvcount, int *recvtype, + int *comm, int *ierr) +{ + *ierr = MPI_Alltoall(sendbuf, *sendcount, *sendtype, + recvbuf, *recvcount, *recvtype, *comm); +} + +void _F77(mpi_comm_dup)(MPI_Comm *comm, MPI_Comm *newcomm, int *ierr) +{ + *ierr = MPI_Comm_dup(*comm, newcomm); +} + +void _F77(mpi_comm_split)(MPI_Comm *comm, int *color, int *key, + MPI_Comm *newcomm, int *ierr) +{ + *ierr = MPI_Comm_split(*comm, *color, *key, newcomm); +} + + +double _F77(mpi_wtime)(void) +{ + return MPI_Wtime(); +} + +double _F77(mpi_wtick)(void) +{ + return MPI_Wtick(); +} diff --git a/external/libsdf/libsw/timers.c b/external/libsdf/libsw/timers.c new file mode 100644 index 0000000..88322a7 --- /dev/null +++ b/external/libsdf/libsw/timers.c @@ -0,0 +1,159 @@ +#include +#include +#include +#include "Malloc.h" +#include "timers.h" +#include "mpmy.h" +#include "mpmy_time.h" +#include "Assert.h" + +/* Make sure any #defines in timers.h don't interfere... */ +#undef StartTimer +#undef StopTimer +#undef StartWCTimer +#undef StopWCTimer + +#define MAXENABLED 100 + +static Timer_t *enabled_timers[MAXENABLED]; +static int nenabled_timers; + +void ClearTimer(Timer_t *t) +{ + MPMY_ClearTimer(t->mpmy_tm); + return; +} + +double ReadTimer(Timer_t *t) +{ + return MPMY_ReadTimer(t->mpmy_tm); +} + +void StartTimer(Timer_t *t) +{ + if (t->enabled) + MPMY_StartTimer(t->mpmy_tm); + return; +} + +void CopyTimer(Timer_t *src, Timer_t *dest) +{ + MPMY_CopyTimer(src->mpmy_tm, dest->mpmy_tm); +} + +void StopTimer(Timer_t *t) +{ + if (t->enabled) + MPMY_StopTimer(t->mpmy_tm); + return; +} + +void ClearEnabledTimers(void){ + int i; + for(i=0; iname = Malloc(strlen(name)+1); + strcpy(t->name, name); + t->enabled = 1; + t->mpmy_tm = MPMY_CreateTimer(MPMY_WC_TIME); + ClearTimer(t); +} + +void EnableCPUTimer(Timer_t *t, char *name){ + assert(nenabled_timers < MAXENABLED); + enabled_timers[nenabled_timers++] = t; + t->name = Malloc(strlen(name)+1); + strcpy(t->name, name); + t->enabled = 1; + t->mpmy_tm = MPMY_CreateTimer(MPMY_CPU_TIME); + ClearTimer(t); +} + +void DisableTimer(Timer_t *t){ + int i; + + for(i=0; ienabled = 0; + Free(t->name); + t->name = NULL; + enabled_timers[i] = enabled_timers[--nenabled_timers]; + MPMY_DestroyTimer(t->mpmy_tm); +} + +void SumTimers(void){ + double nprocinv; + Timer_t *t; + int i; + MPMY_Comm_request req; + + MPMY_ICombine_Init(&req); + for(i=0; imean = t->min = t->max = MPMY_ReadTimer(t->mpmy_tm); + MPMY_ICombine(&t->min, &t->min, 1, MPMY_DOUBLE, MPMY_MIN, req); + MPMY_ICombine(&t->max, &t->max, 1, MPMY_DOUBLE, MPMY_MAX, req); + MPMY_ICombine(&t->mean, &t->mean, 1, MPMY_DOUBLE, MPMY_SUM, req); + } + MPMY_ICombine_Wait(req); + + /* Now loop a second time and divide the mean by Nproc */ + nprocinv = 1./MPMY_Nproc(); + for(i=0; imean *= nprocinv; + } +} + +void OutputTimers(int (*Printf_Like)(const char *, ...)){ + int i; + Timer_t *t; + + SumTimers(); + Printf_Like("%12s %10s %10s %10s\n", "Timers", "Min", "Max", "Mean"); + for (i = 0; i < nenabled_timers; i++) { + t = enabled_timers[i]; + if( t->enabled && t->name ) + Printf_Like("%12s %10.2f %10.2f %10.2f\n", t->name, + t->min, t->max, t->mean); + } +} + +void OutputTimer(Timer_t *t, int (*Printf_Like)(const char *, ...)){ + MPMY_Comm_request req; + + if( t->enabled && t->name ) { + MPMY_ICombine_Init(&req); + t->mean = t->min = t->max = MPMY_ReadTimer(t->mpmy_tm); + MPMY_ICombine(&t->min, &t->min, 1, MPMY_DOUBLE, MPMY_MIN, req); + MPMY_ICombine(&t->max, &t->max, 1, MPMY_DOUBLE, MPMY_MAX, req); + MPMY_ICombine(&t->mean, &t->mean, 1, MPMY_DOUBLE, MPMY_SUM, req); + MPMY_ICombine_Wait(req); + t->mean /= MPMY_Nproc(); + Printf_Like("%12s %10.2f %10.2f %10.2f\n", t->name, + t->min, t->max, t->mean); + } +} + +void OutputIndividualTimers(int (*Printf_Like)(const char *, ...)){ + int i; + Timer_t *t; + + Printf_Like("%12s %10s\n", "Timers", "(sec)"); + for (i = 0; i < nenabled_timers; i++) { + t = enabled_timers[i]; + if( t->enabled && t->name ){ + Printf_Like("%12s %10.2f\n", t->name, MPMY_ReadTimer(t->mpmy_tm)); + } + } +} +