Tip 62: Get to know configure.ac
level: Autotools user
purpose: Write a bang-up setup script
Prerequisite: Tip 60 and Tip 61.
You are going to write a script named configure.ac, and it is going to produce two outputs: a makefile, and a header file named config.h.
At this point, I expect that you've already opened one the sample configure.ac files produced so far, and noted that it looks nothing at all like a shell script. This is because it makes heavy use of a set of macros (in the m4 macro language) that are predefined by Autoconf. Rest assured that every one of them will blow up into familiar-looking (but probably illegible) lines of shell script. That is, configure.ac isn't a script to generate the configure shell script, it is configure, just compressed by some very impressive macros.
The m4 language doesn't have all that much syntax that we as users have to care about. Every macro is function-like, with parens after the macro name listing the comma-separated arguments (if any; else the parens are dropped). Where most languages write 'literal text', m4-via-autoconf writes [literal text], and to prevent surprises where m4 macro-expands your input, all of your macro inputs should be wrapped in those square brackets.
The first line that Autoscan generated is a good example:
AC_INIT([FULL-PACKAGE-NAME], [VERSION], [BUG-REPORT-ADDRESS])We know that this is going to generate a few hundred lines of shell code, and somewhere in there, the given elements will be set. Change the values in square brackets to whatever is relevant.
Also, you can often omit elements, so something like
AC_INIT([hello], [1.0])is valid. At the extreme, one might give zero arguments to a macro like
AC_OUTPUT,
in which case you don't need to bother with the parentheses. By the way,
the current custom in m4 documentation is to mark optional arguments with--I am not
making this up--square brackets. So bear in mind that in m4 macros for Autoconf,
square brackets mean literal not-for-expansion text, and in m4 macro documentation it
means an optional argument.
What do we need for a functional Autoconf file? In order of appearance:
AC_INIT(...), as above.AM_INIT_AUTOMAKE, to have Automake generate the makefile.LT_INITsets up Libtool, so you need this if and only if you are installing a shared library.AC_CONFIG_FILES([Makefile subdir/Makefile]), which tells Autoconf to go through those files listed and replace variables like@cc@with their appropriate value. If you have several makefiles (like subdirectories), then list them here.AC_OUTPUTto ship out.
So we have a functional build package for any POSIX system in four or five lines, three of which Autoscan probably wrote for you.
But the real art that takes configure.ac from functional to intelligent is in identifying what needs to be checked or set up--predicting problems some users might have--and finding the macro that handles it.
Getting back to the outputs, config.h is a standard C header consisting of a series of
#define statements. For example, if Autoconf verified the presence of the GSL, you would find
#define HAVE_LIBGSL 1in config.h. You can then put
#ifdefs into your code to behave appropriately
under appropriate circumstances.
Autoconf's check doesn't just find the library based on some naming scheme and hope that it actually works--it compiles a program using that library and any one function somewhere in the library. So autoscan can't autogenerate a check for the library, because it doesn't know what functions are to be found in it. The macro to check for a library is an easy one-liner, to which you provide the library name and a function that can be used for the check. E.g.:
AC_CHECK_LIB([glib-2.0],[g_free]) AC_CHECK_LIB([gsl],[gsl_blas_dgemm])
Add one line to configure.ac for every library you use that is not 100% guaranteed by the POSIX standard, and those one-liners will blossom into the appropriate shell script snippets in configure.
Further, you may recall my complaint about how package managers always split libraries into the binary shared object package and the header package. Users of your library may not remember (or even know) to install the header package, so check for it with, e.g.:
AC_CHECK_HEADER([gsl/gsl_matrix.h], , [AC_MSG_ERROR(
[Couldn't find the GSL header files (I searched for <gsl/gsl_matrix.h> \
on the include path). If you are using a package manager, don't forget \
to install the libgsl-devel package, as well as libgsl itself.])])
Notice the two commas: the arguments are (header to check, action if found, action if not found), and we are leaving the second blank.
What else could go wrong in a compilation? After the basic makefile from last time, what else needs to be prepped? It's hard to become an authority on all the glitches of all the world's computers, given that we all have only a couple of machines at our disposal. The best reference I have seen--a veritable litany of close readings of the POSIX standard, implementation failures, and practical advice--is the Autoconf manual itself. Some of it will just be nitpicking to the rest of us,1, some of it is good advice for your code-writing, and some of the the descriptions of system quirks are followed by the name of an Autoconf macro to include in your project's configure.ac should it be relevant to your situation.
Then, check the GNU Autoconf macro archive for additional macros that don't ship with Autoconf itself but which you can save to an m4 subdirectory in your project directory, where Autoconf will be able to find and use them.
More bits of shell
Because configure.ac is a compressed version of the configure script the user will run, you can throw in any arbitrary shell code you'd like. Before you do, double-check that what you want to do isn't yet handled by any macros--is your situation really so unique that it never happened to any Autotools users before?
A banner notifying the user that they've made it through the configure process might be nice, and there's no macro for it because all you need is echo.
You'll notice that I use several variables defined by Autoconf. There's documentation about what shell variables the system defines for you to use, but I find that the easiest way to find out what a macro did is to just run autoreconf to fully expand configure and look at the post-expansion details.
Anyway, here's a sample banner:
echo \
"-----------------------------------------------------------
${PACKAGE_NAME} version ${PACKAGE_VERSION}
Installation directory prefix: '${prefix}'.
Compilation command: '${CC} ${CFLAGS} ${CPPFLAGS}'
Now type 'make; sudo make install' to generate the program
and install it to your system.
------------------------------------------------------------"
Footnotes
- ... us,1
- E.g., “Solaris 10 dtksh and the UnixWare 7.1.1 Posix shell [...] mishandle braced variable expansion that crosses a 1024- or 4096-byte buffer boundary within a here-document.”
[link][no comments]
Tip 61: Get to know Makefile.am
level: Autotools user
purpose: learn the conventions and let Automake do the rest
Last time, you met Autotools, and saw how friendly it can be for a simple project. Except your project is more than one .c file, which is why you need Autotools to begin with. The hard part is writing Makefile.am, which gives a somewhat encrypted summary of your project. Once you learn the language, though, it's not so bad.
First, if you include a target and its associated actions in Makefile.am, then Automake will copy it into the final makefile verbatim.
If you add a variable, that too gets added verbatim. This will especially be useful in conjunction with Autoconf, because if Makefile.am has variable assignments like
TEMP=@autotemp@ HUMIDITY=@autohum@
and your configure.ac has
#configure is a plain shell script; these are plain shell vars autotemp=40 autohum=.8 AC_SUBST(autotemp) AC_SUBST(autohum)
then the final makefile will have text reading
TEMP=40 HUMIDITY=.8
So you have an easy conduit from the shell script that Autoconf spits out to the final makefile.
The rest of Makefile.am will largely consist of two types of entry, neither of which look anything like the final makefile. They are product list variables and product source/option variables, but in an effort to avoid like-sounding jargon I will refer to them as form variables and content variables, respectively.
Form variables
The example of this from last time was this line:
bin_PROGRAMS=hello
If the install location is il and the type of compilation
TYPE, these all have the form il_TYPE. The most important examples:
bin_PROGRAMS #programs include_HEADERS #headers to install in system-wide includedir. pkginclude_HEADERS #same, but install in includedir/yourprogram subdir. lib_LTLIBRARIES #dynamic libraries, via libtool EXTRA_DIST #distribute with pkg, but don't install
There are many others; python_PYTHON, for example.
The location/TYPE combo provides a bit of false generality, because it makes no sense to
install programs in the include directory, for example, even if the system would let you
do it. However, the location half can usefully be noinst, meaning that something
gets produced but not installed, and you can put pkg in front of several locations
to produce pkgbin, pkglib, et cetera.
nodist_EXTRAS: files that have to be in the package for the thing to compile, but
which won't be installed in the system. I could never work out whether this is the right
place for `em, but this is where I put Apophenia's test data, needed for the tests
but not worth installing.
The TYPE half tells the system what form of make target to generate. It has built-in rules for generating a program from source and built-in rules for generating a library via Libtool, and you are telling it which template to use.
Put as many items on each line as you'd like, e.g.:
pkginclude_HEADERS = firstpart.h secondpart.h
EXTRA_DIST = sample1.csv sample2.csv \
sample99.csv sample100.csv
Content variables
Items underEXTRA_DIST just get copied over, and
the process for dealing with header files is basically just to copy them to the right
place. So those are basically settled.
For the compilation steps like ..._PROGRAMS and ..._LDLIBRARIES,
Automake needs to know more details about
how the compilation works. At the very least, it needs to know what source files are being
compiled. Thus, for every item on the right-hand side of an equals sign of a form variable
about compilation, we need a variable specifying the sources:
bin_PROGRAMS= weather wxpredict weather_SOURCES= temp.c barometer.c wxpredict_SOURCES=rng.c tarotdeck.c
This may be all you need for a basic package.
Notice that the content variables have the same lower_UPPER look as the form
variables above, but they are formed from entirely different parts and serve entirely
different purposes.
Back to traditional makefiles for a second: if you don't specify a rule for compiling (but not linking) from source to object, make will apply a POSIX-standard implicit rule that it has memorized:
$(CC) $(CPPFLAGS) $(CFLAGS) -c
To link together object files, the implicit rule is:
$(CC) $(LDFLAGS) obj1.o obj2.o $(LOADLIBES) $(LDLIBS)
Let's just look the other way from the variable LOADLIBES [sic], and Automake prefers LDADD for the second half of the link line anyway (i.e., always use LDLIBS with make; always use LDADD with Automake).
That little caveat noted, you can set all of these variables on a per-program or
per-library basis, like weather_CFLAGS=-O1. Or, use AM_ to set a variable
for all compilations or linkings. I consider this line to be essential, giving debugger
symbols and all warnings for every compilation/link:
AM_CFLAGS=-g -Wall -O3
If you've been following me for very long, then you know that I always use -std=gnu99 to get GCC to use a less obsolete standard. However, this is a very compiler-specific flag. If I put
AC_PROG_CC_C99
in configure.ac, then Autoconf will set the CC variable to gcc -std=gnu99 for me. Autoscan isn't (yet) smart enough to put this into the configure.scan that it generates for you, so you will probably have to put it into configure.ac yourself.
Specific rules override AM_-based rules, so here's how we'd keep the general rules and add on an override for one flag:
AM_CFLAGS=-g -Wall -O3 hello_CFLAGS = $(AM_CFLAGS) -O0
To give a fuller example, say that several programs all depend on common source
files. Then perhaps you could generate a no-install static library (without Libtool) and
link everything to that library. Notice how hello.a turns into hello_a for
the purposes of the content variable naming scheme, as all of the characters that
aren't alphanumeric get converted to underscores.
noinst_LIBRARIES = hello.a hello_a_SOURCES = guts1.c guts2.c AM_CFLAGS=-g -Wall -O3 bin_PROGRAMS = hello hi hello_SOURCES= hello.c hello_LDADD=hello.a hi_SOURCES= hi.c hi_LDADD=hello.a
OK, those are all the parts of Makefile.am: make variables as usual and
make target/rules as usual are copied verbatim (after Autoconf does variable
substitutions); form variables specify which files are to be handled how and where
to put them; and content variables specify the details of how compilation happens for
each output file.
[link][no comments]
Tip 60: Package your code with Autotools
level: your code is good enough to share
purpose: love your users
The Autotools are what make it possible for you to download a library or program, and run
./configure make sudo make install
(and nothing else) to set it up. Please recognize what a miracle of modern science this is: the developer has no idea what the name of your compiler is, what sort of computer you have, where you keep your programs and libraries (/usr/bin? /sw? /cygdrive/c/bin?), and whatever quirks your machine demonstrates, and yet configure sorted everything out so that make could run seamlessly. And so, Autotools is the center of how anything gets distributd in the modern day. If you want anybody who is not your personal friend to use your code (or if you want a Linux distro to include your program in their package manager), then you need to have Autotools generate the build for you.
You will quickly realize how complicated the Autotools can get, but the basics are darn simple. By the end of this, we will have written six lines of packaging text and run four commands, and will have a complete (but rudimentary) package ready for distribution.
Here's how I imagine it all happening. The actual history is different from the sequence here. These are distinct packages, and there is a reason to run any of them without the other. But all that is irrelevant, and the purpose of this little dialogue is to help you think of the several tools as a unified whole working toward a unified goal.
P1: I love make. It's so nice that I can write down all the little steps to building my project in one place.
P2: Yes, automation is great. Everything should be automated, all the time.
P1: Yeah, I started adding lots of steps to my makefile, so users can type make to just produce the program, make install to install, make check to run tests, and so on. It's a lot of work to write all those makefile targets, but so smooth when it's all assembled.
P2: OK, I shall write a system--it will be called Automake--that will automatically generate makefiles from a very short pre-Makefile.
P1: That's great. Producing shared libraries is especially annoying, because every system has a different procedure.
P2: It is annoying. Given the system information, I shall write a program for generating the scripts needed to produce shared libraries from source code, and then put those into Automade makefiles.
P1: Wow, so all I have to do is tell you my operating system, and whether my compiler is named cc or clang or gcc or whatever, and you'll drop in the right code for the system I'm on?
P2: That's too much work. I will write a system called Autoconf that will be aware of every system out there and that will produce a report of everything Automake and your program needs to know about the system. Then Automake can use the list of environment variables in my report to produce a makefile.
P1: I am flabbergasted--you've automated the process of autogenerating Makefiles. But it sounds like we've just changed the work I have to do from inspecting the various platforms to writing configuration files for Autoconf and makefile templates for Automake.
P2: You're right. I shall write a tool, Autoscan, that will scan the Makefile.am you wrote for Automake, and autogenerate Autoconf's configure.ac for you.
P1: Now all you have to do is autogenerate Makefile.am.
P2: Yeah, whatever. RTFM and do it yourself.1
Each step in the story adds a little more automation to the step that came before it: Automake uses a simple script to generate makefiles (which already go pretty far in automating compilation over manual command-typing); Autoconf tests the environment and uses that information to run Automake; Autoscan checks your code for what you need to write to make Autoconf run. Libtool works in the background to assist Automake.
If you are doing something reasonably common (and compiling straight-up C code is the #1 most common task for the Autotools), then the system will do the right thing without your needing to go off the beaten path. The trouble shows up when you need to modify the defaults, at which point you're going to need to know what macro-generating macros to modify where.
Here's a demo, in which we get Autotools to take care of Hello, World. As per the count below, the script writes nine lines of text, and it produces a full package ready for distribution to the world.
This is a shell script you can copy/paste onto your command line (as long as you make sure there are no spaces after the backslashes). Of course, it won't run until you ask your package manager to install Autotools, Autoconf, Automake, and Libtool.
- The first few lines create a directory and write hello.c to it.
- Then we need to hand-write Makefile.in, which is two lines long, and four files that are required by the GNU coding standards (so GNU Autotools won't proceed without them).
- Getting Autotools up and running after this takes us three steps:
- Run autoscan, which produces configure.scan.
- Edit the file to give the specs of your project (name, version, contact email), and
add the line
AM_INIT_AUTOMAKEto initialize Automake. Yes, this is annoying, especially given that Autoscan used Automake's Makefile.in to gather info, so it is well aware that we want to use Automake. You could do this by hand; I used sed to directly stream the corrected version to configure.ac. - Run autoreconf to use configure.ac to generate the files to ship out.
#Make a directory; write a hello world program to it.
mkdir -p autodemo
cd autodemo
cat > hello.c <<\
--------------
#include <stdio.h>
int main(){ printf("Hi.\n"); }
--------------
#Autoscan needs a Makefile.am.
cat > Makefile.am <<\
--------------
bin_PROGRAMS=hello
hello_SOURCES=hello.c
--------------
#GNU coding standards require these; a human has to write them:
echo 'No news' > NEWS
echo 'Just run it.' > README
echo 'Kernighan & Ritchie' > AUTHORS
echo 'None yet' > ChangeLog
#Autoscan creates configure.scan, then we edit a few
#things to make configure.ac.
#Notably, add AM_INIT_AUTOMAKE
autoscan
sed -e 's/FULL-PACKAGE-NAME/hello/' \
-e 's/VERSION/1/' \
-e 's|BUG-REPORT-ADDRESS|/dev/null|' \
-e '10iAM_INIT_AUTOMAKE' \
<configure.scan > configure.ac
#Given configure.ac, run autoreconf to produce everything.
autoreconf -i -vv
So how much do all these macros do? The hello.c program itself is a leisurely three lines, Makefile.am is two lines, and we wrote four one-line files, for nine lines of user-written text. Your results may differ a little, but when I run wc -l * in the post-script directory, I find 8,920 lines of text, including a 4,700-line configure script. It's so bloated because it's so portable: thie script doesn't depend on Autotools, and can be run on any system with basic POSIX-compliance.
Run ./configure at the command prompt, and now you have the 600-line Makefile. Thanks to zsh's autocomplete, I can tell you there are 216 targets in this makefile. The default target, when you just type make on the command line, produces the executable, and sudo make install would install this program if you are so interested; run sudo make uninstall to clear it out.
As the author of the package, you will also be interested in make dist, which generates a tar file with everything a user would need to unpack and run the usual ./configure; make; sudo make install (without the aid of the Autotools system that you have on your development box). Also, make check verifies the tar file; don't forget that you can list multiple targets to make, like make dist check to produce the output package and then check it.
So, having run autoscan and autoreconf, two more commands and we've got a distributable package in one tar file:
./configure make dist check
In the next few episodes, some notes on reading and producing the Makefile.am and
config.ac files for when you need more than the defaults.
Footnotes
- ... yourself.1
- RTFM is an acronym meaning Read The Manual.
[link][no comments]
Tip 59: Use a package manager
level: Computer user
purpose: let a librarian organize your libraries
Oh man, if you are not using a package manager, you are missing out.
If you are already using one, then here's a simple tip: open a listing of available packages via whatever means you are used to, and skip down to the packages beginning with lib. There, you'll find a few hundred C libraries available for your use. Some will be attached to a specific program, but these are the portions of the program that the authors felt might be useful outside of the program itself, so some of those may be useful to you.
Different package managers have different customs, but they tend to split a single library into two or three packages, typically a base libsomething package for installation with executables that rely on that library; a libsomething-dev or libsomething-devel package for developers (that's you), including the headers; and a libsomething-doc package, in case you decide you want a local copy of the online documentation. Be sure to install at least the first two.
For those of you who aren't using a package manager, the rest of this section is a brief introduction, in two parts. The first is about language-specific package managers, explaining why we C users are going to be sticking with the system-wide package managers. The second part discusses the options for system-wide package managers.
CPAN, et al
You may be familiar with the repository systems for Perl, Python, Ruby, and company. From within the interpreter, you can call a command that will pull a package from a central repository and install it into a local directory. They're pretty spiffy, and when they work they save a whole lot of trouble.
What would a package manager for C look like? For the most part, what other languages call a package, we call a library. If you give me the header files, a compiled object file, and some documentation, I can use the library for my own code. Or you can give me the source and I can compile the object files for myself.
Here in reality, there are complications. The library may depend on other libraries, so we will need to have a list of dependencies, and will then need to work out from where to get them. We may need the installation location. The compilation may depend on nonstandard tools. These are exactly the considerations that a package manager keeps track of.
Language-specific managers, like the CPAN (Comprehensive Perl Archive Network) or the Ruby Gem system solve many of the dependency problems by imposing some restrictions. The first is of course that the package must be written in the language of the associated system. If your R package depends on a Ruby gem, well, good luck with that. Of course, every scripting language package manager includes a mechanism for linking to C code, although some are more harrowing than others. None that I have seen provide an automated means of dealing with a C library that calls other C libraries, and as a result, we often find script-supporting C code that reimplements all the basics, just in case the build system at the central repository may not have a copy of Glib installed.
There are people whose livelihoods depend on advancing a given language, and those are the people who (hire people to) put in the labor to make the repositories work. For example, the R implementation of the S programming language, for example, is maintained by The R Foundation, a Viennese non-profit funded by pharmaceutical companies, academic statistics departments, et al. The R Foundation distributes the one and only R interpreter, which is of course built to talk to the CRAN.
There are no insurmountable technical reasons for why a C package manager couldn't happen, but there are social reasons preventing this. C is far bigger than any one foundation or compiler, and it is unlikely that we will ever see one effort to be a central repository for C code really take off (though there have been many efforts, including a CCAN). Further, most of what we need is already provided by the system-wide package managers.
The system-wide package manager
These provide all of the tools one would need to get a full POSIX subsystem up and running. Of course, you can't have a POSIX subsystem without C tools and libraries, this is trivially true, because the standard requires certain C libraries, so any POSIX-oriented package manager will have a system in place for handling libraries.
As above, there isn't all that much distance between a C library and a package as it is typically understood. To bridge the gap, we need an address where our package manager can find the files, a means of expressing what other packages have to be installed before this one, and perhaps some pre- and post-install scripts that establish the important environment variables and go through all that stuff about ./configure; make; sudo make install for the user.
There isn't all that much serious divergence from those basics. Here's my sampling from Wikipedia's list. They are variously maintained by a mix of operating system vendors and volunteers.
- Windows
- The front-runner here is Cygwin, which
is based on a Windows-native base library (DLL) that provides all the POSIX-type stuff
missing from a basic Windows machine. In theory, a program compiled under Cygwin could run
on boxes that don't have Cygwin installed if you were to copy over the Cygwin DLL with
your compiled program; I've never tried it.
On top of this very basic concept, Cygwin has its own package management system, which is pretty together.
- Mac OS X
- There are many. Deb-based Fink
seems to be the front-runner.
- Debian/RPM
- RPM stands for Red Hat Package Manger, and Debian stands for Debra
and Ian. Both lean toward precompiled binary packages, meaning that there's a different
package and different subrepository for every type of OS. Between the two of them, it's
hard to say much about the differences. I've found it a little easier to write packages for RPM.
- Source-based
- Autotools already gets us most of the way toward standardized installation (I promise a discussion of Autotools soon), because its configure script works out all the platform-specific stuff. From there, there are several systems that will do the additional stuff about resolving dependencies and such.
In all cases, there's little technical difference between the different versions in each category. Generally, the older and biggest network will have the most packages available.
I sometimes lament the great diversity here, because each system is slightly incompatible with every other: ¿Should my package depend on libgsl-dev, libgsl-0-dev, or libgsl-devel? Also, I'm somewhat certain that five years from now the above list of recommendations will be very different.
But in the mean time, the recommendation is simple: if you are a Mac or Windows user, get
a package manager immediately. It's already an immense streamlining of the
software-obtention process, but if you're a C author, it's how you're going to get your
libraries.
[link][no comments]
Tip 58: Destroy your inputs
level: function writer
purpose: use the declarations you already have
This is not a new tip. In fact, it's on page 24 of the 1ed. of K&R:
Call by value is an asset, however, not a liability. It usually leads to more compact programs with fewer extraneous variables, because arguments can be treated as conveniently initialized local variables in the called routine.
I really hope this is an ex post justification for the system, not their real motivation, because the technique of screwing around with the input variables knowing they can't affect the main program has somewhat limited utility.
Integers can often be used as countdown variables. To rewrite K&R's example:
double power(double x, int n){ //assume n>=0
double out =1;
for (; n>0; n--) out *= x;
return out;
}
int main(){
printf("2^5: %g\n", power(2, 5));
printf("2.5^5: %g\n", power(2.5, 5));
printf("3^5: %g\n", power(3, 5));
}
If you have a NULL-terminated list, you already have what you need to step through
it. We guaranteed such a list in Tip 26, so
let's rewrite the code there without bothering with an index in the sum_base function:
#include <math.h> //NAN
#include <stdio.h>
#define sum(...) sum_base((double[]){__VA_ARGS__, NAN})
double sum_base(double in[]){
double out=0;
for ( ; !isnan(*in); in++) out += *in;
return out;
}
int main(){
double two_and_two = sum(2, 2);
printf("2+2 = %g\n", two_and_two);
printf("(2+2)*3 = %g\n", sum(two_and_two, two_and_two, two_and_two));
printf("sum(asst) = %g\n", sum(3.1415, two_and_two, 3, 8, 98.4));
}
The code isn't actually shorter, but the for loop uses the input pointer to step through the array, rather than using an index to step through. If we had a linked list, then it may be impossible to use the index-based form, so you're forced to use a form like this one.
This may come up frequently in main, because argv is just the sort of list you would want to step through--and the standard says that its last element (argv[argc]) is NULL (C99 & C11 §5.1.2.2.1.2, which also says that you are allowed to modify the values of argv in place--they are not string constants). So, in fact, argc may be entirely redundant.
OK, you can count down with integers, and you can use pointers to arrays or lists to step
through the structure and check up on each element individually. Are there any other ways
to take advantage of inputs as conveniently initialized variables?
[link][no comments]
Tip 57: Base your code on pointers to objects
level: Your code depends on a class of objects
purpose: don't worry about scope when you don't want to
So you've seen several examples at this point of an object setup, with a typedef and new/copy/free functions. There was the fstr*, a pointer to a file-as-string, the group*, a pointer to a social group which our imaginary agents joined and left, and the apop_model *, a pointer to a statistical or simulation (or other) model.
Why did I base all of these things on pointers to data structures, instead of just passing around data structures directly? Using a plain struct is as easy as rolling out of bed. If you use a pointer-to-struct, you require a new/copy/free function; if you use a plain struct, then:
- new
- Use designated initializers on the first line where you need a struct.
- copy
- The equals sign does this.
- free
- Don't bother; it'll go out of scope soon enough.
So we're making things more difficult for ourselves with pointers. Yet from what I've seen, there's consensus on using pointers to objects as the base of our designs.
Pros to using pointers:
- Copying a single pointer is cheaper than copying a full structure, so you save a picosecond on every function call with a struct as an input. Of course, this only adds up after a few hundred million function calls.
- Data structure libraries (your trees and linked lists) all have hooks for a void pointer.1
- Now that you're filling a data structure, having the system automatically free the struct at the end of the scope in which they were created may be an annoyance.
- Many of your functions that take in a struct will modify the struct's contents, meaning that you've got to pass a pointer to the struct anyway. Having some functions that just take in the struct and some that take in a pointer to struct is confusing (I have one interface like this and I regret it), so you might as well just send a pointer every time.
- Once you have a pointer inside the struct, then the convenience bonus from using a plain struct evaporates anyway: if you want a deep copy (wherein the data pointed to is copied, not just the pointer) then you need a copy function, and you will probably want a free function to make sure the internal data is eliminated.
As your project gets bigger, and a throwaway struct grows into a core of how your data is
organized, the pros for pointers wax and the pros for non-pointers wane. That's why these
tips, taken together, don't provide a one-size-fits-all set of rules for using structs. The best techniques for
using the throwaway structs are different from those for using structs as the core of your data
organization.
Footnotes
- ... pointer.1
- A side note on the void pointer thing: using a void pointer is
giving up the type-checking that is decidedly a good thing and which has saved all of us
heartache at some time or another. With regards to linked lists and such, however, I've
never perceived a problem. If I have a linked list named active_groups and another
list named persons, both of which take in a void pointer, then the compiler would
let me append person to the active_groups list, but it is obvious to me as a human
that I'm touching the wrong list. As a practical matter, it doesn't take all that much
care to check that you don't hook the wrong thing onto a data structure's void pointer hook.
Things get much worse when you have a void pointer as input to a function, and when using that kind of mechanism, I try to keep the function and the calls to the function close together.
[link][no comments]
