Problem 1: Find the first occurrence of a given character in a character string. Return a pointer to the character if found, NULL otherwise.
This problem is solved in the standard library by the strchr function. Let's write it. The algorithm is very simple: We examine each character. If it is zero, this is the end of the string, we are done and we return NULL to indicate that the character is not there. If we find it, we stop searching and return the pointer to the character position.
char *strchr(char *str, int ch)
if (*str == ch)
return str;
return NULL;
We loop through the characters in the string. We use a while condition requiring that the character pointed to by our pointer "str" is different than zero and it is different than the character given. In that case we continue with the next character. We just increment our pointer. When the while loop ends, we have either found a character, or we have arrived at the end of the string. We discriminate between these two cases after the loop.
How can this program fail?
We do not test for NULL. Any NULL pointer passed to this program will provoke a trap. A way of making this more robust would be to return NULL if we receive a NULL pointer. This would indicate to the calling function that the character wasn't found, what is always true if our pointer doesn't point anywhere.
A more serious problem happens when our string is missing the zero byte. In that case the program will blindly loop through memory, until it either finds the byte is looking for, or a zero byte somewhere. This is a much more serious problem, since if the search ends by finding a random character somewhere, it will return an invalid pointer to the calling program!
This is really bad news, since the calling program may not use the result immediately. It could be that the result is stored in a variable, for instance, and then used in another, completely unrelated section of the program. The program would crash without any hints of what is wrong and where was the failure.
Problem 2: Return the length of a given string.
This is solved by the strlen function. We just count the chars in the string, stopping when we find a zero byte.
int strlen(char *str)
return p - str;
We copy our pointer into a new one that will loop through the string. We test for a zero byte in the while condition. Note the expression *p != 0. This means "Fetch the value this pointer is pointing to (*p), and compare it to zero". If the comparison is true, then we increment the pointer to the next byte.
We return the number of characters between our pointer p and the saved pointer to the start of the string. This pointer arithmetic is quite handy.
How can this program fail?
The same problems apply that we discussed in the previous example, but in an attenuated form: only a wrong answer is returned, not an outright wrong pointer. The program will only stop at a zero byte.
Problem 3: Given a positive number, find out if it is a power of two.
Algorithm: A power of two has only one bit set, in binary representation. We count the bits. If we find a bit count different than one we return 0, if there is only one bit set we return 1.
Implementation: We test the rightmost bit, and we use the shift operator to shift the bits right, shifting out the bit that we have tested. For instance, if we have the bit pattern 1 0 0 1, shifting it right by one gives 0 1 0 0: the rightmost bit has disappeared, and at the left we have a new bit shifted in, that is always zero.
int ispowerOfTwo(int n)
n = n >> 1;
if (bitcount == 1)
return 1;
return 0;
Our condition here is that n must be different than zero, i.e. there must be still some bits to count to go on. We test the rightmost bit with the binary and operation. The number one has only one bit set, the rightmost one. By the way, one is a power of two .
Note that the return expression could have also been written like this:
return bitcount == 1;
The intention of the program is clearer with the "if" expression.
How can this program fail?
The while loop has only one condition: that n is different than zero, i.e. that n has some bits set. Since we are shifting out the bits, and shifting in always zero bits, in a 32 bit machine like a PC this program will stop after at most 32 iterations. Running mentally some cases (a good exercise) we see that for an input of zero, we will never enter the loop, bitcount will be zero, and we will return 0, the correct answer. For an input of 1 we will make only one iteration of the loop. Since 1 & 1 is 1, bitcount will be incremented, and the test will make the routine return 1, the correct answer. If n is three, we make two passes, and bitcount will be two. This will be different than 1, and we return zero, the correct answer.
Anh Vu Tran [email protected] made me discover an important bug. If you change the declaration of "i" from unsigned int to int, without qualification, the above function will enter an infinite loop if n is negative.
Why?
When shifting signed numbers sign is preserved, so the sign bit will be carried through, provoking that n will become eventually a string of 1 bits, never equal to zero, hence an infinite loop.
.
Problem 4: Given a string containing upper case and lower case characters, transform it in a string with only lower case characters. Return a pointer to the start of the given string.
This is the library function strlwr. We make the transformation in-place, i.e. we transform the given string.
#include <ctype.h> /* needed for using isupper and tolower
char *strlwr(char *str)
return result;
We include the standard header ctype.h, which contains the definition of several character classification functions (or macros) like "isupper" that determines if a given character is upper case, and many others like "isspace", or "isdigit".
The first thing we do is to test if the given pointer is NULL. If it is, we return NULL. Then, we start our loop that will span the entire string. The construction while (*str) tests if the contents of the character pointer str are different than zero. If this is the case, we test if the character is an upper case character using the isupper classification function. If it is an upper case character, we transform it into a lower case one. We increment our pointer to point to the next character, and we restart the loop. When the loop finishes because we hit the zero byte that terminates the string, we stop and return the saved position of the start of the string.
How can this program fail?
Since we test for NULL, a NULL pointer can't provoke a trap. Is this a good idea?
Well this depends. This function will not trap with NULL pointers, but then the error will be detected later when other operations are done with that pointer anyway. Maybe making a trap when a NULL pointer is passed to us is not that bad, since it will uncover the error sooner rather than later. There is a big probability that if the user of our function is calling us to transform the string to lower case, is because he/she wants to use it later in a display, or otherwise. Avoiding a trap here only means that the trap will appear later, probably making error finding more difficult.
Writing software means making this type of decisions over and over again.
Obviously this program will fail with any incorrect string, i.e. a string that is missing the final zero byte. The failure behavior of our program is quite awful: in this case, this program will start destroying all bytes that happen to be in the range of uppercase characters until it hits a random zero byte. This means that if you pass a non-zero terminated string to this apparently harmless routine, you activate a randomly firing machine gun that will start destroying your program's data in a random fashion. The absence of a zero byte in a string is a catastrophe for any C programmer. In a tutorial this can't be too strongly emphasized!
The expression (*p != 0) could have been written in the form while (*p), using the implicit test for a non-zero result in any logical expression. Any expression will be considered true if its value is anything but zero. It is better, however, to make comparisons explicit.
Different than is written in C != instead of . The symbol wasn't included in the primitive typewriters in use when the C language was designed, and we have kept that approximation. It is consistent with the usage of ! as logical not, i.e. != would mean not equal.
For a more detailed discussion, see the section Newsgroups at the end of this tutorial.
This convention is used in the library function. Actually, it is a quite bad interface, since the return value doesn't give any new information to the user, besides the expected side effect of transforming the given string. A better return value would be the number of changed characters, for instance, that would allow the caller to know if a transformation was done at all, or the length of the string, or several others. But let's implement this function as it is specified in the standard library. Many times, you will see that even if it is obvious that software must be changed, the consequences of a change are so vast that nobody wants to assume it, and we get stuck with software "for compatibility reasons". Here is yet another example.
|