Programming:C variables
Variables
Contents | Previous Chapter | Next Chapter
Like most programming languages, C is able to use and process named variables and their contents. Variables are most simply described as names by which we refer to some location in memory - a location that holds a value with which we are working.
It often helps to think of variables as a "pigeonhole", or a placeholder for a value. You can think of a variable as being equivalent to its value. So, if you have a variable i that is initialized to 4, i+1 will equal 5.
Since C is a relatively low-level programming language, it is necessary for a C program to claim the memory needed to store the values for variables before using that memory. This is done by declaring variables, the way in which a C program shows the number of variables it needs, what they are going to be named, and how much memory they will need.
Within the C programming language, when we manage and work with variables, it is important for us to know the type of our variables and the size of these types. This is because C is a sufficiently low-level programming language that these aspects of its working can be hardware specific - that is, how the language is made to work on one type of machine can be different from how it is made to work on another.
All variables in C are typed. That is, you must give a type for every variable you declare.
Declaring, Initializing, and Assigning Variables
Here is an example of declaring an integer, which we've called some_number. (Note the semicolon at the end of the line - that is how your compiler separates one program statement from another.)
int some_number;
This statement means we're declaring some space for a variable called some_number, which will be used to store integer data. Note that we must specify the type of data that a variable will store. There are specific keywords to do this - we'll look at them in the next section.
You can also declare multiple variables with one statement:
int anumber, anothernumber, yetanothernumber;
We can also declare and assign some content to a variable at the same time. This is called initialization because it is the "initial" time a value has been assigned to the variable:
int some_number=3;
In C, all variable declarations (except for globals) must be done at the beginning of a block. You cannot declare your variables, insert some other statements, and then declare more variables. Variable declarations (if there are any) are always the first part of any block.
After declaring variables, you can assign a value to a variable later on using a statement like this:
some_number=3;
You can also assign a variable the value of another variable, like so:
anumber = anothernumber;
Or assign multiple variables the same value with one statement:
anumber = anothernumber = yetanothernumber = 3;
This is because the assignment ( x = y) returns the value of the assignment. x = y = z is really shorthand for x = (y = z).
Naming Variables
(Note: Several words in this section should be made into links.)
Variable names in C are made up of letters (upper and lower case) and digits. The underscore character ("_") is also permitted. Names must not begin with a digit. Unlike some languages (such as Perl and some BASIC dialects), C does not use any special prefix characters on variable names.
Some examples of valid (but not very descriptive) C variable names:
foo
Bar
BAZ
foo_bar
_foo42
_
QuUx
Some examples of invalid C variable names:
2foo (must not begin with a digit)
my foo (spaces not allowed in names)
$foo ($ not allowed -- only letters, digits, and _)
while (language keywords cannot be used as names)
As the last example suggests, certain words are reserved as keywords in the language, and these cannot be used as variable names.
In addition there are certain sets of names that, while not language keywords, are reserved for one reason or another. For example, a C compiler might use certain names "behind the scenes", and this might cause problems for a program that attempts to use them. Also, some names are reserved for possible future use in the C standard library. The rules for determining exactly what names are reserved (and in what contexts they are reserved) are too complicated to describe here, and as a beginner you don't need to worry about them much anyway. For now, just avoid using names that begin with an underscore character.
The naming rules for C variables also apply to other language constructs such as function names, struct tags, and macros, all of which will be covered later.
Literals
Anytime within a program in which you specify a value explicitly instead of referring to a variable or some other form of data, that value is referred to as a literal. In the initialization example above, 3 is a literal. Literals can either take a form defined by their type (more on that soon), or one can use hexadecimal (hex) notation to directly insert data into a variable regardless of its type. Hex numbers are always preceded with 0x. For now, though, you probably shouldn't be too concerned with hex.
The Four Basic Types
In Standard C there are four basic data types. They are int, char, float, and double.
The int type
The int type, which you've already seen, is meant to store integers, which you may also know as "whole numbers". An integer is typically the size of one machine word, which on most modern home PCs is 4 bytes (32 bits). Examples of literals are 3, 42, 100... When int is 4 bytes, it can store any whole number between -2147483648 and 2147483647. (About 2 billion either way.)
If you want to declare a new int variable, use the int keyword. For example:
int numberOfStudents, i, j=5;
The char type
The char type is similar to the int type, yet it is only small enough to hold one ASCII character. It stores the same kind of data as an int (i.e. integers), but always has a size of one byte. It is most often used to store character data, hence its name.
Examples of character literals are 'a', 'b', '1', etc., as well as special characters such as '\0' (the null character) and '\n' (endline, recall "Hello, World").
The reason why one byte is seen as the ideal size for character data is that one byte is large enough to provide one slot for each member of the ASCII character set, which is a set of characters which maps one-to-one with a set of integers. At compile time, all character literals are converted into their corresponding integer. For example, 'A' will be converted to 65 (0x41). (Knowing about the ASCII character set is very useful many times.)
When we initialize a character variable, we can do it two ways. One is preferred, the other way is bad programming practice.
The first way is to write
char letter1='a';
This is good programming practice in that it allows a person reading your code to understand that letter is being initialized with the letter "a" to start off with.
The second way, which should not be used when you are coding letter characters, is to write
char letter2=97; /* in ASCII, 97 = 'a' */
This is considered by some to be extremely bad practice, if we are using it to store a character, not a small number, in that if someone reads your code, most readers are forced to look up what character corresponds with the number 97 in the encoding scheme. In the end, letter1 and letter2 store both the same thing -- the letter "a", but the first method is clearer, easier to debug, and much more straightforward.
One important thing to mention is that characters for numerals are represented differently than their corresponding number, i.e. '1' is not equal to 1.
There is one more kind of literal that needs to be explained in connection with chars: the string literal. A string is a series of characters, usually intended to be output to the string. They are surrounded by double quotes (" ", not ' '). An example of a string literal is the "Hello, world!\n" in the "Hello, World" example.
The float type
float is short for Floating Point. It stores real numbers also, but is only one machine word in size. Therefore, it is used when less precision than a double provides is required. float literals must be suffixed with F or f, otherwise they will be interpreted as doubles. Examples are: 3.1415926f, 4.0f, 6.022e+23f. float variables can be declared using the float keyword.
The double type
The double and float types are very similar. The float type allows you to store single-precision floating point numbers, while the double keyword allows you to store double-precision floating point numbers - real numbers, in other words, both integer and non-integer values. Its size is typically two machine words, or 8 bytes on most machines. Examples of double literals are 3.1415926535897932, 4.0, 6.022e+23 (scientific notation). If you use 4 instead of 4.0, the 4 will be interpreted as an int.
The distinction between floats and doubles was made because of the differing sizes of the two types. When C was first used, space was at a minimum and so the judicious use of a float instead of a double saved some memory. Nowadays, with memory more freely available, you do not really need to conserve memory like this - it may be better to use doubles consistently. Indeed, some C implementations use doubles instead of floats when you declare a float variable.
If you want to use a double variable, use the double keyword.
sizeof
If you have any doubts as to the amount of memory actually used by any type (and this goes for types we'll discuss later, also), you can use the sizeof operator to find out for sure. (For completeness, it is important to mention that sizeof is an operator, not a function, even though it looks like a function. It does not have the overhead associated with a function, nor do you need to #include anything to use it.) Syntax is:
int i;
i = sizeof(int);
i will be set to 4, assuming a 32-bit system.
Data type modifiers
One can alter the data storage of any data type by preceding it with certain modifiers.
long and short are modifiers that make it possible for a data type to use either more or less memory. The int keyword need not follow the short and long keywords. This is most commonly the case. A short can be used where the values fall within a lesser range than that of an int, typically -32768 to 32767. A long can be used to contain an extended range of values. It is not guaranteed that a short uses less memory than an int, nor is it guaranteed that a long takes up more memory than an int. It is only guaranteed that sizeof(short) <= sizeof(int) <= sizeof(long). Typically a short is 2 bytes, an int is 4 bytes, and a long either 4 or 8 bytes.
In all of the types described above, one bit is used to store the sign (positive or negative) or a value. If you decide that a variable will never hold a negative value, you may use the unsigned modifier to use that one bit for storing other data, effectively doubling the range of values while mandating that those values be positive. The unsigned specifier may also be used without a trailing int, in which case the size defaults to that of an int. There is also a signed modifier which is the opposite, but it is not necessary and seldom used since all types are signed by default.
To use a modifier, just declare a variable with the data type and relevant modifiers attached:
unsigned short int i;
short things;
unsigned long apples;
const modifier
When const is added as a modifier, the declared variable must be initialized at declaration. It is then not allowed to be changed, unless a cast is done.
While the idea of a variable that never changes may not seem useful, there are good reasons to use const. For one thing, many compilers can perform some small optimizations on data when it knows that data will never change. For example, if you need the value of π in your calculations, you can declare a const variable of pi, so a program or another function written by someone else cannot change the variable of pi.
Magic numbers
When you write C programs, you may be tempted to write code that will depend on certain numbers. For example, you may be writing a program for a grocery store. This complex program has thousands upon thousands of lines of code. The programmer decides to represent the cost of a can of corn, currently 99 cents, as a literal throughout the code. Now, assume the cost of a can of corn changes to 89 cents. The programmer must now go in and manually change each entry of 99 cents to 89. While this is not that big of a problem, considering the "global find-replace" function of many text editors, consider another problem: the cost of a can of green beans is also initially 99 cents. To reliably change the price, you have to look at every occurrence of the number 99.
C possesses certain functionality to avoid this. This functionality is approximately equivalent, though one method can be useful in one circumstance, over another.
Using the const keyword
The const keyword helps eradicate magic numbers. By declaring a variable const corn at the beginning of a block, a programmer can simply change that const and not have to worry about setting the value elsewhere.
There is also another method for avoiding magic numbers. It is much more flexible than const, and also much more problematic in many ways. It also involves the preprocessor, as opposed to the compiler. Behold...
#define
When you write programs, you can create what is known as a macro, so when the computer is reading your code, it will replace all instances of a word with the specified expression.
Here's an example. If you write
#define PRICE_OF_CORN 0.99
when you want to, for example, print the price of corn, you use the word PRICE_OF_CORN instead of the number 0.99 - the precompiler will replace all instances of PRICE_OF_CORN with the text "0.99", which the compiler will interpret as the literal float 0.99. Notice that, since this is a special directive (the compiler will never know that this line was there), there is no need for a semicolon.
It is important to note that #define has basically the same functionality as the "find-and-replace" function in a lot of text editors/word processors.
For some purposes, #define can be harmfully used, and it is usually preferable to use const if #define is unnecessary. It is possible, for instance, to #define, say, a macro DOG as the number 3, but if you try to print the macro, thinking that DOG represents a string that you can show on the screen, the program will have an error. #define also has no regard for type. It disregards the structure of your program, replacing the text everywhere (in effect, disregarding scope), which could be advantageous in some circumstances, but can be the source of problematic bugs.
You will see further instances of the #define directive later in the text. It is good convention to write #defined words in all capitals, so a programmer will know that this is not a variable that you have declared but a #defined macro.
Scope
In the Basic Concepts section, the concept of scope was introduced. It is important to revisit the distinction between local types and global types, and how to declare variables of each. To declare a local variable, you place the declaration at the beginning (i.e. before any non-declarative statements) of the block the variable is intended to be local to. To declare a global variable, declare the variable outside of any block. If a variable is global, it can be read, and written, from anywhere in your program.
Global variables are not considered good programming practice, and should be avoided whenever possible. They inhibit code readability, create naming conflicts, waste memory, and can create difficult-to-trace bugs. Excessive usage of globals is usually a sign of laziness and/or poor design. However, if there is a situation where local variables may create more obtuse and unreadable code, there's no shame in using globals. (Implementing malloc, which is a function discussed later, is one example of something that is simply too much more difficult to write without at least one global variable.)
Other Modifiers
Included here, for completeness, are more of the modifiers that standard C provides. For the beginning programmer, static and extern may be useful. volatile is more of interest to advanced programmers. register and auto are largely deprecated and are generally not of interest to either beginning or advanced programmers.
static is sometimes a useful keyword. When you declare a variable as static, it is created just like any other variable. However, when the variable goes out of scope (i.e. the block it was local to is finished) the variable stays in memory, retaining its value. The variable stays in memory until the program ends. While this behaviour resembles that of global variables, static variables still obey scope rules and therefore cannot be accessed outside of their scope.
Variables declared static are initialized to zero (or for pointers, NULL) by default.
You can use static in (at least) two different ways. Consider this code, and imagine it is in a file called jfile.c:
static int j = 0;
void upj()
{
static int k = 0;
j++;
}
void downj()
{
j--;
}
The j var is accessible by both upj and downj and retains its value. the k var also retains its value, but is only accessible to upj. static vars are a good way to implement encapsulation, a term from the object-oriented way of thinking that effectively means not allowing changes to be made to a variable except through function calls.
extern is used when a file needs to access a variable in another file that it may not have #included directly. Therefore, extern does not actually carve out space for a new variable, it just provides the compiler with sufficient information to access the remote variable.
volatile is a special type modifier which informs the compiler that the value of the variable may be changed by external entities other than the program itself. This is necessary for certain programs compiled with optimisations - if a variable were not defined volatile then the compiler may assume that certain operations involving the variable are safe to optimise away when in fact they aren't. volatile is particularly relevant when working with embedded systems (where a program may not have complete control of a variable) and multi-threaded applications.
auto is a modifier which specifies an "automatic" variable that is automatically created when in scope and destroyed when out of scope. If you think this sounds like pretty much what you've been doing all along when you declare a variable, you're right: all declared items within a block are implicitly "automatic". For this reason, the auto keyword is more like the answer to a trivia question than a useful modifier, and there are lots of very competent programmers that are unaware of its existence.
register is a hint to the compiler to attempt to optimise the storage of the given variable by storing it in a register of the computer's CPU when the program is run. Most optimising compilers do this anyway, so use of this keyword is often unnecessary. In fact, ANSI C states that a compiler can ignore this keyword if it so desires -- and many do. Microsoft Visual C++ is an example of an implementation that completely ignores the register keyword.
Concepts
In this section
Links
Contents | Previous Chapter | Next Chapter
Back to contents: Beginning C
|