A computer program is a set of instructions to perform specific tasks. In computers, these sets of instructions are simply referred to as a program.

When a computer executes or performs a program, it follows its instructions in a predefined manner. It is very much similar to the act of cooking by following a recipe. Although the recipe has to be specifically well defined otherwise, the program will not be executed or will execute incorrectly in the case of computers.

You might question what a well-defined recipe or instruction means and answer that let's take the recipe for making an egg omelet shown in figure 21.

Figure 21: Recipe for Egg Omelette

Depending on how well you understand the recipe and execute the instructions, you might still end up with completely different results than intended. This happens because instructions in figure 21 are still pretty ambiguous, which leaves room for confusion.

For example, when the recipe says "Beat the eggs, salt & water in a bowl" it doesn't specify the exact procedure to beat the eggs or the order in which we should follow the instruction.

Or how to "Heat the pan until hot".

Why do you think the instructions are ambiguous?

In our day-to-day communications, the intentional meaning behind the words is usually inferred from multiple competing meanings.

Every sentence is written concerning a given context. The meaning of the sentence is derived from the context in which it has been presented.

When we write instructions for the computer to perform a specific task, we need to ensure no room for ambiguity. We need to specify every detail of how to perform a task. It turns out that such a level of specificity is quite difficult to achieve in English or any other natural languages such as French, German, or Hindi. To instruct the computer to perform a specific task, we require a specific type of language designed to be unambiguous.

Such languages are called formal language. Examples of formal languages include notations used to describe mathematical axioms, proofs, and equations.

The formal language in which we write a set of instructions for the computer to perform a specific task is called programming language.

We have already encountered two programming languages in the earlier sections. Can you take a guess the programming languages I am referring to?

Machine Language and Assembly Language are both programming languages used to write instructions for the computer.

Before covering formal languages in detail, let's look at some of a language's basic characteristics.

The following are the general characteristics of a language.

  • The building blocks for any written language are symbols. In English, the alphabets, numbers, and punctuation constitute the symbols of the language.
  • When a series of symbols are strung together, they form another element of a written language called word. Only finite combinations of symbols are accepted as a valid word.
  • Each valid word has an underlying meaning.
  • A set of words can be combined in an accepted manner to form a sentence to express a statement, question, or command.
  • Based on a set of rules, a sentence might mean something or might be complete gibberish.

These are the basics of a language.

A language is characterized by mainly two sets of rules relating to syntax and semantics.

  • Syntax: Rules that tell which symbols are allowed to be used and how to put them together to form valid expression.
  • Semantics: Rules that tell you the meanings of valid expressions & symbols.
Before understanding these rules in detail, can you take a guess what syntax means?
In general, syntax refers to the arrangement of words and phrases to create well-formed sentences in a language.

We can define syntax as rules that tell which symbols are allowed to be used and how to put them together to form valid expression. Syntactic rules let us know the validity of a word or a sentence.

Can you now take a guess what semantics mean?

The word semantic relates to meaning in a language. The rules of semantics tell you the meanings of valid expressions & symbols.

The group of valid symbols used in a language is often referred to as tokens. The rules about the order in which these tokens can be combined is called structure. Syntactic rules govern the structure of sentences or expressions, including the order of the valid symbols and group of such symbols.

Therefore, rules relating to the syntaxs relates to the form of the language. In contrast, semantic rules deals with the underlying meaning.

In natural languages, the syntactic rules are flexible and lenient. Because they are flexible, they have space for creating new words and phrases such as:

  • Sup? which is short for What's up ?
  • Peace out, bruh !, which is erm ...... I have little idea what this means but this is a valid expression in the English Language in some parts of the world.
What can you say about the syntactic and semantic rules of formal languages as opposed to natural languages?

For formal languages, the syntactic and semantic rules are precise and strict. For instance, mathematical statements in formal notation are considered correct only when they use the proper mathematical notation (syntax). The resulting statement is meaningful (semantic).

Now, let's look at formal languages.

Formal Languages

The rules are flexible and lenient for natural language because the rules that govern natural languages are not human-made but evolve naturally over a long period. In contrast, formal languages are created artificially.

For formal languages, the syntactic and semantic rules are precise and strict.

Consider the following statement expressed in the formal language of mathematics:
$$
1 + 4 = 5 \tag{A}
$$
In statement A, tokens present in the expression are +, =, 1, 4, and 5. These are valid symbols in mathematics, and the order in which these tokens are arranged also forms a valid expression. Therefore, it is syntactically a valid mathematical expression.

The statement states that adding one to four results in five, which is correct. Therefore semantically, this statement is also valid.

Consider the following statement expressed in the formal language of mathematics:
$$
1 + 4 = 5 \tag{A}
$$

Here, tokens present in the expression are $+, =, 1, 4$, and $5$. These are valid symbols in mathematics, and the order in which these tokens are arranged also forms a valid expression. Therefore, syntactically, it is a valid mathematical expression.

Is the statement A semantically correct? Why or why not?

The statement A states that adding one to four results in five, which is correct. Therefore semantically, this statement is also valid.

Let's take another mathematical statement.
$$
1+4=7 \tag{B}
$$

Is the statement B syntactically correct? Why or Why not?

Similar to statement A above, statement B also has valid tokens and correct structure. Therefore, statement B is syntactically correct.

Is the statement B semantically correct? Why or why not?

The statement B expresses that adding one to four results in seven, which, as we know, is incorrect. Therefore, statement B is syntactically correct, although semantically incorrect.

Let's take another statement.
$$
\text{+ 1  3} = 4 \tag{C}
$$

Is the statement C syntactically correct? Why or why not?

Although statement C has valid tokens, they are arranged in a wrong order, making it an invalid or syntactically incorrect expression.

Is the statement C semantically correct? Why or why not?

In formal languages, syntactically incorrect statements often make it difficult to understand the underlying meaning. Therefore, we cannot comment if the statement C is semantically correct.

Let's take a statement in the English Language.
$$
\text{Th1s $tatement has c0rrect $tructure but 1nvalid t0ken$.} \tag{D}
$$

Is the statement D syntactically correct? Why or why not?

Statement D shows a statement in the English language with invalid tokens, but a correct structure makes it syntactically incorrect.

What about the meaning of Statement D? Is the statement D semantically correct? Why or why not?

Although the sentence is syntactically incorrect, the sentence is still valid semantically because we can understand it's an underlying meaning.

This also shows how lenient natural languages are concerning their rules relating to semantics.

Now let's examine the expressiveness of natural and formal languages.

Take the following mathematical statement.
$$
x = \sqrt{(y + 2)^3} \tag{E}
$$
Statement E shows a mathematical expression involving two unknown variables $x$ and $y$. If you were to read the expression in the English language, it would be something similar to statement F.
$$
\text{The unknown variable } x  \text{ equals to the  square root of the sum} \\ \text{ of the unknown variable } y \text{ and 2, raised to the power 3.} \tag{F}
$$

The two statements E & F expresses the same meaning in a formal and natural language.

Can you comment on the expressiveness of formal languages compared to natural language?

If you compare statement F with statement E, you can see that the formal language of mathematics expresses the same thing in a more concise manner.

It is easier to read and understand statement E than reading statement F, given that you are familiar with the underlying rules.

Formal languages are more concise in expressing meaning than natural languages.

Natural languages are full of words and sentences whose meanings depend on the context. Words might have a different meaning for different situations. Natural language employs more words to provide context to reduce the ambiguity, which makes expressing in the language more verbose with many redundant parts.

Natural Languages also contain symbolism, idioms, metaphors,  and allegory, which makes the underlying meaning of the expression well-hidden unless someone understands the actual meaning.
$$
\text{Luffy keeps trying to get it working,
but I think he's beating a dead horse.} \tag{G}
$$
Statement G says something about beating a dead horse which is an idiom meaning to waste effort on something when there is no chance of succeeding. Someone unfamiliar with the idiom might take the statement at its face value and think Luffy is literally beating a dead horse.

In contrast to natural languages, statements and expressions in formal languages are unambiguous and have a well-defined meaning. Because the formal language is straightforward, the expressions are less verbose and concise. Also, a statement in formal languages precisely means what it expresses.

Table 16: Difference between Natural Language & Formal Language
Natural Language Formal Language
Ambiguous Unambiguous
More redundant and verbose Less Redundant and concise
Obfuscate the actual meaning by employing symbolism Precisely means what it expresses
Evolves naturally Artificially created for a specific domain
Rule related to syntax and semantics are flexible Rules pertaining to syntax and semantics are precise
Examples - English, Hindi, French, and Japanese. Examples - Chemical Equations, Programming Languages, and Mathematical formulae.

The differences between Natural & Formal Languages are summarised in Table 16. In the next section, we will dive into more about the semantics of expressions in a formal language.

Semantics

As we stated before, the semantics of a language relates to the meaning of symbols, tokens, and expressions. Consider the recipe of an omelet shown in figure 21. To follow the recipe, we must understand what each instruction means.

I mentioned it earlier that the recipe is not well-defined. This is because the recipe instructions hide many details, which is believed to be commonly understood by everyone.

For example, the instruction relating to heating the pan assumes that everyone knows how to heat a pan and what a pan is. This simplified instruction comprises a bunch of other hidden instructions to make it generalized.

This process of simplifying instructions or communication of leaving out all concrete details and extracting the essential features is called abstraction.

Abstractions help in hiding away a lot of underlying inferred meaning. Effective communication about things in the abstract requires an intuitive or shared experience between the communicator and the communication recipient.

Abstraction in daily life
Figure 22: Communication is simplified using abstractions.

Let's take figure 22 as an example, in which a person is instructing his friend to bring him a glass of water. The section in the figure 22 without abstractions (or more accurately using lesser abstractions) expands on the instruction to highly specific steps.

At this point, can you describe what abstraction means in your words from whatever you have understood so far?

Abstraction hides away many details of communications. If we start adding more details and specificity to the sentence, we will realize that we can do this for a very long time. Let's expand upon the original statement by adding more details to it.

Abstraction Pyramid

When we expand upon each word, we get an abstraction pyramid. The further we go down in the pyramid, the more we have to unpack the words' specific meaning. This can go on forever in natural languages, as humans communicate in shared abstractions, which cannot be explicitly detailed.

For example, take the conversation shown in figure 22. We can expand upon the original statement by adding more details to it as shown in the figure 23.

Abstraction Tree
Figure 23: Expanding on the abstractions to show the underlying abstraction pyramid

For formal languages, especially programming languages, it is a higher-level thinking tool, possibly an end to the abstraction pyramid.

Before learning more about the abstractions in programming languages, let me ask you a question. What forms the foundation for instructions written using programming languages?

Semantics in Programming Language

Machine language is the only language that a processor then executes directly without a previous transformation. Also, each instruction is written in machine language often corresponds to a specific instruction for a processor to execute. So, it is safe to say that machine languages form the basis on which higher abstractions can be formed.

Programming languages are used to instruct the computer to perform specific tasks or computations.

Being a formal language, the instructions for performing the tasks are completely detailed, and rules for deriving their meaning are precise. When you write a program using a programming language, the computer first reads those instructions and performs them. Performing an instruction of a program is called the execution of the instruction. Every instruction of the program should be properly detailed for the computer to execute it.

For example, let's say we want to write a program that accepts two numbers, multiplies them, and prints the result. Before we can instruct the computer to do that, the definition of numbers in terms of bit patterns in memory, how to take inputs from the keyboard, how to multiply two numbers, and how to print the result on the screen must be understood or known by the computer.

However, if every single time we need to multiply two numbers, we were to define what numbers are and what multiplication means, that would be a bit of drudgery. Software development in programming languages employs many tactics that hide away a lot of implementation details so that we can focus on solving problems at hand.

An example would be the one we encountered in the earlier section. We converted an instruction in machine language to our imaginary assembly language. We did this by storing the three memory addresses in the symbolic address names length, height, and area and using the mnemonic opcode M.U.L..

$$
\text{Sample Instruction in hex : }\overbrace{\text{8b}}^{\text{opcode}} \underbrace{\text{ 48 3f 4e}}_{\text{operands}}
$$

$$
\text{Sample Instruction in Assembly : }\overbrace{\text{MUL}}^{\text{opcode}} \text{    } \text{ } \underbrace{\text{  length   height   area}}_{\text{operands}}
$$

The amount of implementation details a language takes to care for us determines how easy or difficult it is to program in that language. Even though assembly languages provide a very little abstraction over the machine language, it is still much easier to work with than machine language. This is pretty similar to the process of abstraction we saw in figure 23.

Based on the levels of abstraction that the languages employ, the programming languages are divided into high-level languages and low-level languages.

Guess as to what low-level languages mean?

Low-level Programming Languages

The low-level programming languages provide little or no abstraction from a computer's instruction set. The commands or functions in the language map closely to processor instructions.

The term low-level indicates the low abstraction between the language and instruction set understood by the computer, which is why low-level languages are sometimes also described as being close to the hardware.

What are some low-level languages that we have already covered?

We are already familiar with the low-level programming languages: Machine Language and Assembly language.

We earlier mentioned that each processor has a somewhat different instruction set based on its internal architecture. This is often referred to as instruction set architecture (ISA) or computer architecture or simply architecture.

An instruction set architecture is an abstract model of a computer. A realization of an I.S.A., such as a central processing unit (CPU), is called an implementation. Instructions in machine languages directly correspond to the specific instruction set architecture.

The machine language program maps closely to the instruction set of a particular processor of a family of processors. Because assembly depends on the machine code instructions, every assembler has its assembly language designed for precisely one specific computer architecture.

A program is said to be portable if the same code works in different environments - for example, different processors or different operating systems. Programs written in low-level languages tend to relatively non-portable due to being optimized for a certain type of processor's architecture. Furthermore, it requires memorizing or looking up numerical opcodes or mnemonic codes for every instruction and is difficult to modify.

Programmers rarely write programs directly in machine code because it requires attention to numerous details that a high-level language handles automatically.

Examples of High-level languages are Python, C, C++, Java, and Lisp.

I earlier mentioned that the low-level languages have low abstraction. In contrast to low-level languages, high-level languages have higher or stronger abstractions. What do you think a high level of abstraction means?

High-level Programming Languages

The high-level programming languages lets you express instructions without worrying about the implementation details. When you drive the car, you only need to work with a couple of things, such as brakes, accelerators, and gears. You don't worry about how every tire or the underlying engine works.

The underlying working of the car is the implementation detail. That's an example of working with higher-level abstraction.

High-level abstraction can also be understood by taking the example of getting a taxi or cab in a city. Can you rearrange the order of things that were needed to be done to get a taxi?

  1. Ask the taxi driver if they can take you to your destination.
  2. Go outside near the street and wait for an unreserved taxi to show up.
  3. If they answer in the affirmative, get inside the taxi.
  4. If the answer is negative, wait for another taxi and repeat all the above steps.
  5. Show your hands up to signal the taxi to stop.

The previous exercise is a high-level overview of how to get a taxi. Now, contrast this with the cab sharing app called Uber. One of the earliest pitch by the founders of Uber was this:

You push few buttons, and a black car shows up at your doorstep.
Where did the implementation details go?

The implementation details are taken care of by the Uber App. The app user can deal with scheduling cabs and choosing a destination without performing any previous steps to get a taxi.

It's worth noting that the user interacts with a taxi's abstraction, not a physical taxi itself. In a way, the Uber App is a higher abstraction on the top of the activity of getting a taxi.

Can you think of another example that provides an abstraction on the top of physical activity?

E-commerce companies such as Amazon provide an option to shop around on their website and add them to a virtual cart. Contrast this with going into a mall and doing the shopping while dragging around a physical cart. It's only fitting to say that E-commerce, such as Amazon, provides an abstraction on shopping while hiding the implementation details.

Similarly, higher-level languages take care of the implementation details and provide you a framework to work towards problem-solving rather than dealing with the implementation details.

Let's revisit the earlier sample instruction where we wrote the instructions for multiplying two numbers in our imaginary machine language.

Which of the following is true for a programmer writing a sample instruction for multiplying two numbers in our imaginary machine language?

  1. The programmer needs to look up the opcode or mnemonic code for multiplication.
  2. The programmer needs to convert the input into binary or hexadecimal representation
  3. The programmer needs to store the converted input's value in two unused memory addresses
  4. All of the above.

The implementation details for the multiplication of two numbers are too many in a machine language. Let's contrast to how something similar is done in the higher-level languages.

We can perform the same calculation in Python in the following manner.

>>> 7 * 20
140

This is it. It will be helpful if you notice the following.

  1. No need to look up memory addresses to store value.
  2. No need to change the representation of the numbers to binary or hexadecimal.
  3. No need to look up the correct opcode for the operation.
  4. And much more readable.

You can certainly see how convenient it is to work with a higher-level. In contrast to low-level programming languages, higher-level may use natural language elements, be easier to use, or may automate (or even hide entirely) significant areas of computing systems (such as. memory management), making the process of developing a program simpler and more understandable than when using a lower-level language.

The amount of abstraction provided defines how "high-level" a programming language is. Rather than dealing with registers, memory addresses and call stacks, high-level languages deal with names, lists, objects, complex arithmetic or Boolean expressions, functions, loops, and other abstract computer science concepts, with a focus on usability over optimal program efficiency.

We will look into all these concepts subsequently in later chapters. For now, the only thing you need to know is high-level programming languages make it easier to write instructions for computers, which coincidentally are easy to read and maintain.

For instance, the following is a sample program written in the high-level language Python.

for num in range(51):
    
    # Divisible by both 3 & 5
    if num % 3 == 0 and num % 5 == 0:
        print("fizzbuzz")
        continue
    
    # Divisible by only 3
    elif num % 3 == 0:
        print("fizz")
        continue
    
    # Divisible by only 5
    elif num % 5 == 0:
        print("buzz")
        continue
    else:
    	print(num)
Listing 1: FizzBuzz program written in Python

At this point, you don't need to know what this program does. However, you can notice that the program is very much readable.

I mentioned earlier that the processor could only process instructions in machine language. How do you think the processor understands the instruction written in the above form?

When instructions are written for a computer in a language other than machine code, it is called source code. The source code needs to be converted to a representation that the processor can execute.

Based on how the source code is converted, programming languages are classified into compiled and interpreted.

Compiled Languages

Let's look into compiled languages first.

Often, high-level languages generate an executable program entirely by compilation.

Compilation is a process of converting the source code into machine language.

To understand how compilation works, let's take a program written in C programming. The program takes user input and prints if the number is odd or even.

#include <stdio.h>
int main() {
    
    int num;
    printf("Enter an integer: ");
	
    // Read the User's input
    scanf("%d", &num);

    // True if num is perfectly divisible by 2
    if(num % 2 == 0)
        printf("The number %d is even.", num);
    else
        printf("The number %d is odd.", num);
    
    return 0;
}
Listing 2: Odd and Even in c, OddEven.c

The above source code is stored in a text editor with a filename OddEven.c.  The source code first needs to be converted into an executable program containing machine language instruction before the processor can execute it. This is done using a C compiler program for C programming language. This is figure shown in figure 24.

Compiler
Figure 24: Compiling a c program

The C compiler converts the source into an executable file that can be executed directly by the processor. It takes the source code file as input and produces a machine language program as output. Figure 25 shows how the program is executed.

Executable Program
Figure 25: Execution of compiled program

As we mentioned earlier, a machine language program is called an executable program, or sometimes, just executable. In the above case, the executable program is called OddEven.exe and can be saved on the hard disk. The processor copies the program's executable version into the main memory to run the program and executes the instructions.

The entire source code is compiled to a corresponding executable file. How does the programmer modify the compiled program?

For compiled languages such as C, the programmer needs to modify the source code written in the OddEven.c file and re-compile to generate the corresponding executable file to modified source code.

Interpreted Languages

In an interpreted language, the source code is converted into another representation, not necessarily machine language. The program is executed with the help of an interpreter program.

Before going into details, let me ask you a question. What's the difference between an interpreter and a translator?

Usually, translators accept a document, for instance, written in Swahili, and they translate the whole document into another language, say English. The compilation is similar to the end-to-end translation of a document.

While interpreters, who are people, interpret what the person is saying at that particular instance in real-time.

Figure 26 is an example of such interpreter interpretation between two roommates.

Figure 26: With and without an interpreter

In interpreted languages, such as Python, the source code is interpreted to something else, which in Python is called bytecode representation.

Let's rewrite the same program written above in Python .

try:
    num = int(input('Enter an integer : '))
    if num % 2 == 0:
        print("The number {} is even.", num)
    else :
        print("The number {} is odd", num)
else:
    print("Please enter an integer")
Listing 3: Odd and even in Python, OddEven.py

A python interpreter is required to run the program. Therefore, to execute the OddEven.py file, we would need a Python interpreter. This is done using the following command:

python OddEven.py

The python interpreter reads the instructions one by one and executes them.

Interpreter Program
Figure 27: Execution of a Python program using Python interpreter

Figure 27 shows how the python interpreter executes source code written in Python.

If a programmer wishes to modify a python program, how does he do that?

The programmer can modify the source code in OddEven.py and re-run the program using an interpreter. There is no extra step of compiling, unlike compiled languages. This is a crucial difference between interpreted and compiled language.

This brings us to the end of our chapter on fundamentals of computing.