Let us learn about lexical analyzer in C programming and understand how lexical analysis works in C programming with an example.
What is Lexical Analysis?
Lexical analysis occurs at the very first phase of the compilation process. It is also very popularly known as tokenization, and this leads to the efficiency of programming.
Lexical analysis is the process of converting the sequence of characters in a source code into a set of tokens. Some of the common tokens are enlisted below:
Token Name | Example |
---|---|
Keyword | auto, int, if |
Identifier | x, temp, arr |
Separator | , ; () {} |
Literal/Constants | “3.45677” |
A lexer is usually combined with a parser to scan the source code to generate the tokens. It works closely with the syntax analyser.
The lexical analyzers help to find the tokens within a given C program and also calculate the total number of tokens present in it.
Must Read: C Program For First and Follow of Grammar
There are some elements that cannot be categorized into tokens which are as follows:
- Pre-processor directives
- Macro
- Blank
- Newline
- Tabs
- Comments
The following lexical analyzer program in C language includes a function that enlists all the keywords available in the C programming library. Lexical analysis is used in compiler designing process.
Also, we have only used the arithmetic operators in arithmetic_operators variable which can be modified to include other operators in C programming such as relational, logical, ternary and other operators.
Functions of Lexical Analyser in Compiler Design
- Divide the source code into different tokens
- Remove comments
- Remove whitespaces
- Error generation for invalid tokens with line and row numbers.
Lexical Analysis Example
Source code
1 2 3 4 5 6 7 | #include<stdio.h> int main(void) { printf("Hello World\n"); return 0; } |
Result of Lexical Analysis
Identifier: includestdioh
Keyword: int
Identifier: mainvoid
Identifier: printfHello
Identifier: Worldn
Keyword: return
Identifier: 0
Note: This simple C program to implement lexical analysis is compiled with GNU GCC compiler using CodeLite IDE on Microsoft Windows 10 operating system.
Implementation of Lexical Analyzer in C Programming
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | #include<stdio.h> #include<conio.h> #include<ctype.h> #include<string.h> #include<stdlib.h> int keyword_library(char temp[]); int main() { char ch, temp[40], arithmetic_operator[] = "=+%*/-"; FILE *file_pointer; int count, x = 0; file_pointer = fopen("C:\\Users\\tusoni\\Desktop\\demo.txt","r"); if(file_pointer == NULL) { printf("The file could not be opened.\n"); exit(0); } while((ch = fgetc(file_pointer)) != EOF) { count = 0; while(count <= 5) { if(ch == arithmetic_operator[count]) { printf("\nOperator:\t%c", ch); } count = count + 1; } if(isalnum(ch)) { temp[x++] = ch; } else if((ch == '\n' || ch == ' ') && (x != 0)) { temp[x] = '\0'; x = 0; if(keyword_library(temp) == 1) { printf("\nKeyword:\t%s", temp); } else { printf("\nIdentifier:\t%s", temp); } } } fclose(file_pointer); getch(); return 0; } int keyword_library(char temp[]) { int count = 0, flag = 0; char keywords[32][12] = {"return", "continue", "extern", "static", "long", "signed", "switch", "char", "else", "unsigned", "if", "struct", "union", "goto", "while", "float", "enum", "sizeof", "double", "volatile", "const", "case", "for", "break", "void", "register", "int", "do", "default", "short", "typedef", "auto"}; while(count <= 31) { if(strcmp(keywords[count], temp) == 0) { flag = 1; break; } count = count + 1; } return (flag); } |
Output

If you have any doubts about the implementation of a lexical analyzer in C programming, let us know about it in the comment section.
ما هو تطبيق العالم الحقيقي من محلل المعجمية؟
ويستخدم هذا أساسا لتحليل بناء صفحات الويب، نماذج الويب، لغات البرمجة بما في ذلك مستوى عال، مستوى منخفض والتجمع لغات وأكثر من ذلك بكثير.
Lexical analysis seems to have a huge scope in compiler designing.
algorithm for lexical analysis
what is This ?