Tuesday, 2 May 2017

Grep Command in UNIX and Linux

Grep is the frequently used command in UNIX (or Linux). Most of us use grep just for finding the words in a file. The power of grep comes with using its options and regular expressions. You can analyze large sets of log files with the help of grep command.

Grep stands for Global search for Regular Expressions and Print.

The basic syntax of grep command is

grep [options] pattern [list of files]

Let see some practical examples on grep command.

1. Running the last executed grep command

This saves a lot of time if you are executing the same command again and again.

!grep
This displays the last executed grep command and also prints the result set of the command on the terminal.

2. Search for a string in a file

This is the basic usage of grep command. It searches for the given string in the specified file.

grep "Error" logfile.txt
This searches for the string "Error" in the log file and prints all the lines that has the word "Error".

3. Searching for a string in multiple files.

grep "string" file1 file2
grep "string" file_pattern
This is also the basic usage of the grep command. You can manually specify the list of files you want to search or you can specify a file pattern (use regular expressions) to search for.

4. Case insensitive search

The -i option enables to search for a string case insensitively in the give file. It matches the words like "UNIX", "Unix", "unix".

grep -i "UNix" file.txt

5. Specifying the search string as a regular expression pattern.


grep "^[0-9].*" file.txt
This will search for the lines which starts with a number. Regular expressions is huge topic and I am not covering it here. This example is just for providing the usage of regular expressions.

6. Checking for the whole words in a file.

By default, grep matches the given string/pattern even if it found as a substring in a file. The -w option to grep makes it match only the whole words.

grep -w "world" file.txt

7. Displaying the lines before the match.

Some times, if you are searching for an error in a log file; it is always good to know the lines around the error lines to know the cause of the error.

grep -B 2 "Error" file.txt
This will prints the matched lines along with the two lines before the matched lines.

8. Displaying the lines after the match.

grep -A 3 "Error" file.txt
This will display the matched lines along with the three lines after the matched lines.

9. Displaying the lines around the match

grep -C 5 "Error" file.txt
This will display the matched lines and also five lines before and after the matched lines.

10. Searching for a sting in all files recursively

You can search for a string in all the files under the current directory and sub-directories with the help -r option.

grep -r "string" *

11. Inverting the pattern match

You can display the lines that are not matched with the specified search sting pattern using the -v option.

grep -v "string" file.txt

12. Displaying the non-empty lines

You can remove the blank lines using the grep command.

grep -v "^$" file.txt

13. Displaying the count of number of matches.

We can find the number of lines that matches the given string/pattern

grep -c "sting" file.txt

14. Display the file names that matches the pattern.

We can just display the files that contains the given string/pattern.

grep -l "string" *

15. Display the file names that do not contain the pattern.

We can display the files which do not contain the matched string/pattern.

grep -L "string" *

16. Displaying only the matched pattern.

By default, grep displays the entire line which has the matched string. We can make the grep to display only the matched string by using the -o option.

grep -o "string" file.txt

17. Displaying the line numbers.

We can make the grep command to display the position of the line which contains the matched string in a file using the -n option

grep -n "string" file.txt

18. Displaying the position of the matched string in the line

The -b option allows the grep command to display the character position of the matched string in a file.

grep -o -b "string" file.txt

19. Matching the lines that start with a string

The ^ regular expression pattern specifies the start of a line. This can be used in grep to match the lines which start with the given string or pattern.

grep "^start" file.txt

20. Matching the lines that end with a string

The $ regular expression pattern specifies the end of a line. This can be used in grep to match the lines which end with the given string or pattern.

grep "end$" file.txt


What is UNIX

UNIX is a multi-user multitasking-optimized operating system that can run on various hardware platforms.

1. What command can you use to display the first 3 lines of text from a file and how does it work?
There are two commands that can be used to complete this task:
        head -3 test.txt – this uses the “head” command, along with the “-3” parameter that indicates the number of lines to be displayed;
        sed ‘4,$ d’ test.txt – this command uses the Sed text editor to perform the task. If the command was simply “sed test.txt” the whole file would have been displayed; however, in our example the delete parameter was used (d) to make Sed delete everything between the 4th and the last line (defined by the $ parameter), leaving only the first 3 lines of the file. It is important to mention that Sed does not actually delete the lines from the file itself, but just from the output result.
2. How can you remove the 7th line from a file?
The easiest way is by using the following command: sed -i ‘7 d’ test.txt
Unlike the previous Sed command, this command also has the “-i” parameter, which tells Sed to make the change in the actual file.
3. How do you reverse a string?
You can reverse a string by using a simple piping of two commands: echo “Mary” | rev
The first command will generate the output “Mary”, which will become the input for the rev command, making it return the reverse: “yraM”.
AIX (Advanced Interactiveexecutive)
AIX is an open operating system from IBM that is based on a version of UNIX.
AIX/ESA was designed for IBM's System/390 or large server hardware platform.
BASIC FILE HANDLING
ls
- list files in directory; use with options
-l (long format)
-a (list . files too)
-r (reverse order)
-t (newest appears first)
-d (do not go beyond current directory)
-i (show inodes)

- used to control input by pages - like the dos /p argument with dir. e.g. 

- show present working directory. e.g.
$ pwd
/usr/live/data/epx/vss2

To change the current working directory use 
cd

- change directory (without arguments, this is the same as $ cd $HOME or $ cd ~)

<source><destination> - move a file from one location to another. e.g.
$ mv /tmp/jon/handycommands.txt . # movehandycommands in /tmp/jon to current directory
$ mv -f vihelp vihelp.txt # Move file vihelp to vihelp.txt (forced) 

Options
·         -f (to force the move to occur)
·         -r (to recursively move a directory)
·         -p (to attempt to preserve permissions when moving)

<filename> - removes a file. e.g.
$ rm /tmp/jon/*.unl # remove all *.unl files in /tmp/jon
$ rm -r /tmp/jon/usr # remove all files recursively
 Options
·         -f (to force the removal of the file)
·         -r (to recursively remove a directory)

Recursively lists directories and their sizes. e.g.
$ du /etc # list recursively all directories off /etc
712 /etc/objrepos
64 /etc/security/audit
536 /etc/security
104 /etc/uucp
8 /etc/vg
232 /etc/lpp/diagnostics/data
240 /etc/lpp/diagnostics
248 /etc/lpp
16 /etc/aliasesDB
16 /etc/acct
8 /etc/ncs
8 /etc/sm
8 /etc/sm.bak
4384 /etc
The sizes displayed are in 512K blocks. To view this in 1024K blocks use the option -k

lp -d<Printername><Filename>
send file to printer. e.g. $ lp -dhplas14 /etc/motd # send file /etc/motd to printer hplas14
$ lp /etc/motd # send file /etc/motd to default printer

chmod <Octal Permissions><file(s)>
- change file permissions. e.g.
$ chmod 666 handycommands
changes the permissions (seen by 
ls -l) of the file handycommands to -rw-rw-rw-
r = 4, w = 2, x = 1. In the above example if we wanted read and write permission for a particular file then we would use r + w = 6. If we then wanted to have the file have read-write permissions for User, Group and All, then we would have permissions of 666. Therefore the command to change is that above.
$ chmod 711 a.out
Changes permissions to: 
-rwx--x--x
Additional explanation of file permissions and user/group/all meaning are given in the description of 
ls -l
You may specify chmod differently - by expressing it in terms of + and - variables. For example
$ chmodu+s /usr/bin/su
will modify the "sticky bit" on su, which allows it to gain the same access on the file as the owner of it. What it means is "add s permission to user". So a file that started off with permissions of "-rwxr-xr-x" will change to "rwsr-xr-x" when the above command is executed. You may use "u" for owner permissions, "g" for group permissions and "a" for all.

chown <Login Name><file(s)>
- Change ownership of a file. Must be done as root. e.g.
chowninformix *.dat # change all files ending .dat to be owned by informix

chgrp <Group Name><file(s)>
- Change group ownership of a file. Must be done as root. e.g.
chgrp sys /.netrc # change file /.netrc to be owned by the group sys

mvdir <Source Directory><Destination Directory>
- move a directory - can only be done within a volume group. To move a directory between volume groups you need to use mv -r
or 
find <dirname> -print | cpio -pdumv<dirname2>rm -r <dirname>

cpdir <Source Directory><Destination Directory>
- copy a directory. See mvdir

rmdir <Directory>
- this is crap - use rm -r instead

mkdir <Directory>
- Creates a directory. e.g.
$ mkdir /tmp/jon/ # create directory called /tmp/jon/ 

head -<Number><FileName>
- prints out the first few line of a file to screen. Specify number to indicate how many lines (default is 10). e.g. If you sent something to a labels printer and it wasn't lined up, then you could print the first few labels again using:
$ head -45 label1.out | lp -dlocal1

tail -<Number><FileName>
- prints out the end of a file. Very similar to head but with a very useful option '-f' which allows you to follow the end of a file as it is being created.e.g.
$ tail -f vlink.log # follow end of vlink.log file as it is created.

wc -<options><FileName>
- Word Count (wc) program. Counts the number of chars, words, and lines in a file or in a pipe. Options:
·         -l (lines)
·         -c (chars)
·         -w (words)
To find out how many files there are in a directory do ls | wc -l

split -<split><FileName>
- Splits a file into several files.e.g.
$ split -5000 CALLS1 # will split file CALLS1 into smaller files of 5000 lines each called xaa, xab, xac, etc.

- cut's the file or pipe into various fields. e.g.
$ cut -d "|" -f1,2,3 active.unl # will take the file active.unl which is delimited by pipe symbols and print the first 3 fields options: 
·         -d <delimiter>
·         -f <fields>
Not too useful as you can't specify the delimiter as merely white space (defaults to tab) Alternatively, you can 'cut' up files by character positioning (useful with a fixed width file). e.g.
$ cut -c8-28 "barcode.txt" # would cut columns 8 to 28 out of the barcode.txt file.
- paste will join two files together horizontally rather than just tacking one on to the end of the other. e.g. If you had one file with two lines:
Name:
Employee Number:
and another file with the lines:
Fred Bloggs
E666
then by doing:
$ paste file1 file2 > file3 # this would then produce (in file3).
Name: Fred Bloggs
Employee Number: E666

- list users who are currently logged on (useful with option 'am i' - i.e. 'who am i' or 'whoami')

exit
- end current shell process. If you log in, then type this command, it will return you to login. ^D (control-D) and logout (in some shells) does the same.

- login to a remote machine, e.g.
$ rlogin hollandrs # log in to machine called hollandrs
Useful with -l option to specify username - e.g.
$ rlogin cityrs -l ismsdev # log in to machine cityrs as user ismsdev For further info about trust network see .rhosts file and /etc/resolv.conf (I think).

telnet
- very similar to rlogin except that it is more flexible (just type telnet with no arguments and then '?' to see the options). Useful because you can specify a telnet to a different port.

ftp
- File Transfer Protocol - a quick and easy method for transferring files between machines. The .netrc file in your $HOME directory holds initial commands. type ftp without arguments and then '?' to see options)

rcp
- Remote copy. Copies a file from one unix box to another, as long as they trust each other (see .rhosts file or /etc/resolv.conf I think). Options
·         -f (to force the copy to occur)
·         -r (to recursively copy a directory)
·         -p (to attempt to preserve permissions when copying)
logout
syntax: logout
This command should be issued to terminate your session and allow the next user to access the computer. The logout command will not execute if there are any stopped jobs. To logout, you must first kill or activate the jobs. The logout command will not terminate any jobs running in the background. It is imperative that you remember to kill all background jobs before logging out.
kill
syntax: kill [-signal] processID

This command is used to terminate a process. For the processID, you can specify a percent sign followed by the job ID returned by the jobs command, or the process ID returned by the ps command. Some processes may ignore the SIGTERM signal sent by the kill command. To terminate these processes, you must specify the -kill signal.

Big Data FQA

1) What is a Big Data Architecture? Generally speaking, a Big Data Architecture is one which involves the storing and processing of data...