UNIX Course Part II Working with files
Andy Hauser LAFUGA & Chair of Animal Breeding and Husbandry Gene Center Munich LMU June, 2016
1
UNIX Course Part II Working with files Andy Hauser LAFUGA & - - PowerPoint PPT Presentation
UNIX Course Part II Working with files Andy Hauser LAFUGA & Chair of Animal Breeding and Husbandry Gene Center Munich LMU June, 2016 1 Recall ls list file, information about files cd change working directory mkdir make directory
Andy Hauser LAFUGA & Chair of Animal Breeding and Husbandry Gene Center Munich LMU June, 2016
1
ls list file, information about files cd change working directory mkdir make directory whoami; id information about user groups; id information about group memberships df -h information about disks man manual pages
CPU
Hardware
RAM Disks Net Keyboard Mouse
Communication
Kernel Drivers Kernel Resource Management Kernel User ABI/API Command Line Interface Desktop Environment
command stdin (0) stdout (1) stderr (2) 2> > >> create / overwrite append 2>> append create / overwrite
$ command1 | command2 | command3 $ command1 > /tmp/file1 $ command2 < /tmp/file1 > /tmp/file2 $ command3 < /tmp/file2
Advantages:
Stdout of one command is connected to stdin of another command with pipe „|“.
$ echo foo foo $ echo foo bar foo bar $ echo foo bar foo bar $ echo "foo bar" foo bar $ echo -n "foo bar" foo bar$ $ echo hello > hello $ echo world >> world
Arguments are separated by one or more space One argument with spaces must be protected.
stdout goes to a file, creating /overwriting it stdout goes to a file, appending to it
$ cat hello hello world $ echo hello > hello $ cat hello hello $ cat < hello $ echo world > world $ cat hello world hello world $ cat < hello < world
Not necessarily supported by a shell
$ touch touch.whatever $ echo abc > abc.txt $ cp abc.txt abc.copy.txt $ ls -l abc.txt
Note that default permissions are influenced by umask.
$ rm touch.whatever
$ ls -l abc.txt
User Permissions Size blocks Group Size Bytes Creation date Filename User Group Others
Closest kicks in. E.g. rwx—-rwx will disallow any group member but not others
$ ls -l world
$ chmod a-rwx world $ ls -l world
$ chmod o+r world $ ls -l world
$ chmod ug+rwx world $ ls -l world
$ chmod g-w world $ ls -l world
$ cp abc.txt abc.copy.txt $ scp abc.txt housedata:abc.backup.txt $ cp -r foo/ bar/ $ scp -r foo/ housedata:bar/ $ rsync -av foo/ housedata:bar/
bar/ needs to exist! bar/ will be created can copy over SSH
$ echo * .CFUserTextEncoding .DS_Store .RData .Rapp.history .Rhistory .Rprofile .Trash .Xauth
ierc .emai .gem .gitconfig .gnupg .hgrc .ionic .lesshst .lldb .local .npm .plugman . rnd .rstudio- desktop .screenrc .ssh .subversion .toprc .vim .viminfo .vimrc .wmii .xbindkeysrc .x pdfrc .zcompdump .zshrc .zshrc.local AndroidStudioProjects Attachments Bout2 Desktop Documents Downloads HistTexte.pdf Library MA Movies Music Pictures Public URLS2 admix backup brew-install.rb bta4_bending_andy_filtered_10_logl.png bta4_bending_andy_filtered_logl.png bta4_bending_andy_logl.png bta4_bending_logl.png chess config dot emai github ivica lm.RData src titel-small.png tmp wrk $ echo */ .Trash/ .cache/ .config/ .cordova/ .cups/ .emai/ .gem/ .gnupg/ .ionic/ .lldb/ .local / .npm/ .plugman/ .rstudio-desktop/ .ssh/ .subversion/ .vim/ AndroidStudioProjects/ Attachments/ Bout2/ Desktop/ Documents/ Downloads/ Library/ MA/ Movies/ Music/ Pictures/ Public/ admix/ backup/ chess/ dot/ emai/ github/ ivica/ src/ tmp/ wrk/ $ echo .* .CFUserTextEncoding .DS_Store .RData .Rapp.history .Rhistory .Rprofile .Trash .Xauth
ierc .emai .gem .gitconfig .gnupg .hgrc .ionic .lesshst .lldb .local .npm .plugman . rnd .rstudio- desktop .screenrc .ssh .subversion .toprc .vim .viminfo .vimrc .wmii .xbindkeysrc .x pdfrc .zcompdump .zshrc .zshrc.local $ echo *.png bta4_bending_andy_filtered_10_logl.png bta4_bending_andy_filtered_logl.png bta4_bending_andy_logl.png bta4_bending_logl.png titel-small.png $ echo bta4_bending_* bta4_bending_andy_filtered_10_logl.png bta4_bending_andy_filtered_logl.png bta4_bending_andy_logl.png bta4_bending_logl.png
$ echo ? zsh: no matches found: ? $ touch a $ echo ? a $ touch b $ echo ? a b $ echo [a-z] a b $ echo [abc] a b $ echo [^abc] zsh: no matches found: [^abc] $ echo [^b-z]
? matches any one character [] for matching character sets, allowing for ranges like a-z. And [^] for matching not that character set.
$ wget ftp://ftp.ensemblgenomes.org/pub/release-29/bacteria//gtf/bacteria_86_collection/ escherichia_coli_gca_000770055/Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf.gz
bacteria_86_collection/escherichia_coli_gca_000770055/ Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf.gz => 'Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf.gz' Resolving ftp.ensemblgenomes.org... 193.62.197.94 Connecting to ftp.ensemblgenomes.org|193.62.197.94|:21... connected. Logging in as anonymous ... Logged in! ==> SYST ... done. ==> PWD ... done. ==> TYPE I ... done. ==> CWD (1) /pub/release-29/bacteria//gtf/bacteria_86_collection/ escherichia_coli_gca_000770055 ... done. ==> SIZE Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf.gz ... 323321 ==> PASV ... done. ==> RETR Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf.gz ... done. Length: 323321 (316K) (unauthoritative) Escherichia_coli_gca_0007700 100%[=============================================>] 315.74K 64.5KB/s in 4.9s 2016-06-14 21:22:19 (64.5 KB/s) - 'Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf.gz' saved [323321] $ gunzip Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf.gz $ ls Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf
$ head Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf #!genome-build ASM77005v1 #!genome-version GCA_000770055.1 #!genome-date 2014-11 #!genome-build-accession GCA_000770055.1 #!genebuild-last-updated 2014-11 Contig0000020 ena gene 680 1846 . + . gene_id "JQ56_06920"; gene_version "1"; gene_name "nhaA"; gene_source "ena"; gene_biotype "protein_coding"; Contig0000020 ena transcript 680 1846 . + . gene_id "JQ56_06920"; gene_version "1"; transcript_id "KGP19944"; transcript_version "1"; gene_name "nhaA"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "nhaA-1"; transcript_source "ena"; transcript_biotype "protein_coding"; Contig0000020 ena exon 680 1846 . + . gene_id "JQ56_06920"; gene_version "1"; transcript_id "KGP19944"; transcript_version "1"; exon_number "1"; gene_name "nhaA"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "nhaA-1"; transcript_source "ena"; transcript_biotype "protein_coding"; exon_id "KGP19944-1"; exon_version "1"; Contig0000020 ena CDS 680 1843 . + 0 gene_id "JQ56_06920"; gene_version "1"; transcript_id "KGP19944"; transcript_version "1"; exon_number "1"; gene_name "nhaA"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "nhaA-1"; transcript_source "ena"; transcript_biotype "protein_coding"; protein_id "KGP19944"; protein_version "1"; Contig0000020 ena start_codon 680 682 . + 0 gene_id "JQ56_06920"; gene_version "1"; transcript_id "KGP19944"; transcript_version "1"; exon_number "1"; gene_name "nhaA"; gene_source "ena"; gene_biotype "protein_coding"; transcript_name "nhaA-1"; transcript_source "ena"; transcript_biotype "protein_coding"; $ head -5 Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf #!genome-build ASM77005v1 #!genome-version GCA_000770055.1 #!genome-date 2014-11 #!genome-build-accession GCA_000770055.1 #!genebuild-last-updated 2014-11
$ cut -f 1-7 Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf > ecoli_1-7.gtf $ head ecoli_1-7.gtf #!genome-build ASM77005v1 #!genome-version GCA_000770055.1 #!genome-date 2014-11 #!genome-build-accession GCA_000770055.1 #!genebuild-last-updated 2014-11 Contig0000020 ena gene 680 1846 . + Contig0000020 ena transcript 680 1846 . + Contig0000020 ena exon 680 1846 . + Contig0000020 ena CDS 680 1843 . + Contig0000020 ena start_codon 680 682 . +
Fields can be given as ranges like 1-7 or comma separated. Delimiter is by default a TAB (\t), but can be nearly any one character.
$ echo a > abc $ echo b >> abc $ echo c >> abc $ grep b abc b $ grep c abc c $ grep -v b abc a c
$ cut -f 1-7 Escherichia_coli_gca_000770055.GCA_000770055.1.29.gtf | grep -v '^#' > ecoli_1-7.gtf $ head ecoli_1-7.gtf Contig0000020 ena gene 680 1846 . + Contig0000020 ena transcript 680 1846 . + Contig0000020 ena exon 680 1846 . + Contig0000020 ena CDS 680 1843 . + Contig0000020 ena start_codon 680 682 . + Contig0000020 ena stop_codon 1844 1846 . + Contig0000020 ena gene 1912 2811 . + Contig0000020 ena transcript 1912 2811 . + Contig0000020 ena exon 1912 2811 . + Contig0000020 ena CDS 1912 2808 . +
The expression can be a simple string like „a“. Certain characters are interpreted though. E.g. ^ means match at the beginning of the line.
$ head ecoli_1-7.gtf | sort -k 3 Contig0000020 ena CDS 1912 2808 . + Contig0000020 ena CDS 680 1843 . + Contig0000020 ena exon 1912 2811 . + Contig0000020 ena exon 680 1846 . + Contig0000020 ena gene 1912 2811 . + Contig0000020 ena gene 680 1846 . + Contig0000020 ena start_codon 680 682 . + Contig0000020 ena stop_codon 1844 1846 . + Contig0000020 ena transcript 1912 2811 . + Contig0000020 ena transcript 680 1846 . + $ head ecoli_1-7.gtf | sort -n -k 5 Contig0000020 ena start_codon 680 682 . + Contig0000020 ena CDS 680 1843 . + Contig0000020 ena exon 680 1846 . + Contig0000020 ena gene 680 1846 . + Contig0000020 ena stop_codon 1844 1846 . + Contig0000020 ena transcript 680 1846 . + Contig0000020 ena CDS 1912 2808 . + Contig0000020 ena exon 1912 2811 . + Contig0000020 ena gene 1912 2811 . + Contig0000020 ena transcript 1912 2811 . +
$ head ecoli_1-7.gtf | cut -f 5 | uniq 1846 1843 682 1846 2811 2808 $ head ecoli_1-7.gtf | cut -f 5 | sort -n | uniq 682 1843 1846 2808 2811 $ head ecoli_1-7.gtf | cut -f 5 | sort -n | uniq -c 1 682 1 1843 4 1846 1 2808 3 2811 $ head ecoli_1-7.gtf | cut -f 5 | sort -n | uniq -c | sort -n 1 1843 1 2808 1 682 3 2811 4 1846
$ cut -f 7 ecoli_1-7.gtf | sort | uniq -c 14313 + 13246 -
Automatic Completetion. Most important key. Works for commands and paths in most shells.
+
Search in the history
+
Delete a word
+
Jump to beginning
+
Jump to end
"Der UNIX - Werkzeugkasten. Programmieren mit UNIX" Brian W. Kernighan, Rob Pike Erscheinungsdatum: 1986 ISBN: 3446142738 "UNIX Power Tools, 2nd Edition" Jerry Peek, Tim O'Reilly & Mike Loukides Erscheinungsdatum: 1997 ISBN: 1-56592-260-3 The Unix Programming Environment Brian W. Kernighan, Rob Pike Year: 1983 ISBN: 978-0139376818 "UNIX Power Tools, 2nd Edition" Jerry Peek, Tim O'Reilly & Mike Loukides Year: 1997 ISBN: 978-1565922600
English Deutsch