Manipulating string from CD-text to conform to CDDB style

“Interactive is overstated”  -Aixa Ardín

I am using the ABCDE encoder in Ubuntu to finally digitalize my cd collection into flac files that are replaygain-ed.  Since many of my cd’s are from Spanish language country procedence they are either not available at CDDB or have accents and other special characters in the text that will be used for tagging the tracks.  When CDDB data is not found ABCDE goes to find the CD-Text data included in the CD. The problems I have encountered with ABCDE doing this CD-Text reading are:

  1. The accents and quotes generate invalid encoding problems
  2. The disc information in CD-Text for ARTIST and ALBUMTITLE does not conform to CDDB style and this affects both later tagging and directory naming.
  3. Because of #2 replaygain will not work correctly from within abcde
  4. Optional: figure out a way to disable cddb data retrieval and force cd-text reading within the script when cddb data is way off. (This happens with many international cds)
  5. Year and genre must be edited-in in nano. Is this because they are not included in the cd-text info or because they are not processed?
  6. #2 problem also occurs with multiple artist cds on occasions and the pattern is not one of the 7 patterns offered by abcde to format the information.
  • Replaygain action does not activate but maybe I just have my config file incorrectly setup for this.

 

I have worked many hours trying to solve these problems. The solutions will be included on my own copy of the ‘abcde’  script in /usr/bin now named ‘abcdef’. My solutions do not aim to be elegant or optimized for either resources or speed.  On this post I will share a script that will try to solve #2 above.

I have concocted out this code by hacking and  hammering many pieces fo sed, grep, tr, cat and echo examples out there without a complete understanding of these tools. I do have a general understanding of their purpose, commands and possibilities as I DO READ man pages, faqs and tutorials. But I must say that when complex problems regarding special characters arise in forums many solutions offered sadly do not include a complete explanation of what is happening inside the command, leaving us, sporadic users of the CLI, dumbfoundedly copying commands into scripts without learning a thing. The code has various issues pending solution as stated in the verbose comments, but it is a starting point. I try explaining to myself what is happening in the hope of internalizing the knowledge eventually.

Problem #2
#example
#Cd-text style:  DTITLE=’Alma Adentro: Songbook of Puerto Rico'[from Miguel Zenon]
#CDDB style:   DTITLE=Led Zeppelin / Presence
or                     DTITLE=The Beatles / With The Beatles

#Desired          DTITLE=Miguel Zenon / Alma Adentro: Songbook of Puerto Rico

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
#IF non-cddbstyle Disc title from Cd-text is obtained PERFORM proper restyling

#example
#Cd-text style:		DTITLE='Alma Adentro: Songbook of Puerto Rico'[from Miguel Zenon]
#CDDB style:		DTITLE=Led Zeppelin / Presence  or DTITLE=The Beatles / With The Beatles

# try reorganizing output from cd-text format of ATITLE to conform to  'Album Artist' /
#'Album title'

#issues pending:
#	when title or artist name have quotes there will probably be problems
# 	when the word 'from' is not included there will probably be problems
#	when the  word 'FROM' has different casing there will probably be problems

echo -e "\n\n***********************"

ATITLE="'Alma Adentro: Songbook of Puerto Rico'[from Miguel Zenon]"
echo -e "The string is $ATITLE \n"

#remove old file
rm -f ./temp.txt

#Get the disk title found between single quotes
# the parenthesis
# 	--although not specified on most tutorials as special characters for sed, they are
# 	special to REGULAR EXPRESSIONS see http://www.regular-expressions.info/brackets.html  
# will capture what is found in between the single quotes WITH the single quotes included
echo $ATITLE | sed -n "s/.*\('.*'\).*/\1/p" | tr -d \' > temp.txt
echo "The Album Title is now correctly defined "
head ./temp.txt
echo -e " \n"

#assign to a variable
#will probably need to prune newline  #well as it turns out NO newline is appended one less problem
NEWVAR=$(head ./temp.txt)

# APPEND somevar="$somevar$somevar"

#Get the ARTIST found between square brackets []
#	using the same syntax of the single quote extraction above for brackets did not work
#	even if I double quoted or escaped the brackets --another thing I don not understand
#WEIRD getting four lines from next command
#MAYBE I need to cancel out the redirection done on line 28 to avoid this?
(echo $ATITLE | grep -o '\[.*\]' | sed "s/\[from //p" | sed "s/\]//p") >> temp.txt
echo "$(tail -4 ./temp.txt)"   #confirm if weird repetition is happening
echo "chuck it"  #rant
echo -e " \n The Artist is now correctly defined "
tail -1 ./temp.txt  #get only last one 
echo -e " \n"

#now reconstruct the string as needed
NEWVAR="$(tail -1 ./temp.txt) / $NEWVAR"

echo $NEWVAR

 


#IF non-cddbstyle Disc title from Cd-text is obtained PERFORM proper restyling

#example
#Cd-text style: DTITLE=’Alma Adentro: Songbook of Puerto Rico'[from Miguel Zenon]
#CDDB style: DTITLE=Led Zeppelin / Presence or DTITLE=The Beatles / With The Beatles

# try reorganizing output from cd-text format of ATITLE to conform to ‘Album Artist’ /
#’Album title’

#issues pending:
# when title or artist name have quotes there will probably be problems
# when the word ‘from’ is not included there will probably be problems
# when the word ‘FROM’ has different casing there will probably be problems

echo -e “\n\n***********************”

ATITLE=”‘Alma Adentro: Songbook of Puerto Rico'[from Miguel Zenon]”
echo -e “The string is $ATITLE \n”

#remove old file
rm -f ./temp.txt

#Get the disk title found between single quotes
# the parenthesis
# –although not specified on most tutorials as special characters for sed, they are
# special to REGULAR EXPRESSIONS see http://www.regular-expressions.info/brackets.html
# will capture what is found in between the single quotes WITH the single quotes included
echo $ATITLE | sed -n “s/.*\(‘.*’\).*/\1/p” | tr -d \’ > temp.txt
echo “The Album Title is now correctly defined ”
head ./temp.txt
echo -e ” \n”

#assign to a variable
#will probably need to prune newline #well as it turns out NO newline is appended one less problem
NEWVAR=$(head ./temp.txt)

# APPEND somevar=”$somevar$somevar”

#Get the ARTIST found between square brackets []
# using the same syntax of the single quote extraction above for brackets did not work
# even if I double quoted or escaped the brackets –another thing I don not understand
#WEIRD getting four lines from next command
#MAYBE I need to cancel out the redirection done on line 24 to avoid this?
(echo $ATITLE | grep -o ‘\[.*\]’ | sed “s/\[from //p” | sed “s/\]//p”) >> temp.txt
echo “$(tail -4 ./temp.txt)” #confirm if weird repetition is happening
echo “chuck it” #rant
echo -e ” \n The Artist is now correctly defined ”
tail -1 ./temp.txt #get only last one
echo -e ” \n”

#now reconstruct the string as needed
NEWVAR=”$(tail -1 ./temp.txt) / $NEWVAR”

echo $NEWVAR

Anuncios

Responder

Por favor, inicia sesión con uno de estos métodos para publicar tu comentario:

Logo de WordPress.com

Estás comentando usando tu cuenta de WordPress.com. Cerrar sesión / Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Cerrar sesión / Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Cerrar sesión / Cambiar )

Google+ photo

Estás comentando usando tu cuenta de Google+. Cerrar sesión / Cambiar )

Conectando a %s