I used \s{2,}
to search for every space that was 2
or larger. \s
found the spaces, the 2
specified that it was two spaces, and the ,
comma specified
that it was two or more spaces. I then replaced it with a comma followed
by a space.
I used (\w+), (\w+), (.*)
to capture the first word
as the first capture, the second word as the second capture, and the
rest of the line as the third capture. I then replaced it with
\2 \1 \(\3)
to swap the first and last names, as well as
putting the university name (capture 3) in parentheses.
I used mp3
as my find, and replaced it with
mp3 \r
to keep the mp3 and to add a line return after each
number.
I used (\d+)\s(.*)(\.mp3)
to capture the first
digits, ignore the space, capture the rest of the song name, and capture
the .mp3
at the end of the song title. I then used
\2_\1\3
to make the song name capture first, add an
underscore before the digits capture, and keep the .mp3
at
the end.
I used (\w)\w+,(\w+),\d+\.\d+(,\d+)
to keep the
first letter of the genus, get rid of the rest of the genus, keep the
species, get rid of the first number, and keep the final comma and
number. In the replace box, I used \1_\2\3
to format by the
first letter of the genus, underscore species, and keep the final
number.
Using the original data, I used
(\w)(\w+),(\w{4})(\w+),(\d+).(\d+)(,\d+)
to capture
everything independently, making sure to capture the first 4 letters of
the species name. Then, following the order of the captures, I replaced
it with \1_\3\7
to obtain a final result with the first
letter of the genus, underscore, first 4 letters of the species, comma,
and final number.
Using the original data, I found
(\w{3})(\w+),(\w{3})(\w+),(\d+).(\d+)(,\d+)
capturing
everything independently making sure to specify the letters of each word
I wanted to keep. For the replacement, in order to rearrange the digits
I did \1\3\7,\5.\6
, leaving me with the final correct
format.
For the spaces, I typed a space into the find bar and replaced it
with nothing. For the special characters, I went character by character
using versions of \%
until all of the characters were gone.
When it came to replacing the random underscores, I made sure to do
\_
only on the lines of data. For the NA
present in the binary column, I replaced all NA
with
0
.