I have certain 'processes' when I search for new software. This 'methodology', I use is based on 30+ years of IT experience (19 of them where my primary job was to evaluate new hardware and software technologies for Exxon). I do not look for software by first identifying the candidates and then select based on criteria. But in a way perhaps I do - depends on what you call criteria. I first imagine what the solution I am looking for will look like and behave. And in the case of my search for a virtual singer solution, I judge that today, predominantly, software claiming 'artificial intelligence' has put some of the 'intelligence' of the solution in its 'underpinnings'. But most of the intelligence and actions must still be customized and invoked and 'shaped' by the user of the software. Again, some of intelligence is in the software and most remains in the user of that software.
The solution I am looking for is software where that balance is switched - where most of the intelligence lies in the solution and the user merely guides the software towards the solution. I believe in real life that is how we operate. We care primarily how a singer 'sounds' and often care little about how the singer creates the sound. However, sometimes we ask the singer to make the modifications. For example, if the singer sings "H AND" and we want them to pronounce each occurrence as "HA ND" (hopefully you get the difference in the two pronunciations) in the current generation of software, the user goes to every place the word is sung and assigns a different phoneme code to change the singing pronunciation. However, in real life we just tell the singer "sing each occurrence of "hand" as ... . The solution I am looking for allows we to interact with the virtual singer "in English" the way I would do in real life and not through a bunch of clicking to modify rectangles and curves in a piano roll.
This high level English language approach to creating AI-based software which places most of the intelligence in the virtual singing software and all the underlying customization is handled by the software as much as possible, has been over the years described as an "expert system". I participated in expert system research while a graduate student at Rutgers from 1972 to 1976. And here we are 50+ years later and music expert systems still need conception. Again for my envisioned software solution, I want to interact with a virtual vocalist the way I do in real life - by communicating in English using the three basic audio entities the singer cares about - the audio file or other entity they used to learn the song, the instrumental track they will listening to while they do the singing and the singers vocal track (the 'output' which will be mixed with). Every other object other than those three should be hidden.
So before I delve into more details on what I believe a virtual vocalist expert system should look like versus the current piano roll-based Audio Engineering (AE) model, here is my personal list (from my own experience) of how I collaborate with singers. Again, I am looking for software to handle most of the collaboration and what remains I should be able to tackle with little technical knowledge of the underpinnings of the singing. Some of the software I am looking at (like Dreamweaver Synthesizer V) have an increasing amount of intelligent functionality and what they all (so far) seem to be lacking is an intelligent English interface that hides the technical underpinnings.
The Collaboration Process
• Auditioning (Finding the Vocalist) – There are five basic methods by which a primary vocalist is determined: (a) Listening to demos/snippets of the vocalist singing, (b) filtering out a smaller subset of vocalists based on various criteria including singing style, genre, etc., (c) choosing a vocalist that ‘sounds like’ another vocalist (the later often is well known), (d) choosing a vocalist previously collaborated with, (e) asking vocalists to sing part or all of the song being recorded.
• Learning The Song – (a) Many vocalists have no formal music training nor interest in learning the basic concepts or the theory. They learn a song by hearing it played with a melody sung by some one else or perhaps by an instrument. (b) Other vocalists ask for lyric sheets which contain merely the lyrics and possibly chords on top of the lyrics. (c) some vocalists learn songs with ‘lead sheets’ (which contain melody, chords, and lyrics) or other codified sheet music.
• The Sweet Spot – Once a song is learnt and ready to sing, there must be an agreement on where the vocalists ‘sweet spot’ is located. That way, it can be determined whether the key of the song needs adjusting so that the vocalist can be most effective at the ‘highlights of the song’. Adjusting the key the vocalist sings could impact the accompanying music track and may necessitate the music track be re-recorded.
• Octaving – Sometimes the melody of the written song is way too low or high for the vocalist necessitating singing the whole melody an octave higher/lower. Or perhaps, only parts of the song are out of range or at the ‘edges’ of the vocalist necessitating moving up or down an octave. However, if there is too much movement/too many jumps, then it may be more effective to change the key to place the melody in the middle of the vocalist’s range.
• Singing Style – Choosing a vocalist in the real world comes with a primary singing style. If that does not match the criteria for the choice, there could be a certain amount of leeway for the vocalist. At some point, if the style does not match the criteria another vocalist needs to be chosen.
• Tempo – The vocalist may prefer a particular tempo as they are better singing at certain tempos.
• Articulation – Singers have a primary accent and many can sing in different accents. The singing accent will impact the pronunciation of certain words. For example, the words “hand’ or “potato.”
• Syncing – If the instrument track was created digitally, then the vocalist merely sings at a constant tempo. However, analog tracks (not recorded with click track) especially live performances could vary slightly in tempo in various spots. A quality vocalist can find a way to start and stop at the beginning of phrases so that the resulting vocal track does not have to have certain parts shifted/sync’d.