Login| Sign Up| Help| Contact|

Patent Searching and Data


Title:
ARTIFICIAL INTELLIGENCE SYSTEM & METHODOLOGY TO AUTOMATICALLY PERFORM AND GENERATE MUSIC & LYRICS
Document Type and Number:
WIPO Patent Application WO/2021/159203
Kind Code:
A1
Abstract:
A method and system of using artificial intelligence to automatically create and generate music and lyrics that can later be played through a music engine or be played by human performers.

Inventors:
HANG CHU (CA)
KYUN KOH SANG (US)
MASSIMO GUIDA (CA)
Application Number:
PCT/CA2021/050143
Publication Date:
August 19, 2021
Filing Date:
February 10, 2021
Export Citation:
Click for automatic bibliography generation   Help
Assignee:
1227997 B C LTD (CA)
International Classes:
G01H1/00; G06N3/02; G06N3/08; G06N20/00; G09B15/00
Foreign References:
US20190362696A12019-11-28
US20170092247A12017-03-30
Other References:
OPENAI, MUSENET, 25 April 2019 (2019-04-25), XP055847213, Retrieved from the Internet
Attorney, Agent or Firm:
FASKEN MARTINEAU DUMOULIN LLP (CA)
Download PDF:
Claims:
Hll EMBODIMENTS FOR WHICH AN EXCLUSIVE PRIVILEGE OR PROPERTY IS CLAIM ARE AS FOLLOWS:

1. A method for composing a music composition, the method comprising the steps of:

(a) a user selecting target music data and compositional rule set for the music composition;

(b) processing the target music data through a rule annotator to detect and compile a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set;

(c) training a machine learning model for composing the music composition using a training data, wherein the training data includes the complied rule violations and the target music data;

(d) composing the music composition using the machine learning model retrained with the training data and the compositional rule set.

2. The method of claim 1 wherein the compositional rules set comprises a musical rule set and a lyric rule set.

3. The method of claim 2 wherein the compositional rule set is based on a plurality of species counterpoint rules.

4. The method of claim 3 wherein the species counterpoint rule is based on 16th century species-counterpoint rules.

5. The method of claim 3 wherein the species counterpoint rule is based on 17th century species-counterpoint rules.

6. The method of claim 3 wherein the compositional rules are selected from the group consisting of melodic, horizontal, contrapuntal, vertical, musical phrase, cadence, motivic, and text-setting rules. 7. The method of claim 6 wherein the target music data is rendered from music composed by a human.

8. The method of claim 7 wherein the rule annotator represents each musical note or a lyric in the music data as a pair of two (or three) integer values based on or corresponding to the musical note’s pitch and time duration enabling querying of neighboring musical notes or lyrics to verify a rule violation.

9. A method for comparing a first and second music composition, the method comprising the steps of:

(a) a first user selecting an input criteria for composing the first and second music composition;

(b) a machine learning model composing the first music composition based on the input criteria, the machine learning model employing multiple types of machine learning algorithms to compose the first music composition;

(c) a human composing the second music composition based on the input criteria;

(d) performing the first and second music compositions

(e) a second user comparing the performance of the first music compositions and performance of the second music composition to identify which music composition was composed by the human or the machine learning model, wherein the second user does no know which music of the first and second composition was composed by the human or the machine learning model.

10. A system implemented on a machine that utilizes predictive models of human memory to compose a music composition, the system comprising a component that

(a) acquires a target music data and a compositional rule set via a user interface (b) processes the target music data through a rule annotator to detect a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set;

(b) trains a machine learning model for composing the music composition using the training data, wherein the training data includes the complied rule violations and the target music data; and

(c) composes the music composition using the machine learning model trained with the training data and the compositional rule set.

11. The system of claim 10 wherein the compositional rules set comprises a musical rule set and a lyric rule set.

12. The system of claim 11 wherein the compositional rule set is based on a plurality of species counterpoint rules.

13. The system of claim 12 wherein the species counterpoint rule is based on 16th century species-counterpoint rules.

14. The system of claim 13 wherein the species counterpoint rule is based on 17th century species-counterpoint rules.

15. The system of claim 12 wherein the compositional rules are selected from the group consisting of melodic, horizontal, contrapuntal, vertical, musical phrase, cadence, motivic, and text-setting rules.

16. The system of claim 15 wherein the target music data is rendered from music composed by a human.

17. The system of claim 16 wherein the rule annotator represents each musical note in the music data as a pair of two integer values based on or corresponding to the musical note’s pitch and time duration enabling querying of neighboring musical notes to verify a rule violation. 18. One or more non-transitory computer-readable media storing computer-executable instructions that upon execution cause one or more processors to perform acts comprising:

(a) a receiving a target music data and compositional rule set for the music composition from a user;

(b) processing the target music data through a rule annotator to detect a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set;

(c) training a machine learning model for composing the music composition using the training data, wherein the training data includes the complied rule violations and the target music data; and

(d) composing the music composition using the machine learning model trained with the training data and the compositional rule set.

19. The non-transitory computer-readable media of claim 18 wherein the compositional rules set comprises a musical rule set and a lyric rule set.

20. The non-transitory computer-readable media of claim 19 wherein the compositional rule set is based on a plurality of species counterpoint rules.

21. The non-transitory computer-readable media of claim 20 wherein the species counterpoint rule is based on 16th century species-counterpoint rules.

22. The non-transitory computer-readable media of claim 21 wherein the species counterpoint rule is based on 17th century species-counterpoint rules.

23. The non-transitory computer-readable media of claim 20 wherein the compositional rules are selected from the group consisting of melodic, horizontal, contrapuntal, vertical, musical phrase, cadence, motivic, and text-setting rules.

24. The non-transitory computer-readable media of claim 23 wherein the target music data is rendered from music composed by a human.

25. The non-transitory computer-readable media of claim 24 wherein the rule annotator represents each musical note in the music data as a pair of two integer values based on or corresponding to the musical note’s pitch and time duration enabling querying of neighboring musical notes to verify a rule violation.

26. An automated music composition system for composing and performing a music composition in response to a user providing a target music data and a compositional rule set, said automated music composition and generation system comprising: an automated music composition engine, the automated music composition engine using a trained machine learning model to compose the music composition, the trained machine learning model employing multiple types of machine learning algorithms to compose the music composition; a user interface subsystem interfaced with the automated music composition engine, and employing a graphical user interface (GUI) for permitting the user to select the target music data and the compositional rule sets for the music composition; a processing subsystem interfaced with the automated music composition engine:

(i) employing a rule annotator to detect a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set,

(ii) refining the trained machine learning model using a training data, wherein the training data comprises the complied rule violations and the target music data, the refining comprising retraining the machine learning model using the training data.

Description:
ARTIFICIAL INTELLIGENCE SYSTEM & METHODOLOGY TO AUTOMATICALLY PERFORM AND GENERATE MUSIC & LYRICS

FIELD OF INVENTION

0001 The present invention is invention relates to artificial intelligence in general, and more particularly to a novel software platform for authoring, analyzing and performing music and lyrics by artificial intelligence.

BACKGROUND TO THE INVENTION

0002 Artificial intelligence (“A.I.”) is a field of computer science concerned with creating software systems and methods which can perform activities that are traditionally thought to be the exclusive domain of humans. Research in artificial intelligence (AI) is known to have impacted medical diagnosis, stock trading, robot control, and several other fields. One subfield in this area relates to creating software systems and methods that can mimic human behavior, including human creativity, so that these software systems and methods can replicate human characteristics or traits.

0003 A.I. has also contributed to the field of music. Artificial intelligent systems and methods have been subject to much research. Current research includes the application of A.I. in music composition, performance, theory, etc.. Several music software programs have been developed that use A.I. to produce music. A prominent feature is the capability of the A.I. based processes or algorithms to learn based on information obtained such as the computer accompaniment technology, which is capable of listening to and following a human performer so it can perform in synchrony. Artificial intelligence also drives the so- called interactive composition technology, wherein a computer composes music in response to the performance of a live musician. There are several other A.I. applications to music that covers not only music composition, production, and performance but also the way it is marketed and consumed.

0004 However, current A.I. systems either rely on human input to initiate the compositional process or depend solely on analysis of musical data that utilize statistical analysis of the musical data. These two characteristics of current AI software limits the degree of autonomy and the compositional variety. As a result, some music being produced is a human-AI hybrid rather than purely AI-generated, and the music quality does not meet either production or artistic quality. Similarly, music produced through “pure” AI-generated processes known in the art also do not meet either production or artistic quality.

0005 With all the limitations and challenges above comes the need for new technologies and techniques to address the limitations.

SUMMARY OF THE INVENTION

0006 There remains a need for techniques, systems, methods and devices that can be used in the development of artificial intelligence.

0007 An aspect of the present invention is directed to a system for using artificial intelligence to automatically create and generate music and lyrics that can later be played through a music engine or be played by human performers.

0008 A further aspect of the invention is directed to a method for composing a music composition, the method comprising the steps of: (a) a user selecting target music data and compositional rule set for the music composition; (b) processing the target music data through a rule annotator to detect and compile a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set; (c) training a machine learning model for composing the music composition using a training data, wherein the training data includes the complied rule violations and the target music data; (d) composing the music composition using the machine learning model retrained with the training data and the compositional rule set.

0009 Another aspect of the above noted invention is the method wherein the compositional rules set comprises a musical rule set and a lyric rule set.

0010 Yet another aspect of the above noted invention is the method wherein the compositional rule set is based on a plurality of species counterpoint rules. 0011 Yet another aspect of the above noted invention is the method wherein the species counterpoint rule is based on 16th century species-counterpoint rules.

0012 Yet another aspect of the above noted invention is the method wherein the species counterpoint rule is based on 17th century species-counterpoint rules.

0013 Yet another aspect of the above noted invention is the method wherein the compositional rules are selected from the group consisting of melodic, horizontal, contrapuntal, vertical, musical phrase, cadence, motivic, and text-setting rules.

0014 Yet another aspect of the above noted invention is the method wherein the target music data is rendered from music composed by a human.

0015 Yet another aspect of the above noted invention is the method wherein the rule annotator represents each musical note or a lyric in the music data as a pair of two (or three) integer values based on or corresponding to the musical note’s pitch and time duration enabling querying of neighboring musical notes or lyrics to verify a rule violation.

0016 Another aspect of the invention is directed to a method for comparing a first and second music composition, the method comprising the steps of: (a) a first user selecting an input criteria for composing the first and second music composition; (b) a machine learning model composing the first music composition based on the input criteria, the machine learning model employing multiple types of machine learning algorithms to compose the first music composition; (c) a human composing the second music composition based on the input criteria; (d) performing the first and second music compositions; and (e) a second user comparing the performance of the first music compositions and performance of the second music composition to identify which music composition was composed by the human or the machine learning model, wherein the second user does no know which of the first and second composition was composed by the human or the machine learning model.

0017 Another aspect of the above noted invention is directed to a system implemented on a machine that utilizes predictive models of human memory to compose a music composition, the system comprising a component that (a) acquires a target music data and a compositional rule set via a user interface; (b) processes the target music data through a rule annotator to detect a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set; (b) trains a machine learning model for composing the music composition using the training data, wherein the training data includes the complied rule violations and the target music data; and (c) composes the music composition using the machine learning model trained with the training data and the compositional rule set.

0018 Yet another aspect of the above noted invention is the system wherein the compositional rules set comprises a musical rule set and a lyric rule set.

0019 Yet another aspect of the above noted invention is the system wherein the compositional rule set is based on a plurality of species counterpoint rules.

0020 Yet another aspect of the above noted invention is the system wherein the species counterpoint rule is based on 16th century species-counterpoint rules.

0021 Yet another aspect of the above noted invention is the system wherein the species counterpoint rule is based on 17th century species-counterpoint rules.

0022 Yet another aspect of the above noted invention is the system wherein the compositional rules are selected from the group consisting of melodic, horizontal, contrapuntal, vertical, musical phrase, cadence, motivic, and text-setting rules.

0023 Yet another aspect of the above noted invention is the system wherein the target music data is rendered from music composed by a human.

0024 Yet another aspect of the above noted invention is the system wherein the rule annotator represents each musical note in the music data as a pair of two integer values based on or corresponding to the musical note’s pitch and time duration enabling querying of neighboring musical notes to verify a rule violation.

0025 Another aspect of the present invention is directed to one or more non-transitory computer-readable media storing computer-executable instructions that upon execution cause one or more processors to perform acts comprising: (a) a receiving a target music data and compositional rule set for the music composition from a user; (b) processing the target music data through a rule annotator to detect a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set; (c) training a machine learning model for composing the music composition using the training data, wherein the training data includes the complied rule violations and the target music data; and (d) composing the music composition using the machine learning model trained with the training data and the compositional rule set.

0026 Yet another aspect of the present invention is the one or more non-transitory computer-readable media wherein the compositional rules set comprise a musical rule set and a lyric rule set.

0027 Yet another aspect of the present invention is the one or more non-transitory computer-readable media wherein the compositional rule set is based on a plurality of species counterpoint rules.

0028 Yet another aspect of the present invention is the one or more non-transitory computer-readable media wherein the species counterpoint rule is based on 16th century species-counterpoint rules.

0029 Yet another aspect of the present invention is the one or more non-transitory computer-readable media wherein the species counterpoint rule is based on 17th century species-counterpoint rules.

0030 Yet another aspect of the present invention is the one or more non-transitory computer-readable media wherein the compositional rules are selected from the group consisting of melodic, horizontal, contrapuntal, vertical, musical phrase, cadence, motivic, and text-setting rules.

0031 Yet another aspect of the present invention is the one or more non-transitory computer-readable media wherein the target music data is rendered from music composed by a human. 0032 Yet another aspect of the present invention is the one or more non-transitory computer-readable media wherein the rule annotator represents each musical note in the music data as a pair of two integer values based on or corresponding to the musical note’s pitch and time duration enabling querying of neighboring musical notes to verify a rule violation.

0033 Another aspect of the present invention is directed to an automated music composition system for composing and performing a music composition in response to a user providing a target music data and a compositional rule set, said automated music composition and generation system comprising: (a) an automated music composition engine, the automated music composition engine using a trained machine learning model to compose the music composition, the trained machine learning model employing multiple types of machine learning algorithms to compose the music composition; (b) a user interface subsystem interfaced with the automated music composition engine, and employing a graphical user interface (GUI) for permitting the user to select the target music data and the compositional rule sets for the music composition; (c) a processing subsystem interfaced with the automated music composition engine: (i) employing a rule annotator to detect a plurality of rule violations based on the compositional rule set wherein the rule violations comprise where the target music data deviates from an anticipated music data derived from the compositional rule set, and (ii) refining the trained machine learning model using a training data, wherein the training data comprises the complied rule violations and the target music data, the refining comprising retraining the machine learning model using the training data

BRIEF DESCRIPTION OF THE DRAWINGS

0034 In the drawings, which illustrate embodiments of the invention

0035 FIG. 1 illustrates a preferred embodiment of the present invention detailing the overall system architecture and its components.

0036 FIG. 2 illustrates a preferred embodiment of the present invention detailing the Musical Turing Test. DESCRIPTION OF THE PREFERRED EMBODIMENTS

0037 The description that follows, and the embodiments described therein, is provided by way of illustration of an example, or examples, of particular embodiments of the principles and aspects of the present invention. These examples are provided for the purposes of explanation, and not of limitation, of those principles and of the invention.

0038 It should also be appreciated that the present invention can be implemented in numerous ways, including as a process, method, an apparatus, a system, a device or a method. In this specification, these implementations, or any other form that the invention may take, may be referred to as processes. In general, the order of the steps of the disclosed processes may be altered within the scope of the invention. The description that follows, and the embodiments described therein, is provided by way of illustration of an example, or examples, of particular embodiments of the principles and aspects of the present invention. These examples are provided for the purposes of explanation, and not of limitation, of those principles and of the invention.

0039 It will be understood by a person skilled in the relevant art that in different geographical regions and jurisdictions these terms and definitions used herein may be given different names, but relate to the same respective systems.

0040 Although the present specification describes components and functions implemented in the embodiments with reference to standards and protocols known to a person skilled in the art, the present disclosure as well as the embodiments of the present invention are not limited to any specific standard or protocol. Each of the standards for Internet and other forms of computer network transmission (e.g., TCP/IP, UDP/IP, HTML, HTTP, SSL and SFTP) represent examples of the state of the art. Such standards are periodically superseded by faster or more efficient equivalents having essentially the same functions. Accordingly, replacement standards and protocols having the same functions are considered equivalents.

0041 Preferred embodiments of the present invention can be implemented in numerous configurations depending on implementation choices based upon the principles described herein. Various specific aspects are disclosed, which are illustrative embodiments not to be construed as limiting the scope of the disclosure. Although the present specification describes components and functions implemented in the embodiments with reference to standards and protocols known to a person skilled in the art, the present disclosures as well as the embodiments of the present invention are not limited to any specific standard or protocol.

0042 Some portion of the detailed descriptions that follow are presented in terms of procedures, steps, logic block, processing, and other symbolic representations of operations on data bits that can be performed on computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc. may be here, and generally, conceived to be a self-consistent sequence of operations or instructions leading to a desired result. The operations are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like.

0043 A person skilled in the art will understand that the present description will reference terminology from the field of artificial intelligence, including machine learning, and may be known to such a person skilled in the relevant art. A person skilled in the relevant art will also understand that artificial neural networks generally refer to computing or computer systems that are design to mimic biological neural networks (e.g. animal brains). Such systems “learn” to perform tasks by considering examples, generally being programmed with or without task-specific rules. For example, in music composition, such systems might learn to produce musical scores that contain sequences of notes by analyzing musical data from a particular composer or a specific compositional style. A person skilled in the relevant art will understand that convolutional neural networks, recurrent neural networks, transformer neural networks are classes of neural networks that specializes in processing data that has a grid-like or sequential-like topology, such as a music score. A digitized music score is a digital representation of music data. It contains a series of notes arranged in a sequence-like fashion that contains pitch and rhythmic values to denote how human or electronic performers should perform the notations.

0044 Machine learning techniques will generally be understood as being used to identify and classify specific reviewed data. Machine learning approaches first tend to involve what is known in the art as a “training phase”. In the context of classifying functions, a training “corpus” is first constructed. This corpus typically comprises a set of known data. Each set is optionally accompanied with a “label” of its disposition. It is preferable to have fewer unknown samples. Furthermore, it is preferable for the corpus to be representative of the real world scenarios in which the machine learning techniques will ultimately be applied. This is followed by a “training phase” in which the data together with the labels associated with the data, files, etc. themselves, are fed into an algorithm that implements the “training phase”. The goal of this phase is to automatically derive a “generative model”. A person skilled in the relevant art will understand that a generative model effectively encodes a mathematical function whose input is the data and whose output is also the data. By exploiting patterns that exist in the data through the training phase, the model learns the process that generates similar note patterns and arrangements of notes, which are indicative of specific compositional styles. A generative machine learning algorithm should ideally produce a generator that is reasonably consistent with the training examples and that has a reasonable likelihood of generating new instances that are similar to its training data but not identical. Specific generative machine learning algorithms in the art include the Autoregressive Recurrent Neural Networks, Variational Auto-Encoders, Generative Adversarial Neural Networks, Energy-Based Models, Flow-Based Neural Networks, and others known in the art. The term generator is also used to describe a model. For example, one may refer to a Recurrent Neural Network Generator. Once the model/generator is established, it can be used to generate new instances, scenarios or data sets that are presented to a computer or computer network in practice.

0045 The present invention may be a system, a method, and/or a computer program product such that selected embodiments include software that performs certain tasks. The software discussed herein may include script, batch, or other executable files. The software may be stored on a machine-readable or computer-readable storage medium, and is otherwise available to direct the operation of the computer system as described herein and claimed below. In one embodiment, the software uses a local or database memory to implement the data transformation and data structures so as to automatically generate and add libraries to a library knowledge base for use in detecting library substitution opportunities, thereby improving the quality and robustness of software and educating developers about library opportunities and implementation to generate more readable, reliable, smaller, and robust code with less effort. The local or database memory used for storing firmware or hardware modules in accordance with an embodiment of the invention may also include a semiconductor-based memory, which may be permanently, removably or remotely coupled to a microprocessor system. Other new and various types of computer- readable storage media may be used to store the modules discussed herein. Additionally, those skilled in the art will recognize that the separation of functionality into modules is for illustrative purposes. Alternative embodiments may merge the functionality of multiple software modules into a single module or may impose an alternate decomposition of functionality of modules. For example, a software module for calling sub-modules may be decomposed so that each sub-module performs its function and passes control directly to another sub -module.

0046 In addition, selected aspects of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and/or hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present invention may take the form of computer program product embodied in a computer readable storage medium or media having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention. Thus embodied, the disclosed system, a method, and/or a computer program product is operative to improve the design, functionality and performance of software programs by adding libraries for use in automatically detecting and recommending library function substitutions for replacing validated code snippets in the software program. 0047 The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non- exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a dynamic or static random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a magnetic storage device, a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

0048 A person skilled in the relevant art will understand that the computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a Public Switched Circuit Network (PSTN), a packet-based network, a personal area network (PAN), a local area network (LAN), a wide area network (WAN), a wireless network, or any suitable combination thereof. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device. Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Java, Python, Visual Basic.net, Ruby, Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language, Hypertext Precursor (PHP), or similar programming languages. The computer readable program instructions may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server or cluster of servers. In the latter scenario, the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention. A person skilled in the relevant art will understand that the AI based or algorithmic processes of the present invention may be implemented in any desired source code language, such as Python, Java, and other programming languages and may reside in private software repositories or online hosting service such as Github.

0049 A person skilled in the relevant art will understand that the term “deep learning” refers to a type of machine learning based on artificial neural networks. Deep learning is a class of machine learning algorithms (e.g. a set of instructions, typically to solve a class of problems or perform a computation) that use multiple layers to progressively extract higher level features from raw input. For example, in image processing, lower layers may identify edges, while higher layers may identify human-meaningful items such as digits or letters or faces; in music analysis, lower layers may identify local pitch and rhythmic movements, while higher layers may identify emotional artistic expressions of the composer.

0050 A person skilled in the art will understand that the operation of the network ready device (e.g. mobile device, work station, etc.) may be controlled by a variety of different program modules. Examples of program modules are routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. It will be understood that the present invention may also be practiced with other computer system configurations, including multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCS, minicomputers, mainframe computers, and the like. Furthermore, the invention may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote memory storage devices. One skilled in the relevant art would appreciate that the device connections mentioned herein are for illustration purposes only and that any number of possible configurations and selection of peripheral devices could be coupled to the computer system.

0051 Embodiments of the present invention can be implemented by a software program for processing data through a computer system. It will be understood by a person skilled in the relevant art that the computer system can be a personal computer, mobile device, notebook computer, server computer, mainframe, networked computer (e.g., router), workstation, and the like. The program or its corresponding hardware implementation is operable for providing user authentication. In one embodiment, the computer system includes a processor coupled to a bus and memory storage coupled to the bus. The memory storage can be volatile or non-volatile (i.e. transitory or non-transitory) and can include removable storage media. The computer can also include a display, provision for data input and output, etc. as will be understood by a person skilled in the relevant art.

0052 A person skilled in the art will understand that a “Turing Test” is typically referred to as test to tell computers and humans apart. In theory, it is a simple test that can be easily answered by a human but extremely difficult to be answered by a computer. Such tests have been widely used for security reasons, such as for example, preventing automated registration in web-based services like web-based email. Email providers may use an automated Turing Test as a step in the registration process to prevent automated scripts from subscribing and using their resources for spam distribution. Other applications of Automated Turing Tests involve on-line polls, web-blogs, or purchasing products, where only humans are permitted to participate. An automated Turing Test typically presents a human with a token that includes a key. The token is often implemented as an image and the key is often implemented as text within the image. While a human is generally able to identify the text within the image fairly easily, such identification is often difficult for a computer program. Automated Turing Tests typically attempt to frustrate a computer program’s ability to identify the key by embedding text into the image that violates OCR recognition rules. In embodiment of the present invention, an analogy to a Turing Test is used to determine whether music compositions (containing music and/or lyrics) has been composed by a machine (e.g. a computer) or by a person.

0053 It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “receiving,” “creating,” “providing,” or the like refer to the actions and processes of a computer system, or similar electronic computing device, including an embedded system, that manipulates and transfers data represented as physical (electronic) quantities within the computer system’s registers and memories into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.

Composition Rule Sets

0054 A music composition may comprise multiple voices, where a voice will be generally understood to represent an independent musical line consisting of an ordered sequence of notes. Within each voice, certain compositional rules apply to adjacent notes, which are referred to as the Melodic or Horizontal Rules. When notes from two or more voices are involved, the governing rules are referred to as Contrapuntal Rules, or Vertical Rules. Musical Phrase is typically multiple measures that are spliced together to form musical sentences and paragraphs as in written languages. Within the context of Musical Phrase, Cadential Rules function as the musical equivalence of punctuations, partitioning the musical materials into various sections and providing the music a sense of conclusion. Motivic Rules dictate the pattern of specific pitches, rhythms, and intervals, that serve to interrelate various sections and provide the music a sense of cohesiveness. Text-Setting Rules determine how unaccented and accented syllables can be paired with specific rhythms and pitch motions. When all the foregoing rules, as seen in FIG. 1, both music rule sets 110 and lyric rule sets 111, are applied as a set of rules, they are referred to as a Composition Rules Set 115.

0055 A person skilled in the relevant art will understand each type, style, era, etc. of music may have specific Compositional Rules Sets. As such, the methods, systems and apparatus of the present invention can be directed to the production and performance of music and lyrics derived from any Composition Rules Set. In a preferred embodiment, a Compositional Rule Set may be derived from or based on 16th and 17th century species- counterpoint rules. It will be understood, however, that any species-counterpoint rules can be used in the present invention. In this preferred embodiment, as shown in FIG. 1, the target music data 100 (for example, in MIDI format) may be rendered (see 145 of FIG. 1) from computer readable data of human composed music 140 (e.g. the music from specific known musical compositions). A person skilled in the relevant art will understand that any other computer readable format may be used in the embodiments of the present invention. The rendered target music data 100 may contain note values (e.g. pitch content and rhythmic duration), lyrics, performance parameters (e.g. breathing, note velocity, note amplitude, vibrato, maintenance of sound etc.), etc. The AI-based processes of the present invention utilize the Composition Rules Set 115 to determine the best option for the next immediate note or notes to follow in one or multiple voices based on the specific Compositional Rules Set employed (e.g. 115 in FIG. 1) in order to produce Rules Set music and lyrics data 195 via virtual composer 130. In a preferred embodiment, this process is repeated until the conclusion of the composition. The Melodic (horizontal) Rules and the Contrapuntal (vertical) Rules from the Compositional Rules Sets dictate the solution space, which consists of all the plausible note selections that follow the Compositional Rules Sets. The resulting music/lyrics data 195 may be produced based strictly in accordance with that specific Compositional Rule Set (e.g. 115 in FIG. 1). The data 195 may be then said to have zero “errors” (e.g. it complies with the parameters established under the Compositional Rule Set). Despite the Composition Rule Set being designated for a specific time period (i.e. 16th and 17th century), it will be understood that these rule sets tend to be fundamental to most, if not all, music in Western Art Music tradition and provide fertile grounds for implementing AI- rule augmented systems.

0056 Music compositions may also comprise lyrics as well as musical notes. Text- Setting Rules 111 determine how words or syllables (e.g. lyrics/libretto) are applied to the music data 100, which dictates how notes and syllables are paired. When the Text Setting Rules 111 are combined with the Music Setting or Note Placement Rules 110 to arrive at the Compositional Rules Set 115, algorithm based processes of the present invention referred to as purely rule based composition or a searching based virtual composer 130 may be able to produce new music and/or lyrics data set B 195 that abides by the conventions and rules selected (e.g. 16th and 17th species-counterpoint rules). In a preferred embodiment, the Text Setting Rules 111 may be a part of the Compositional Rule Set 115, and it is activated when the composition involves text.

0057 In order to obtain the necessary music derived computer readable data 100, the raw music data (e.g. MIDI or any other known format) 140 must first be reviewed, transcribed and translated into corresponding mathematical and computer readable code formulations (e.g. “rendering” music data at 145 in FIG. 1).

0058 In an embodiment of the present invention, there is provided AI-based processes of the present invention responsible for auto-detection and auto-annotation, referred to herein as the “Rule Annotator” (see 150 in FIG. 1). Using the encoded rules provided under the selected Compositional Rules Set 115 as the basis for analysis, the Rule Annotator 150 scans the musical data and detects points in the music data 100 in which the music deviates from the encoded Compositional Rules Set 115. The data representing these “violations” 160 are then automatically identified and compiled using an auto-annotation algorithm. The algorithm encodes each individual horizontal and vertical rules as a logical expression (i.e. a rule-abiding or rule-violating statement), and sequentially applies this logical checking to annotate the music data. Through the repetition of this process, Rule Annotator 150 collects data on what rules are being violated, how a rule is violated, when it is violated, and how the violation is rectified within each violation’s respective musical contexts. This is the first step towards developing the present inventions’ AI’s based “conscious learning” 170 of the music data 100 and the violations data 160 (see 196 and 197 in FIG. 1). Over time, this process will produce a database of rule violations, which would serve to identify particular styles, composers, and time periods.

0059 The details of this analysis and annotation process may, in a more preferred embodiment, involve the following steps. First, AI-based or algorithmic processes of the present invention represent each musical note (or alternatively, a lyric) as a pair of two (or three) integer values based on or corresponding to the musical note’s pitch and time duration. This data structure enables querying of neighboring notes in the music, both horizontally and vertically, which may be used to verify whether or not certain Compositional Rules Sets are violated (see 160 in FIG. 1).

0060 Simultaneously, AI-based or algorithmic processes of the present invention represented by the Rule Annotator 150 divides the Musical Phrases, which defines a unit of musical segment that has a complete musical sense of its own, by detecting cadences or specific note sequences that signal ends of musical phrases using a Cadential Point Detector, which is based on the aforementioned Cadential Rules that checks specific combinations of note pitch and duration values. Motifs are detected by checking and mining the repetitive note patterns in the note values of music data, which is based on the aforementioned Motivic Rules. Also simultaneously, Rule Annotator 150 first converts the inputted text (e.g. lyrics) into words, then a sequence of syllables, consisting of the syllable name, letters, and accentuations. The syllable sequence is sequentially paired with musical notes, where one syllable can be mapped to multiple notes, and vice versa. The Text Setting Rules 111 dictates when and what syllable can appear based on the notes’ rhythmic durations, pitch values, and horizontal motions from adjacent notes.

0061 In a preferred embodiment of the present invention, a rule-augmented AI-based process referred to as the “conscious learning” algorithm uses the rule violation data 160 from Rule Annotator 150 and music data 100 to train one or more music-generating neural networks (collectively referred to as 185 in FIG. 1) so as to develop a “Conscious Learning” AI-based algorithm at 170. In a preferred embodiment, conscious learning 170 trains the one or more neural networks 185 to allow the generation of music data. After processing 160 and 100, the neural networks 185 learn to jointly model these two data modalities to produce, with AI-based algorithms 180, new music data set A 190. These AI-based algorithms 180 can be referred to as “Mixture of Experts”, as described below.

0062 In a preferred embodiment, a “Mixture-of-Expert” system 180 may be used with 185 and 115 to generate new music data set A 190, where each “expert” (e.g. set of rules or probabilities provided by trained neural networks 185) provides certain constraints or probabilistic distributions of the next note to be added, given preceding notes 190. A search-based algorithm, e.g. A-Star searching algorithm, may be used to improve the efficiency of music generation in 130 and/or 180. In a preferred embodiment, the “Mixture- of-Expert” system 180 produces computer readable data that corresponds to the music and/or lyrics. This computer readable music data 190 and/or 195 can then be converted into sound (See FIG. 2).

Instrumental and Vocal Performing.

0063 In a preferred embodiment, the performances of the generated computer readable music data (see 195 and 190 in FIG. 2) may be automated and may be dictated by neural networks that autonomously control velocity, pedal, vocal range, tonal quality, vibrato, amplitude, breathing, and timbre. When multiple voices are being performed, separately trained neural networks, or other ML- or rule-based algorithms 201 synthesize or automatically generate mixing parameters for EQ, compression, normalization, reverb, and acoustical effects, which are conducive to producing the most favourable musical results 210 and/or 211. If lyrics are involved in the music, the Text Setting Rules 111 provide the basis for how the syllables are coupled with the notes in the performance process.

0064 It will be understood that any musical and lyric computer readable data 100 could be used to “train” the AI-based process of the present invention (e.g. see FIG. 1 where 100 is used to train 170). In a preferred embodiment, the musical (and lyric, when available) computer readable data 100 may be derived from music data 140. As will be understood by a person skilled in the relevant art, MIDI files of Palestrina and Bach’s music may be used as the music data 140. MIDI files, widely available on the web, may have to be downloaded and edited to exclude any anomalies and human errors in the transposition process (e.g. rendered as provided in 145). In an embodiment of the present invention, rhythms, instrumentation, and pitch range may also be manually modified in 145 to ensure that the Rule Annotator 150 can properly analyze the music data files 100.

Training data and methodology for rendering training data

0065 Along with manual tweaking of MIDI files, several methods for rendering the data (see 145 in FIG. 1) may be employed to ensure that the converted MIDI files may be formatted into well-paced symbolic music scores, properly transposed between different music modes, e.g. Aeolian, Phrygian, Mixolydian, Dorian, Ionian, Lydian, and Locarian known by a person skilled in the relevant art, and reduce down to the music’s non ornamental notes, e.g. notes that conveys the main body of the flow of the music. These modifications were then exported as new MIDI files or other known formats, which may be then processed in accordance with the embodiments of the present invention.

Design of Musical Turing Test: Comparison of human compositions and AI generations under same compositional parameters

0066 In a preferred embodiment, as shown in FIG. 2, the Musical Turing Test of the present invention is designed to compare AI-generated musical compositions (e.g. containing music and/or lyrics, such as, for example, music data A 190 and music data B 195 in FIGS. 1 and 2) with music data whereby the music is by human composers (see for example 140 in FIGS. 1 and 2) to determine whether or not the AI based processes of the present invention or other AI systems can compose music comparable to and/or indistinguishable from those of humans. The aim of the Musical Turing Test is to explore the qualities of the music that makes human compositions human and isolate (as shown in 240 of FIG. 2), the effective sub-procedures of the algorithm that correlate to human-like AI music generations.

0067 In a preferred embodiment, human composers may be asked to compose music based on specific criteria, such as, images, texts, themes, etc. These same criteria may be processed by the AI based systems of the present invention to produce music similarly “inspired” by the selected criteria. In a more preferred embodiment, an input criteria, such as an image, may be converted into a representational vector of floating numbers by AI or algorithm based processes such as convolutional neural networks (e.g. a class of deep neural networks, most commonly applied to analyzing visual imagery), which is then converted into a text consisting of a sequence of words that describes the content of the image via AI or algorithm based processes such as recurrent neural networks or transformer neural networks. The generated text is then used to condition music generation, such as, for example, generation of music in accordance to the semantics and meaning of the text. Human performers 220 may record the human compositions 140 to produce audio recordings 214 and may record the transcribed AI-generated music 190 and 195 to produce 213 and 212 (see FIG. 2), and these recordings 212, 213 and 214 may be sampled into excerpts (not shown) (e.g. of one minute in duration and/or sixteen bars of composed music). These audio excerpts may be the sound materials that the human participants will evaluate as part of the Musical Turing Test (see 230 in FIG. 2). Human participants will listen to randomized sound materials and determine whether or not sound files were produced by humans or AI. AI-generated music that were misidentified as being human produced will serve as the proof-of-concept for this music generating software.

Methodology to automatically evaluate algorithm effectiveness

0068 The first type of evaluation requires collecting a statistically sufficient number of human questionnaires as part of the Musical Turing Test (see 230 of FIG. 2) to evaluate the effectiveness of the generated test results, e.g. via ratings. The second type of evaluation (e.g. via a trained evaluation neural network 240) may take test music data (e.g. 140, 190 and 195) and automatically determine features and metrics that may be well-correlated with human ratings. This enables automatic human-like evaluation of the present invention’s algorithms effectiveness, even without a human in-the-loop. In a preferred embodiment, 240 may be used together with 180 (see 241 in FIG. 2) to further improve the quality of 190.

Application of Present Invention

0069 EMUJI™ is a system that automatically converts text into singing music with accompanied instrumental performances. The user of Emuji first inputs text and chooses a desired music style to “Emujify” (e.g. Piano/String/Rock/Pop/etc, male/female/etc). The algorithm generates new and diverse musical samples utilizing the text provided within the musical constraints identified by the user. When a viewer browses text that has been processed by Emuji, the generated music automatically plays.

0070 The Emuji system consists of an automatic text-to-music API empowered by the aforementioned composition algorithm, as well as the GUI of querying and controlling the API, and the streaming of generated music data, while text is being browsed.

0071 Although this disclosure has described and illustrated certain preferred embodiments. As shown in FIG. 1, in a second situation, of the invention, it may be understood that the invention may be not restricted to those embodiments. Rather, the invention includes all embodiments which are functional or mechanical equivalence of the specific embodiments and features that have been described and illustrated.