BASS Audio Recognition Library
BASS Audio Recognition Library is a library (.dll) for use in Win32, Win64 (Windows XP/Vista/7/8/10) and OSX32 software with BASS.
Makes it easy to add audio recognition functionality to your application. This is not a speech recognition library! It compares files with a % similarity, for example to check if two song sound files contain the same music or similar sounds, or to find a sound clip in a recording.
Delphi and C++ API included.
- Find audio file 1 in audio file 2
- Check if two audio files contain the same sound
- The files can have different bitrate/resolution, even stereo and mono
- Option to load audio files from memory location
- "Real-time scanning" functionality, give virtually any BASS channel handle to the library for real-time scanning (microphone, internet streams) with option for multiple scanners on the same BASS channel handle
- 1D, 2D and frequency peak search modes
- OpenCL accelerated search
- Saving and loading of search objects
- Multi threading supported
Requirements: any dev. environment that supports the stdcall calling convention.
- For the 1D/2D search mode: file 1 in file 2 needs to have the same loudness, peak search mode works with any loudness as only the peak frequencies are compared.
BASS Audio Recognition Library in freeware, shareware and commercial software?
You can use this component in your free programs for a very little registration fee. If like it and use it for shareware or commercial software (or make money with it in any way - ads., in app. selling, etc.) you have to buy a license.
With commercial license the source code can be requested (firstname.lastname@example.org) that can be used to adapt the component for your applications and it's also possible to include the audio recognition functionality in Delphi iOS or Android apps. Commercial license also allows the usage of the component inside a commercial company.
Loading audio files
Three modes of loading/searching is possible: 1D, 2D and peak mode.
In 1D mode the FFT (frequency) values of the time slice are summed, so it gives an average sound power of the time slice. The advantage is that this mode is very fast for searching.
In 2D mode a 2 dimensional array is used with time slices containing the frequency spectrum of the time slice. Because the full frequency spctrum is proecessed this mode is much more precise then 1D mode, although searching in this mode is slower.
In 2D peak mode, first a 2D array is created then from every time slice (frame) the peak frequency is extracted and an 1D array is created containing the peaks.
The loaded sound objects need to have the same format and specify for BASS_AudioRecognition_Compare() and BASS_AudioRecognition_StartScanRealTime() function which search mode to use: AR_SEARCH_MODE_1D or AR_SEARCH_MODE_2D or AR_SEARCH_MODE_2D_PEAK_1D.
To load a sound file the following parameters need to be specified:
Example loading from memory with a TMemoryStream:
- FileName: either a pointer to unicode string containing the file name or a pointer to a memory address if loading from memory. Any format that BASS supports will be loaded.
- FileInMemory: Set to True if FileName is a pointer to an audio file memory address.
- FileInMemorySize: The size of the memory block containing the audio files. Not used when FileInMemory is False.
- The rest of the parameters are automatically filled by the loading function when it returns.
- FFTResolution: a BASS flag to specify the FFT resolution to use. Higher FFT size gives more precise frequency resolution but less time slices.
- LoadMode: AR_LOAD_1D or AR_LOAD_2D or AR_SEARCH_MODE_2D_PEAK_1D as described above.
- SampleRate: resample the input sound to a specified frequency. For music 44100 should be used, for speech 11025 is more suitable. Note that the FFT always contains the frequency powers from 0 to 'SampleRate' / 2. Resampling the input sound is slower then specifying 'SampleRate' to the same value as the audio file's own sample rate.
- MaxSilence: used with AR_LOAD_2D_PEAK_1D, specifies the lower limit to consider is silence.
- Normalize: Normalize the data to 1.0.
- ChannelMapping: if not nil specifies which channels to load. The format is the same as for BASS_Mix's BASS_Split_StreamCreate(), please read the BASS_Mix's BASS_Split_StreamCreate() documentation for details.
- StatusCallback: A callback to receive the progress of the loading process.
...fill WaveStream here with data...
SearchInObject.FileName := PChar(WaveStream.Memory);
SearchInObject.FileInMemory := True;
SearchInObject.FileInMemorySize := WaveStream.Size;
Description of the searching parameters
AR_SEARCH_MODE_2D_PEAK_1D only uses the following search parameters: AllowedFrequencyVariation, AllowedFrameVariation, AllowedFrameDefference and Transparent.
- Mode: AR_SEARCH_MODE_1D or AR_SEARCH_MODE_2D or AR_SEARCH_MODE_2D_PEAK_1D.
- AllowedAmplitudeDifference: Specifies how much Abs(+ -) can the frequency power value differ in the time slice. This parameter is used for every frequency sample. In 1D mode this specifies the summed time slice's power tolerance. The mesurement is 0 to 1.0 (in most cases the max. value of a frequencie's power will be lower then 0.5).
- AllowedFrameDefference: Specifies how many time slices (frames) can totally differ in the two sound. The value is a %, so 0 means no difference, 1 means total difference (no use set it to 1 as match will be reported always).
- AllowedFrameValuesDefference: Specifies how many % of a frame's frequency values can differ in a frame. Only used for 2D mode.
- AllowedFrequencyVariation: Specifies in Hz how much can a frequency value differ. For example if searching for a 1000Hz power value, and 'AllowedFrequencyVariation' is set to 100Hz, then a 900Hz peak will also be considered as a match. Setting this to other than 0 slows down processing, the higher the value the more slower processing. Should be used with higher FFT size loading mode. Only used for 2D mode.
- AllowedFrameVariation: Specifies how many time slices (frames) can totally differ in the two sounds but can be found - and + the count specified (temporal variation). For music set it to max. 2, music files can not differ more then 2 frames, and set AllowedFrameDefference to 10% or more (14-15% should be ok). For real life sounds set AllowedFrameVariation to higher values as the real life sounds have a lot of variation in time. Only used by AR_SEARCH_MODE_PEAK_1D.
- Transparent: Usefull with filtering. Filter sets frequency power values to 0 that are lower then the filtering value. Using this flag, the search will treat these 0 values as ok. Usefull with sound files where there is some background noise in the 'search in' file (if the search-for audio is layered on a background sound in the search-in audio). Also usefull with 'AllowedFrequencyVariation'. This flag increases processing time. Only used for 2D and peak search mode.
- MultipleMatches: If set to 'False' search will abort on first match.
- MatchCallback: A callback to receive a match mid processing.
- StatusCallback: A callback to receive the progress of the search process. Always set to nil for BASS_AudioRecognition_StartScanRealTime().
- User: Callbacks will receive this value.
OpenCL accelerated search right now is supported with AR_SEARCH_MODE_1D and AR_SEARCH_MODE_2D_PEAK_1D. Note that status callback is not used and match callback is called after the search completed when using OpenCL.
There is option to load and to save the created ARObjects if you want to re-use them for example if you need to search a given ARObject database multiple times.
Use BASS_AudioRecognition_SaveObject() and specify 'Mode' flags what data to save. If you want to save all data 'OR' together both flags: "AR_SAVE_FFT OR AR_SAVE_FFT_STATS".
When searching for variable sounds, not concrete music clips, there is option to "stretch" the ARObject to search with a little variable in length (longer and shorter), use BASS_AudioRecognition_Stretch() with, for example, 'NewSize = ARObject.DataLength + 1'.
The search can be more successful if in a loop the search is performed with a range, ARObject.DataLength + 10 to ARObject.DataLength - 10. Determine the range needed for the given task.
AR_SEARCH_MODE_2D_PEAK_1D is probably the best choice for "variable sounds" (sounds that are not completely the same as a music clip) and has a big advantage over the other modes as it's not sensitive to volume level.
Multiple real-time scanners can be applyed to virtually any BASS channel handle (like recording channels (microphone) or internet audio stream). The function applyes a BASS channel DSP to the channel handle (with priority '-100') and scans it periodically.
- Always use the BASS_SAMPLE_FLOAT flag when creating the BASS channel handle. The library expects 32 bit float samples.
- No normalization is possible as the incoming audio data is processed in chunks.
- Do not use the AR_SCANNER_DOGETDATA flag for real-time streams like a recording channel or an internet stream (a real time internet stream like radio) if it is played with BASS_ChannelPlay. Put 0 for the Flags. AR_SCANNER_DOGETDATA flag results that the library will BASS_ChannelGetData() as much data as is available which will steal audio data from the BASS_ChannelPlay() and audio heard will be skipping/fast.
- TARCreateObjectParameters must be exactly the same for creating the search-for and for the real-time scanner. You should just use the same object for both functions.
- Important: Do not use a status callback when scanning, and the match callback is called from a thread so do not access UI stuff (but send a message instead for example) from the thread.
- The matches returned doesn't really have a position value - as for example for a recording channel there is no stream length, so no real position. For real-time channels consider the match position as when you received the match (Delphi 'Now' function).
- Call BASS_AudioRecognition_StopScanRealTime() to terminate the scanner thread (and remove the DSP). Be careful, the handle needs to be valid, else a crash is likely.
- You probably will want to use AR_SEARCH_MODE_2D_PEAK_1D for microphone input as it is not sensitive to volume level.
- BASS_AudioRecognition_CreateObject = function (var ARObject: TARObject; Parameters: TARCreateObjectParameters): Integer;
- BASS_AudioRecognition_FreeObject = function (var ARObject: TARObject): Integer;
- BASS_AudioRecognition_Compare = function (ARObjectSearchIn: TARObject; ARObjectSearchFor: TARObject; Parameters: TARProcessParameters; var CompareResult: TARResult): Integer;
- BASS_AudioRecognition_StartScanRealTime = function (ARObjectSearchFor: TARObject; SearchInChannel: Cardinal; SearchInParameters: TARCreateObjectParameters; ScanParameters: TARProcessParameters; Flags: Cardinal; var ScannerHandle: Pointer): Integer;
- BASS_AudioRecognition_StopScanRealTime = function (ScannerHandle: Pointer): Integer;
- BASS_AudioRecognition_FreeResult = function (var CompareResult: TARResult): Bool;
- BASS_AudioRecognition_Filter = function (SourceARObject: TARObject; var DestinationARObject: TARObject; MinValue: Double): Integer;
- BASS_AudioRecognition_SetRegistration = function (Name: PWideChar; Email: PWideChar; RegistrationNumber: Integer; VersionCode: Integer): Integer;
- BASS_AudioRecognition_SaveObject = function (ARObject: TARObject; FileName: PChar; Mode: Integer): Integer;
- BASS_AudioRecognition_LoadObject = function (var ARObject: TARObject; FileName: PChar): Integer;
- BASS_AudioRecognition_Normalize = function (var ARObject: TARObject): Integer;
- BASS_AudioRecognition_Stretch = function (Source: TARObject; var Destination: TARObject; NewSize: UInt64): Integer;
- BASS_AudioRecognition_OpenCLGetPlatform = function (PlaformIndex: Cardinal): PChar;
- BASS_AudioRecognition_OpenCLGetPlatformDevice = function (PlaformIndex: Cardinal; DeviceIndex: Cardinal): PChar;
- BASS_AudioRecognition_OpenCLInit = function (PlaformIndex: Cardinal; DeviceIndex: Cardinal; CompareType: Integer): TAROCLDevice;
- BASS_AudioRecognition_OpenCLFree = function (var OCLDevice: TAROCLDevice): Bool;
- BASS_AudioRecognition_OpenCLSetSearchIn = function (OCLDevice: TAROCLDevice; ARObjectSearchIn: TARObject): Integer;
- BASS_AudioRecognition_OpenCLSetSearchFor = function (OCLDevice: TAROCLDevice; ARObjectSearchFor: TARObject): Integer;
- BASS_AudioRecognition_OpenCLPerformSearch = function (OCLDevice: TAROCLDevice; Parameters: TARProcessParameters; var CompareResult: TARResult; User: Pointer): Integer;
On Windows the Win64 version is around 20% faster then the Win32 version when searching.
Loading WMA files is slower than any other supported audio file format.
Note on OpenCL info strings
The pointers to wide chars returned by 'BASS_AudioRecognition_OpenCLGetPlatform' and 'BASS_AudioRecognition_OpenCLGetPlatformDevice' are only valid until the according function is called again.
So make a copy of the strings if they are needed afterwards.