To achieve different levels of relevance, the search phrase is matched against the searchable fields using various matching queries.

Queries with a closer match rank higher than a partial match, enabling matched results to be ordered from the best-match to the worst.

Example: Searching for “Jamie’s Shoes” will return the following list of entity names ordered from the best-match to the worst:

  1. Jamie’s Shoes
  2. Jamie’s Shoes Shop
  3. Jamie Shoe Shop
  4. Jamie Has a Shoe
  5. Shoe of Jamie
  6. Shoes Shop (Address: 10 Jamie Smith Street, Townsville, 4565)

Queries look for a different degree of matching by means of using different transformation rules applied to the search string and to the values being searched. There are four main transformers we apply:

TransformerDescriptionSearch StringTransformed String
Original valueUppercase and remove trailing or leading double spaces.Jamie’s ShoesJAMIE’S SHOES
Almost identical valueNoise removed (click here for more details) and then all spaces removed.Jamie’s ShoesJAMYSHOE
Original words/tokensOriginal value broken into words. Jamie’s Shoes “JAMIE’S” and “SHOES”
Almost identical words/tokensAlmost identical value before removal of all spaces broken into words.Jamie’s Shoes“JAMY” and “SHOE”

The relevance score can be assigned based on the field being searched, for example, any matches on previous names can be rated lower than matches on current names.

The following detailed examples describe how various queries are applied to achieve relevance ranking when searching on:

  1. Business entity names, e.g. “Blue Sky Proprietary Limited”
  2. Individual names, e.g. John James Smith
  3. Addresses, e.g. “1/45 Norton Rd., Section 2, Rosedale, Sydney, Queensland, PW6 Y67, Australia
  4. Identifiers/registration numbers, e.g. 123-4556-789 or COY-7846764576

Searching Business Entity Names

The same search logic is usually applied to current and previous names. Previous names are given a lower score, the default configuration is 30% of the normal score.

Verne is looking for partial, non-case sensitive matches of the search string against the entity name value with the following ranking of the matches:

  • High relevance match
    • Exact match on the name without a suffix. E.g. searching for ” Blue  Sky Proprietary Limited ” will match “Blue Sky Ltd.” and “Blue Sky Limited” with high relevance
  • Medium relevance, partial match
    • Exact match of the almost identical value. E.g. searching for ” Blue  Sky Proprietary Limited ” will match “B l u e S k y Pty”, “Blue (Sky) Limited”, “BlueSky Limited”
    • Starts with a match on the name without a suffix. E.g. searching for ” Blue  Sky Proprietary Limited ” will match “Blue Skype”, “Blue Sky Banking Ltd.”
  • Low relevance, partial match
    • All words from the search string are found in any order as “starts with” among the different words of the entity name. E.g. searching for ” Blue  Sky Proprietary Limited ” will match “Skye’s Blueberry Farm Limited”,
    • All words from the search string are found in any order as “contains” among the different words of the entity name. E.g. searching for ” Blue  Sky Proprietary Limited ” will match “Husky Blues Singing Shows Ltd.”, “Bluesky Cafe”

Searching Individual Names

The standard individual name search is based on the following assumptions:

  • If there is one word in the search query field, it is most likely they are looking for matches on the last name and it is less probable that they are interested in the first or middle name matches.
  • If two words are entered, they are likely to be first and last name in any order.
  • If three or more words are entered, then they are likely to be the first, middle and last name in any order. And, less likely, parts of the full name, e.g. initials and the last name

The same search logic is usually applied to current and previous names, however previous names are given lower score, the default setting is 30% of the normal score

Verne is looking for partial non-case sensitive matches of the search string against the individual name value with the following ranking of the matches:

  • High relevance match
    • Exact match on the full name including the middle name (with surname being either in the beginning or at the end of the name) E.g. searching for “O’Brian John Robert” and “John Robert O’Brian” will match the people whose names are spelled exactly this way. Those whose last name is “Obrian” or “O Brian” and whose name is “John Robert” or “John-Robert” or “Robert John” will NOT match with high relevance.
    • Exact match on the full name without the middle name (with surname being either in the beginning or at the end of the name) E.g. searching for “O’Brian Robert” and “Robert O’Brian” will match the people whose names are spelled exactly this way. Those whose last name is “Obrian” or “O Brian”  and whose middle name is “Robert” or whose first name is “Roberto” will NOT match with high relevance.
    • Exact match on the last name. E.g. searching for “O’Brian” will match any person whose last name is exactly “O’Brian”, those whose last name is “Obrian” or “O Brian” will NOT match with high relevance.
  • Medium relevance, partial match
    • All of the above queries will run in the same way but with noise removed (almost identical values). E.g. searching for “O’Brian John Robert” will now match those whose last name is “Obrian” or “O Brian”  and who’s name is “John Robert” or “John-Robert” or “Robert John” but NOT “Roberto” or “Johnson”, the latter will fall into the low relevance category
  • Low relevance, partial match
    • All words from the search string are found in any order as “starts with” among the different parts of the full name. E.g. search on “SMITH Ann” or “SMITH A R” will match “Anne Rose Smithson”.
    • All words from the search string are found in any order as “contains” among the entire full name as one string. E.g. search on “SMITH Mary Ann” will match “Maryann Smithson” or “Mary-Ann Handsmith”.

Searching Addresses

The standard address search is based on the following assumptions:

  • If there is one word in the search query field, it is likely they are looking for matches on Country, Region, City, Suburb or Postcode.
  • If there are multiple words they are either searching for an exact address which they copied from the screen, or they remember some keywords from the address and entered their combination.

The same search logic is applied to current and previous addresses. Previous addresses are given a lower score, the default setting is 30% of the normal score.

Verne is looking for partial non-case sensitive matches of the search string against the address value with the following ranking of the matches:

  • High relevance match
    • Exact match on the full address as it is displayed on the screen. E.g. the address is “1/45 Norton Rd., Section 2, Rosedale, Sydney, Queensland, PW6 Y67, Australia” and the user enters “1/45 Norton Rd., Section 2, Rosedale, Sydney, Queensland, PW6 Y67, Australia”
    • Exact match on the country name (not country code)
    • Exact match on the region name (not region code)
    • Exact match on the city
    • Exact match on the postcode
  • Medium relevance, partial match
    • All words from the search string are found in any order as “exact match” among the different parts of the address value. Allows for a match if the user only knows part of the address. E.g. the address is “1/45 Norton Rd., Section 2, Rosedale, Sydney, Queensland, PW6 Y67, Australia” and the user enters “1/45 Norton Rd Rosedale” or “PW6 45” – they all will be a match for this address
  • Low relevance, partial match
    • All words from the search string without special characters (almost identical) are found in any order as “starts with” among the different parts of the address value without special characters. Allows for a “starts with” match on any part of the address. E.g. the address is “1/45 Norton Rd., Section 2, Rosedale, Sydney, Queensland, PW6 Y67, Australia” and the user enters “1-45 Norton Rd Rose-dale” or “Sydne” or “Austrail”  or “PW6” – they all will be a match for this address

Searching Identifiers

An ID or a number can be collected (or generated by the system) for an entity or an individual person so it can be uniquely identified on the register. 

For example:

  • Entity numbers generated in the system for new registrations (COY00001234)
  • Old entity numbers migrated from legacy systems (CM1983-67/a)
  • Proof of identity document numbers collected for individual persons (e.g. passport number 89 456123)
  • Tax numbers (111-222-333)
  • Social security numbers  (348756438756)
  • Registration number of an overseas entity (unknown format)

Unique identifiers consist of digitsletters and special symbols arranged in a pattern or a specific format. Long digital ID’s are visually presented as broken into several parts, separated by spaces or special symbols, e.g. value that is stored as  011222333 can be displayed as 011-222-333. This creates some challenges when it comes to searching by unique identifiers, as users might not remember the exact format or they will want to search using those separator symbols that are not actually stored.

The standard identifier search is based on the following assumptions: 

  1. We do not want the person searching to be forced to provide the full value of the searched ID exactly how it is stored in the database
  2. When we relax the search conditions to allow for partial matches, assuming people searching do not use the right format, we should still be able to return only the relevant results
  3. We do not allow searching by ranges or series of ID’s when the search only enters a small portion of the ID value in the search string

Verne is looking for partial, non-case sensitive matches of the search string against the ID value with the following ranking of the matches:

  • High relevance match
    • Exact match on the entered ID value (exactly how it is stored in the database). E.g. searching on “BN1983-67” will NOT match on “BN198367” or “BN1983/67” or “BN67-1983” with high relevance
  • Low relevance, partial match
    • The search string is broken into tokens and special symbols removed, all tokens must be found in any order. E.g. searching on “BN1983-67” will now match on “BN198367” or “BN1983/67” or “BN67-1983”.
  • No match
    • Each ID field that is being searched has a setting that determines how many digits the user must enter to start matching on this field. E.g. searching on “19” will NOT match on “BN1983-67” or any other value in that same field, as pretty much every entity registered in 20th century in this example will contain the digits 19. 
0
0

Jump to Section