About Regular Expressions in Google Analytics



Hey, a small failure, this afternoon, at least two hours of regular expression, because you want to get a custom report in ga, to exclude some keywords ... ...

Really nima is the eggs of the sad ah.

Because ga in the use of regular, have their own provisions, for me this program is white, it is a very good to simplify it

Han did not say, and quickly get people to sort out the first, after the spare.

Regular expression commonly used to the symbol:

This symbol is a universal card, it can represent any character;

+: Repeat the previous character 1 or more times;

*: Repeat the previous character 0 times or more times;

\ D: represents any number (Digit);

\ S: represents any space element (Space, Tag, etc);

\ W: represents any alphanumeric character set with an underscore (Word);

$: Dollar sign matches the end of the character;

-: The hyphen represents the creation of a range, such as az that matches any character of a to z;

|: The vertical bar represents or the meaning, so a | b means that a or b can match;

\: The backslash means that the symbol returns to the literal meaning. For example, the function of "?" Character is to match the end of the sentence, "\?" Means that it can only match the question mark;

(): The parentheses represent a set of symbols and elements. For example, (. +) Will match any character. This allows you to use an adapter in the entire group. For example, if you match "

Web ", enter" web "can, but also want to match" webweb ", then you should use" (web) + ".

[]: Square brackets represent character sets, and are often used to represent ranges. For example, [at] will match any lowercase letters between a and t. You can also put a lot of characters in square brackets, such as [a- ZA-Z0-9 \ s - # "=], will match any single letter, number, space, hyphen, number sign, quotation mark and equal sign. (This formula can also be optimized for [\ w \ s- "=], But the use of the previous expression is expressed in the form of a range.

{}: Curly braces are used to define repetitions, so (web) {2} means two times to match what (webweb). Similarly, (web) {2,7} will repeat between 2 and 7 (Including 3 replicates, 4 replicates, 5 replicates, 6 replicates)

^: The insert symbol has two effects, it can match at the beginning of the character, or in the character set to play a negative role. So ^ [az] $ means only to match the beginning and end are lowercase letters string, [^ az] that does not contain lowercase alphabetic characters string; (aaa not, aAa not, AAA can)

?: The question mark has many uses, such as "[1314] web" will match "1314 web" or "web" but can not match "13 web" the same "13?" Can match "1" or "13".

Specific case:

We know that the long tail keyword is more specific because of the need, so there is more conversion in e-commerce, because more in line with the expectations of searchers, then if you want to distinguish between short keywords and long tail keywords, such as a key Word and n keywords, how should the use of regular expressions?

In the Google Analytics Keyword report, select Advanced Options and select the option "Include - Keyword - Match Regular Expressions":

^ \ S * [^ \ s] + \ s * $ - a keyword

^ \ S * [^ \ s] + (\ s + [^ \ s] +) {1} \ s * $ - two keywords

^ \ S * [^ \ s] + (\ s + [^ \ s] +) {2} \ s * $ - three keywords

^ \ S * [^ \ s] + (\ s + [^ \ s] +) {3} \ s * $ - four keywords

Meaning that the space (\ s) is repeated 0 or more (*), followed by a non-empty content ([^ \ s]) 1 or more, followed by a space of 0 or more; if you need to add more times the key Word, use ({number}) on it.

Symbol Description:

1) \ backslash (Backslash)

"\ Backslash" is the most commonly used symbol in regular expressions, so it is necessary to first introduce, Google Analytics for "\ backslash" is defined as: the character function failure

Is to make all the special characters have become special characters, remove the meaning of special characters. In real life, similar to the star of makeup water, back to the prototype.

The backslash can be used on any special character, and it is most often used with "." Points.

Example: szwebanalytics \ .com, 127 \ .0 \ .0 \ .1

2) ^ caret (Anchors)

Google Analytics defines the "^ caret" as follows: the beginning of the matching field.

That is, if there is a character in front of the ^ character, then the string does not meet the requirements of this regular expression.

Suppose there are two pages:

Http://www.szwebanalytics.com/web/analytics/ and http: // www. Szwebanalytics.com/analytics/.

Often, according to Google Analytics, it is / web / analytics / and / analytics /. (Because GA already knows the domain name, www. Szwebanalytics.com)

If I only want / analytics /, if the regular expression is entered / analytics /, then these characters will be included, including / web / analytics /, so the following need to be handled:

The regular expression should be: ^ / analytics /

Then it will be excluded / web / analytics /, leaving only / analytics /.

3). Dot (dot)

Google Analytics for "." Is defined as: match any one character

Suppose the regular expression is .at, then hat, cat, sat and other qualified four characters can match the string.

But if at the words, it does not meet, because at must have a character, can not be empty.

If the regular expression is web.com

Then webbcom, web4com, webdcom can match.

If the match means that the URL, then the regular expression should be homepage \ .com. ("\ Backslash" let "." Can not play to match any one character function, but become literal meaning)

4) Question Mark

Google Analytics for the "question mark" is defined as: match 0 or 1 front of the characters (expression)

For example, want to use a regular expression can match either labor, or can match labor, then the regular expression should be: labou? R

5) $ currency sign (Dollar sign)

Google Analytics defines "$ currency symbols" as follows: Match the end of the string.

1) Suppose that there are pages on the site that are ending in htm and html, and if you have htm, then the regular expression can be: \ .ttm $

$ Symbol means to tell GA, assuming that the h characters in the htm there are characters behind, then the string does not meet the requirements of regular expressions.

2) If you need to filter the IP address, assume that there is a regular expression ip address: 12 \ .34 \ .56 \ .78

12.34.56.78 can match, but you need to confirm the exclusion of the 12.34.56.789, so the regular expression should be changed to 12 \ .34 \ .56 \ .78 $, but also to exclude 512.34.56.78, then it should Add the caret ^ in the front, ie ^ 12 \ .34 \ .56 \ .78 $

3) There is another example of the use of brand words:

Assuming you want to know how many variations of the brand name start, assuming that the brand word is Apple, then the regular expression will be ^ Apple, then keywords such as Apple iPhone, Apple iPod, Apple Corporation can match;

So the regular expression at the end of the Apple brand name? Ie Apple $, keywords such as Apple, What is Apple, i love Apple can match.

6) | or (or)

Google Analytics for "| or" is defined as: or.

I believe that this character is the simplest of the regular expression, assuming that the keyword report you want to find Apple devices iPhone, iPad, iPod data, then the regular expression can be: iPhone | iPad | iPod

Regular Expression Features - Maximize Match:

Suppose you wrote such a regular expression: / mypage /, intended to match the website called / mypage / page.

Yes, the regular expression will match / mypage /, but it will also match / mypage / thirdpage-and-something-else, /secondpage/mypage.html, mypage.htm, mypage.asp and so on. Since the regular expression is greedy, it will do the most likely to match the string.

So how do you minimize the match?

Case 1: There are many ways to solve this problem, that is, narrow the encirclement, more clear:

1) change the expression to ^ / mypage /, this will only match the beginning by / mypage /, you can exclude secondpage / mypage.html;

2) tell the regular expression to what the end, assuming only need to end htm, then it should be as follows: / mypage /.* \ .htm $

The backslash is to let the point return to the original meaning. * = Match any string, the dollar sign is indicated to htm at the end, so you can exclude the html end of the string. After the combination is: ^ / mypage \ .htm $

Regular expression example:

Tracking for SEO projects is an important and routine work of the online marketing team . A very important part of the tracking of the SEO project is the situation of keywords, such as visitors from visitors, average dwell time, bounce rate, order and revenue, etc. The Because there are too many long tail keywords, so each track is not realistic, so the need for keyword summary.

Assuming the long tail keyword is Apple iPhone 4s, Apple iPhone apps, Apple iPhone themes, then the theme keyword is the Apple iPhone, then the Apple iPhone may be as a subject to track.

How to track

Because the Google Analytics original profile in the keyword report, will only list the long tail of the keyword, if you want to summary, then only in the search box to enter the Apple iPhone data, such a trace need to spend a lot of manpower, In fact, you can use the GA filter's search and replace function.

7) () Parentheses parenthesis

In the mathematical formula, 3 * (1 + 2) is equivalent to 3 * 1 plus 3 * 2,

Regular expression, "()" parentheses "is also similar to the role of characters or strings that means the meaning.

Suppose the regular expression is iPhone (4 | 4s), then iPhone4 and iPhone4s can match;

The regular expression is Ste (ph | v) en, both Stephen and Steven can match;

8) [] square brackets square brackets

Google Analytics defines "[] brackets" to match a single item in the list.

But what exactly is the list? What is a single item?

That is, when using square brackets, such as [aeiou], in square brackets each character is a single item, a, e, i, o, u are single, [aeiou] is a list. (It is important to note that you do not need to use any symbol to split these items)

example:

Assuming that the book is selling books, this book is a large series, divided into several parts, is to Book1, Book2, BookN ... to distinguish, assuming only want to see a few parts of the situation, such as Book2, Book3, Book4, Then the regular expression can be as follows: Book [234]

9) - Dash Dash

Google Analytics for "- dash" is defined as: create a range in the list.

If you want to express the meaning of a, b, c, d, e, f one of the words, the use of square brackets regular expression is: [abcdef], if the use of dashes, then it is [af]. Only need to list the first two characters can be, the middle plus "- dash" can be.

Sometimes, we want the results also need to contain dashes - how do?

Such as apple-mac, apple4mac, then the regular expression should be apple [-4] mac;

10 plus + plus plus

Google Analytics for "+ plus" is defined as: match one or more previous characters or strings.

Case 1: If you enter web + analytics in the filter.

Webanalytics, webbanalytics, webbbbanalytics are eligible.

Case 2: If you use square brackets, for example [abc] +

Then a, ab, cab, c, b, bbbb are eligible. Looks a bit strange, but it is in line with the definition, because it does not need to follow the specific order of the characters.

Case 3: If you want to count the web site, b and s letters between the empty grid of keywords need statistics, then the regular expression should be web [] + site.

11) * star star

Google Analytics for "* asterisk" is defined as: * matches 0 or more previous characters or strings.

Assuming a company has several phone numbers, NUM00034, I would like to know how many people are searching for 34, then Google Analytics can use regular expressions: NUM0 * 34, then NUM034, NUM0034, NUM00000034 are eligible, NUM34 is no exception The

If you want to count the website, or web site, or b and s empty a lot of keywords, then the regular expression is: web [] * site

Then the website, web site and the like can match.

12) {} brace brace

"{} Curly braces" is to repeat the last message (which can be a single character, or a string), and usually use two numbers to represent the range, such as {6,8}, to repeat the last Information at least 6 times, but not more than 8 times.

Such as google, assuming that there are not many o, then the regular expression can be: go {2,3} gle

Meaning to repeat o at least 2 times, no more than 3 times.

Has the difference between the preceding characters

Google Analytics for "* asterisk" is defined as: * matches 0 or more previous characters or strings.

Google Analytics for "+ plus" is defined as: match one or more previous characters or strings.

Google Analytics defines a "question mark" that matches 0 or 1 preceding characters (expressions).

In contrast, the "question mark" matches the front character function is weak, followed by "+ plus", the more powerful is the "* asterisk."

Regular Expressions Advanced Case:

1) Wikipedia case

The following is a case of Wikipedia about regular expressions:

".at" can match any string of three characters like hat, cat, bat, because the dot (.) Can match any character.

"[Hc] at" can match hat, cat, because the square brackets to create a list, any h or c can be, so this regular expression can not match bat.

"[^ B] at" can match the ".at" to match any of the characters ending in at, except bat, because the caret ^ has a useful thing, that is, at the beginning of the square brackets, the meaning of".

"^ [Hc] at" matches only the string that starts with a hat or cat, because the caret ^ means the meaning of the beginning.

"[Hc] at $" can only match a string that ends with a hat or cat because the dollar sign $ is the meaning of the end.

2) Robbin Steif (LunaMetrics CEO) case

((Great) * grand)? ((Fa | mo) ther)

For decomposition, it is easier to understand; parentheses create groups, question marks are used to split.

With the decomposition of the eyes to see this regular expression, then is:

(Regular expression)? (Regular expression)

The question mark means 0 or 1 front of the expression, so the first half of the presence or absence is not important, but still need to understand the next:

(Great) * grand)

The asterisk means matching 0,1, or more than one front of the regular expression, so long as there is a grand, there is no great relationship.

Such as grand, great grand, great great grand, a bunch of great grand can match.

((Fa | mo) ther) means "or" means, so (father) or (mother) is no problem.

In this expression: ((great) * grand)? ((Fa | mo) ther), in fact, some brackets are not necessary, just to make the context look clearer, but some brackets can not be removed, for example ( Fa | mo) ther), if removed, it becomes fa | mother, meaning to taste, become fa or mother can be.

Silently say, in this article knock back when the car keys, and then browse the content again, which will have a more clear understanding of my mind, more and more, or in the future practice ah.

Comments

Popular Posts