Imagine the example:
If I wanted to bold all occurances of geekzilla, I'd usually do this:
unfortunately, when dealing with HTML rather than just text, this will screw my tag and produce the following
We did a lot of googling and found loads of people discussing ways to ignore the tags. Suggetions rainged from sax parsers to character by character loops (nasty).
Armed with an excellent regex for matching an entire HTML tag we came up with the following solution
Our Solution
Use a custom Regex match evaluator to ignore any tags. This works well and is very fast. There may be a slicker way to do this, I hope someone is inspired enough to figure it out and post a comment
private string replaceString = "";
public string Parse(string content)
{
const string regTagName = @"<.[^>]*>";
Regex reg = new Regex(@"(" + regTagName + ")|(geekzilla)",
RegexOptions.IgnoreCase | RegexOptions.Multiline);
// this is what I'd like to replace the match with
replaceString = "$1";
// do the replace
content = reg.Replace(content, new MatchEvaluator(MatchEval));
return content;
}
protected string MatchEval(Match match)
{
if (match.Groups[1].Success)
{
// the tag
return match.ToString();
}
if (match.Groups[2].Success)
{
// the text we're interested in
return Regex.Replace(match.ToString(), "(.+)", replaceString);
}
// everything else
return match.ToString();
}
No comments:
Post a Comment