Grow your CSS skills. Land your dream job.

Last updated on:

Convert Accented Characters

For instance, if you want to use a string as part of a URL but need to make it safe for that kind of use.

function replace_accents($str) {
   $str = htmlentities($str, ENT_COMPAT, "UTF-8");
   $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/','$1',$str);
   return html_entity_decode($str);
}

Comments

  1. Or you just use the functions that are already implemented in PHP ><

    
    $unsafe = 'Hello Daniël';
    $safe = urlencode($unsafe);
    $transfered = urldecode($_GET['data']);
    
    • the $unsafe value pulled through urlencode gives you this: “Hello%20Dani%C3%ABl”
      That are all valid characters inside a URL. It even works on so called multi byte characters, which are used in Korean, Japanese, Mandarin and similar languages. I found that out the hard way when some survey system started chopping up comments.

      The JavaScript equivalent is encodeURI / encodeURIComponent and to reverse that you use decodeURI / decodeURIComponent.

      
      var response, safe, unsafe = 'Hello Daniël';
      safe = encodeURIComponent(unsafe);
      response = decodeURIComponent(ajaxResponse);
      
  2. The go the other way around, when facing encoded characters, you might want to get the character that relates the most to an accented variant. E.g.: You want to store a song with DJ Tiësto in the title, but want it to turn out like DJ Tiesto when creating a filename in your script.

    
    function transform($title){
    	// Support ASCII list characters in encoded format
    	while(strpos($title,'&#')!==false){
    		$pointer = strpos($title,'&#');
    		$plength = 5;
    		$first = substr($title,0,$pointer);
    		$last = substr($title,$pointer+$plength);
    		$pnr = substr($title,$pointer+2,3);
    		$backstring = '';
    		for($i=0;$i<3;$i++){
    			if(!is_numeric($pnr[$i])){
    				$backstring.=$pnr[$i];
    				unset($pnr[$i]);				
    			}
    		}
    		$last = $backstring.$last;
    		$title = $first.htmlentities(chr($pnr)).$last;
    	}
    	
    	$title = str_replace(
    		array("ì","í","î","ï","ï","î","í","ì","ù"
    ,"ú","û","ü","ü","û","ú","ù","ç","ç","ì",
    "í","î","ï","ë","ê","é","è","è",
    "é","ê","ë","ë","à","á","â","ã","ä","å","à","á","â",
    "ã","ä","å",".","!",",",":",";","'","\"","ù","ú","û",
    "ü","ý","þ","ÿ","ù","ú","û","ü","ý",
    "þ","ÿ"),
    		array("i","i","i","i","i","i","i","i","u","u","u","u","u","u","u","u","c","c",
    "i","i","i","i","e","e","e","e","e","e","e","e","e","a","a","a","a","a","a","a",
    "a","a","a","a","a","","","","","","","","u","u","u","u","u","u","u","u","u","u",
    "u","u","u","u"),
    		$title
    	);
    	$title = str_replace(
    		array('\\','/',':','"','*','?','','|','+','%'),
    		'',
    		$title
    	);
    	while(strpos($title,'  ')!==false){
    		$title=str_replace('  ',' ',$title);
    	}
    	return $title;
    }
    
  3. the original snippet doesn’t cut it. Unicodes that aren’t covered by htmlentities() are ignored altogether. If you (safely) want to transform an UTF-8 string to alph-anumeric for the use in URLs, give urlify() a shot.

  4. Julien
    Permalink to comment#

    I have done this :

    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace(‘/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/’,’$1′,$str);
    return html_entity_decode($str);
    }

  5. Julien
    Permalink to comment#


    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/','$1',$str);
    return html_entity_decode($str);
    }

Leave a Comment

Posting Code

Markdown is supported in the comment area, so you can write inline code in backticks like `this` or multiline blocks of code in in triple backtick fences like ```this```. You don't need to escape code in backticks, Markdown does that for you.

Sadly, it's kind of broken. WordPress only accepts a subset of HTML in comments, which makes sense, because certainly some HTML can't be allowed, like <script> tags. But this stripping happens before the comment is processed by Markdown (via Jetpack). It seems to me that would be reversed, because after Markdown processes code in backticks, it's escaped, thus safe. If you think you can fix this issue, get in touch!

If you need to make sure the code (typically HTML) you post absolutely posts correctly, escape it and put it within <pre><code> tags.

Current ye@r *

*May or may not contain any actual "CSS" or "Tricks".