Convert Accented Characters

For instance, if you want to use a string as part of a URL but need to make it safe for that kind of use.

function replace_accents($str) {
   $str = htmlentities($str, ENT_COMPAT, "UTF-8");
   $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde);/','$1',$str);
   return html_entity_decode($str);
}

Comments

  1. User Avatar
    Jan-Marten de Boer
    Permalink to comment#

    Or you just use the functions that are already implemented in PHP ><

    
    $unsafe = 'Hello Daniël';
    $safe = urlencode($unsafe);
    $transfered = urldecode($_GET['data']);
    
    • User Avatar
      Jan-Marten de Boer
      Permalink to comment#

      the $unsafe value pulled through urlencode gives you this: “Hello%20Dani%C3%ABl”
      That are all valid characters inside a URL. It even works on so called multi byte characters, which are used in Korean, Japanese, Mandarin and similar languages. I found that out the hard way when some survey system started chopping up comments.

      The JavaScript equivalent is encodeURI / encodeURIComponent and to reverse that you use decodeURI / decodeURIComponent.

      
      var response, safe, unsafe = 'Hello Daniël';
      safe = encodeURIComponent(unsafe);
      response = decodeURIComponent(ajaxResponse);
      
    • User Avatar
      parmod
      Permalink to comment#

      function replace_bishnoi($str) {
      $str = htmlentities($str, ENT_COMPAT, “UTF-8”);
      $str = preg_replace(‘/&([a-zA-Z])(uml|acute|grave|circ|tilde);/’,’$1′,$str);
      return html_entity_decode($str);
      }

  2. User Avatar
    Jan-Marten de Boer
    Permalink to comment#

    The go the other way around, when facing encoded characters, you might want to get the character that relates the most to an accented variant. E.g.: You want to store a song with DJ Tiësto in the title, but want it to turn out like DJ Tiesto when creating a filename in your script.

    
    function transform($title){
    	// Support ASCII list characters in encoded format
    	while(strpos($title,'&#')!==false){
    		$pointer = strpos($title,'&#');
    		$plength = 5;
    		$first = substr($title,0,$pointer);
    		$last = substr($title,$pointer+$plength);
    		$pnr = substr($title,$pointer+2,3);
    		$backstring = '';
    		for($i=0;$i<3;$i++){
    			if(!is_numeric($pnr[$i])){
    				$backstring.=$pnr[$i];
    				unset($pnr[$i]);				
    			}
    		}
    		$last = $backstring.$last;
    		$title = $first.htmlentities(chr($pnr)).$last;
    	}
    	
    	$title = str_replace(
    		array("ì","í","î","ï","ï","î","í","ì","ù"
    ,"ú","û","ü","ü","û","ú","ù","ç","ç","ì",
    "í","î","ï","ë","ê","é","è","è",
    "é","ê","ë","ë","à","á","â","ã","ä","å","à","á","â",
    "ã","ä","å",".","!",",",":",";","'","\"","ù","ú","û",
    "ü","ý","þ","ÿ","ù","ú","û","ü","ý",
    "þ","ÿ"),
    		array("i","i","i","i","i","i","i","i","u","u","u","u","u","u","u","u","c","c",
    "i","i","i","i","e","e","e","e","e","e","e","e","e","a","a","a","a","a","a","a",
    "a","a","a","a","a","","","","","","","","u","u","u","u","u","u","u","u","u","u",
    "u","u","u","u"),
    		$title
    	);
    	$title = str_replace(
    		array('\\','/',':','"','*','?','','|','+','%'),
    		'',
    		$title
    	);
    	while(strpos($title,'  ')!==false){
    		$title=str_replace('  ',' ',$title);
    	}
    	return $title;
    }
    
  3. User Avatar
    Rodney Rehm
    Permalink to comment#

    the original snippet doesn’t cut it. Unicodes that aren’t covered by htmlentities() are ignored altogether. If you (safely) want to transform an UTF-8 string to alph-anumeric for the use in URLs, give urlify() a shot.

  4. User Avatar
    Julien
    Permalink to comment#

    I have done this :

    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace(‘/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/’,’$1′,$str);
    return html_entity_decode($str);
    }

  5. User Avatar
    Julien
    Permalink to comment#


    function replace_accents($str) {
    $str = htmlentities($str);
    $str = preg_replace('/&([a-zA-Z])(uml|acute|grave|circ|tilde|cedil|elig|ring|th|slash|zlig|horn);/','$1',$str);
    return html_entity_decode($str);
    }

  6. User Avatar
    Arnaud
    Permalink to comment#

    Thx !!

Submit a Comment

Posting Code

You may write comments in Markdown. This makes code easy to post, as you can write inline code like `<div>this</div>` or multiline blocks of code in triple backtick fences (```) with double new lines before and after.

Code of Conduct

Absolutely anyone is welcome to submit a comment here. But not all comments will be posted. Think of it like writing a letter to the editor. All submitted comments will be read, but not all published. Published comments will be on-topic, helpful, and further the discussion or debate.

Want to tell us something privately?

Feel free to use our contact form. That's a great place to let us know about typos or anything off-topic.

icon-anchoricon-closeicon-emailicon-linkicon-logo-staricon-menuicon-nav-guideicon-searchicon-staricon-tag