欢迎加入QQ讨论群258996829
麦子学院 头像
苹果6袋
6
麦子学院

Python3中的全角与半角字符如何实现转换?

发布时间:2017-09-20 23:13  回复:0  查看:3328   最后回复:2017-09-20 23:13  
本文和大家分享的主要是python3 中全角与半角字符的转换相关内容,一起来看看吧,希望对大家 学习python3有所帮助。
   一、背景介绍
  ·  解决什么问题  :快速方便的对文本进行全角半角自动转换
  ·  适用什么场景  :学生答题数据中全角字符替换为半角字符
   二、全角半角原理
  ·  全角即:  D ouble  B yte  C haracter ,简称  DBC
  ·  半角即:  S ingle  B yte  C haracter ,简称  SBC
  ·  在  windows  中,中文和全角字符都占两个字节,并且使用了  ascii chart 2 (codes 128–255)
  ·  全角字符的第一个字节总是被置为  163 ,而第二个字节则是相同半角字符码加上 128 (不包括空格,全角空格和半角空格也要考虑进去);
  ·  对于中文来说,它的第一个字节被置为大于 163 ,如 :176 162 ,检测到中文时不进行转换。
  ·  例如:半角  为  65 ,则全角  是  163 (第一个字节)、 193 (第二个字节, 128+65 )。
  全角半角示例:(文本 test.txt  包含全角和半角字符)
  F:\test> type  test. txt123456
  123456
  abcdefg
  abcdefg
  中国你好
   三、使用 Python3 实现全角半角转换
  # -*- coding:utf-8 -*-
  i@mail.chenpeng.info
  ”’
  全角即:Double Byte Character ,简称: DBC
  半角即:Single Byte Character ,简称: SBC
  ”’
  def DBC2SBC(ustring):
  ”’  全角转半角  ”’
  rstring = “”
  for uchar in ustring:
  inside_code = ord(uchar)
  if inside_code == 0x3000:
  inside_code = 0x0020
  else:
  inside_code -= 0xfee0
  if not (0x0021 <= inside_code and inside_code <= 0x7e):
  rstring += uchar
  continue
  rstring += chr(inside_code)
  return rstring
  def SBC2DBC(ustring):
  ”’  半角转全角  ”’
  rstring = “”
  for uchar in ustring:
  inside_code = ord(uchar)
  if inside_code == 0x0020:
  inside_code = 0x3000
  else:
  if not (0x0021 <= inside_code and inside_code <= 0x7e):
  rstring += uchar
  continue
  inside_code += 0xfee0
  rstring += chr(inside_code)
  return rstring
  s = ”’
  array(‘ ’ => ‘0’, ‘ ’ => ‘1’, ‘ ’ => ‘2’, ‘ ’ => ‘3’, ‘ ’ => ‘4’,
   ’ => ‘5’, ‘ ’ => ‘6’, ‘ ’ => ‘7’, ‘ ’ => ‘8’, ‘ ’ => ‘9’,
   ’ => ‘A’, ‘ ’ => ‘B’, ‘ ’ => ‘C’, ‘ ’ => ‘D’, ‘ ’ => ‘E’,
   ’ => ‘F’, ‘ ’ => ‘G’, ‘ ’ => ‘H’, ‘ ’ => ‘I’, ‘ ’ => ‘J’,
   ’ => ‘K’, ‘ ’ => ‘L’, ‘ ’ => ‘M’, ‘ ’ => ‘N’, ‘ ’ => ‘O’,
   ’ => ‘P’, ‘ ’ => ‘Q’, ‘ ’ => ‘R’, ‘ ’ => ‘S’, ‘ ’ => ‘T’,
   ’ => ‘U’, ‘ ’ => ‘V’, ‘ ’ => ‘W’, ‘ ’ => ‘X’, ‘ ’ => ‘Y’,
   ’ => ‘Z’, ‘ ’ => ‘a’, ‘ ’ => ‘b’, ‘ ’ => ‘c’, ‘ ’ => ‘d’,
   ’ => ‘e’, ‘ ’ => ‘f’, ‘ ’ => ‘g’, ‘ ’ => ‘h’, ‘ ’ => ‘i’,
   ’ => ‘j’, ‘ ’ => ‘k’, ‘ ’ => ‘l’, ‘ ’ => ‘m’, ‘ ’ => ‘n’,
   ’ => ‘o’, ‘ ’ => ‘p’, ‘ ’ => ‘q’, ‘ ’ => ‘r’, ‘ ’ => ‘s’,
   ’ => ‘t’, ‘ ’ => ‘u’, ‘ ’ => ‘v’, ‘ ’ => ‘w’, ‘ ’ => ‘x’,
   ’ => ‘y’, ‘ ’ => ‘z’,
   ’ => ‘(‘, ‘ ’ => ‘)’, ‘ ’ => ‘[‘, ‘ ’ => ‘]’, ‘ ’ => ‘[‘,
   ’ => ‘]’, ‘ ’ => ‘[‘, ‘ ’ => ‘]’, ‘”‘ => ‘[‘, ‘”‘ => ‘]’,
  ‘\” => ‘[‘, ‘\” => ‘]’, ‘ ’ => ‘{‘, ‘ ’ => ‘}’, ‘ ’ => ‘<‘,
   ’ => ‘>’,
   ’ => ‘%’, ‘ ’ => ‘+’, ‘—’ => ‘-‘, ‘ ’ => ‘-‘, ‘ ’ => ‘-‘,
   ’ => ‘:’, ‘ ’ => ‘.’, ‘ ’ => ‘,’, ‘ ’ => ‘.’, ‘ ’ => ‘.’,
   ’ => ‘,’, ‘ ’ => ‘?’, ‘ ’ => ‘!’, ‘…’ => ‘-‘, ‘‖’ => ‘|’,
  ‘”‘ => ‘”‘, ‘\” => ‘`’, ‘\” => ‘`’, ‘ ’ => ‘|’, ‘ ’ => ‘”‘,
  ‘ ’ => ‘ ‘);
  ”’
  全角转半角
  print(DBC2SBC(s))
  半角转全角
  print(SBC2DBC(s))
  s = ”’ 中文测试 ”’
  全角转半角
  print(DBC2SBC(s))
  半角转全角
  print(SBC2DBC(s))
来源: 陈鹏个人博客
您还未登录,请先登录

热门帖子

最新帖子