2012-01-30 3 views
-4

임 학교 정보를 얻으려고하고 각 열의 세부 정보가 포함 된 엑셀 테이블 시트로 저장하려고합니다. 처음 시작하는 코드는 더 나아가 도움이되었습니다. 열 헤더 : 내가 소유 한 학교 목록에 대한 학교 이름, 마스코트, 주소, 유형, 전화 번호, 팩스 번호 등. 예를 들어 하나의 링크를 사용했습니다.이것이 웹 스크 레이 핑을보다 훌륭하게하고 올바른 코드로 만들 수 있습니까?

Imports System.IO.StreamReader 
Imports System.Text.RegularExpressions 

Public Class Form1 

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click 
     Dim request As System.Net.HttpWebRequest = System.Net.WebRequest.Create("http://www.maxpreps.com/high-schools/abbeville-yellowjackets-(abbeville,al)/home.htm") 
     Dim response As System.Net.HttpWebResponse = request.GetResponse 

     Dim sr As System.IO.StreamReader = New System.IO.StreamReader(response.GetResponseStream()) 
     Dim rsssource As String = sr.ReadToEnd 
     Dim r As New System.Text.RegularExpressions.Regex("<h1 id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Header"">.*</h1>") 
     Dim r1 As New System.Text.RegularExpressions.Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Mascot"">.*</span>") 
     Dim r3 As New System.Text.RegularExpressions.Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Colors"">.*</span>") 
     Dim r4 As New System.Text.RegularExpressions.Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_GenderType"">.*</span>") 
     Dim r5 As New System.Text.RegularExpressions.Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_AthleteDirectorGenericControl"">.*</span>") 
     Dim r6 As New System.Text.RegularExpressions.Regex("<address>.*</address>") 
     Dim r7 As New System.Text.RegularExpressions.Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Phone"">.*</span>") 
     Dim r8 As New System.Text.RegularExpressions.Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Fax"">.*</span>") 

     Dim matches As MatchCollection = r.Matches(rsssource) 
     Dim matches1 As MatchCollection = r1.Matches(rsssource) 
     Dim matches3 As MatchCollection = r3.Matches(rsssource) 
     Dim matches4 As MatchCollection = r4.Matches(rsssource) 
     Dim matches5 As MatchCollection = r5.Matches(rsssource) 
     Dim matches6 As MatchCollection = r6.Matches(rsssource) 
     Dim matches7 As MatchCollection = r7.Matches(rsssource) 
     Dim matches8 As MatchCollection = r8.Matches(rsssource) 


     For Each itemcode As Match In matches 
      ListBox1.Items.Add(itemcode.Value.Split("_").GetValue(4)) 
      ListBox1.Items.Add(itemcode.Value.Split("><").GetValue(1)) 
     Next 
     For Each itemcode As Match In matches1 
      ListBox1.Items.Add(itemcode.Value.Split("_").GetValue(4)) 
      ListBox1.Items.Add(itemcode.Value.Split("><").GetValue(1)) 

     Next 
    End Sub 
End Class 

답변

1

찾고 계시는 분은 Code Review입니다. 어쨌든, 네, 훨씬 더 좋게 만들 수 있습니다. 먼저 System.Text.RegularExpressions 네임 스페이스를 가져 왔습니다. Regex의 자격을 완전히 취득 할 필요는 없습니다. 다음으로 그룹을 사용할 수 있습니다.

다음으로는 HttpWebRequest 대신에 WebClient을 사용할 수 있습니다. 다음은 시작이다 :

Imports System.Net 
Imports System.Text.RegularExpressions 

Public Class Form1 

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click 
     Using wc As New WebClient() 
      rssource = wc.DownloadString("http://www.maxpreps.com/high-schools/abbeville-yellowjackets-(abbeville,al)/home.htm") 
     End Using 

     Dim r As New Regex("<h1 id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Header"">(.*?)</h1>") 
     Dim r1 As New Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Mascot"">(.*?)</span>") 
     Dim r3 As New Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Colors"">(.*?)</span>") 
     Dim r4 As New Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_GenderType"">(.*?)</span>") 
     Dim r5 As New Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_AthleteDirectorGenericControl"">(.*?)</span>") 
     Dim r6 As New Regex("<address>(.*)</address>") 
     Dim r7 As New Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Phone"">(.*?)</span>") 
     Dim r8 As New Regex("<span id=""ctl00_NavigationWithContentOverRelated_ContentOverRelated_Header_Fax"">(.*?)</span>") 

     Dim matches As MatchCollection = r.Matches(rsssource) 
     Dim matches1 As MatchCollection = r1.Matches(rsssource) 
     Dim matches3 As MatchCollection = r3.Matches(rsssource) 
     Dim matches4 As MatchCollection = r4.Matches(rsssource) 
     Dim matches5 As MatchCollection = r5.Matches(rsssource) 
     Dim matches6 As MatchCollection = r6.Matches(rsssource) 
     Dim matches7 As MatchCollection = r7.Matches(rsssource) 
     Dim matches8 As MatchCollection = r8.Matches(rsssource) 

     For Each itemcode As Match In matches 
      'ListBox1.Items.Add(itemcode.Value.Split("_").GetValue(4)) 
      'Use columns or something instead 
      ListBox1.Items.Add(itemcode.Groups(1).Value) 
     Next 

     For Each itemcode As Match In matches1 
      ListBox1.Items.Add(itemcode.Groups(1).Value) 
     Next 
    End Sub 
End Class 

다음으로, 당신의 정규 표현식에게 의미있는 이름을 부여 효율성을 높이기 위해 그들에게 StaticCompiled을 만들고, 모든에서 정규 표현식을 사용하지 않는 것이 좋습니다. 아, 대신 HTML 파서를 사용하십시오.

관련 문제